1. How to calculate the estimate usable disk space for a hybrid cluster?

For the common case when there is one replica (copy) on an SSDs placement group the available capacity is approximately the combined capacity of all SSDs without 10% overhead.

\[\frac{\sum (SSD\ capacity)}{1.1} = usable\ space\]

The only exception is for cases when the available space for the other two copies in the hard drives placement group is less than 2 times the available in the SSD placement group, i.e. they are the limiting factor. In such a case the same calculation will look like this:

\[\frac{\sum (HDD\ capacity)}{2.2} = usable\ space\]

Same applies for cluster with dual replicas on SSD media and single replica on HDDs with the reverse logic:

\[\frac{\sum (SSD\ capacity)}{2.2} = usable\ space\]

2. What is the current capacity, provisioned and available space in the cluster?

The needed information could be collected using storpool template status CLI command or in the collected statistics over time at for the cluster at hand.

# storpool template status
| template             | place head | place all  | place tail | repl. | volumes | snapshots/removing |    size | capacity |  avail. | avail. all | avail. tail |
| hybrid               | hdd        | hdd        | ssd        |     3 |       2 |         0/0        |  8.0 TB |    11 TB |  9.2 TB |      56 TB |      9.2 TB |

The size column is showing the space that is provisioned. The capacity column is the full capacity that could be filled with the given template placement and replication. The avail column is how much space currently is available.

The avail might be less than the actual raw free space because of disbalance between the disks in each placement group or due to drives with different sizes in the same placement group. It is calculated as the least amount of free space on a drive, minus a minimum safety margin of 10GiB, multiplied by the amount of drives in the placement group.

The estimate is done this way to be conservative and provide good enough early warning before the cluster runs out of free space.

3. What is the current thin provisioning gain of the cluster?

The gain of the thin provisioning could be calculated with the values from the storpool template status CLI command (see an example output above) using the following formula

\[\frac{size}{capacity - avail} = gain\]

For example:

\[\frac{8}{11 - 9.2} = 4.67\]

The calculated gain of the thin provisioning is x4.67.

4. How does StorPool handle writes?

StorPool is using copy-on-write strategy for writing on the drives, which is needed to guarantee data consistency and allows to perform very fast storage operations. However this requires aggregation for random workloads, which usually happens in the background when no significant load is pushed on the backend storage. This could be observed with iostat and with storpool disk {diskId} activeRequests and is perfectly normal.

What we have observed in production systems and naturally gathered blktraces, the higher IOPS demand is in short bursts with much lower average demanded IOPS. This allows the system to aggregate and cope with the workload without affecting storage operations.

In cases when the storage system is hammered with unusually high artificial random workloads for long periods of time it will start aggregating while performing the workload, which will slow down the storage operations. At some point the performance will settle to some point, in which the aggregating and the random workload will be balanced and will continue this way until the drives are full.

5. Why are there partitions on all SATA drives used by StorPool?

First reason is that by using partitions, a proper alignment could be enforced, e.g. using 2M for the start of the partition will deal with any internal alignment of the underlying device smaller than 2M even in cases where the disk is a virtul device exposed by a RAID controller (for example).

Another reason is that sometimes the controllers are changing the order of drives on each boot. This case is sometimes combined with the boot drive for the root device of the operating system on the same controller. In such cases having GRUB installed on all other devices as well is the only workaround for getting consistent booting.

We have also seen the kernel getting confused of the data in the first few sectors with the weird side effect of detecting a phantom partition at the last few gigabytes of the drive, obstructing normal operations with the disk. In this case the best option is to re-create the same disk ID (i.e. re-balance out, then re-balance back in) on a partition on the same drive with proper alignment.

This rule does not apply for NVMe devices, due to the way they are being managed (the kernel does not see the devices managed by StorPool due to kernel bypass). Even then properly aligned partitions are required if the device is used for journals for other devices (e.g. an Optane drive in front of HDDs) or if the same drive needs to be split to many server instances for performance reasons.

6. Why the StorPool processes seem to be at 100% CPU usage all the time?

TL;DR: They are not (using 100% CPU), but the kernel gets confused that they are.

Longer explanation - sometimes processes like the storpool_server and storpool_block are reported by top at 100% CPU usage all the time. This is the expected behaviour when hardware sleep is enabled (the SP_SLEEP_TYPE parameter set to hsleep) and is related to the implementation of time critical services in StorPool. The actual CPU usage is much lower than reported by top, and can be monitored with cpupower monitor, values of Mperf C0 field for the CPUs dedicated for these processes. The only exception of this rule is for nodes/services configured with a different than the default SP_SLEEP_TYPE in which cases the ksleep will show variable usage and the no sleep will actually keep them at 100% CPU usage.

7. What addresses uses StorPool for monitoring

The IP addresses from which we access the servers and the servers send monitoring/statistic data back are, and The ports used for sending monitoring/statistics are 443 (HTTPS) and 2266 (SSH). Because we look up the host by name, there should be an available DNS service configured on the nodes.

A simple test is to try connecting to on port 443 and on 2266 with SSH.

8. What is required when I add/change memory modules on a hypervisor?

In case of memory module(s) addition or changes the best current practice is to run a full memtest before the hypervisor is returned into production. We have a reduced set of memory tests, usually available for execution as ~/storpool/ which will re-validate the memory with parallel memtester executions. The usual run time is between 10 and 40 minutes, depending on the number of CPU cores and the amount of memory installed in the hypervisor.

If memory was added the old cgroup limits will have to be updated, which in most cases will be as easy as:

storpool_cg print # to see the presently configured limits
storpool_cg conf -NME # -N for noop to check what will be changed
storpool_cg conf -ME # to actually perform the changes live

The above example is for hypervisor-only nodes, nodes that are also exposing disks to the cluster (i.e. running one or more storpool_server services) would need the converged=1 parameter on the storpool_cg conf ... command line, because this cannot be auto-detected. Detailed info regarding cgroups configuration and the storpool_cg tool is available at Control Groups.

In any case if unsure, please open up a ticket with StorPool support.


1. StorPool not working on vlan interface on I350 NIC

Generally 1GE interfaces are not supported with StorPool, but they are useful in some occasions with testing installations. For I350 based NICs, the VLAN offload must be disabled on the parent NIC.

Example to verify current state, disable vlan-offload and configrm the change

# verify
[root@s21 ~]# ethtool -k eth1 | grep vlan-offload
rx-vlan-offload: on
tx-vlan-offload: on
# disable
[root@s21 ~]# ethtool -K eth1 rxvlan off
Actual changes:
rx-vlan-offload: off
tx-vlan-offload: off [requested on]
# confirm
[root@s21 ~]# ethtool -k eth1 | grep vlan-offload
rx-vlan-offload: off
tx-vlan-offload: off [requested on]

This configuration works only without any hardware-acceleration with CPU cores reserved for the NIC interrupts and the storpool_rdma user space threads (the iface_acc=false flag in storpool_cg).