Rebalancing the cluster

Overview

In some situations the data in the StorPool cluster needs to be rebalanced. This is performed by the balancer and the relocator tools. The relocator is an integral part of the StorPool management service, the balancer is an external tool available and executed on some of the nodes with access to the API.

Note

Be advised that he balancer tool will create some files it needs in the present working directory.

Rebalancing procedure

The rebalancing operation is performed in the following steps:

The balancer tool is executed to calculate the new state of the cluster.
The results from the balancer are verified by set of automated scripts.
The results are also manually reviewed to check whether they contain any inconsistencies and whether they achieve the intended goals. These results are available by running storpool balancer disks and will be printed at the end of balancer.sh

If the result is not satisfactory, the balancer is executed with different parameters, until a satisfactory result is obtained.
Once the proposed end result is satisfactory, the calculated state is loaded into the relocator tool, by doing storpool balancer commit.

Note that this step can be reversed only with the --restore-state option, which will revert to the initial state. If a balancing operation has ran for a while and for some reason it needs to be “cancelled” (stopped at the current state), currently that’s not supported.
The relocator tool performs the actual move of the data.

The progress of the relocator tool can be monitored by storpool task list for the currently running tasks, storpool relocator status for an overview of the relocator state, and storpool relocator disks (warning: slow command) for the full relocation state. For details, see Tasks and Relocator.

Options

The balancer tool is executed via the /usr/lib/storpool/balancer.sh wrapper. It accepts the options described in Balancing options.

Restoring volume redundancy on a failed drive

Situation: we have lost drive 1802 in placementGroup ssd. We want to remove it from the cluster and restore the redundancy of the data. We need to do the following:

storpool disk 1802 forget                               # this will also remove the drive from all placement groups it participated in
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -R
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Restoring volume redundancy for two failed drives (single-copy situation)

(Emergency) Situation: we have lost drives 1802 and 1902 in placementGroup ssd. We want to remove them from the cluster and restore the redundancy of the data. We need to do the following:

storpool disk 1802 forget                               # this will also remove the drive from all placement groups it participated in
storpool disk 1902 forget                               # this will also remove the drive from all placement groups it participated in
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -F --min-replication 2    # first balancing run, to create a second copy of the data
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation
# wait for the balancing to finish

/usr/lib/storpool/balancer.sh -R                        # second balancing run, to restore full redundancy
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Adding new drives and rebalancing data on them

Situation: we have added SSDs 1201, 1202 and HDDs 1510, 1511, that need to go into placement groups ssd and hdd respectively, and we want to re-balance the cluster data so that it is re-dispersed onto the new disks as well. We have no other placement groups in the cluster.

storpool placementGroup ssd addDisk 1201 addDisk 1202
storpool placementGroup hdd addDisk 1510 addDisk 1511
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -F -c 0                   # rebalance all placement groups, move data from fuller to emptier drives
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Restoring volume redundancy with rebalancing data on other placementGroup

Situation: we have to restore the redundancy of a hybrid cluster (2 copies on HDDs, one on SSDs) while the ssd placementGroup is out of free space because a few SSDs have recently failed. We can’t replace the failed drives with new ones for the moment.

mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -R -c 0 -b hdd            # use placementGroup ``hdd`` as a backup and move some data from SSDs
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Note

The -f argument could be further used in order to instruct the balancer how full to keep the cluster and thus control how much data will be moved in the backup placement group.

Decommissioning a live node

Situation: a node in the cluster needs to be decommissioned, so that the data on its drives needs to be moved away. The drive numbers on that node are 101, 102 and 103.

Note

You have to make sure you have enough space to restore the redundancy before proceeding.

storpool disk 101 softEject                             # mark all drives for evacuation
storpool disk 102 softEject
storpool disk 103 softEject
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -R -c 0                   # rebalance all placement groups, -F has the same effect in this case
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Decommissioning a dead node

Situation: a node in the cluster needs to be decommissioned, as it has died and cannot be brought back. The drive numbers on that node are 101, 102 and 103.

Note

You have to make sure you have enough space to restore the redundancy before proceeding.

storpool disk 101 forget                                # remove the drives from all placement groups
storpool disk 102 forget
storpool disk 103 forget
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -R -c 0                   # rebalance all placement groups
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Tip

Alternatively, you can try physically moving the disks from the dead node into another live node in the cluster, and then running another re-balance operation.

Resolving imbalances in the drive usage

Situation: we have an imbalance in the drive usage in the whole cluster and we want to improve it.

mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -F -c 0                   # rebalance all placement groups
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Resolving imbalances in the drive usage with three-node clusters

Situation: we have an imbalance in the drive usage in the whole cluster and we want to improve it. We have a three-node hybrid cluster and proper balancing requires larger moves of “unrelated” data:

mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -F -c 0                   # rebalance all placement groups
/usr/lib/storpool/balancer.sh -A -c 10                  # retry to see if we get a better result with more data movements
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Reverting balancer to a previous state

Situation: we have committed a rebalancing operation, but want to revert back to the previous state:

cd ~/storpool/balancer                                             # it's recommended to run the following commands in a screen/tmux session
ls                                                                 # list all saved states and choose what to revert to
/usr/lib/storpool/balancer.sh --restore-state 2022-10-28-15-39-40  # revert to 2022-10-28-15-39-40
storpool balancer commit                                           # to actually load the data into the relocator and start the re-balancing operation

Reading the output of `storpool balancer disks`

Here is an example output from storpool balancer disks:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|     disk | server |   size   |                  stored                  |                 on-disk                  |                     objects                      |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|        1 |   14.0 |   373 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.1 MB -> 6.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 405000  |
|     1101 |   11.0 |   447 GB |    16 GB -> 15 GB    (-1.0 GB / 1.4 GB)  |    18 GB -> 17 GB    (-1.1 GB / 1.4 GB)  |   11798 -> 10040     (-1758 / +3932)   / 480000  |
|     1102 |   11.0 |   447 GB |    16 GB -> 15 GB    (-268 MB / 1.3 GB)  |    17 GB -> 17 GB    (-301 MB / 1.4 GB)  |   10843 -> 10045      (-798 / +4486)   / 480000  |
|     1103 |   11.0 |   447 GB |    16 GB -> 15 GB    (-1.0 GB / 1.8 GB)  |    18 GB -> 16 GB    (-1.2 GB / 1.9 GB)  |   12123 -> 10039     (-2084 / +3889)   / 480000  |
|     1104 |   11.0 |   447 GB |    16 GB -> 15 GB    (-757 MB / 1.3 GB)  |    17 GB -> 16 GB    (-899 MB / 1.3 GB)  |   11045 -> 10072      (-973 / +4279)   / 480000  |
|     1111 |   11.0 |   466 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   5.1 MB -> 5.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 495000  |
|     1112 |   11.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   5.1 MB -> 5.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1121 |   11.0 |   931 GB |    22 GB -> 21 GB    (-1009 MB / 830 MB)  |    22 GB -> 21 GB    (-1.0 GB / 872 MB)  |   13713 -> 12698     (-1015 / +3799)   / 975000  |
|     1122 |   11.0 |   931 GB |    21 GB -> 21 GB    (-373 MB / 2.0 GB)  |    22 GB -> 21 GB    (-379 MB / 2.0 GB)  |   13469 -> 12742      (-727 / +3801)   / 975000  |
|     1123 |   11.0 |   931 GB |    22 GB -> 21 GB    (-1.1 GB / 1.9 GB)  |    22 GB -> 21 GB    (-1.1 GB / 2.0 GB)  |   14859 -> 12629     (-2230 / +4102)   / 975000  |
|     1124 |   11.0 |   931 GB |    21 GB -> 21 GB      (36 MB / 1.8 GB)  |    21 GB -> 21 GB      (92 MB / 1.9 GB)  |   13806 -> 12743     (-1063 / +3389)   / 975000  |
|     1201 |   12.0 |   447 GB |    18 GB -> 15 GB    (-2.9 GB / 633 MB)  |    19 GB -> 16 GB    (-3.0 GB / 658 MB)  |   14148 -> 10070     (-4078 / +3050)   / 480000  |
|     1202 |   12.0 |   447 GB |    17 GB -> 15 GB    (-2.1 GB / 787 MB)  |    19 GB -> 16 GB    (-2.3 GB / 815 MB)  |   13243 -> 10067     (-3176 / +2576)   / 480000  |
|     1203 |   12.0 |   447 GB |    17 GB -> 15 GB    (-2.0 GB / 3.3 GB)  |    19 GB -> 16 GB    (-2.4 GB / 3.5 GB)  |   12746 -> 10062     (-2684 / +3375)   / 480000  |
|     1204 |   12.0 |   447 GB |    18 GB -> 15 GB    (-2.7 GB / 1.1 GB)  |    19 GB -> 16 GB    (-2.9 GB / 1.1 GB)  |   12835 -> 10075     (-2760 / +3248)   / 480000  |
|     1212 |   12.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.0 MB -> 6.0 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1221 |   12.0 |   931 GB |    20 GB -> 21 GB     (569 MB / 1.5 GB)  |    21 GB -> 21 GB     (587 MB / 1.6 GB)  |   13115 -> 12616      (-499 / +3736)   / 975000  |
|     1222 |   12.0 |   931 GB |    22 GB -> 21 GB    (-979 MB / 307 MB)  |    22 GB -> 21 GB    (-1013 MB / 317 MB)  |   12938 -> 12697      (-241 / +3291)   / 975000  |
|     1223 |   12.0 |   931 GB |    22 GB -> 21 GB    (-1.1 GB / 781 MB)  |    22 GB -> 21 GB    (-1.2 GB / 812 MB)  |   13968 -> 12718     (-1250 / +3302)   / 975000  |
|     1224 |   12.0 |   931 GB |    21 GB -> 21 GB    (-784 MB / 332 MB)  |    22 GB -> 21 GB    (-810 MB / 342 MB)  |   13741 -> 12692     (-1049 / +3314)   / 975000  |
|     1225 |   12.0 |   931 GB |    21 GB -> 21 GB    (-681 MB / 849 MB)  |    22 GB -> 21 GB    (-701 MB / 882 MB)  |   13608 -> 12748      (-860 / +3420)   / 975000  |
|     1226 |   12.0 |   931 GB |    22 GB -> 21 GB    (-1.1 GB / 825 MB)  |    22 GB -> 21 GB    (-1.1 GB / 853 MB)  |   13066 -> 12692      (-374 / +3817)   / 975000  |
|     1301 |   13.0 |   447 GB |    13 GB -> 15 GB     (2.6 GB / 4.2 GB)  |    14 GB -> 17 GB     (2.7 GB / 4.4 GB)  |    7244 -> 10038     (+2794 / +6186)   / 480000  |
|     1302 |   13.0 |   447 GB |    12 GB -> 15 GB     (3.0 GB / 3.7 GB)  |    13 GB -> 17 GB     (3.1 GB / 3.9 GB)  |    7507 -> 10063     (+2556 / +5619)   / 480000  |
|     1303 |   13.0 |   447 GB |    14 GB -> 15 GB     (1.3 GB / 3.2 GB)  |    15 GB -> 17 GB     (1.3 GB / 3.4 GB)  |    7888 -> 10038     (+2150 / +5884)   / 480000  |
|     1304 |   13.0 |   447 GB |    13 GB -> 15 GB     (2.7 GB / 3.7 GB)  |    14 GB -> 17 GB     (2.8 GB / 3.9 GB)  |    7660 -> 10045     (+2385 / +5870)   / 480000  |
|     1311 |   13.0 |   466 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.1 MB -> 6.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 495000  |
|     1312 |   13.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.1 MB -> 6.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1321 |   13.0 |   931 GB |    21 GB -> 21 GB    (-193 MB / 1.1 GB)  |    21 GB -> 21 GB    (-195 MB / 1.2 GB)  |   13365 -> 12765      (-600 / +5122)   / 975000  |
|     1322 |   13.0 |   931 GB |    22 GB -> 21 GB    (-1.4 GB / 1.1 GB)  |    23 GB -> 21 GB    (-1.4 GB / 1.1 GB)  |   12749 -> 12739       (-10 / +4651)   / 975000  |
|     1323 |   13.0 |   931 GB |    21 GB -> 21 GB    (-504 MB / 2.2 GB)  |    22 GB -> 21 GB    (-496 MB / 2.3 GB)  |   13386 -> 12695      (-691 / +4583)   / 975000  |
|     1325 |   13.0 |   931 GB |    21 GB -> 20 GB    (-698 MB / 557 MB)  |    22 GB -> 21 GB    (-717 MB / 584 MB)  |   13113 -> 12768      (-345 / +2668)   / 975000  |
|     1326 |   13.0 |   931 GB |    21 GB -> 21 GB    (-507 MB / 724 MB)  |    22 GB -> 21 GB    (-522 MB / 754 MB)  |   13690 -> 12704      (-986 / +3327)   / 975000  |
|     1401 |   14.0 |   223 GB |   8.3 GB -> 7.6 GB   (-666 MB / 868 MB)  |   9.3 GB -> 8.5 GB   (-781 MB / 901 MB)  |    3470 -> 5043      (+1573 / +2830)   / 240000  |
|     1402 |   14.0 |   447 GB |   9.8 GB -> 15 GB     (5.6 GB / 5.7 GB)  |    11 GB -> 17 GB     (5.8 GB / 6.0 GB)  |    4358 -> 10060     (+5702 / +6667)   / 480000  |
|     1403 |   14.0 |   224 GB |   8.2 GB -> 7.6 GB   (-623 MB / 1.1 GB)  |   9.3 GB -> 8.6 GB   (-710 MB / 1.2 GB)  |    4547 -> 5036       (+489 / +2814)   / 240000  |
|     1404 |   14.0 |   224 GB |   8.4 GB -> 7.6 GB   (-773 MB / 1.5 GB)  |   9.4 GB -> 8.5 GB   (-970 MB / 1.6 GB)  |    4369 -> 5031       (+662 / +2368)   / 240000  |
|     1411 |   14.0 |   466 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.0 MB -> 6.0 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 495000  |
|     1412 |   14.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.0 MB -> 6.0 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1421 |   14.0 |   931 GB |    19 GB -> 21 GB     (1.9 GB / 2.6 GB)  |    19 GB -> 21 GB     (2.0 GB / 2.7 GB)  |   10670 -> 12624     (+1954 / +6196)   / 975000  |
|     1422 |   14.0 |   931 GB |    19 GB -> 21 GB     (1.6 GB / 3.2 GB)  |    20 GB -> 21 GB     (1.6 GB / 3.3 GB)  |   10653 -> 12844     (+2191 / +6919)   / 975000  |
|     1423 |   14.0 |   931 GB |    19 GB -> 21 GB     (1.9 GB / 2.5 GB)  |    19 GB -> 21 GB     (2.0 GB / 2.6 GB)  |   10715 -> 12688     (+1973 / +5846)   / 975000  |
|     1424 |   14.0 |   931 GB |    18 GB -> 20 GB     (2.2 GB / 2.9 GB)  |    19 GB -> 21 GB     (2.3 GB / 3.0 GB)  |   10723 -> 12686     (+1963 / +5505)   / 975000  |
|     1425 |   14.0 |   931 GB |    19 GB -> 21 GB     (1.3 GB / 2.5 GB)  |    20 GB -> 21 GB     (1.4 GB / 2.6 GB)  |   10702 -> 12689     (+1987 / +5486)   / 975000  |
|     1426 |   14.0 |   931 GB |    20 GB -> 21 GB     (1.0 GB / 2.5 GB)  |    20 GB -> 21 GB     (1.0 GB / 2.6 GB)  |   10737 -> 12609     (+1872 / +5771)   / 975000  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|       45 |    4.0 |    29 TB |   652 GB -> 652 GB    (512 MB / 69 GB)   |   686 GB -> 685 GB   (-240 MB / 72 GB)   |  412818 -> 412818       (+0 / +159118) / 30885000 |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Let’s start with the last line. Here’s the meaning, field by field:

There are 45 drives in total.
There are 4 server instances.
The total disk capacity is 29 TB.
The stored data is 652 GB and will change to 652 GB. The total change for all drives afterwards is 512 MB, and the total amount of changes for the drives is 69 GB (i.e. how much will they “recover” from other drives).
The same is repeated for the on-disk size. Here the total amount of changes is roughly the amount of data that would need to be copied.
The total current number of objects will not change (i.e. from 412818 to 412818), 0 new objects will be created, the total amount of objects to be moved is 159118, and the total number of possible objects in the cluster is 30885000.

The difference between “stored” and “on-disk” size is that in the latter also includes the size of checksums and metadata.

For the rest of the lines, the data is the same, just per disk.

What needs to be taken into account is:

Are there drives that will have too much data on them? Here, both data size and objects must be checked, and they should be close to the average percentage for the placement group.
Is the data stored on the drives balanced, i.e. are all the drives’ usages close to the average?
Are there drives that should have data on them, but nothing is scheduled to be moved?

This usually happens because a drive wasn’t added to the right placement group.
Will there be too much data to be moved?

To illustrate the difference of amount to be moved, here is the output of storpool balancer disks from a run with -c 10:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|     disk | server |   size   |                  stored                  |                 on-disk                  |                     objects                      |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|        1 |   14.0 |   373 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.1 MB -> 6.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 405000  |
|     1101 |   11.0 |   447 GB |    16 GB -> 15 GB    (-1.0 GB / 1.7 GB)  |    18 GB -> 17 GB    (-1.1 GB / 1.7 GB)  |   11798 -> 10027     (-1771 / +5434)   / 480000  |
|     1102 |   11.0 |   447 GB |    16 GB -> 15 GB    (-263 MB / 1.7 GB)  |    17 GB -> 17 GB    (-298 MB / 1.7 GB)  |   10843 -> 10000      (-843 / +5420)   / 480000  |
|     1103 |   11.0 |   447 GB |    16 GB -> 15 GB    (-1.0 GB / 3.6 GB)  |    18 GB -> 16 GB    (-1.2 GB / 3.8 GB)  |   12123 -> 10005     (-2118 / +6331)   / 480000  |
|     1104 |   11.0 |   447 GB |    16 GB -> 15 GB    (-752 MB / 2.7 GB)  |    17 GB -> 16 GB    (-907 MB / 2.8 GB)  |   11045 -> 10098      (-947 / +5214)   / 480000  |
|     1111 |   11.0 |   466 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   5.1 MB -> 5.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 495000  |
|     1112 |   11.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   5.1 MB -> 5.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1121 |   11.0 |   931 GB |    22 GB -> 21 GB    (-1003 MB / 6.4 GB)  |    22 GB -> 21 GB    (-1018 MB / 6.7 GB)  |   13713 -> 12742      (-971 / +9712)   / 975000  |
|     1122 |   11.0 |   931 GB |    21 GB -> 21 GB    (-368 MB / 5.8 GB)  |    22 GB -> 21 GB    (-272 MB / 6.1 GB)  |   13469 -> 12718      (-751 / +8929)   / 975000  |
|     1123 |   11.0 |   931 GB |    22 GB -> 21 GB    (-1.1 GB / 5.9 GB)  |    22 GB -> 21 GB    (-1.1 GB / 6.1 GB)  |   14859 -> 12699     (-2160 / +8992)   / 975000  |
|     1124 |   11.0 |   931 GB |    21 GB -> 21 GB      (57 MB / 7.4 GB)  |    21 GB -> 21 GB     (113 MB / 7.7 GB)  |   13806 -> 12697     (-1109 / +9535)   / 975000  |
|     1201 |   12.0 |   447 GB |    18 GB -> 15 GB    (-2.8 GB / 1.2 GB)  |    19 GB -> 17 GB    (-3.0 GB / 1.2 GB)  |   14148 -> 10033     (-4115 / +4853)   / 480000  |
|     1202 |   12.0 |   447 GB |    17 GB -> 15 GB    (-2.0 GB / 1.6 GB)  |    19 GB -> 16 GB    (-2.2 GB / 1.7 GB)  |   13243 -> 10055     (-3188 / +4660)   / 480000  |
|     1203 |   12.0 |   447 GB |    17 GB -> 15 GB    (-2.0 GB / 2.3 GB)  |    19 GB -> 16 GB    (-2.3 GB / 2.4 GB)  |   12746 -> 10070     (-2676 / +4682)   / 480000  |
|     1204 |   12.0 |   447 GB |    18 GB -> 15 GB    (-2.7 GB / 2.1 GB)  |    19 GB -> 16 GB    (-2.8 GB / 2.2 GB)  |   12835 -> 10110     (-2725 / +5511)   / 480000  |
|     1212 |   12.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.0 MB -> 6.0 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1221 |   12.0 |   931 GB |    20 GB -> 21 GB     (620 MB / 6.3 GB)  |    21 GB -> 21 GB     (805 MB / 6.7 GB)  |   13115 -> 12542      (-573 / +9389)   / 975000  |
|     1222 |   12.0 |   931 GB |    22 GB -> 21 GB    (-981 MB / 2.9 GB)  |    22 GB -> 21 GB    (-1004 MB / 3.0 GB)  |   12938 -> 12793      (-145 / +8795)   / 975000  |
|     1223 |   12.0 |   931 GB |    22 GB -> 21 GB    (-1.1 GB / 5.9 GB)  |    22 GB -> 21 GB    (-1.1 GB / 6.1 GB)  |   13968 -> 12698     (-1270 / +10094)  / 975000  |
|     1224 |   12.0 |   931 GB |    21 GB -> 21 GB    (-791 MB / 4.5 GB)  |    22 GB -> 21 GB    (-758 MB / 4.7 GB)  |   13741 -> 12684     (-1057 / +8616)   / 975000  |
|     1225 |   12.0 |   931 GB |    21 GB -> 21 GB    (-671 MB / 4.8 GB)  |    22 GB -> 21 GB    (-677 MB / 4.9 GB)  |   13608 -> 12690      (-918 / +8559)   / 975000  |
|     1226 |   12.0 |   931 GB |    22 GB -> 21 GB    (-1.1 GB / 6.2 GB)  |    22 GB -> 21 GB    (-1.1 GB / 6.4 GB)  |   13066 -> 12737      (-329 / +9386)   / 975000  |
|     1301 |   13.0 |   447 GB |    13 GB -> 15 GB     (2.6 GB / 4.5 GB)  |    14 GB -> 17 GB     (2.7 GB / 4.6 GB)  |    7244 -> 10077     (+2833 / +6714)   / 480000  |
|     1302 |   13.0 |   447 GB |    12 GB -> 15 GB     (3.0 GB / 4.9 GB)  |    13 GB -> 17 GB     (3.2 GB / 5.2 GB)  |    7507 -> 10056     (+2549 / +7011)   / 480000  |
|     1303 |   13.0 |   447 GB |    14 GB -> 15 GB     (1.3 GB / 3.2 GB)  |    15 GB -> 17 GB     (1.3 GB / 3.3 GB)  |    7888 -> 10020     (+2132 / +6926)   / 480000  |
|     1304 |   13.0 |   447 GB |    13 GB -> 15 GB     (2.7 GB / 4.7 GB)  |    14 GB -> 17 GB     (2.8 GB / 4.9 GB)  |    7660 -> 10075     (+2415 / +7049)   / 480000  |
|     1311 |   13.0 |   466 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.1 MB -> 6.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 495000  |
|     1312 |   13.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.1 MB -> 6.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1321 |   13.0 |   931 GB |    21 GB -> 21 GB    (-200 MB / 4.1 GB)  |    21 GB -> 21 GB    (-192 MB / 4.3 GB)  |   13365 -> 12690      (-675 / +9527)   / 975000  |
|     1322 |   13.0 |   931 GB |    22 GB -> 21 GB    (-1.3 GB / 6.9 GB)  |    23 GB -> 21 GB    (-1.3 GB / 7.2 GB)  |   12749 -> 12698       (-51 / +10047)  / 975000  |
|     1323 |   13.0 |   931 GB |    21 GB -> 21 GB    (-495 MB / 6.1 GB)  |    22 GB -> 21 GB    (-504 MB / 6.3 GB)  |   13386 -> 12693      (-693 / +9524)   / 975000  |
|     1325 |   13.0 |   931 GB |    21 GB -> 21 GB    (-620 MB / 6.6 GB)  |    22 GB -> 21 GB    (-612 MB / 6.9 GB)  |   13113 -> 12768      (-345 / +9942)   / 975000  |
|     1326 |   13.0 |   931 GB |    21 GB -> 21 GB    (-498 MB / 7.1 GB)  |    22 GB -> 21 GB    (-414 MB / 7.4 GB)  |   13690 -> 12697      (-993 / +9759)   / 975000  |
|     1401 |   14.0 |   223 GB |   8.3 GB -> 7.6 GB   (-670 MB / 950 MB)  |   9.3 GB -> 8.5 GB   (-789 MB / 993 MB)  |    3470 -> 5061      (+1591 / +3262)   / 240000  |
|     1402 |   14.0 |   447 GB |   9.8 GB -> 15 GB     (5.6 GB / 7.1 GB)  |    11 GB -> 17 GB     (5.8 GB / 7.5 GB)  |    4358 -> 10052     (+5694 / +7092)   / 480000  |
|     1403 |   14.0 |   224 GB |   8.2 GB -> 7.6 GB   (-619 MB / 730 MB)  |   9.3 GB -> 8.5 GB   (-758 MB / 759 MB)  |    4547 -> 5023       (+476 / +2567)   / 240000  |
|     1404 |   14.0 |   224 GB |   8.4 GB -> 7.6 GB   (-790 MB / 915 MB)  |   9.4 GB -> 8.5 GB   (-918 MB / 946 MB)  |    4369 -> 5062       (+693 / +2483)   / 240000  |
|     1411 |   14.0 |   466 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.0 MB -> 6.0 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 495000  |
|     1412 |   14.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.0 MB -> 6.0 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1421 |   14.0 |   931 GB |    19 GB -> 21 GB     (2.0 GB / 6.8 GB)  |    19 GB -> 21 GB     (2.1 GB / 7.0 GB)  |   10670 -> 12695     (+2025 / +10814)  / 975000  |
|     1422 |   14.0 |   931 GB |    19 GB -> 21 GB     (1.6 GB / 7.4 GB)  |    20 GB -> 21 GB     (1.7 GB / 7.7 GB)  |   10653 -> 12702     (+2049 / +10414)  / 975000  |
|     1423 |   14.0 |   931 GB |    19 GB -> 21 GB     (2.0 GB / 7.4 GB)  |    19 GB -> 21 GB     (2.1 GB / 7.8 GB)  |   10715 -> 12683     (+1968 / +10418)  / 975000  |
|     1424 |   14.0 |   931 GB |    18 GB -> 21 GB     (2.2 GB / 8.0 GB)  |    19 GB -> 21 GB     (2.3 GB / 8.3 GB)  |   10723 -> 12824     (+2101 / +9573)   / 975000  |
|     1425 |   14.0 |   931 GB |    19 GB -> 21 GB     (1.3 GB / 5.8 GB)  |    20 GB -> 21 GB     (1.4 GB / 6.1 GB)  |   10702 -> 12686     (+1984 / +10231)  / 975000  |
|     1426 |   14.0 |   931 GB |    20 GB -> 21 GB     (1.0 GB / 6.5 GB)  |    20 GB -> 21 GB     (1.2 GB / 6.8 GB)  |   10737 -> 12650     (+1913 / +10974)  / 975000  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|       45 |    4.0 |    29 TB |   652 GB -> 653 GB    (1.2 GB / 173 GB)  |   686 GB -> 687 GB    (1.2 GB / 180 GB)  |  412818 -> 412818       (+0 / +288439) / 30885000 |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This time the total amount of data to be moved is 180GB. It’s possible to have a difference of an order of magnitude in the total data to be moved between -c 0 and -c 10. Usually best results are achieved by using the -F directly with rare occasions requiring full re-balancing (i.e. no -F and higher -c values)

Balancer tool output

Here’s an example of the output of the balancer tool, in non-verbose mode:

 -== BEFORE BALANCE ==-
 shards with decreased redundancy 0 (0, 0, 0)
 server constraint violations 0
 stripe constraint violations 6652
 placement group violations 1250
 pg hdd score 0.6551, objectsScore 0.0269
 pg ssd score 0.6824, objectsScore 0.0280
 pg hdd estFree  45T
 pg ssd estFree  19T
 Constraint violations detected, doing a replication-restore update first
 server constraint violations 0
 stripe constraint violations 7031
 placement group violations 0
 -== POST BALANCE ==-
 shards with decreased redundancy 0 (0, 0, 0)
 server constraint violations 0
 stripe constraint violations 6592
 placement group violations 0
 moves 14387, (1864GiB) (tail ssd 14387)
 pg hdd score 0.6551, objectsScore 0.0269, maxDataToSingleDrive 33 GiB
 pg ssd score 0.6939, objectsScore 0.0285, maxDataToSingleDrive 76 GiB
 pg hdd estFree  47T
 pg ssd estFree  19T

The run of the balancer tool has multiple steps.

First, it shows the current state of the system (lines 2-8):

Shards (volume pieces) with decreased redundancy.
Server constraint violations means that there are pieces of data which which have two or more of their copies on the same server. This is an error condition.
“stripe constraint violation” means that specific pieces of data are not optimally striped on the drives of a specific server. This is NOT an error condition.
“placement group violations” means there is an error condition.
Lines 6 and 7 show the current average “score” (usage in %) of the placement groups, for data and objects;
Lines 8 and 9 show the estimated free space for the placement groups.

Then, in this run it has detected problems (in this case - placement group violations, which in most cases is a missing drive) and has done a pre-run to correct the redundancy (line 10, then again has printed on lines 11-13 the state).

And last, it runs the balancing, and reports the results. The main difference here is that for the placement groups it also reports the maximum data that will be added to a drive. As the balancing happens in parallel on all drives, this is a handy measure to see how long the balance would be (in comparison with a different balancing which might not add that much data to a single drive).

Errors from the `balancer` tool

If the balancer tool doesn’t complete successfully, its output MUST be examined and the root cause fixed.

Miscellaneous

If for any reason the currently running rebalancing operation needs to be paused, it can be done via storpool relocator off. In such cases StorPool Support should also be contacted, as this shouldn’t need to happen. Re-enabling it is done via storpool relocator on.

Rebalancing the cluster

Overview

Rebalancing procedure

Options

Restoring volume redundancy on a failed drive

Restoring volume redundancy for two failed drives (single-copy situation)

Adding new drives and rebalancing data on them

Restoring volume redundancy with rebalancing data on other placementGroup

Decommissioning a live node

Decommissioning a dead node

Resolving imbalances in the drive usage

Resolving imbalances in the drive usage with three-node clusters

Reverting balancer to a previous state

Reading the output of storpool balancer disks

Balancer tool output

Errors from the balancer tool

Miscellaneous

Reading the output of `storpool balancer disks`

Errors from the `balancer` tool