Rebalancing the cluster

Overview

In some situations the data in the StorPool cluster needs to be rebalanced. This is performed by the balancer and the relocator tools. The relocator is an integral part of the StorPool management service, the balancer is presently an external tool available and executed on some of the nodes with access to the API.

Note

Be advised that he balancer tool will create some files it needs in the present working directory.

Rebalancing procedure

The rebalancing operation is performed in the following steps:

  • The balancer tool is executed to calculate the new state of the cluster.

  • The results from the balancer are verified by set of automated scripts.

  • The results are also manually reviewed to check whether they contain any inconsistencies and whether they achieve the intended goals. These results are available by running storpool balancer disks and will be printed at the end of balancer.sh

    If the result is not satisfactory, the balancer is executed with different parameters, until a satisfactory result is obtained.

  • Once the proposed end result is satisfactory, the calculated state is loaded into the relocator tool, by doing storpool balancer commit.

    Note that this step can be reversed only with the --restore-state option, which will revert to the initial state. If a balancing operation has ran for a while and for some reason it needs to be “cancelled”, currently that’s not supported.

  • The relocator tool performs the actual move of the data.

    The progress of the relocator tool can be monitored by storpool task list for the currently running tasks, storpool relocator status for an overview of the relocator state and storpool relocator disks (warning: slow command) for the full relocation state.

Options

The balancer tool is executed via the /usr/lib/storpool/balancer.sh wrapper and accepts the following options:

-A

Don’t only move data from fuller to emptier drives.

-b placementGroup

Use disks in the specified placement group to restore replication in critical conditions.

-c factor

Factor for how much data to try to move around, from 0 to 10. The default is 0, required parameter. In most cases -c is not needed. The main use case with -c 10 is for 3-node clusters, where data needs to be “rotated” through the cluster.

-d diskId [-d diskId]

Put data only on the selected disks.

-D diskId [-D diskId]

Don’t move data from those disks.

--do-whatever-you-can

Decrease the redundancy level.

Warning

The --do-whatever-you-can option is for emergency use only, after the balancer has failed!

-E 0-99

Don’t empty if below, in percents

--empty-down-disks

Proceed with balancing even when there are down disks, and remove all data from them.

-f percent

Allow drives to be filled up to this percentage, from 0 to 99. Default 90.

-F

Only move data from fuller to emptier drives.

-g placementGroup

Work only on the specified placement group.

--ignore-down-disks

Proceed with balancing even when there are down disks, and do not remove data from them.

--ignore-src-pg-violations

Exactly what it says

-m maxAgCount

Limit the maximum allocation group count on drives to this (effectively their usable size).

-M maxDataToAdd

Limit the amount of data to copy to a single drive, to be able to rebalance “in pieces”.

--max-disbalance-before-striping X

In percents.

--min-disk-full X

Don’t remove data from disk if it is not at least this X% full.

--min-replication R

Minimum replication required.

-o overridesPgName

Specify override placement group name (required only if override template is not created). .. no link here to the document for overrides since it got internal in 2024

--only-empty-disk diskId

Like -D for all other disks.

-R

Only restore replication for degraded volumes.

--restore-state

Revert to the initial state of the disks (before the balancer commit execution).

-S

Prefer tail SSD.

-V vagId [-V vagId]

Skip balancing vagId.

-v

Verbose output. Shows data how all drives in the cluster would be affected according to the balancer. This differs from the later output from storpool balancer disks, which is the point of view of storpool_mgmt, as that also takes into account all currently loaded relocations.

-A and -F are the reverse of each other and mutually exclusive.

The -c value is basically the trade-off between the uniformity of the data on the disks and the amount of data moved to accomplish that. A lower factor means less data to be moved around, but sometimes more inequality between the data on the disks, a higher one - more data to be moved, but sometimes with a better result in terms of equality of the amount of data for each drive.

On clusters with drives with unsupported size (HDDs > 4TB) the -m option is required. It will limit the data moved onto these drives to up to the set number of allocation groups. This is done as the performance per TB space of larger drives is lower, and it degrades the performance for the whole cluster for high performance use cases.

The -M option is useful when a full rebalancing would involve many tasks until completed and could impact other operations (such as remote transfers, or the time required for a currently running recovery to complete). With the -M option the amount of data loaded by the balancer for each disk may be reduced, and a more rebalanced state is achieved through several smaller rebalancing operations.

The -f option is required on clusters whose drives are full above 90%. Extreme care should be used when balancing in such cases.

The -b option could be used to move data between placementGroups (in most cases from SSDs to HDDs).

Restoring volume redundancy on a failed drive

Situation: we have lost drive 1802 in placementGroup ssd. We want to remove it from the cluster and restore the redundancy of the data. We need to do the following:

storpool disk 1802 forget                               # this will also remove the drive from all placement groups it participated in
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -R
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Restoring volume redundancy for two failed drives (single-copy situation)

(Emergency) Situation: we have lost drives 1802 and 1902 in placementGroup ssd. We want to remove them from the cluster and restore the redundancy of the data. We need to do the following:

storpool disk 1802 forget                               # this will also remove the drive from all placement groups it participated in
storpool disk 1902 forget                               # this will also remove the drive from all placement groups it participated in
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -F --min-replication 2    # first balancing run, to create a second copy of the data
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation
# wait for the balancing to finish

/usr/lib/storpool/balancer.sh -R                        # second balancing run, to restore full redundancy
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Adding new drives and rebalancing data on them

Situation: we have added SSDs 1201, 1202 and HDDs 1510, 1511, that need to go into placement groups ssd and hdd respectively, and we want to re-balance the cluster data so that it is re-dispersed onto the new disks as well. We have no other placement groups in the cluster.

storpool placementGroup ssd addDisk 1201 addDisk 1202
storpool placementGroup hdd addDisk 1510 addDisk 1511
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -F -c 0                   # rebalance all placement groups, move data from fuller to emptier drives
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Restoring volume redundancy with rebalancing data on other placementGroup

Situation: we have to restore the redundancy of a hybrid cluster (2 copies on HDDs, one on SSDs) while the ssd placementGroup is out of free space because a few SSDs have recently failed. We can’t replace the failed drives with new ones for the moment.

mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -R -c 0 -b hdd            # use placementGroup ``hdd`` as a backup and move some data from SSDs
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Note

The -f argument could be further used in order to instruct the balancer how full to keep the cluster and thus control how much data will be moved in the backup placement group.

Decommissioning a live node

Situation: a node in the cluster needs to be decommissioned, so that the data on its drives needs to be moved away. The drive numbers on that node are 101, 102 and 103.

Note

You have to make sure you have enough space to restore the redundancy before proceeding.

storpool disk 101 softEject                             # mark all drives for evacuation
storpool disk 102 softEject
storpool disk 103 softEject
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -R -c 0                   # rebalance all placement groups, -F has the same effect in this case
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Decommissioning a dead node

Situation: a node in the cluster needs to be decommissioned, as it has died and cannot be brought back. The drive numbers on that node are 101, 102 and 103.

Note

You have to make sure you have enough space to restore the redundancy before proceeding.

storpool disk 101 forget                                # remove the drives from all placement groups
storpool disk 102 forget
storpool disk 103 forget
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -R -c 0                   # rebalance all placement groups
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Tip

Alternatively, you can try adding the disks from the dead node into another live node in the cluster, and then running another re-balance operation.

Resolving imbalances in the drive usage

Situation: we have an imbalance in the drive usage in the whole cluster and we want to improve it.

mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -F -c 0                   # rebalance all placement groups
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Resolving imbalances in the drive usage with three-node clusters

Situation: we have an imbalance in the drive usage in the whole cluster and we want to improve it. We have a three-node hybrid cluster and proper balancing requires larger moves of “unrelated” data:

mkdir -p ~/storpool/balancer && cd ~/storpool/balancer  # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -F -c 0                   # rebalance all placement groups
/usr/lib/storpool/balancer.sh -A -c 10                  # retry to see if we get a better result with more data movements
storpool balancer commit                                # to actually load the data into the relocator and start the re-balancing operation

Reverting balancer to a previous state

Situation: we have committed a rebalancing operation, but want to revert back to the previous state:

cd ~/storpool/balancer                                             # it's recommended to run the following commands in a screen/tmux session
ls                                                                 # list all saved states and choose what to revert to
/usr/lib/storpool/balancer.sh --restore-state 2022-10-28-15-39-40  # revert to 2022-10-28-15-39-40
storpool balancer commit                                           # to actually load the data into the relocator and start the re-balancing operation

Reading the output of storpool balancer disks

Here is an example output from storpool balancer disks:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|     disk | server |   size   |                  stored                  |                 on-disk                  |                     objects                      |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|        1 |   14.0 |   373 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.1 MB -> 6.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 405000  |
|     1101 |   11.0 |   447 GB |    16 GB -> 15 GB    (-1.0 GB / 1.4 GB)  |    18 GB -> 17 GB    (-1.1 GB / 1.4 GB)  |   11798 -> 10040     (-1758 / +3932)   / 480000  |
|     1102 |   11.0 |   447 GB |    16 GB -> 15 GB    (-268 MB / 1.3 GB)  |    17 GB -> 17 GB    (-301 MB / 1.4 GB)  |   10843 -> 10045      (-798 / +4486)   / 480000  |
|     1103 |   11.0 |   447 GB |    16 GB -> 15 GB    (-1.0 GB / 1.8 GB)  |    18 GB -> 16 GB    (-1.2 GB / 1.9 GB)  |   12123 -> 10039     (-2084 / +3889)   / 480000  |
|     1104 |   11.0 |   447 GB |    16 GB -> 15 GB    (-757 MB / 1.3 GB)  |    17 GB -> 16 GB    (-899 MB / 1.3 GB)  |   11045 -> 10072      (-973 / +4279)   / 480000  |
|     1111 |   11.0 |   466 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   5.1 MB -> 5.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 495000  |
|     1112 |   11.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   5.1 MB -> 5.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1121 |   11.0 |   931 GB |    22 GB -> 21 GB    (-1009 MB / 830 MB)  |    22 GB -> 21 GB    (-1.0 GB / 872 MB)  |   13713 -> 12698     (-1015 / +3799)   / 975000  |
|     1122 |   11.0 |   931 GB |    21 GB -> 21 GB    (-373 MB / 2.0 GB)  |    22 GB -> 21 GB    (-379 MB / 2.0 GB)  |   13469 -> 12742      (-727 / +3801)   / 975000  |
|     1123 |   11.0 |   931 GB |    22 GB -> 21 GB    (-1.1 GB / 1.9 GB)  |    22 GB -> 21 GB    (-1.1 GB / 2.0 GB)  |   14859 -> 12629     (-2230 / +4102)   / 975000  |
|     1124 |   11.0 |   931 GB |    21 GB -> 21 GB      (36 MB / 1.8 GB)  |    21 GB -> 21 GB      (92 MB / 1.9 GB)  |   13806 -> 12743     (-1063 / +3389)   / 975000  |
|     1201 |   12.0 |   447 GB |    18 GB -> 15 GB    (-2.9 GB / 633 MB)  |    19 GB -> 16 GB    (-3.0 GB / 658 MB)  |   14148 -> 10070     (-4078 / +3050)   / 480000  |
|     1202 |   12.0 |   447 GB |    17 GB -> 15 GB    (-2.1 GB / 787 MB)  |    19 GB -> 16 GB    (-2.3 GB / 815 MB)  |   13243 -> 10067     (-3176 / +2576)   / 480000  |
|     1203 |   12.0 |   447 GB |    17 GB -> 15 GB    (-2.0 GB / 3.3 GB)  |    19 GB -> 16 GB    (-2.4 GB / 3.5 GB)  |   12746 -> 10062     (-2684 / +3375)   / 480000  |
|     1204 |   12.0 |   447 GB |    18 GB -> 15 GB    (-2.7 GB / 1.1 GB)  |    19 GB -> 16 GB    (-2.9 GB / 1.1 GB)  |   12835 -> 10075     (-2760 / +3248)   / 480000  |
|     1212 |   12.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.0 MB -> 6.0 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1221 |   12.0 |   931 GB |    20 GB -> 21 GB     (569 MB / 1.5 GB)  |    21 GB -> 21 GB     (587 MB / 1.6 GB)  |   13115 -> 12616      (-499 / +3736)   / 975000  |
|     1222 |   12.0 |   931 GB |    22 GB -> 21 GB    (-979 MB / 307 MB)  |    22 GB -> 21 GB    (-1013 MB / 317 MB)  |   12938 -> 12697      (-241 / +3291)   / 975000  |
|     1223 |   12.0 |   931 GB |    22 GB -> 21 GB    (-1.1 GB / 781 MB)  |    22 GB -> 21 GB    (-1.2 GB / 812 MB)  |   13968 -> 12718     (-1250 / +3302)   / 975000  |
|     1224 |   12.0 |   931 GB |    21 GB -> 21 GB    (-784 MB / 332 MB)  |    22 GB -> 21 GB    (-810 MB / 342 MB)  |   13741 -> 12692     (-1049 / +3314)   / 975000  |
|     1225 |   12.0 |   931 GB |    21 GB -> 21 GB    (-681 MB / 849 MB)  |    22 GB -> 21 GB    (-701 MB / 882 MB)  |   13608 -> 12748      (-860 / +3420)   / 975000  |
|     1226 |   12.0 |   931 GB |    22 GB -> 21 GB    (-1.1 GB / 825 MB)  |    22 GB -> 21 GB    (-1.1 GB / 853 MB)  |   13066 -> 12692      (-374 / +3817)   / 975000  |
|     1301 |   13.0 |   447 GB |    13 GB -> 15 GB     (2.6 GB / 4.2 GB)  |    14 GB -> 17 GB     (2.7 GB / 4.4 GB)  |    7244 -> 10038     (+2794 / +6186)   / 480000  |
|     1302 |   13.0 |   447 GB |    12 GB -> 15 GB     (3.0 GB / 3.7 GB)  |    13 GB -> 17 GB     (3.1 GB / 3.9 GB)  |    7507 -> 10063     (+2556 / +5619)   / 480000  |
|     1303 |   13.0 |   447 GB |    14 GB -> 15 GB     (1.3 GB / 3.2 GB)  |    15 GB -> 17 GB     (1.3 GB / 3.4 GB)  |    7888 -> 10038     (+2150 / +5884)   / 480000  |
|     1304 |   13.0 |   447 GB |    13 GB -> 15 GB     (2.7 GB / 3.7 GB)  |    14 GB -> 17 GB     (2.8 GB / 3.9 GB)  |    7660 -> 10045     (+2385 / +5870)   / 480000  |
|     1311 |   13.0 |   466 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.1 MB -> 6.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 495000  |
|     1312 |   13.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.1 MB -> 6.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1321 |   13.0 |   931 GB |    21 GB -> 21 GB    (-193 MB / 1.1 GB)  |    21 GB -> 21 GB    (-195 MB / 1.2 GB)  |   13365 -> 12765      (-600 / +5122)   / 975000  |
|     1322 |   13.0 |   931 GB |    22 GB -> 21 GB    (-1.4 GB / 1.1 GB)  |    23 GB -> 21 GB    (-1.4 GB / 1.1 GB)  |   12749 -> 12739       (-10 / +4651)   / 975000  |
|     1323 |   13.0 |   931 GB |    21 GB -> 21 GB    (-504 MB / 2.2 GB)  |    22 GB -> 21 GB    (-496 MB / 2.3 GB)  |   13386 -> 12695      (-691 / +4583)   / 975000  |
|     1325 |   13.0 |   931 GB |    21 GB -> 20 GB    (-698 MB / 557 MB)  |    22 GB -> 21 GB    (-717 MB / 584 MB)  |   13113 -> 12768      (-345 / +2668)   / 975000  |
|     1326 |   13.0 |   931 GB |    21 GB -> 21 GB    (-507 MB / 724 MB)  |    22 GB -> 21 GB    (-522 MB / 754 MB)  |   13690 -> 12704      (-986 / +3327)   / 975000  |
|     1401 |   14.0 |   223 GB |   8.3 GB -> 7.6 GB   (-666 MB / 868 MB)  |   9.3 GB -> 8.5 GB   (-781 MB / 901 MB)  |    3470 -> 5043      (+1573 / +2830)   / 240000  |
|     1402 |   14.0 |   447 GB |   9.8 GB -> 15 GB     (5.6 GB / 5.7 GB)  |    11 GB -> 17 GB     (5.8 GB / 6.0 GB)  |    4358 -> 10060     (+5702 / +6667)   / 480000  |
|     1403 |   14.0 |   224 GB |   8.2 GB -> 7.6 GB   (-623 MB / 1.1 GB)  |   9.3 GB -> 8.6 GB   (-710 MB / 1.2 GB)  |    4547 -> 5036       (+489 / +2814)   / 240000  |
|     1404 |   14.0 |   224 GB |   8.4 GB -> 7.6 GB   (-773 MB / 1.5 GB)  |   9.4 GB -> 8.5 GB   (-970 MB / 1.6 GB)  |    4369 -> 5031       (+662 / +2368)   / 240000  |
|     1411 |   14.0 |   466 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.0 MB -> 6.0 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 495000  |
|     1412 |   14.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.0 MB -> 6.0 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1421 |   14.0 |   931 GB |    19 GB -> 21 GB     (1.9 GB / 2.6 GB)  |    19 GB -> 21 GB     (2.0 GB / 2.7 GB)  |   10670 -> 12624     (+1954 / +6196)   / 975000  |
|     1422 |   14.0 |   931 GB |    19 GB -> 21 GB     (1.6 GB / 3.2 GB)  |    20 GB -> 21 GB     (1.6 GB / 3.3 GB)  |   10653 -> 12844     (+2191 / +6919)   / 975000  |
|     1423 |   14.0 |   931 GB |    19 GB -> 21 GB     (1.9 GB / 2.5 GB)  |    19 GB -> 21 GB     (2.0 GB / 2.6 GB)  |   10715 -> 12688     (+1973 / +5846)   / 975000  |
|     1424 |   14.0 |   931 GB |    18 GB -> 20 GB     (2.2 GB / 2.9 GB)  |    19 GB -> 21 GB     (2.3 GB / 3.0 GB)  |   10723 -> 12686     (+1963 / +5505)   / 975000  |
|     1425 |   14.0 |   931 GB |    19 GB -> 21 GB     (1.3 GB / 2.5 GB)  |    20 GB -> 21 GB     (1.4 GB / 2.6 GB)  |   10702 -> 12689     (+1987 / +5486)   / 975000  |
|     1426 |   14.0 |   931 GB |    20 GB -> 21 GB     (1.0 GB / 2.5 GB)  |    20 GB -> 21 GB     (1.0 GB / 2.6 GB)  |   10737 -> 12609     (+1872 / +5771)   / 975000  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|       45 |    4.0 |    29 TB |   652 GB -> 652 GB    (512 MB / 69 GB)   |   686 GB -> 685 GB   (-240 MB / 72 GB)   |  412818 -> 412818       (+0 / +159118) / 30885000 |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Let’s start with the last line. Here’s the meaning, field by field:

  • There are 45 drives in total.

  • There are 4 server instances.

  • The total disk capacity is 29 TB.

  • The stored data is 652 GB and will change to 652 GB. The total change for all drives afterwards is 512 MB, and the total amount of changes for the drives is 69 GB (i.e. how much will they “recover” from other drives).

  • The same is repeated for the on-disk size. Here the total amount of changes is roughly the amount of data that would need to be copied.

  • The total current number of objects will not change (i.e. from 412818 to 412818), 0 new objects will be created, the total amount of objects to be moved is 159118, and the total number of possible objects in the cluster is 30885000.

The difference between “stored” and “on-disk” size is that in the latter also includes the size of checksums and metadata.

For the rest of the lines, the data is basically the same, just per disk.

What needs to be taken into account is:

  • Are there drives that will have too much data on them? Here, both data size and objects must be checked, and they should be close to the average percentage for the placement group.

  • Is the data stored on the drives balanced, i.e. are all the drives’ usages close to the average?

  • Are there drives that should have data on them, but nothing is scheduled to be moved?

    This usually happens because a drive wasn’t added to the right placement group.

  • Will there be too much data to be moved?

To illustrate the difference of amount to be moved, here is the output of storpool balancer disks from a run with -c 10:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|     disk | server |   size   |                  stored                  |                 on-disk                  |                     objects                      |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|        1 |   14.0 |   373 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.1 MB -> 6.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 405000  |
|     1101 |   11.0 |   447 GB |    16 GB -> 15 GB    (-1.0 GB / 1.7 GB)  |    18 GB -> 17 GB    (-1.1 GB / 1.7 GB)  |   11798 -> 10027     (-1771 / +5434)   / 480000  |
|     1102 |   11.0 |   447 GB |    16 GB -> 15 GB    (-263 MB / 1.7 GB)  |    17 GB -> 17 GB    (-298 MB / 1.7 GB)  |   10843 -> 10000      (-843 / +5420)   / 480000  |
|     1103 |   11.0 |   447 GB |    16 GB -> 15 GB    (-1.0 GB / 3.6 GB)  |    18 GB -> 16 GB    (-1.2 GB / 3.8 GB)  |   12123 -> 10005     (-2118 / +6331)   / 480000  |
|     1104 |   11.0 |   447 GB |    16 GB -> 15 GB    (-752 MB / 2.7 GB)  |    17 GB -> 16 GB    (-907 MB / 2.8 GB)  |   11045 -> 10098      (-947 / +5214)   / 480000  |
|     1111 |   11.0 |   466 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   5.1 MB -> 5.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 495000  |
|     1112 |   11.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   5.1 MB -> 5.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1121 |   11.0 |   931 GB |    22 GB -> 21 GB    (-1003 MB / 6.4 GB)  |    22 GB -> 21 GB    (-1018 MB / 6.7 GB)  |   13713 -> 12742      (-971 / +9712)   / 975000  |
|     1122 |   11.0 |   931 GB |    21 GB -> 21 GB    (-368 MB / 5.8 GB)  |    22 GB -> 21 GB    (-272 MB / 6.1 GB)  |   13469 -> 12718      (-751 / +8929)   / 975000  |
|     1123 |   11.0 |   931 GB |    22 GB -> 21 GB    (-1.1 GB / 5.9 GB)  |    22 GB -> 21 GB    (-1.1 GB / 6.1 GB)  |   14859 -> 12699     (-2160 / +8992)   / 975000  |
|     1124 |   11.0 |   931 GB |    21 GB -> 21 GB      (57 MB / 7.4 GB)  |    21 GB -> 21 GB     (113 MB / 7.7 GB)  |   13806 -> 12697     (-1109 / +9535)   / 975000  |
|     1201 |   12.0 |   447 GB |    18 GB -> 15 GB    (-2.8 GB / 1.2 GB)  |    19 GB -> 17 GB    (-3.0 GB / 1.2 GB)  |   14148 -> 10033     (-4115 / +4853)   / 480000  |
|     1202 |   12.0 |   447 GB |    17 GB -> 15 GB    (-2.0 GB / 1.6 GB)  |    19 GB -> 16 GB    (-2.2 GB / 1.7 GB)  |   13243 -> 10055     (-3188 / +4660)   / 480000  |
|     1203 |   12.0 |   447 GB |    17 GB -> 15 GB    (-2.0 GB / 2.3 GB)  |    19 GB -> 16 GB    (-2.3 GB / 2.4 GB)  |   12746 -> 10070     (-2676 / +4682)   / 480000  |
|     1204 |   12.0 |   447 GB |    18 GB -> 15 GB    (-2.7 GB / 2.1 GB)  |    19 GB -> 16 GB    (-2.8 GB / 2.2 GB)  |   12835 -> 10110     (-2725 / +5511)   / 480000  |
|     1212 |   12.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.0 MB -> 6.0 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1221 |   12.0 |   931 GB |    20 GB -> 21 GB     (620 MB / 6.3 GB)  |    21 GB -> 21 GB     (805 MB / 6.7 GB)  |   13115 -> 12542      (-573 / +9389)   / 975000  |
|     1222 |   12.0 |   931 GB |    22 GB -> 21 GB    (-981 MB / 2.9 GB)  |    22 GB -> 21 GB    (-1004 MB / 3.0 GB)  |   12938 -> 12793      (-145 / +8795)   / 975000  |
|     1223 |   12.0 |   931 GB |    22 GB -> 21 GB    (-1.1 GB / 5.9 GB)  |    22 GB -> 21 GB    (-1.1 GB / 6.1 GB)  |   13968 -> 12698     (-1270 / +10094)  / 975000  |
|     1224 |   12.0 |   931 GB |    21 GB -> 21 GB    (-791 MB / 4.5 GB)  |    22 GB -> 21 GB    (-758 MB / 4.7 GB)  |   13741 -> 12684     (-1057 / +8616)   / 975000  |
|     1225 |   12.0 |   931 GB |    21 GB -> 21 GB    (-671 MB / 4.8 GB)  |    22 GB -> 21 GB    (-677 MB / 4.9 GB)  |   13608 -> 12690      (-918 / +8559)   / 975000  |
|     1226 |   12.0 |   931 GB |    22 GB -> 21 GB    (-1.1 GB / 6.2 GB)  |    22 GB -> 21 GB    (-1.1 GB / 6.4 GB)  |   13066 -> 12737      (-329 / +9386)   / 975000  |
|     1301 |   13.0 |   447 GB |    13 GB -> 15 GB     (2.6 GB / 4.5 GB)  |    14 GB -> 17 GB     (2.7 GB / 4.6 GB)  |    7244 -> 10077     (+2833 / +6714)   / 480000  |
|     1302 |   13.0 |   447 GB |    12 GB -> 15 GB     (3.0 GB / 4.9 GB)  |    13 GB -> 17 GB     (3.2 GB / 5.2 GB)  |    7507 -> 10056     (+2549 / +7011)   / 480000  |
|     1303 |   13.0 |   447 GB |    14 GB -> 15 GB     (1.3 GB / 3.2 GB)  |    15 GB -> 17 GB     (1.3 GB / 3.3 GB)  |    7888 -> 10020     (+2132 / +6926)   / 480000  |
|     1304 |   13.0 |   447 GB |    13 GB -> 15 GB     (2.7 GB / 4.7 GB)  |    14 GB -> 17 GB     (2.8 GB / 4.9 GB)  |    7660 -> 10075     (+2415 / +7049)   / 480000  |
|     1311 |   13.0 |   466 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.1 MB -> 6.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 495000  |
|     1312 |   13.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.1 MB -> 6.1 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1321 |   13.0 |   931 GB |    21 GB -> 21 GB    (-200 MB / 4.1 GB)  |    21 GB -> 21 GB    (-192 MB / 4.3 GB)  |   13365 -> 12690      (-675 / +9527)   / 975000  |
|     1322 |   13.0 |   931 GB |    22 GB -> 21 GB    (-1.3 GB / 6.9 GB)  |    23 GB -> 21 GB    (-1.3 GB / 7.2 GB)  |   12749 -> 12698       (-51 / +10047)  / 975000  |
|     1323 |   13.0 |   931 GB |    21 GB -> 21 GB    (-495 MB / 6.1 GB)  |    22 GB -> 21 GB    (-504 MB / 6.3 GB)  |   13386 -> 12693      (-693 / +9524)   / 975000  |
|     1325 |   13.0 |   931 GB |    21 GB -> 21 GB    (-620 MB / 6.6 GB)  |    22 GB -> 21 GB    (-612 MB / 6.9 GB)  |   13113 -> 12768      (-345 / +9942)   / 975000  |
|     1326 |   13.0 |   931 GB |    21 GB -> 21 GB    (-498 MB / 7.1 GB)  |    22 GB -> 21 GB    (-414 MB / 7.4 GB)  |   13690 -> 12697      (-993 / +9759)   / 975000  |
|     1401 |   14.0 |   223 GB |   8.3 GB -> 7.6 GB   (-670 MB / 950 MB)  |   9.3 GB -> 8.5 GB   (-789 MB / 993 MB)  |    3470 -> 5061      (+1591 / +3262)   / 240000  |
|     1402 |   14.0 |   447 GB |   9.8 GB -> 15 GB     (5.6 GB / 7.1 GB)  |    11 GB -> 17 GB     (5.8 GB / 7.5 GB)  |    4358 -> 10052     (+5694 / +7092)   / 480000  |
|     1403 |   14.0 |   224 GB |   8.2 GB -> 7.6 GB   (-619 MB / 730 MB)  |   9.3 GB -> 8.5 GB   (-758 MB / 759 MB)  |    4547 -> 5023       (+476 / +2567)   / 240000  |
|     1404 |   14.0 |   224 GB |   8.4 GB -> 7.6 GB   (-790 MB / 915 MB)  |   9.4 GB -> 8.5 GB   (-918 MB / 946 MB)  |    4369 -> 5062       (+693 / +2483)   / 240000  |
|     1411 |   14.0 |   466 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.0 MB -> 6.0 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 495000  |
|     1412 |   14.0 |   366 GB |   4.7 MB -> 4.7 MB      (0  B / 0  B)    |   6.0 MB -> 6.0 MB      (0  B / 0  B)    |      26 -> 26           (+0 / +0)      / 390000  |
|     1421 |   14.0 |   931 GB |    19 GB -> 21 GB     (2.0 GB / 6.8 GB)  |    19 GB -> 21 GB     (2.1 GB / 7.0 GB)  |   10670 -> 12695     (+2025 / +10814)  / 975000  |
|     1422 |   14.0 |   931 GB |    19 GB -> 21 GB     (1.6 GB / 7.4 GB)  |    20 GB -> 21 GB     (1.7 GB / 7.7 GB)  |   10653 -> 12702     (+2049 / +10414)  / 975000  |
|     1423 |   14.0 |   931 GB |    19 GB -> 21 GB     (2.0 GB / 7.4 GB)  |    19 GB -> 21 GB     (2.1 GB / 7.8 GB)  |   10715 -> 12683     (+1968 / +10418)  / 975000  |
|     1424 |   14.0 |   931 GB |    18 GB -> 21 GB     (2.2 GB / 8.0 GB)  |    19 GB -> 21 GB     (2.3 GB / 8.3 GB)  |   10723 -> 12824     (+2101 / +9573)   / 975000  |
|     1425 |   14.0 |   931 GB |    19 GB -> 21 GB     (1.3 GB / 5.8 GB)  |    20 GB -> 21 GB     (1.4 GB / 6.1 GB)  |   10702 -> 12686     (+1984 / +10231)  / 975000  |
|     1426 |   14.0 |   931 GB |    20 GB -> 21 GB     (1.0 GB / 6.5 GB)  |    20 GB -> 21 GB     (1.2 GB / 6.8 GB)  |   10737 -> 12650     (+1913 / +10974)  / 975000  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|       45 |    4.0 |    29 TB |   652 GB -> 653 GB    (1.2 GB / 173 GB)  |   686 GB -> 687 GB    (1.2 GB / 180 GB)  |  412818 -> 412818       (+0 / +288439) / 30885000 |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This time the total amount of data to be moved is 180GB. It’s possible to have a difference of an order of magnitude in the total data to be moved between -c 0 and -c 10. Usually best results are achieved by using the -F directly with rare occasions requiring full re-balancing (i.e. no -F and higher -c values)

Balancer tool output

Here’s an example of the output of the balancer tool, in non-verbose mode:

 1 -== BEFORE BALANCE ==-
 2 shards with decreased redundancy 0 (0, 0, 0)
 3 server constraint violations 0
 4 stripe constraint violations 6652
 5 placement group violations 1250
 6 pg hdd score 0.6551, objectsScore 0.0269
 7 pg ssd score 0.6824, objectsScore 0.0280
 8 pg hdd estFree  45T
 9 pg ssd estFree  19T
10 Constraint violations detected, doing a replication-restore update first
11 server constraint violations 0
12 stripe constraint violations 7031
13 placement group violations 0
14 -== POST BALANCE ==-
15 shards with decreased redundancy 0 (0, 0, 0)
16 server constraint violations 0
17 stripe constraint violations 6592
18 placement group violations 0
19 moves 14387, (1864GiB) (tail ssd 14387)
20 pg hdd score 0.6551, objectsScore 0.0269, maxDataToSingleDrive 33 GiB
21 pg ssd score 0.6939, objectsScore 0.0285, maxDataToSingleDrive 76 GiB
22 pg hdd estFree  47T
23 pg ssd estFree  19T

The run of the balancer tool has multiple steps.

First, it shows the current state of the system (lines 2-8):

  • Shards (volume pieces) with decreased redundancy.

  • Server constraint violations means that there are pieces of data which which have two or more of their copies on the same server. This is an error condition.

  • “stripe constraint violation” means that specific pieces of data are not optimally striped on the drives of a specific server. This is NOT an error condition.

  • “placement group violations” means there is an error condition.

  • Lines 6 and 7 show the current average “score” (usage in %) of the placement groups, for data and objects;

  • Lines 8 and 9 show the estimated free space for the placement groups.

Then, in this run it has detected problems (in this case - placement group violations, which in most cases is a missing drive) and has done a pre-run to correct the redundancy (line 10, then again has printed on lines 11-13 the state).

And last, it runs the balancing, and reports the results. The main difference here is that for the placement groups it also reports the maximum data that will be added to a drive. As the balancing happens in parallel on all drives, this is a handy measure to see how long the balance would be (in comparison with a different balancing which might not add that much data to a single drive).

Errors from the balancer tool

If the balancer tool doesn’t complete successfully, its output MUST be examined and the root cause fixed.

Miscellaneous

If for any reason the currently running rebalancing operation needs to be paused, it can be done via storpool relocator off. In such cases StorPool Support should also be contacted, as this shouldn’t need to happen. Re-enabling it is done via storpool relocator on.