Rebalancing the cluster
Overview
In some situations the data in the StorPool cluster needs to be rebalanced. This
is performed by the balancer
and the relocator
tools. The relocator
is an integral part of the StorPool management service, the balancer
is
presently an external tool available and executed on some of the nodes with
access to the API.
Note
Be advised that he balancer tool will create some files it needs in the present working directory.
Rebalancing procedure
The rebalancing operation is performed in the following steps:
The
balancer
tool is executed to calculate the new state of the cluster.The results from the balancer are verified by set of automated scripts.
The results are also manually reviewed to check whether they contain any inconsistencies and whether they achieve the intended goals. These results are available by running
storpool balancer disks
and will be printed at the end ofbalancer.sh
If the result is not satisfactory, the
balancer
is executed with different parameters, until a satisfactory result is obtained.Once the proposed end result is satisfactory, the calculated state is loaded into the
relocator
tool, by doingstorpool balancer commit
.Note that this step can be reversed only with the
--restore-state
option, which will revert to the initial state. If a balancing operation has ran for a while and for some reason it needs to be “cancelled”, currently that’s not supported.The
relocator
tool performs the actual move of the data.The progress of the
relocator
tool can be monitored bystorpool task list
for the currently running tasks,storpool relocator status
for an overview of therelocator
state andstorpool relocator disks
(warning: slow command) for the full relocation state.
Options
The balancer tool is executed via the /usr/lib/storpool/balancer.sh
wrapper
and accepts the following options:
- -A
Don’t only move data from fuller to emptier drives.
- -b placementGroup
Use disks in the specified placement group to restore replication in critical conditions.
- -c factor
Factor for how much data to try to move around, from 0 to 10. The default is 0, required parameter. In most cases
-c
is not needed. The main use case with-c 10
is for 3-node clusters, where data needs to be “rotated” through the cluster.
- -d diskId [-d diskId]
Put data only on the selected disks.
- -D diskId [-D diskId]
Don’t move data from those disks.
- --do-whatever-you-can
Decrease the redundancy level.
Warning
The
--do-whatever-you-can
option is for emergency use only, after the balancer has failed!
- -E 0-99
Don’t empty if below, in percents
- --empty-down-disks
Proceed with balancing even when there are down disks, and remove all data from them.
- -f percent
Allow drives to be filled up to this percentage, from 0 to 99. Default 90.
- -F
Only move data from fuller to emptier drives.
- -g placementGroup
Work only on the specified placement group.
- --ignore-down-disks
Proceed with balancing even when there are down disks, and do not remove data from them.
- --ignore-src-pg-violations
Exactly what it says
- -m maxAgCount
Limit the maximum allocation group count on drives to this (effectively their usable size).
- -M maxDataToAdd
Limit the amount of data to copy to a single drive, to be able to rebalance “in pieces”.
- --max-disbalance-before-striping X
In percents.
- --min-disk-full X
Don’t remove data from disk if it is not at least this X% full.
- --min-replication R
Minimum replication required.
- -o overridesPgName
Specify override placement group name (required only if
override
template is not created). .. no link here to the document for overrides since it got internal in 2024- --only-empty-disk diskId
Like -D for all other disks.
- -R
Only restore replication for degraded volumes.
- --restore-state
Revert to the initial state of the disks (before the balancer commit execution).
- -S
Prefer tail SSD.
- -V vagId [-V vagId]
Skip balancing vagId.
- -v
Verbose output. Shows data how all drives in the cluster would be affected according to the balancer. This differs from the later output from
storpool balancer disks
, which is the point of view ofstorpool_mgmt
, as that also takes into account all currently loaded relocations.
-A
and -F
are the reverse of each other and mutually exclusive.
The -c
value is basically the trade-off between the uniformity of the data
on the disks and the amount of data moved to accomplish that. A lower factor
means less data to be moved around, but sometimes more inequality between the
data on the disks, a higher one - more data to be moved, but sometimes with a
better result in terms of equality of the amount of data for each drive.
On clusters with drives with unsupported size (HDDs > 4TB) the -m
option is
required. It will limit the data moved onto these drives to up to the set number
of allocation groups. This is done as the performance per TB space of larger
drives is lower, and it degrades the performance for the whole cluster for high
performance use cases.
The -M
option is useful when a full rebalancing would involve many tasks
until completed and could impact other operations (such as remote transfers, or
the time required for a currently running recovery to complete). With the -M
option the amount of data loaded by the balancer for each disk may be reduced,
and a more rebalanced state is achieved through several smaller rebalancing
operations.
The -f
option is required on clusters whose drives are full above 90%.
Extreme care should be used when balancing in such cases.
The -b
option could be used to move data between placementGroups (in most
cases from SSDs to HDDs).
Restoring volume redundancy on a failed drive
Situation: we have lost drive 1802 in placementGroup ssd
. We want to remove
it from the cluster and restore the redundancy of the data. We need to do the
following:
storpool disk 1802 forget # this will also remove the drive from all placement groups it participated in
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -R
storpool balancer commit # to actually load the data into the relocator and start the re-balancing operation
Restoring volume redundancy for two failed drives (single-copy situation)
(Emergency) Situation: we have lost drives 1802 and 1902 in placementGroup
ssd
. We want to remove them from the cluster and restore the redundancy of
the data. We need to do the following:
storpool disk 1802 forget # this will also remove the drive from all placement groups it participated in
storpool disk 1902 forget # this will also remove the drive from all placement groups it participated in
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -F --min-replication 2 # first balancing run, to create a second copy of the data
storpool balancer commit # to actually load the data into the relocator and start the re-balancing operation
# wait for the balancing to finish
/usr/lib/storpool/balancer.sh -R # second balancing run, to restore full redundancy
storpool balancer commit # to actually load the data into the relocator and start the re-balancing operation
Adding new drives and rebalancing data on them
Situation: we have added SSDs 1201, 1202 and HDDs 1510, 1511, that need to go
into placement groups ssd
and hdd
respectively, and we want to
re-balance the cluster data so that it is re-dispersed onto the new disks as
well. We have no other placement groups in the cluster.
storpool placementGroup ssd addDisk 1201 addDisk 1202
storpool placementGroup hdd addDisk 1510 addDisk 1511
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -F -c 0 # rebalance all placement groups, move data from fuller to emptier drives
storpool balancer commit # to actually load the data into the relocator and start the re-balancing operation
Restoring volume redundancy with rebalancing data on other placementGroup
Situation: we have to restore the redundancy of a hybrid cluster (2 copies on
HDDs, one on SSDs) while the ssd
placementGroup is out of free space because
a few SSDs have recently failed. We can’t replace the failed drives with new
ones for the moment.
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -R -c 0 -b hdd # use placementGroup ``hdd`` as a backup and move some data from SSDs
storpool balancer commit # to actually load the data into the relocator and start the re-balancing operation
Note
The -f
argument could be further used in order to instruct the
balancer how full to keep the cluster and thus control how much data
will be moved in the backup placement group.
Decommissioning a live node
Situation: a node in the cluster needs to be decommissioned, so that the data on
its drives needs to be moved away. The drive numbers on that node are 101
,
102
and 103
.
Note
You have to make sure you have enough space to restore the redundancy before proceeding.
storpool disk 101 softEject # mark all drives for evacuation
storpool disk 102 softEject
storpool disk 103 softEject
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -R -c 0 # rebalance all placement groups, -F has the same effect in this case
storpool balancer commit # to actually load the data into the relocator and start the re-balancing operation
Decommissioning a dead node
Situation: a node in the cluster needs to be decommissioned, as it has died and
cannot be brought back. The drive numbers on that node are 101
, 102
and
103
.
Note
You have to make sure you have enough space to restore the redundancy before proceeding.
storpool disk 101 forget # remove the drives from all placement groups
storpool disk 102 forget
storpool disk 103 forget
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -R -c 0 # rebalance all placement groups
storpool balancer commit # to actually load the data into the relocator and start the re-balancing operation
Tip
Alternatively, you can try adding the disks from the dead node into another live node in the cluster, and then running another re-balance operation.
Resolving imbalances in the drive usage
Situation: we have an imbalance in the drive usage in the whole cluster and we want to improve it.
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -F -c 0 # rebalance all placement groups
storpool balancer commit # to actually load the data into the relocator and start the re-balancing operation
Resolving imbalances in the drive usage with three-node clusters
Situation: we have an imbalance in the drive usage in the whole cluster and we want to improve it. We have a three-node hybrid cluster and proper balancing requires larger moves of “unrelated” data:
mkdir -p ~/storpool/balancer && cd ~/storpool/balancer # it's recommended to run the following commands in a screen/tmux session
/usr/lib/storpool/balancer.sh -F -c 0 # rebalance all placement groups
/usr/lib/storpool/balancer.sh -A -c 10 # retry to see if we get a better result with more data movements
storpool balancer commit # to actually load the data into the relocator and start the re-balancing operation
Reverting balancer to a previous state
Situation: we have committed a rebalancing operation, but want to revert back to the previous state:
cd ~/storpool/balancer # it's recommended to run the following commands in a screen/tmux session
ls # list all saved states and choose what to revert to
/usr/lib/storpool/balancer.sh --restore-state 2022-10-28-15-39-40 # revert to 2022-10-28-15-39-40
storpool balancer commit # to actually load the data into the relocator and start the re-balancing operation
Reading the output of storpool balancer disks
Here is an example output from storpool balancer disks
:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| disk | server | size | stored | on-disk | objects |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | 14.0 | 373 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 6.1 MB -> 6.1 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 405000 |
| 1101 | 11.0 | 447 GB | 16 GB -> 15 GB (-1.0 GB / 1.4 GB) | 18 GB -> 17 GB (-1.1 GB / 1.4 GB) | 11798 -> 10040 (-1758 / +3932) / 480000 |
| 1102 | 11.0 | 447 GB | 16 GB -> 15 GB (-268 MB / 1.3 GB) | 17 GB -> 17 GB (-301 MB / 1.4 GB) | 10843 -> 10045 (-798 / +4486) / 480000 |
| 1103 | 11.0 | 447 GB | 16 GB -> 15 GB (-1.0 GB / 1.8 GB) | 18 GB -> 16 GB (-1.2 GB / 1.9 GB) | 12123 -> 10039 (-2084 / +3889) / 480000 |
| 1104 | 11.0 | 447 GB | 16 GB -> 15 GB (-757 MB / 1.3 GB) | 17 GB -> 16 GB (-899 MB / 1.3 GB) | 11045 -> 10072 (-973 / +4279) / 480000 |
| 1111 | 11.0 | 466 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 5.1 MB -> 5.1 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 495000 |
| 1112 | 11.0 | 366 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 5.1 MB -> 5.1 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 390000 |
| 1121 | 11.0 | 931 GB | 22 GB -> 21 GB (-1009 MB / 830 MB) | 22 GB -> 21 GB (-1.0 GB / 872 MB) | 13713 -> 12698 (-1015 / +3799) / 975000 |
| 1122 | 11.0 | 931 GB | 21 GB -> 21 GB (-373 MB / 2.0 GB) | 22 GB -> 21 GB (-379 MB / 2.0 GB) | 13469 -> 12742 (-727 / +3801) / 975000 |
| 1123 | 11.0 | 931 GB | 22 GB -> 21 GB (-1.1 GB / 1.9 GB) | 22 GB -> 21 GB (-1.1 GB / 2.0 GB) | 14859 -> 12629 (-2230 / +4102) / 975000 |
| 1124 | 11.0 | 931 GB | 21 GB -> 21 GB (36 MB / 1.8 GB) | 21 GB -> 21 GB (92 MB / 1.9 GB) | 13806 -> 12743 (-1063 / +3389) / 975000 |
| 1201 | 12.0 | 447 GB | 18 GB -> 15 GB (-2.9 GB / 633 MB) | 19 GB -> 16 GB (-3.0 GB / 658 MB) | 14148 -> 10070 (-4078 / +3050) / 480000 |
| 1202 | 12.0 | 447 GB | 17 GB -> 15 GB (-2.1 GB / 787 MB) | 19 GB -> 16 GB (-2.3 GB / 815 MB) | 13243 -> 10067 (-3176 / +2576) / 480000 |
| 1203 | 12.0 | 447 GB | 17 GB -> 15 GB (-2.0 GB / 3.3 GB) | 19 GB -> 16 GB (-2.4 GB / 3.5 GB) | 12746 -> 10062 (-2684 / +3375) / 480000 |
| 1204 | 12.0 | 447 GB | 18 GB -> 15 GB (-2.7 GB / 1.1 GB) | 19 GB -> 16 GB (-2.9 GB / 1.1 GB) | 12835 -> 10075 (-2760 / +3248) / 480000 |
| 1212 | 12.0 | 366 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 6.0 MB -> 6.0 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 390000 |
| 1221 | 12.0 | 931 GB | 20 GB -> 21 GB (569 MB / 1.5 GB) | 21 GB -> 21 GB (587 MB / 1.6 GB) | 13115 -> 12616 (-499 / +3736) / 975000 |
| 1222 | 12.0 | 931 GB | 22 GB -> 21 GB (-979 MB / 307 MB) | 22 GB -> 21 GB (-1013 MB / 317 MB) | 12938 -> 12697 (-241 / +3291) / 975000 |
| 1223 | 12.0 | 931 GB | 22 GB -> 21 GB (-1.1 GB / 781 MB) | 22 GB -> 21 GB (-1.2 GB / 812 MB) | 13968 -> 12718 (-1250 / +3302) / 975000 |
| 1224 | 12.0 | 931 GB | 21 GB -> 21 GB (-784 MB / 332 MB) | 22 GB -> 21 GB (-810 MB / 342 MB) | 13741 -> 12692 (-1049 / +3314) / 975000 |
| 1225 | 12.0 | 931 GB | 21 GB -> 21 GB (-681 MB / 849 MB) | 22 GB -> 21 GB (-701 MB / 882 MB) | 13608 -> 12748 (-860 / +3420) / 975000 |
| 1226 | 12.0 | 931 GB | 22 GB -> 21 GB (-1.1 GB / 825 MB) | 22 GB -> 21 GB (-1.1 GB / 853 MB) | 13066 -> 12692 (-374 / +3817) / 975000 |
| 1301 | 13.0 | 447 GB | 13 GB -> 15 GB (2.6 GB / 4.2 GB) | 14 GB -> 17 GB (2.7 GB / 4.4 GB) | 7244 -> 10038 (+2794 / +6186) / 480000 |
| 1302 | 13.0 | 447 GB | 12 GB -> 15 GB (3.0 GB / 3.7 GB) | 13 GB -> 17 GB (3.1 GB / 3.9 GB) | 7507 -> 10063 (+2556 / +5619) / 480000 |
| 1303 | 13.0 | 447 GB | 14 GB -> 15 GB (1.3 GB / 3.2 GB) | 15 GB -> 17 GB (1.3 GB / 3.4 GB) | 7888 -> 10038 (+2150 / +5884) / 480000 |
| 1304 | 13.0 | 447 GB | 13 GB -> 15 GB (2.7 GB / 3.7 GB) | 14 GB -> 17 GB (2.8 GB / 3.9 GB) | 7660 -> 10045 (+2385 / +5870) / 480000 |
| 1311 | 13.0 | 466 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 6.1 MB -> 6.1 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 495000 |
| 1312 | 13.0 | 366 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 6.1 MB -> 6.1 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 390000 |
| 1321 | 13.0 | 931 GB | 21 GB -> 21 GB (-193 MB / 1.1 GB) | 21 GB -> 21 GB (-195 MB / 1.2 GB) | 13365 -> 12765 (-600 / +5122) / 975000 |
| 1322 | 13.0 | 931 GB | 22 GB -> 21 GB (-1.4 GB / 1.1 GB) | 23 GB -> 21 GB (-1.4 GB / 1.1 GB) | 12749 -> 12739 (-10 / +4651) / 975000 |
| 1323 | 13.0 | 931 GB | 21 GB -> 21 GB (-504 MB / 2.2 GB) | 22 GB -> 21 GB (-496 MB / 2.3 GB) | 13386 -> 12695 (-691 / +4583) / 975000 |
| 1325 | 13.0 | 931 GB | 21 GB -> 20 GB (-698 MB / 557 MB) | 22 GB -> 21 GB (-717 MB / 584 MB) | 13113 -> 12768 (-345 / +2668) / 975000 |
| 1326 | 13.0 | 931 GB | 21 GB -> 21 GB (-507 MB / 724 MB) | 22 GB -> 21 GB (-522 MB / 754 MB) | 13690 -> 12704 (-986 / +3327) / 975000 |
| 1401 | 14.0 | 223 GB | 8.3 GB -> 7.6 GB (-666 MB / 868 MB) | 9.3 GB -> 8.5 GB (-781 MB / 901 MB) | 3470 -> 5043 (+1573 / +2830) / 240000 |
| 1402 | 14.0 | 447 GB | 9.8 GB -> 15 GB (5.6 GB / 5.7 GB) | 11 GB -> 17 GB (5.8 GB / 6.0 GB) | 4358 -> 10060 (+5702 / +6667) / 480000 |
| 1403 | 14.0 | 224 GB | 8.2 GB -> 7.6 GB (-623 MB / 1.1 GB) | 9.3 GB -> 8.6 GB (-710 MB / 1.2 GB) | 4547 -> 5036 (+489 / +2814) / 240000 |
| 1404 | 14.0 | 224 GB | 8.4 GB -> 7.6 GB (-773 MB / 1.5 GB) | 9.4 GB -> 8.5 GB (-970 MB / 1.6 GB) | 4369 -> 5031 (+662 / +2368) / 240000 |
| 1411 | 14.0 | 466 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 6.0 MB -> 6.0 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 495000 |
| 1412 | 14.0 | 366 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 6.0 MB -> 6.0 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 390000 |
| 1421 | 14.0 | 931 GB | 19 GB -> 21 GB (1.9 GB / 2.6 GB) | 19 GB -> 21 GB (2.0 GB / 2.7 GB) | 10670 -> 12624 (+1954 / +6196) / 975000 |
| 1422 | 14.0 | 931 GB | 19 GB -> 21 GB (1.6 GB / 3.2 GB) | 20 GB -> 21 GB (1.6 GB / 3.3 GB) | 10653 -> 12844 (+2191 / +6919) / 975000 |
| 1423 | 14.0 | 931 GB | 19 GB -> 21 GB (1.9 GB / 2.5 GB) | 19 GB -> 21 GB (2.0 GB / 2.6 GB) | 10715 -> 12688 (+1973 / +5846) / 975000 |
| 1424 | 14.0 | 931 GB | 18 GB -> 20 GB (2.2 GB / 2.9 GB) | 19 GB -> 21 GB (2.3 GB / 3.0 GB) | 10723 -> 12686 (+1963 / +5505) / 975000 |
| 1425 | 14.0 | 931 GB | 19 GB -> 21 GB (1.3 GB / 2.5 GB) | 20 GB -> 21 GB (1.4 GB / 2.6 GB) | 10702 -> 12689 (+1987 / +5486) / 975000 |
| 1426 | 14.0 | 931 GB | 20 GB -> 21 GB (1.0 GB / 2.5 GB) | 20 GB -> 21 GB (1.0 GB / 2.6 GB) | 10737 -> 12609 (+1872 / +5771) / 975000 |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 45 | 4.0 | 29 TB | 652 GB -> 652 GB (512 MB / 69 GB) | 686 GB -> 685 GB (-240 MB / 72 GB) | 412818 -> 412818 (+0 / +159118) / 30885000 |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Let’s start with the last line. Here’s the meaning, field by field:
There are 45 drives in total.
There are 4 server instances.
The total disk capacity is 29 TB.
The stored data is 652 GB and will change to 652 GB. The total change for all drives afterwards is 512 MB, and the total amount of changes for the drives is 69 GB (i.e. how much will they “recover” from other drives).
The same is repeated for the on-disk size. Here the total amount of changes is roughly the amount of data that would need to be copied.
The total current number of objects will not change (i.e. from 412818 to 412818), 0 new objects will be created, the total amount of objects to be moved is 159118, and the total number of possible objects in the cluster is 30885000.
The difference between “stored” and “on-disk” size is that in the latter also includes the size of checksums and metadata.
For the rest of the lines, the data is basically the same, just per disk.
What needs to be taken into account is:
Are there drives that will have too much data on them? Here, both data size and objects must be checked, and they should be close to the average percentage for the placement group.
Is the data stored on the drives balanced, i.e. are all the drives’ usages close to the average?
Are there drives that should have data on them, but nothing is scheduled to be moved?
This usually happens because a drive wasn’t added to the right placement group.
Will there be too much data to be moved?
To illustrate the difference of amount to be moved, here is the output of
storpool balancer disks
from a run with -c 10
:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| disk | server | size | stored | on-disk | objects |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | 14.0 | 373 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 6.1 MB -> 6.1 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 405000 |
| 1101 | 11.0 | 447 GB | 16 GB -> 15 GB (-1.0 GB / 1.7 GB) | 18 GB -> 17 GB (-1.1 GB / 1.7 GB) | 11798 -> 10027 (-1771 / +5434) / 480000 |
| 1102 | 11.0 | 447 GB | 16 GB -> 15 GB (-263 MB / 1.7 GB) | 17 GB -> 17 GB (-298 MB / 1.7 GB) | 10843 -> 10000 (-843 / +5420) / 480000 |
| 1103 | 11.0 | 447 GB | 16 GB -> 15 GB (-1.0 GB / 3.6 GB) | 18 GB -> 16 GB (-1.2 GB / 3.8 GB) | 12123 -> 10005 (-2118 / +6331) / 480000 |
| 1104 | 11.0 | 447 GB | 16 GB -> 15 GB (-752 MB / 2.7 GB) | 17 GB -> 16 GB (-907 MB / 2.8 GB) | 11045 -> 10098 (-947 / +5214) / 480000 |
| 1111 | 11.0 | 466 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 5.1 MB -> 5.1 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 495000 |
| 1112 | 11.0 | 366 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 5.1 MB -> 5.1 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 390000 |
| 1121 | 11.0 | 931 GB | 22 GB -> 21 GB (-1003 MB / 6.4 GB) | 22 GB -> 21 GB (-1018 MB / 6.7 GB) | 13713 -> 12742 (-971 / +9712) / 975000 |
| 1122 | 11.0 | 931 GB | 21 GB -> 21 GB (-368 MB / 5.8 GB) | 22 GB -> 21 GB (-272 MB / 6.1 GB) | 13469 -> 12718 (-751 / +8929) / 975000 |
| 1123 | 11.0 | 931 GB | 22 GB -> 21 GB (-1.1 GB / 5.9 GB) | 22 GB -> 21 GB (-1.1 GB / 6.1 GB) | 14859 -> 12699 (-2160 / +8992) / 975000 |
| 1124 | 11.0 | 931 GB | 21 GB -> 21 GB (57 MB / 7.4 GB) | 21 GB -> 21 GB (113 MB / 7.7 GB) | 13806 -> 12697 (-1109 / +9535) / 975000 |
| 1201 | 12.0 | 447 GB | 18 GB -> 15 GB (-2.8 GB / 1.2 GB) | 19 GB -> 17 GB (-3.0 GB / 1.2 GB) | 14148 -> 10033 (-4115 / +4853) / 480000 |
| 1202 | 12.0 | 447 GB | 17 GB -> 15 GB (-2.0 GB / 1.6 GB) | 19 GB -> 16 GB (-2.2 GB / 1.7 GB) | 13243 -> 10055 (-3188 / +4660) / 480000 |
| 1203 | 12.0 | 447 GB | 17 GB -> 15 GB (-2.0 GB / 2.3 GB) | 19 GB -> 16 GB (-2.3 GB / 2.4 GB) | 12746 -> 10070 (-2676 / +4682) / 480000 |
| 1204 | 12.0 | 447 GB | 18 GB -> 15 GB (-2.7 GB / 2.1 GB) | 19 GB -> 16 GB (-2.8 GB / 2.2 GB) | 12835 -> 10110 (-2725 / +5511) / 480000 |
| 1212 | 12.0 | 366 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 6.0 MB -> 6.0 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 390000 |
| 1221 | 12.0 | 931 GB | 20 GB -> 21 GB (620 MB / 6.3 GB) | 21 GB -> 21 GB (805 MB / 6.7 GB) | 13115 -> 12542 (-573 / +9389) / 975000 |
| 1222 | 12.0 | 931 GB | 22 GB -> 21 GB (-981 MB / 2.9 GB) | 22 GB -> 21 GB (-1004 MB / 3.0 GB) | 12938 -> 12793 (-145 / +8795) / 975000 |
| 1223 | 12.0 | 931 GB | 22 GB -> 21 GB (-1.1 GB / 5.9 GB) | 22 GB -> 21 GB (-1.1 GB / 6.1 GB) | 13968 -> 12698 (-1270 / +10094) / 975000 |
| 1224 | 12.0 | 931 GB | 21 GB -> 21 GB (-791 MB / 4.5 GB) | 22 GB -> 21 GB (-758 MB / 4.7 GB) | 13741 -> 12684 (-1057 / +8616) / 975000 |
| 1225 | 12.0 | 931 GB | 21 GB -> 21 GB (-671 MB / 4.8 GB) | 22 GB -> 21 GB (-677 MB / 4.9 GB) | 13608 -> 12690 (-918 / +8559) / 975000 |
| 1226 | 12.0 | 931 GB | 22 GB -> 21 GB (-1.1 GB / 6.2 GB) | 22 GB -> 21 GB (-1.1 GB / 6.4 GB) | 13066 -> 12737 (-329 / +9386) / 975000 |
| 1301 | 13.0 | 447 GB | 13 GB -> 15 GB (2.6 GB / 4.5 GB) | 14 GB -> 17 GB (2.7 GB / 4.6 GB) | 7244 -> 10077 (+2833 / +6714) / 480000 |
| 1302 | 13.0 | 447 GB | 12 GB -> 15 GB (3.0 GB / 4.9 GB) | 13 GB -> 17 GB (3.2 GB / 5.2 GB) | 7507 -> 10056 (+2549 / +7011) / 480000 |
| 1303 | 13.0 | 447 GB | 14 GB -> 15 GB (1.3 GB / 3.2 GB) | 15 GB -> 17 GB (1.3 GB / 3.3 GB) | 7888 -> 10020 (+2132 / +6926) / 480000 |
| 1304 | 13.0 | 447 GB | 13 GB -> 15 GB (2.7 GB / 4.7 GB) | 14 GB -> 17 GB (2.8 GB / 4.9 GB) | 7660 -> 10075 (+2415 / +7049) / 480000 |
| 1311 | 13.0 | 466 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 6.1 MB -> 6.1 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 495000 |
| 1312 | 13.0 | 366 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 6.1 MB -> 6.1 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 390000 |
| 1321 | 13.0 | 931 GB | 21 GB -> 21 GB (-200 MB / 4.1 GB) | 21 GB -> 21 GB (-192 MB / 4.3 GB) | 13365 -> 12690 (-675 / +9527) / 975000 |
| 1322 | 13.0 | 931 GB | 22 GB -> 21 GB (-1.3 GB / 6.9 GB) | 23 GB -> 21 GB (-1.3 GB / 7.2 GB) | 12749 -> 12698 (-51 / +10047) / 975000 |
| 1323 | 13.0 | 931 GB | 21 GB -> 21 GB (-495 MB / 6.1 GB) | 22 GB -> 21 GB (-504 MB / 6.3 GB) | 13386 -> 12693 (-693 / +9524) / 975000 |
| 1325 | 13.0 | 931 GB | 21 GB -> 21 GB (-620 MB / 6.6 GB) | 22 GB -> 21 GB (-612 MB / 6.9 GB) | 13113 -> 12768 (-345 / +9942) / 975000 |
| 1326 | 13.0 | 931 GB | 21 GB -> 21 GB (-498 MB / 7.1 GB) | 22 GB -> 21 GB (-414 MB / 7.4 GB) | 13690 -> 12697 (-993 / +9759) / 975000 |
| 1401 | 14.0 | 223 GB | 8.3 GB -> 7.6 GB (-670 MB / 950 MB) | 9.3 GB -> 8.5 GB (-789 MB / 993 MB) | 3470 -> 5061 (+1591 / +3262) / 240000 |
| 1402 | 14.0 | 447 GB | 9.8 GB -> 15 GB (5.6 GB / 7.1 GB) | 11 GB -> 17 GB (5.8 GB / 7.5 GB) | 4358 -> 10052 (+5694 / +7092) / 480000 |
| 1403 | 14.0 | 224 GB | 8.2 GB -> 7.6 GB (-619 MB / 730 MB) | 9.3 GB -> 8.5 GB (-758 MB / 759 MB) | 4547 -> 5023 (+476 / +2567) / 240000 |
| 1404 | 14.0 | 224 GB | 8.4 GB -> 7.6 GB (-790 MB / 915 MB) | 9.4 GB -> 8.5 GB (-918 MB / 946 MB) | 4369 -> 5062 (+693 / +2483) / 240000 |
| 1411 | 14.0 | 466 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 6.0 MB -> 6.0 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 495000 |
| 1412 | 14.0 | 366 GB | 4.7 MB -> 4.7 MB (0 B / 0 B) | 6.0 MB -> 6.0 MB (0 B / 0 B) | 26 -> 26 (+0 / +0) / 390000 |
| 1421 | 14.0 | 931 GB | 19 GB -> 21 GB (2.0 GB / 6.8 GB) | 19 GB -> 21 GB (2.1 GB / 7.0 GB) | 10670 -> 12695 (+2025 / +10814) / 975000 |
| 1422 | 14.0 | 931 GB | 19 GB -> 21 GB (1.6 GB / 7.4 GB) | 20 GB -> 21 GB (1.7 GB / 7.7 GB) | 10653 -> 12702 (+2049 / +10414) / 975000 |
| 1423 | 14.0 | 931 GB | 19 GB -> 21 GB (2.0 GB / 7.4 GB) | 19 GB -> 21 GB (2.1 GB / 7.8 GB) | 10715 -> 12683 (+1968 / +10418) / 975000 |
| 1424 | 14.0 | 931 GB | 18 GB -> 21 GB (2.2 GB / 8.0 GB) | 19 GB -> 21 GB (2.3 GB / 8.3 GB) | 10723 -> 12824 (+2101 / +9573) / 975000 |
| 1425 | 14.0 | 931 GB | 19 GB -> 21 GB (1.3 GB / 5.8 GB) | 20 GB -> 21 GB (1.4 GB / 6.1 GB) | 10702 -> 12686 (+1984 / +10231) / 975000 |
| 1426 | 14.0 | 931 GB | 20 GB -> 21 GB (1.0 GB / 6.5 GB) | 20 GB -> 21 GB (1.2 GB / 6.8 GB) | 10737 -> 12650 (+1913 / +10974) / 975000 |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 45 | 4.0 | 29 TB | 652 GB -> 653 GB (1.2 GB / 173 GB) | 686 GB -> 687 GB (1.2 GB / 180 GB) | 412818 -> 412818 (+0 / +288439) / 30885000 |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This time the total amount of data to be moved is 180GB. It’s possible to have a
difference of an order of magnitude in the total data to be moved between -c
0
and -c 10
. Usually best results are achieved by using the -F
directly with rare occasions requiring full re-balancing (i.e. no -F
and
higher -c
values)
Balancer tool output
Here’s an example of the output of the balancer tool, in non-verbose mode:
1 -== BEFORE BALANCE ==-
2 shards with decreased redundancy 0 (0, 0, 0)
3 server constraint violations 0
4 stripe constraint violations 6652
5 placement group violations 1250
6 pg hdd score 0.6551, objectsScore 0.0269
7 pg ssd score 0.6824, objectsScore 0.0280
8 pg hdd estFree 45T
9 pg ssd estFree 19T
10 Constraint violations detected, doing a replication-restore update first
11 server constraint violations 0
12 stripe constraint violations 7031
13 placement group violations 0
14 -== POST BALANCE ==-
15 shards with decreased redundancy 0 (0, 0, 0)
16 server constraint violations 0
17 stripe constraint violations 6592
18 placement group violations 0
19 moves 14387, (1864GiB) (tail ssd 14387)
20 pg hdd score 0.6551, objectsScore 0.0269, maxDataToSingleDrive 33 GiB
21 pg ssd score 0.6939, objectsScore 0.0285, maxDataToSingleDrive 76 GiB
22 pg hdd estFree 47T
23 pg ssd estFree 19T
The run of the balancer
tool has multiple steps.
First, it shows the current state of the system (lines 2-8):
Shards (volume pieces) with decreased redundancy.
Server constraint violations means that there are pieces of data which which have two or more of their copies on the same server. This is an error condition.
“stripe constraint violation” means that specific pieces of data are not optimally striped on the drives of a specific server. This is NOT an error condition.
“placement group violations” means there is an error condition.
Lines 6 and 7 show the current average “score” (usage in %) of the placement groups, for data and objects;
Lines 8 and 9 show the estimated free space for the placement groups.
Then, in this run it has detected problems (in this case - placement group violations, which in most cases is a missing drive) and has done a pre-run to correct the redundancy (line 10, then again has printed on lines 11-13 the state).
And last, it runs the balancing, and reports the results. The main difference here is that for the placement groups it also reports the maximum data that will be added to a drive. As the balancing happens in parallel on all drives, this is a handy measure to see how long the balance would be (in comparison with a different balancing which might not add that much data to a single drive).
Errors from the balancer
tool
If the balancer
tool doesn’t complete successfully, its output MUST be
examined and the root cause fixed.
Miscellaneous
If for any reason the currently running rebalancing operation needs to be
paused, it can be done via storpool relocator off
. In such cases StorPool
Support should also be contacted, as this shouldn’t need to happen. Re-enabling
it is done via storpool relocator on
.