StorPool capacity planner
The StorPool capacity planner is a helper tool for planning cluster capacity upgrades. It can be used for all replication and erasure coding schemes. The tool is available starting with release 21.0 revision 21.0.242.e4067e0e4.
For general information about how the total, free, and used space of a StorPool cluster are determined, see Cluster capacity.
Introduction
Using the capacity planning tool you can can figure out what redundancy modes can a specific StorPool cluster support (see Redundancy). These modes are:
Scheme |
Nodes |
Raw space used |
Overhead |
|---|---|---|---|
3 |
3+ |
3.3x |
230% |
2+2 |
5+ |
2.4x |
140% |
4+2 |
7+ |
1.8x |
80% |
8+2 |
11+ |
1.5x |
50% |
The “3” scheme (also known as “3R”) is StorPool’s standard triple replication mode. The rest – in the form of “N+2” – are the Erasure Coding (EC) modes.
Consider that any capacity upgrade calculations done by hand are both difficult and prone to errors at the same time. The tool provided by StorPool is better for providing consistent suggestions for planning capacity upgrades, and also suggestions for better placement with a particular erasure coding scheme.
How it works
The capacity planner works in the following way:
Gets information about a StorPool cluster either using the StorPool CLI (see CLI tutorial), or from a JSON or CSV file provided by you.
Calculates if the cluster can support the redundancy modes described above.
Prints out the results of the calculation: if the cluster can support a mode as-is, what changes are needed to support it (if any), or if the mode is unsupported completely.
The tool can output its result in CSV format, which can be saved as file, edited, and then fed back to the capacity planner. This allows you to explore new potential storage configurations.
Confirming supported modes
Each StorPool redundancy mode has requirements on the number of nodes, and their capacity related to other nodes. Running the tool on an existing cluster can be used to check if (and if not – why) that cluster cannot be used in that specific mode.
Mode support calculation
The total raw capacity of each fault set must be smaller or equal to the total capacity divided by the minimum number of nodes. Every node is its own fault set, apart from those that are combined in custom fault sets.
Usage
The /usr/lib/storpool/storpool_capacity_planner tool can be run in one of the following ways:
Online: performed on a machine with the
storpoolCLI installed (the data is taken from a live cluster).Offline: by passing the
--disk-list-json FILEand--fault-set-list-json FILEoptions.Planning: by passing the
--csv-file CSV_FILEoption with a CSV file that has been generated by a previous run of the tool.
Here is the full list of options you can use:
- --disk-list-json FILE
Provide input information about the disks in a cluster using a JSON file; you can generate such a file with
storpool -j disk list, see Listings disks- --fault-set-list-json FILE
Provide input information about the fault sets in a cluster using a JSON file; you can generate such a file with
storpool -j faultSet list(see Fault sets)- --csv-file FILE
Provide input information about the cluster using a CSV file; you can generate such a file with the
-vvoption, see below- --size-unit
What units to report sizes in; default is TB
- --combine-partitions
Whether partitions (if found) should be combined into their parent device; default is False
- --ag-step
How many allocation groups to decrement when trying to make a cluster support a mode; integer, default is 1
- --iterations
How many iterations to do when trying to make a cluster support a mode; integer, default is 10000
- -v, --verbose
Each such flag adds more information about the computations to the output; use
-vvto obtain the result in CSV format
Online example: checking for mode eligibility
Running storpool_capacity_planner on a live cluster will report
which modes it can support:
$ /usr/lib/storpool/storpool_capacity_planner
Original cluster:
Cluster: raw=35.505TB, fault sets:
fault set: name=FAKE_FOR_NODE_1 size=5.118TB
fault set: name=FAKE_FOR_NODE_2 size=3.518TB
fault set: name=FAKE_FOR_NODE_3 size=3.518TB
fault set: name=FAKE_FOR_NODE_4 size=5.758TB
fault set: name=FAKE_FOR_NODE_5 size=5.118TB
fault set: name=FAKE_FOR_NODE_6 size=5.117TB
fault set: name=FAKE_FOR_NODE_7 size=1.599TB
fault set: name=FAKE_FOR_NODE_8 size=1.919TB
fault set: name=FAKE_FOR_NODE_9 size=3.839TB
3R/3.3/3: Result: OK
Cluster: raw=35.505TB (0.0TB), usable=10.759TB, mode=3R/3.3/3
EC(2+2)/2.4/5: Result: OK
Cluster: raw=35.505TB (0.0TB), usable=14.794TB, mode=EC(2+2)/2.4/5
EC(4+2)/1.8/7: Result OK (cluster has been modified!)
Cluster: raw=33.586TB (-1.919TB), usable=18.659TB, mode=EC(4+2)/1.8/7
EC(8+2)/1.5/11: Result: Not compatible, reasons:
Cluster does not have enough nodes. Actual: 9, expected: >= 11
The tool has calculated the following about the cluster:
It supports “3R” and “EC 2+2” without any modifications to the underlying capacity.
It can support “EC 4+2”, and ~1.9TB can be removed.
“EC 8+2” is not supported because there are not enough nodes.
Planning example: adding disks to a cluster
Note
Depending on the storage system and its configuration, adding more disks to a storage system does not result directly in more storage capacity. This is the case for distributed systems like StorPool - disk additions need to be planned considering the specific cluster, its nodes, and fault sets.
Consider the following cluster with capacities in TB:
Node |
Size (TB) |
Disks (TB) |
|---|---|---|
Node 1 |
10 |
5 + 5 |
Node 2 |
8 |
4 + 4 |
Node 3 |
6 |
2 + 2 + 2 |
Node 4 |
2 |
1 + 1 |
Node 5 |
2 |
1 + 1 |
Node 6 |
2 |
1 + 1 |
Running the capacity planner on this cluster reports that it supports 3R. The
capacity that is required is raw=32.977TB (-0.002TB). The space that is
available to users from this capacity is usable=9.993TB.
$ /usr/lib/storpool/storpool_capacity_planner
Original cluster:
Cluster: raw=29.998TB, fault sets:
fault set: name=FS_NODE_1 size=10.0TB
fault set: name=FS_NODE_2 size=7.999TB
fault set: name=FS_NODE_3 size=5.999TB
fault set: name=FS_NODE_4 size=2.0TB
fault set: name=FS_NODE_5 size=2.0TB
fault set: name=FS_NODE_6 size=2.0TB
3R/3.3/3: Result OK (cluster has been modified!)
Cluster: raw=29.997TB (-0.001TB), usable=9.09TB, mode=3R/3.3/3
...
Working with CSV files
You can generate a CSV representation of the cluster (use -vv, see Usage) and modify the disk layout.
Using this option on the example cluster would provide the following results (which would be used as a basis for the examples below):
disk_id,fault_set_name,node_id,device,serial,size
1001,FS_NODE_1,10,0000:c1:00.0-p1,194828550042,5TB
1002,FS_NODE_1,10,0000:c1:00.0-p2,194828550042,5TB
1101,FS_NODE_2,11,0000:c2:00.0-p1,194828550869,4TB
1102,FS_NODE_2,11,0000:c2:00.0-p2,194828550869,4TB
1201,FS_NODE_3,12,0000:c1:00.0-p1,194828550942,2TB
1202,FS_NODE_3,12,0000:c1:00.0-p2,194828550942,2TB
1203,FS_NODE_3,12,0000:82:00.0-p1,S6CKNE0T711426,2TB
1301,FS_NODE_4,13,0000:c2:00.0-p1,S6CKNG0T217281,1TB
1302,FS_NODE_4,13,0000:c2:00.0-p2,S6CKNG0T217281,1TB
1402,FS_NODE_5,14,0000:c2:00.0-p2,S6CKNG0T217250,1TB
1403,FS_NODE_5,14,0000:c3:00.0-p1,S6CKNG0T217285,1TB
1501,FS_NODE_6,15,0000:c1:00.0-p1,S6CKNG0T217093,1TB
1502,FS_NODE_6,15,0000:c1:00.0-p2,S6CKNG0T217093,1TB
Afterwards, you can provide the modified CSV to the tool to check the potential effects of adding, removing, or shuffling disks in the cluster. When editing the CSV file you should consider the following rules:
The first line should be the following:
disk_id,fault_set_name,node_id,device,serial,sizeThere should be no empty lines.
The data format for the entries should be like the one you get on the CLI; for details, see Listings disks.
Adding a disk to different nodes
Assume you have a CSV file called with_extra_1tb_to_node_1.csv, and you modify it so that there is an extra 1TB disk added to Node 1.
That is, you add a line like this one: 1003,FS_NODE_1,10,0000:c1:00.0-p1,20012BF1F991,1TB
You provide this file to the tool using the --csv-file option:
$ /usr/lib/storpool/storpool_capacity_planner --csv-file with_extra_1tb_to_node_1.csv
Original cluster:
Cluster: raw=30.998TB, fault sets:
fault set: name=FS_NODE_1 size=11.0TB
fault set: name=FS_NODE_2 size=7.999TB
fault set: name=FS_NODE_3 size=5.999TB
fault set: name=FS_NODE_4 size=2.0TB
fault set: name=FS_NODE_5 size=2.0TB
fault set: name=FS_NODE_6 size=2.0TB
3R/3.3/3: Result OK (cluster has been modified!)
Cluster: raw=29.997TB (-1.001TB), usable=9.09TB, mode=3R/3.3/3
...
As shown in the example above, this would not increase the storage capacity. The tool would report the following:
The usable size would remain
usable=9.09TB.The raw space would be decreased with 1TB
raw=29.997TB (-1.001TB).
Using the same approach you can check what would happen on adding a 1TB disk to Node 6 instead.
You can do this using a with_extra_1tb_to_node_6.csv CSV file and adding to it a line like this one: 1503,FS_NODE_6,15,0000:c1:00.0-p1,20012BF1F991,1TB
The result would be the following:
$ /usr/lib/storpool/storpool_capacity_planner --csv-file with_extra_1tb_to_node_6.csv
Original cluster:
Cluster: raw=30.998TB, fault sets:
fault set: name=FS_NODE_1 size=10.0TB
fault set: name=FS_NODE_2 size=7.999TB
fault set: name=FS_NODE_3 size=5.999TB
fault set: name=FS_NODE_4 size=2.0TB
fault set: name=FS_NODE_5 size=2.0TB
fault set: name=FS_NODE_6 size=3.0TB
3R/3.3/3: Result: OK
Cluster: raw=30.998TB (0.0TB), usable=9.393TB, mode=3R/3.3/3
...
The results now are better than those when the disk was added to Node 1:
Cluster’s usable space is increased. Originally it was
usable=9.09TB, and now it isusable=9.393TB.Note that the cluster has not rejected the extra capacity
raw=30.998TB (0.0TB)(no negative values inside the brackets).