Drive management
Introduction
When working a StorPool cluster there would be times when you need to add or remove drives from nodes. Here are a few examples:
Adding a new node with new drives.
Adding a new drive to a node.
Replacing an existing drive with a new one.
Removing a drive when it fails, or if it is not needed anymore.
Whenever you add or remove a drive you should take care to maintain or improve the redundancy levels. It is recommended to evade periods with decreased redundancy levels, or if not possible, to minimize their length.
Adding drives
After a new drive is added to a node you need to define partitions on the drive, and set a Storpool-specific ID. For details, see Storage devices.
Ejecting drives
In StorPool, marking a drive as temporarily unavailable is referred to as ejecting. When you eject a drive, the cluster will do the following:
Stop the data replication for it.
Keep the metadata about the placement groups in which it participated, and which volume objects it contained.
The following sections show some examples with ejected drives. For more information about when and how to eject a drive, see Disk.
Removing a drive without replacement
Follow this procedure when a drive has to be removed from the cluster. Data replicas would be restored on other available drives before the drive is removed from the cluster. This procedure will not leave any volumes with degraded redundancy at any moment. It assumes there is enough free space in the cluster to move the data contained in the drive to be replaced.
Note
If the drive is to be replaced by another drive and there are available slots for the new drive without the need to remove the old drive, follow the instructions to add the new drive first, and then continue with this procedure to remove the old drive. In this case there is no need to rebalance when adding the new drive. Rebalance can be done only once, with the removal of the old drive.
When removing multiple drives (for example, when decommissioning a server), rebalance is recommended to be done only once after all drives are ejected.
To remove a drive and balance the load:
Balance-out the drive:
Mark the drive for removal (repeat for all drives to be removed):
# storpool disk <diskID> softEject
Use the balancer to restore the replication level:
# /usr/lib/storpool/balancer.sh -R
In case you experience issues with the rebalancing - for example, lack of space for performing the operation, or other issues - you can refer to Rebalancing the cluster for details and examples on balancer usage.
Commit the changes and start the relocation:
# storpool balancer commit
Note
Rebalancing operations can take multiple hours, up to few days. This depends on the size of the data that needs to be moved per disk, and also on the load of the cluster.
Wait for the data on the drive to be moved to other drives.
Monitor the process:
# storpool task list # storpool -j relocator status "data" : { "recoveringFromRemote" : 0, "status" : "on", "vagsToCleanup" : 0, "vagsToRelocate" : 443, "vagsToRelocateWaitingForBalancer" : 0, "volumesToRelocate" : 880 },
The relevant fields for rebalancing are
vagsToRelocate
andvolumesToRelocate
.Monitor the remaining data to be moved:
# storpool relocator disks
Attention
The
storpool relocator disks
command consumes a lot of resources. When executed on big clusters with a large number of volumes it may introduce latency and delays to queries to the management of the cluster (it cannot affect I/O operations). Avoid frequent usage of this command if possible.When a drive is balanced-out it will go into ejected mode. You can check in one of the following ways:
Using the
storpool
command:# storpool disk list disk | server | size | used | est.free | % | free entries | on-disk size | allocated objects | errors | flags 802 | 8.0 | - | - | - | - % | - | - | - / - | - / - |
The ejected drive is shown only with its ID and dashes in all other fields.
Using the
storpool_initdisk
command, until you see the “no volumes to relocate” result:# storpool_initdisk --list 0000:83:00.0-p2, diskId 812, version 10009, server instance 5, cluster baaa.b, EJECTED, SSD /dev/sdb2, diskId 912, version 10009, server instance 2, cluster baa.b, EJECTED, SSD
The ejected drive is marked with an
EJECTED
flag.
After ensuring the drive is ejected, remove it from the configuration:
# storpool disk <diskID> forget
Now you can physically unplug the drive.
Replacing a drive without balancing-out the old drive
This procedure will leave data with reduced redundancy during the replacement and following data recovery. It is not recommended to use it. It should be applied only when there are no other options - for example, when there is no space available on the cluster to recover redundancy before removing the old drive.
When replacing an SSD drive on hybrid clusters with one copy on SSD, this procedure will cause increased read latency until all data is fully recovered on the replaced SSD, as some read operations will be performed from HDDs. On all-SSD/NVMe clusters and hybrid clusters with two copies on SSD, performance will not be degraded during the replacement.
To replace a drive without balancing the load you should first eject it and then add the new drive, as described in the sections below.
Ejecting the old drive
Make sure the new drive you would use meets the following requirements:
It is of the same type; for example, replacing a SATA SSD with a SATA SSD or NVMe, a NVMe with a NVMe, or a HDD with a HDD.
It is the same size, or it’s bigger.
Put the node on which the drive is to be ejected in Maintenance mode. Note the following:
Make sure the maintenance period you set is long enough for completing the whole drive replacement procedure!
If this cannot be done (for example, if there is a degraded volume), follow the procedure in Removing an ejected drive without replacement to restore full redundancy! After that you can proceed with ejecting the drive.
Eject the old drive (see Ejecting disks and internal server tests):
# storpool disk <diskID> eject
Adding the new drive
Remove the old drive physically, and insert the new one.
Follow the instructions in Adding a drive to a running cluster to create partitions on the new drive, and assign the same ID.
Note
Make sure the Disk ID and server instance ID are the same (and, optionally, other parameters). You can use the
--list
and-i
options of thestorpool_initdisk
tool.After the new drive is initialized it will be added automatically, and the cluster will start recovering data on it.
To check drive’s status:
# storpool disk list
To check the data recovery process:
# storpool task list
Complete the maintenance mode for the node.
Replacing an ejected drive
A drive can be ejected from a cluster as described in Removing a drive without replacement, or for other reasons (for example, failure). Follow the procedure below when you need to replace the ejected drive with a new one:
Make sure the new drive you would use meets the following requirements:
It is of the same type.
It is the same size, or it’s bigger.
Verify the old drive is ejected before physically unplugging it:
# storpool disk list
Proceed with the steps in the Adding the new drive section.
Removing an ejected drive without replacement
Follow this procedure if the drive is ejected due to failure, but there are no immediate plans to replace the drive. You can restore the data redundancy as follows:
Remove the drive from the list of reported drives and all placement groups in which it participates:
# storpool disk <diskID> forget
Use the balancer to restore the replication level:
# /usr/lib/storpool/balancer.sh -R
Commit the changes and start the relocation:
# storpool balancer commit
For more information about using the balancer tool, see Rebalancing the cluster.
Recovering drives
When you want to return an ejected disk back to the cluster:
storpool_initdisk -r <device>
Here, <device>
should be in the /dev/sdXN
format, where X
is the
drive letter and N
is the partition number. For more information about the
storpool_initdisk
tool, see Drive initialization options.
Wait for the drive to get back in the cluster (check with storpool disk
list
), and then wait for the recovery tasks to complete. If the drive is
unable to get back for some reason, you can do one of the following:
Remove it, as described in Removing an ejected drive without replacement.
Replace it, as described in Replacing an ejected drive.
Note
If you’re returning a drive that was previously balanced-out from the cluster you should reformat it first.
More information
Whenever you add a drive to the cluster, consider also adding the drive to a placement group and then rebalancing it into the cluster. For details, see Placement groups and Rebalancing the cluster.