Drive management

Introduction

When working a StorPool cluster there would be times when you need to add or remove drives from nodes. Here are a few examples:

  • Adding a new node with new drives.

  • Adding a new drive to a node.

  • Replacing an existing drive with a new one.

  • Removing a drive when it fails, or if it is not needed anymore.

Whenever you add or remove a drive you should take care to maintain or improve the redundancy levels. It is recommended to evade periods with decreased redundancy levels, or if not possible, to minimize their length.

Adding drives

After a new drive is added to a node you need to define partitions on the drive, and set a Storpool-specific ID. For details, see Storage devices.

Ejecting drives

In StorPool, marking a drive as temporarily unavailable is referred to as ejecting. When you eject a drive, the cluster will do the following:

  • Stop the data replication for it.

  • Keep the metadata about the placement groups in which it participated, and which volume objects it contained.

The following sections show some examples with ejected drives. For more information about when and how to eject a drive, see Disk.

Removing a drive without replacement

Follow this procedure when a drive has to be removed from the cluster. Data replicas would be restored on other available drives before the drive is removed from the cluster. This procedure will not leave any volumes with degraded redundancy at any moment. It assumes there is enough free space in the cluster to move the data contained in the drive to be replaced.

Note

If the drive is to be replaced by another drive and there are available slots for the new drive without the need to remove the old drive, follow the instructions to add the new drive first, and then continue with this procedure to remove the old drive. In this case there is no need to rebalance when adding the new drive. Rebalance can be done only once, with the removal of the old drive.

When removing multiple drives (for example, when decommissioning a server), rebalance is recommended to be done only once after all drives are ejected.

To remove a drive and balance the load:

  1. Balance-out the drive:

    1. Mark the drive for removal (repeat for all drives to be removed):

      # storpool disk <diskID> softEject
      
    2. Use the balancer to restore the replication level:

      # /usr/lib/storpool/balancer.sh -R
      

      In case you experience issues with the rebalancing - for example, lack of space for performing the operation, or other issues - you can refer to Rebalancing the cluster for details and examples on balancer usage.

    3. Commit the changes and start the relocation:

      # storpool balancer commit
      

    Note

    Rebalancing operations can take multiple hours, up to few days. This depends on the size of the data that needs to be moved per disk, and also on the load of the cluster.

  2. Wait for the data on the drive to be moved to other drives.

    • Monitor the process:

      # storpool task list
      # storpool -j relocator status
        "data" : {
           "recoveringFromRemote" : 0,
           "status" : "on",
           "vagsToCleanup" : 0,
           "vagsToRelocate" : 443,
           "vagsToRelocateWaitingForBalancer" : 0,
           "volumesToRelocate" : 880
        },
      

      The relevant fields for rebalancing are vagsToRelocate and volumesToRelocate.

    • Monitor the remaining data to be moved:

      # storpool relocator disks
      

    Attention

    The storpool relocator disks command consumes a lot of resources. When executed on big clusters with a large number of volumes it may introduce latency and delays to queries to the management of the cluster (it cannot affect I/O operations). Avoid frequent usage of this command if possible.

  3. When a drive is balanced-out it will go into ejected mode. You can check in one of the following ways:

    • Using the storpool command:

      # storpool disk list
        disk |  server  |    size    |    used    |  est.free  |      %  |  free entries  |  on-disk size  |  allocated objects |  errors  |  flags
        802  |     8.0  |         -  |         -  |         -  |    - %  |             -  |             -  |        - / -       |   - / -  |
      

      The ejected drive is shown only with its ID and dashes in all other fields.

    • Using the storpool_initdisk command, until you see the “no volumes to relocate” result:

      # storpool_initdisk --list
        0000:83:00.0-p2, diskId 812, version 10009, server instance 5, cluster baaa.b, EJECTED, SSD
        /dev/sdb2, diskId 912, version 10009, server instance 2, cluster baa.b, EJECTED, SSD
      

      The ejected drive is marked with an EJECTED flag.

  4. After ensuring the drive is ejected, remove it from the configuration:

    # storpool disk <diskID> forget
    
  5. Now you can physically unplug the drive.

Replacing a drive without balancing-out the old drive

This procedure will leave data with reduced redundancy during the replacement and following data recovery. It is not recommended to use it. It should be applied only when there are no other options - for example, when there is no space available on the cluster to recover redundancy before removing the old drive.

When replacing an SSD drive on hybrid clusters with one copy on SSD, this procedure will cause increased read latency until all data is fully recovered on the replaced SSD, as some read operations will be performed from HDDs. On all-SSD/NVMe clusters and hybrid clusters with two copies on SSD, performance will not be degraded during the replacement.

To replace a drive without balancing the load you should first eject it and then add the new drive, as described in the sections below.

Ejecting the old drive

  1. Make sure the new drive you would use meets the following requirements:

    • It is of the same type; for example, replacing a SATA SSD with a SATA SSD or NVMe, a NVMe with a NVMe, or a HDD with a HDD.

    • It is the same size, or it’s bigger.

  2. Put the node on which the drive is to be ejected in Maintenance mode. Note the following:

    • Make sure the maintenance period you set is long enough for completing the whole drive replacement procedure!

    • If this cannot be done (for example, if there is a degraded volume), follow the procedure in Removing an ejected drive without replacement to restore full redundancy! After that you can proceed with ejecting the drive.

  3. Eject the old drive (see Ejecting disks and internal server tests):

    # storpool disk <diskID> eject
    

Adding the new drive

  1. Remove the old drive physically, and insert the new one.

  2. Follow the instructions in Adding a drive to a running cluster to create partitions on the new drive, and assign the same ID.

    Note

    Make sure the Disk ID and server instance ID are the same (and, optionally, other parameters). You can use the --list and -i options of the storpool_initdisk tool.

  3. After the new drive is initialized it will be added automatically, and the cluster will start recovering data on it.

    • To check drive’s status:

      # storpool disk list
      
    • To check the data recovery process:

      # storpool task list
      
  4. Complete the maintenance mode for the node.

Replacing an ejected drive

A drive can be ejected from a cluster as described in Removing a drive without replacement, or for other reasons (for example, failure). Follow the procedure below when you need to replace the ejected drive with a new one:

  1. Make sure the new drive you would use meets the following requirements:

    • It is of the same type.

    • It is the same size, or it’s bigger.

  2. Verify the old drive is ejected before physically unplugging it:

    # storpool disk list
    
  3. Proceed with the steps in the Adding the new drive section.

Removing an ejected drive without replacement

Follow this procedure if the drive is ejected due to failure, but there are no immediate plans to replace the drive. You can restore the data redundancy as follows:

  1. Remove the drive from the list of reported drives and all placement groups in which it participates:

    # storpool disk <diskID> forget
    
  2. Use the balancer to restore the replication level:

    # /usr/lib/storpool/balancer.sh -R
    
  3. Commit the changes and start the relocation:

    # storpool balancer commit
    

For more information about using the balancer tool, see Rebalancing the cluster.

Recovering drives

When you want to return an ejected disk back to the cluster:

storpool_initdisk -r <device>

Here, <device> should be in the /dev/sdXN format, where X is the drive letter and N is the partition number. For more information about the storpool_initdisk tool, see Drive initialization options.

Wait for the drive to get back in the cluster (check with storpool disk list), and then wait for the recovery tasks to complete. If the drive is unable to get back for some reason, you can do one of the following:

Note

If you’re returning a drive that was previously balanced-out from the cluster you should reformat it first.

More information

Whenever you add a drive to the cluster, consider also adding the drive to a placement group and then rebalancing it into the cluster. For details, see Placement groups and Rebalancing the cluster.