Adding and removing nodes

The servers participating in a StorPool cluster are referred to as nodes. Adding, removing and recovering nodes are some of the most common administrative tasks, and the below procedures describe how to perform them safely and without affecting the availability or performance of the cluster.

Terminology

Storage node - a cluster node where StorPool disks are present and a storpool_server instance is running (see StorPool services).
Client node - a cluster node where StorPool services (storpool_beacon, storpool_block, etc.) are running but no StorPool disks or storpool_server instances are present.
Voting node - a cluster node configured with the option SP_NODE_NON_VOTING=0 (this is the default). Voting nodes take part in the cluster quorum vote calculation (see Identification and voting).

Note

In order for the cluster to have quorum and stay up/online it needs more than 50% of the voting nodes to be up and reachable. All storage nodes must always be configured as voting, as their storpool_server instances will not start otherwise. Client nodes can be configured to be voting or non-voting as needed, though most often they are set to be non-voting.

Adding nodes

Prerequisites

This procedure applies for cases where your cluster meets the following conditions, before adding new nodes:

The cluster has two or more voting nodes.
The current number of voting nodes in the cluster is greater than the number of nodes you are adding. This is to ensure that the cluster will have quorum even in the event of an issue with the new nodes during their installation.
StorPool is installed on the new nodes and their StorPool services are stopped; for details, see Installation and setup and Managing services with storpool_ctl.
All cluster nodes, aside from the new nodes that have not been added yet, are online, reachable, and in the cluster. This can be checked with storpool net list; see Network for details.

In cases where the number of voting nodes to be added is equal to or greater than the number of voting nodes currently in the cluster, please add the new nodes in batches, to ensure that the above prerequisites are met.

For cases where a one-node cluster is being expanded, please review the procedure in the Expanding one-node clusters section.

Procedure

Notify StorPool support that the nodes in question will be added.
Ensure that the /etc/storpool.conf file on the nodes currently in the cluster is the same everywhere. This can be done by checking that the sha256sum of /etc/storpool.conf on all cluster nodes is the same:
$ sha256sum /etc/storpool.conf e06535759636807e27a36d33a4e103d41206ec659cf7cbd44c4c065fe8c0834a /etc/storpool.conf
Log on to one of the available nodes and open the /etc/storpool.conf configuration file with your favorite text editor.
Add the per-host configuration section for the node(s) being added.
If the node(s) being added is/are voting, increase the value of the SP_EXPECTED_NODES option to the new expected number of voting nodes; for more information, see Expected nodes:

Attention

It is important to ensure that the SP_EXPECTED_NODES value in /etc/storpool.conf on all nodes is correctly set to the number of expected voting nodes in the cluster. If SP_EXPECTED_NODES is incorrectly set to a value that is too high on any node, and the storpool_beacon on that node is started or rebooted, then this incorrect value will be applied to the cluster, possibly causing quorum to be lost and the cluster to go down.
Save the file.
Copy the updated /etc/storpool.conf file to all other nodes in the cluster; when updating the /etc/storpool.conf file to add nodes there is no need to restart the StorPool services on the existing nodes.

Start and enable the StorPool services on the new node(s) by running the commands storpool_ctl enable and storpool_ctl start:

$ storpool_ctl enable
Created symlink /etc/systemd/system/sysinit.target.wants/storpool_hugepages.service → /usr/lib/systemd/system/storpool_hugepages.service.
Created symlink /etc/systemd/system/multi-user.target.wants/storpool_beacon.service → /usr/lib/systemd/system/storpool_beacon.service.
Created symlink /etc/systemd/system/storpool_server.service.wants/storpool_nvmed.service → /usr/lib/systemd/system/storpool_nvmed.service.
...
storpool_controller    not_running
storpool_volumecare    not_running
storpool_block         not_running
...
storpool_nvmed         not_running

$ storpool_ctl start
storpool_nvmed         running
storpool_cgmove        running
storpool_beacon        running
...
storpool_nvmed         running

The node(s) should now be in the cluster, which can be verified by running the storpool net list command; see Network for details.

Expanding one-node clusters

The procedure to expand one-node clusters is the same as the standard procedure for adding new nodes, with one exception. If adding a new voting node to a one-node cluster, in step 5. it is important to not increase the SP_EXPECTED_NODES value. Its value must be left as 1 until all other steps in the standard procedure for adding nodes have been completed. The SP_EXPECTED_NODES value can only be safely updated to 2 after the new node has been added to the cluster. This is because if the SP_EXPECTED_NODES value is updated to 2 in /etc/storpool.conf on the first node in the cluster before the new node was added, and the first node’s storpool_beacon.serivce is rebooted, then the cluster will not have quorum with only half of the voting nodes being available, and the cluster will not come back up until the second node is brought online.

Removing nodes

Prerequisites

This procedure applies to cases where your cluster meets the following conditions before removing the nodes:

The cluster has three or more voting nodes.
The number of voting nodes to be removed is less than half of the current voting nodes. i.e. in a cluster with 5 voting nodes only 2 can be removed at a time.
There is sufficient available space on the remaining StorPool disks to balance out the storage nodes being removed.
All voting cluster nodes are up. This can be checked with storpool net list ; see Network for details.

Procedure

Notify StorPool support that the nodes in question will be removed from the cluster.
softEject and balance out all of the drives from the storage nodes being removed using the Removing a drive without replacement procedure.
After the disks have been balanced out, set a StorPool maintenance for the first node to be removed using storpool maintenance set node <SP_OURID> duration <duration> description <description>; for details, see Maintenance mode.
Detach any currently attached volumes and snapshots from it; see Attachments.
Check quorum. If the node you are planning to remove is configured as voting verify that there will be enough voting nodes remaining in the cluster after this node is disconnected. More than 50% of the voting nodes need to be online and in the cluster for the cluster to have quorum and remain up. Check the expected and actual number of voting nodes in the cluster:
```
$ storpool net list | grep Quorum
Quorum status: 4 voting beacons up out of 4 expected
```
Stop and disable the StorPool services on the node:
```
$ storpool_ctl stop
$ storpool_ctl disable
```
Check that the node is not part of the cluster anymore with the storpool service list command. The services on the stopped node should be in a “down” state.
The node can now be physically shut down and disconnected from the storage network.
If this was a server node, forget each disk that was on this node from the cluster using storpool disk <disk_Id> forget:
```
$ storpool disk 401 forget
```
Ensure that the /etc/storpool.conf file on the nodes currently in the cluster is the same everywhere. This can be done by checking that the sha256sum of /etc/storpool.conf on all cluster nodes is the same:
```
$ sha256sum /etc/storpool.conf
e06535759636807e27a36d33a4e103d41206ec659cf7cbd44c4c065fe8c0834a  /etc/storpool.conf
```
Log on to one of the available nodes and update the /etc/storpool.conf configuration file:
1. Remove the per-host configuration section of the node being removed.
2. If the node being removed is voting, decrease the value of the SP_EXPECTED_NODES option by one.
3. Save the file.
Copy the updated storpool.conf file to all other nodes in the cluster.
If the SP_EXPECTED_NODES value was decreased in /etc/storpool.conf, then this change needs to be applied to all storpool_mgmt instances using the following procedure:
1. Log on to the currently active management node; the active management node can be found using the storpool service list | grep active command.
2. Manually lower the number of expected voting nodes by one using the echo "expected ${EXPECTED}" | socat - unix-sendto:/var/run/storpool/beacon.cmd.sock command:
```
$ echo "expected 3" | socat - unix-sendto:/var/run/storpool/beacon.cmd.sock
```
3. Check if the new expected number of voting nodes is correct:
```
$ storpool net list | grep 'expected'
Quorum status: 3 voting beacons up out of 3 expected
```
After a node has been removed from the cluster, it will still be listed in the outputs of storpool net list and storpool service list as down. This is purely cosmetic, and can be fixed by doing the following:
1. Restart the non-active storpool_mgmt instances one at a time.
2. After the the non-active storpool_mgmt services are back up, restart the active storpool_mgmt.
3. The removed node should now no longer be listed in storpool net list and storpool service list.
4. If the removed node is still listed, check and ensure there are no volumes or snapshots still listed as attached to it.
Complete the StorPool maintenance for this node using storpool maintenance complete node <SP_OUR>; see Maintenance mode.
Repeat steps 3-18 for each of the other nodes scheduled to be removed from the cluster.

Recovering nodes

It might happen that due to software or hardware issues a node stops participating in the cluster. Here are a few examples:

The storpool_server service is not available.
Incorrect host OS configuration.
A reboot of the node for a kernel or a package upgrade (that requires reboot), and no kernel modules were installed for the new kernel.
A service (like storpool_server) was not configured to start when the node boots.
Network interface issues.
Drive or controller issues.

In such situations, you can try bringing the node back to the cluster as described in Degraded state.