Adding and removing nodes
The servers participating in a StorPool cluster are referred to as nodes. Adding, removing and recovering nodes are some of the most common administrative tasks, and the below procedures describe how to perform them safely and without affecting the availability or performance of the cluster.
Terminology
Storage node - a cluster node where StorPool disks are present and a
storpool_serverinstance is running (see StorPool services).Client node - a cluster node where StorPool services (
storpool_beacon,storpool_block, etc.) are running but no StorPool disks orstorpool_serverinstances are present.Voting node - a cluster node configured with the option
SP_NODE_NON_VOTING=0(this is the default). Voting nodes take part in the cluster quorum vote calculation (see Identification and voting).
Note
In order for the cluster to have quorum and stay up/online it needs
more than 50% of the voting nodes to be up and reachable. All storage nodes
must always be configured as voting, as their storpool_server
instances will not start otherwise. Client nodes can be configured to be voting
or non-voting as needed, though most often they are set to be non-voting.
Adding nodes
Prerequisites
This procedure applies for cases where your cluster meets the following conditions, before adding new nodes:
The cluster has two or more voting nodes.
The current number of voting nodes in the cluster is greater than the number of nodes you are adding. This is to ensure that the cluster will have quorum even in the event of an issue with the new nodes during their installation.
StorPool is installed on the new nodes and their StorPool services are stopped; for details, see Installation and setup and Managing services with storpool_ctl.
All cluster nodes, aside from the new nodes that have not been added yet, are online, reachable, and in the cluster. This can be checked with
storpool net list; see Network for details.
In cases where the number of voting nodes to be added is equal to or greater than the number of voting nodes currently in the cluster, please add the new nodes in batches, to ensure that the above prerequisites are met.
For cases where a one-node cluster is being expanded, please review the procedure in the Expanding one-node clusters section.
Procedure
Notify StorPool support that the nodes in question will be added.
Ensure that the
/etc/storpool.conffile on the nodes currently in the cluster is the same everywhere. This can be done by checking that thesha256sumof/etc/storpool.confon all cluster nodes is the same:$ sha256sum /etc/storpool.conf e06535759636807e27a36d33a4e103d41206ec659cf7cbd44c4c065fe8c0834a /etc/storpool.conf
Log on to one of the available nodes and open the
/etc/storpool.confconfiguration file with your favorite text editor.Add the per-host configuration section for the node(s) being added.
If the node(s) being added is/are voting, increase the value of the
SP_EXPECTED_NODESoption to the new expected number of voting nodes; for more information, see Expected nodes:Attention
It is important to ensure that the
SP_EXPECTED_NODESvalue in/etc/storpool.confon all nodes is correctly set to the number of expected voting nodes in the cluster. IfSP_EXPECTED_NODESis incorrectly set to a value that is too high on any node, and thestorpool_beaconon that node is started or rebooted, then this incorrect value will be applied to the cluster, possibly causing quorum to be lost and the cluster to go down.Save the file.
Copy the updated
/etc/storpool.conffile to all other nodes in the cluster; when updating the/etc/storpool.conffile to add nodes there is no need to restart the StorPool services on the existing nodes.Start and enable the StorPool services on the new node(s) by running the commands
storpool_ctl enableandstorpool_ctl start:$ storpool_ctl enable Created symlink /etc/systemd/system/sysinit.target.wants/storpool_hugepages.service → /usr/lib/systemd/system/storpool_hugepages.service. Created symlink /etc/systemd/system/multi-user.target.wants/storpool_beacon.service → /usr/lib/systemd/system/storpool_beacon.service. Created symlink /etc/systemd/system/storpool_server.service.wants/storpool_nvmed.service → /usr/lib/systemd/system/storpool_nvmed.service. ... storpool_controller not_running storpool_volumecare not_running storpool_block not_running ... storpool_nvmed not_running $ storpool_ctl start storpool_nvmed running storpool_cgmove running storpool_beacon running ... storpool_nvmed running
The node(s) should now be in the cluster, which can be verified by running the
storpool net listcommand; see Network for details.
Expanding one-node clusters
The procedure to expand one-node clusters is the same as the standard procedure for
adding new nodes, with one exception. If adding a new voting node to a one-node
cluster, in step 5. it is important to not increase the SP_EXPECTED_NODES
value. Its value must be left as 1 until all other steps in the standard
procedure for adding nodes have been completed. The SP_EXPECTED_NODES value
can only be safely updated to 2 after the new node has been added to the cluster.
This is because if the SP_EXPECTED_NODES value is updated to 2 in
/etc/storpool.conf on the first node in the cluster before the new node was
added, and the first node’s storpool_beacon.serivce is rebooted, then the
cluster will not have quorum with only half of the voting nodes being available, and
the cluster will not come back up until the second node is brought online.
Removing nodes
Prerequisites
This procedure applies to cases where your cluster meets the following conditions before removing the nodes:
The cluster has three or more voting nodes.
The number of voting nodes to be removed is less than half of the current voting nodes. i.e. in a cluster with 5 voting nodes only 2 can be removed at a time.
There is sufficient available space on the remaining StorPool disks to balance out the storage nodes being removed.
All voting cluster nodes are up. This can be checked with
storpool net list; see Network for details.
Procedure
Notify StorPool support that the nodes in question will be removed from the cluster.
softEject and balance out all of the drives from the storage nodes being removed using the Removing a drive without replacement procedure.
After the disks have been balanced out, set a StorPool maintenance for the first node to be removed using
storpool maintenance set node <SP_OURID> duration <duration> description <description>; for details, see Maintenance mode.Detach any currently attached volumes and snapshots from it; see Attachments.
Check quorum. If the node you are planning to remove is configured as voting verify that there will be enough voting nodes remaining in the cluster after this node is disconnected. More than 50% of the voting nodes need to be online and in the cluster for the cluster to have quorum and remain up. Check the expected and actual number of voting nodes in the cluster:
$ storpool net list | grep Quorum Quorum status: 4 voting beacons up out of 4 expected
Stop and disable the StorPool services on the node:
$ storpool_ctl stop $ storpool_ctl disable
Check that the node is not part of the cluster anymore with the
storpool service listcommand. The services on the stopped node should be in a “down” state.The node can now be physically shut down and disconnected from the storage network.
If this was a server node, forget each disk that was on this node from the cluster using
storpool disk <disk_Id> forget:$ storpool disk 401 forget
Ensure that the
/etc/storpool.conffile on the nodes currently in the cluster is the same everywhere. This can be done by checking that thesha256sumof/etc/storpool.confon all cluster nodes is the same:$ sha256sum /etc/storpool.conf e06535759636807e27a36d33a4e103d41206ec659cf7cbd44c4c065fe8c0834a /etc/storpool.conf
Log on to one of the available nodes and update the
/etc/storpool.confconfiguration file:Remove the per-host configuration section of the node being removed.
If the node being removed is voting, decrease the value of the
SP_EXPECTED_NODESoption by one.Save the file.
Copy the updated
storpool.conffile to all other nodes in the cluster.If the
SP_EXPECTED_NODESvalue was decreased in/etc/storpool.conf, then this change needs to be applied to allstorpool_mgmtinstances using the following procedure:Log on to the currently active management node; the active management node can be found using the
storpool service list | grep activecommand.Manually lower the number of expected voting nodes by one using the
echo "expected ${EXPECTED}" | socat - unix-sendto:/var/run/storpool/beacon.cmd.sockcommand:$ echo "expected 3" | socat - unix-sendto:/var/run/storpool/beacon.cmd.sock
Check if the new expected number of voting nodes is correct:
$ storpool net list | grep 'expected' Quorum status: 3 voting beacons up out of 3 expected
After a node has been removed from the cluster, it will still be listed in the outputs of
storpool net listandstorpool service listas down. This is purely cosmetic, and can be fixed by doing the following:Restart the non-active
storpool_mgmtinstances one at a time.After the the non-active storpool_mgmt services are back up, restart the active
storpool_mgmt.The removed node should now no longer be listed in
storpool net listandstorpool service list.If the removed node is still listed, check and ensure there are no volumes or snapshots still listed as attached to it.
Complete the StorPool maintenance for this node using
storpool maintenance complete node <SP_OUR>; see Maintenance mode.Repeat steps 3-18 for each of the other nodes scheduled to be removed from the cluster.
Recovering nodes
It might happen that due to software or hardware issues a node stops participating in the cluster. Here are a few examples:
The
storpool_serverservice is not available.Incorrect host OS configuration.
A reboot of the node for a kernel or a package upgrade (that requires reboot), and no kernel modules were installed for the new kernel.
A service (like
storpool_server) was not configured to start when the node boots.Network interface issues.
Drive or controller issues.
In such situations, you can try bringing the node back to the cluster as described in Degraded state.