Multi-site and multi-cluster
There are two sets of features allowing connections and operations to be performed on different clusters in the same (Multi-cluster) datacenter or different locations (Multi site).
General distinction between the two:
Multi-cluster covers closely packed clusters (i.e. pods or racks) with a fast and low-latency connection between them
Multi-site covers clusters in separate locations connected through and insecure and/or high-latency connection
Multi-cluster
For a detailed overview, see Introduction to multi-cluster mode.
Main use case for the multi-cluster mode is seamless scalability in the same datacenter. A volume could be live-migrated between different sub-clusters in a multi-cluster setup. This way workloads could be balanced between multiple sub-clusters in a location, which is generally referred to as a multi-cluster setup.
Multi site
Remotely connected clusters in different locations are referred to as multi site. When two remote clusters are connected, they could efficiently transfer snapshots between them. The usual case is remote backup and DR.
Setup
Connecting clusters regardless of their locations requires the
storpool_bridge
service to be running on at least two nodes in each cluster.
Each node running the storpool_bridge
needs the following parameters to be
configured in /etc/storpool.conf
or /etc/storpool.conf.d/*.conf
files:
SP_CLUSTER_NAME=<Human readable name of the cluster>
SP_CLUSTER_ID=<location ID>.<cluster ID>
SP_BRIDGE_HOST=<IP address>
The following is required when a single IP will be failed over between the bridges; see Single IP failed over between the nodes:
SP_BRIDGE_IFACE=<interface> # optional with IP failover
The SP_CLUSTER_NAME
is mandatory human readable name for this cluster.
The SP_CLUSTER_ID
is an unique ID assigned by StorPool for each existing
cluster (example nmjc.b
). The cluster ID consists of two parts:
nmjc.b
| `sub-cluster ID
`location ID
The first part before the dot (nmjc
) is the location ID, and the part after
the dot is the sub-cluster ID (the second part after the .
- b
)
The SP_BRIDGE_HOST
is the IP address to listen for connections from other
bridges. Note that 3749
port should be unblocked in the firewalls between
the two locations.
A backup template should be configured through mgmtConfig (see Management configuration) The backup template is needed to instruct the local bridge which template should be used for incoming snapshots.
Warning
The backupTemplateName mgmtConfig option must be configure in the
destination cluster for storpool volume XXX backup LOCATION
to work
(otherwise the transfer won’t start).
The SP_BRIDGE_IFACE
is required when two or more bridges are configured with
the same public/private key pairs. The SP_BRIDGE_HOST
in this case is a
floating IP address and will be configured on the SP_BRIDGE_IFACE
on the
host with the active
bridge.
Connecting two clusters
In this examples there are two clusters named Cluster_A
and Cluster_B
.
To have these two connected through their bridge services we would have to
introduce each of them to the other.
Note
In case of a multi-cluster setup the location will be the same for both
clusters, the procedure is the same for both cases with the slight difference
that in case of multi-cluster the remote bridges are usually configured with
noCrypto
.
Cluster A
The following parameters from Cluster_B
will be required:
The
SP_CLUSTER_ID
-locationBId.bId
The
SP_BRIDGE_HOST
IP address -10.10.20.1
The public key located in
/usr/lib/storpool/bridge/bridge.key.txt
in the remote bridge host inCluster_B
-eeeeeeeeeeeee.ffffffffffff.ggggggggggggg.hhhhhhhhhhhhh
By using the CLI we could add Cluster_B
’s location with the following
commands in Cluster_A
:
user@hostA # storpool location add locationBId location_b
user@hostA # storpool cluster add location_b bId
user@hostA # storpool cluster list
--------------------------------------------
| name | id | location |
--------------------------------------------
| location_b-cl1 | bId | location_b |
--------------------------------------------
The remote name is location_b-cl1
, where the clN
number is automatically
generated based on the cluster ID. The last step in Cluster_A
is to register
the Cluster_B
’s bridge. The command looks like this:
user@hostA # storpool remoteBridge register location_b-cl1 10.10.20.1 eeeeeeeeeeeee.ffffffffffff.ggggggggggggg.hhhhhhhhhhhhh
Registered bridges in Cluster_A
:
user@hostA # storpool remoteBridge list
----------------------------------------------------------------------------------------------------------------------------
| ip | remote | minimumDeleteDelay | publicKey | noCrypto |
----------------------------------------------------------------------------------------------------------------------------
| 10.10.20.1 | location_b-cl1 | | eeeeeeeeeeeee.ffffffffffff.ggggggggggggg.hhhhhhhhhhhhh | 0 |
----------------------------------------------------------------------------------------------------------------------------
Hint
The public key in /usr/lib/storpool/bridge/bridge.key.txt
will be
generated on the first run of the storpool_bridge
service.
Note
The noCrypto
is usually 1
in case of multi-cluster with a
secure datacenter network for higher throughput and lower latency
during migrations.
Cluster B
Similarly the parameters from Cluster_A
will be required for registering the
location, cluster and bridge(s) in Cluster B:
The
SP_CLUSTER_ID
-locationAId.aId
The
SP_BRIDGE_HOST
IP address inCluster_A
-10.10.10.1
The public key in
/usr/lib/storpool/bridge/bridge.key.txt
in the remote bridge host inCluster_A
-aaaaaaaaaaaaa.bbbbbbbbbbbb.ccccccccccccc.ddddddddddddd
Similarly the commands will be:
user@hostB # storpool location add locationAId location_a
user@hostB # storpool cluster add location_a aId
user@hostB # storpool cluster list
--------------------------------------------
| name | id | location |
--------------------------------------------
| location_a-cl1 | aId | location_a |
--------------------------------------------
user@hostB # storpool remoteBridge register location_a-cl1 1.2.3.4 aaaaaaaaaaaaa.bbbbbbbbbbbb.ccccccccccccc.ddddddddddddd
user@hostB # storpool remoteBridge list
-------------------------------------------------------------------------------------------------------------------------
| ip | remote | minimumDeleteDelay | publicKey | noCrypto |
-------------------------------------------------------------------------------------------------------------------------
| 1.2.3.4 | location_a-cl1 | | aaaaaaaaaaaaa.bbbbbbbbbbbb.ccccccccccccc.ddddddddddddd | 0 |
-------------------------------------------------------------------------------------------------------------------------
At this point, provided network connectivity is working, the two bridges will be connected.
Bridge redundancy
There are two ways to add redundancy for the bridge services by configuring and
starting the storpool_bridge
service on two (or more) nodes in each cluster.
For both cases only one bridge is active at a time and is being failed over in case the node or the active service is restarted.
Separate IP addresses
Configure and start the storpool_bridge
with a separate SP_BRIDGE_HOST
address and a separate set of public/private key pairs. In this case each of the
bridge nodes would have to be registered in the same way as explained in the
Connecting two clusters section. The SP_BRIDGE_IFACE
parameter is
unset and the SP_BRIDGE_HOST
address is expected by the storpool_bridge
service on each of the node where it is started.
In this case each of the bridge nodes in ClusterA
would have to be
configured in ClusterB
and vice-versa.
Single IP failed over between the nodes
For this, configure and start the storpool_bridge
service on the first node.
Then distribute the /usr/lib/storpool/bridge/bridge.key
and the
/usr/lib/storpool/bridge/bridge.key.txt
files on the next node where the
storpool_bridge
service will be running.
The SP_BRIDGE_IFACE
is required and represents the interface where the
SP_BRIDGE_HOST
address will be configured. The SP_BRIDGE_HOST
will be
up only on the node where the active bridge service is running until either the
service or the node itself gets restarted.
With this configuration there will be only one bridge registered in the remote
cluster(s), regardless of the number of nodes with running storpool_bridge
in the local cluster.
The failover SP_BRIDGE_HOST
is better suited for NAT/port-forwarding cases.
Bridge throughput performance
The throughput performance of a bridge connection depends on a couple of factors (not in this exact sequence) - network throughput, network latency, CPU speed and disk latency. Each could become a bottleneck and could require additional tuning in order to get a higher throughput from the available link between the two sites.
Network
For high-throughput links latency is the most important factor for achieving higher link utilization. For example, a low-latency 10 gbps link will be easily saturated (provided crypto is off), but would require some tuning when the latency is higher for optimizing the TCP window size. Same is in effect with lower-bandwidth links with higher latency.
For these cases the send buffer size could be bumped in small increments so that the TCP window is optimized. Check the Location section for more info on how to update the send buffer size in each location.
Note
For testing what would be the best send buffer size for throughput performance from primary to backup site, fill a volume with data in the primary (source) site, then create a backup to the backup (remote) site. While observing the bandwidth utilized, increase the send buffers in small increments in the source and the destination cluster until the throughput either stops rising or stays at an acceptable level.
Note that increasing the send buffers above this value can lead to delays when recovering a backup in the opposite direction.
Further sysctl changes might be required, depending on the NIC driver, check
the /usr/share/doc/storpool/examples/bridge/90-StorPoolBridgeTcp.conf
on
the node with the storpool_bridge
service, for more info on this.
CPU
The CPU usually becomes a bottleneck only when crypto is configured to on, sometimes it is helpful to move the bridge service on a node with a faster CPU.
If a faster CPU is not available in the same cluster, it could be of help to set
the SP_BRIDGE_SLEEP_TYPE
option (see Type of sleep for the bridge service) to
hsleep
, or even to no
. Note that when this is configured, the
storpool_cg
tool will attempt to isolate a full-CPU core (with the second
thread free from other processes).
Disks throughput
The default remote recovery setting maxRemoteRecoveryRequests
(see
Local and remote recovery) is relatively low, especially for dedicated
backup clusters. Thus, when the underlying disks in the receiving cluster are
underutilized (this does not happen with flash media) they become the
bottleneck. This parameter could be tuned for higher parallelism. Here is an
example: a small cluster of 3 nodes with 8 disks, translating to 48 default
queue depth from the bridge, when there are 8 * 3 * 32 available from the
underlying disks, and (by default with a 10gbps link) 2048 requests available
from the bridge service (256 on an 1gbps link).
Exports
A snapshot in one of the clusters could be exported and become visible at all
clusters in the location it was exported to, for example a snapshot called
snap1
could be exported with:
user@hostA # storpool snapshot snap1 export location_b
It becomes visible in Cluster_B
which is part of location_b
and could be
listed with:
user@hostB # storpool snapshot list remote
-------------------------------------------------------------------------------------------------------
| location | remoteId | name | onVolume | size | creationTimestamp | tags |
-------------------------------------------------------------------------------------------------------
| location_b | locationAId.aId.1 | snap1 | | 107374182400 | 2019-08-11 15:18:02 | |
-------------------------------------------------------------------------------------------------------
The snapshot may as well be exported to the location of the source cluster where the snapshot resides. This way it will become visible to all sub-clusters in this location.
Remote clones
Any snapshot export could be cloned locally. For example, to clone a remote
snapshot with globalId
of locationAId.aId.1
locally we could use:
user@hostB # storpool snapshot snap1-copy template hybrid remote location_a locationAId.aId.1
The name of the clone of the snapshot in Cluster_B
will be snap1_clone
with all parameters from the hybrid
template.
Note
Note that the name of the snapshot in Cluster_B
could also be
exactly the same in all sub-clusters in a multi-cluster setup, as well
as in clusters in different locations in a multi site setup.
The transfer will start immediately. Only written parts from the snapshot will
be transferred between the sites. If snap1
has a size of 100GB, but only
1GB of data was ever written in the volume when it was snapshotted, eventually
approximately this amount of data will be transferred between the two
(sub-)clusters.
If another snapshot in the remote cluster is already based on snap1
and then
exported, the actual transfer will again include only the differences between
snap1
and snap2
, since snap1
exists in Cluster_B
.
The globalId
for this snapshot will be the same for all sites it has been
transferred to.
Creating a remote backup on a volume
The volume backup feature is in essence a set of steps that automate the backup procedure for a particular volume.
For example to backup a volume named volume1
in Cluster_A
to
Cluster_B
we will use:
user@hostA # storpool volume volume1 backup Cluster_B
The above command will actually trigger the following set of events:
Creates a local temporary snapshot of
volume1
inCluster_A
to be transferred toCluster_B
Exports the temporary snapshot to
Cluster_B
Instructs
Cluster_B
to initiate the transfer for this snapshotExports the transferred snapshot in
Cluster_B
to be visible fromCluster_A
Deletes the local temporary snapshot
For example, if a backup operation has been initiated for a volume called
volume1
in Cluster_A
, the progress of the operation could be followed
with:
user@hostA # storpool snapshot list exports
-------------------------------------------------------------
| location | snapshot | globalId | backingUp |
-------------------------------------------------------------
| location_b | volume1@1433 | locationAId.aId.p | true |
-------------------------------------------------------------
Once this operation completes the temporary snapshot will no longer be visible
as an export and a snapshot with the same globalId
will be visible
remotely:
user@hostA # snapshot list remote
------------------------------------------------------------------------------------------------------
| location | remoteId | name | onVolume | size | creationTimestamp | tags |
------------------------------------------------------------------------------------------------------
| location_b | locationAId.aId.p | volume1 | volume1 | 107374182400 | 2019-08-13 16:27:03 | |
------------------------------------------------------------------------------------------------------
Note
You must have a template configured in mgmtConfig backupTemplateName
in Cluster_B
for this to work.
Creating an atomic remote backup for multiple volumes
Sometimes a set of volumes are used simultaneously in the same virtual machine, an example would be different filesystems for a database and its journal. In order to restore back to the same point in time all volumes a group backup could be initiated:
user@hostA# storpool volume groupBackup Cluster_B volume1 volume2
Note
The same underlying feature is used by the VolumeCare for keeping consistent snapshots for all volumes on a virtual machine.
Restoring a volume from remote snapshot
Restoring the volume to a previous state from a remote snapshot requires the following steps:
Create a local snapshot from the remotely exported one:
user@hostA # storpool snapshot volume1-snap template hybrid remote location_b locationAId.aId.p OK
There are some bits to explain in the above example - from left to right:
volume1-snap
- name of the local snapshot that will be created.template hybrid
- instructs StorPool what will be the replication and placement for the locally created snapshot.remote location_b locationAId.aId.p
- instructs StorPool where to look for this snapshot and what is itsglobalId
Tip
If the bridges and the connection between the locations are operational, the transfer will begin immediately.
Next, create a volume with the newly created snapshot as a parent:
.. code-block:: console user@hostA # storpool volume volume1-tmp parent volume1-snap
Finally, the volume clone would have to be attached where it is needed.
The last two steps could be changed a bit to rename the old volume to something different and directly create the same volume name from the restored snapshot. This is handled differently in different orchestration systems. The procedure for restoring multiple volumes from a group backup requires the same set of steps.
See VolumeCare node info for an example implementation.
Note
From 19.01 onwards if the snapshot transfer hasn’t completed yet when the volume is created, read operations on an object that is not yet transferred will be forwarded through the bridge and will be processed by the remote cluster.
Remote deferred deletion
Note
This feature is available for both multi-cluster and multi-site
configurations. Note that the minimumDeleteDelay
is per bridge,
not per location, thus all bridges to a remote location should be
(re)registered with the setting.
The remote bridge could be registered with remote deferred deletion enabled.
This feature will enable a user in Cluster A
to unexport and set remote
snapshots for deferred deletion in Cluster B
.
An example for the case without deferred deletion enabled - Cluster_A
and
Cluster_B
are two StorPool clusters in locations A
and B
connected
with a bridge. A volume named volume1
in Cluster_A
has two backup
snapshots in Cluster_B
called volume1@281
and volume1@294
.
The remote snapshots could be unexported from Cluster_A
with the
deleteAfter
flag, but it will be silently ignored in Cluster_B
.
To enable this feature the following steps would have to be completed in the
remote bridge for Cluster_A
:
The bridge in
Cluster_A
should be registered withminimumDeleteDelay
inCluster_B
.Deferred snapshot deletion should be enabled in
Cluster_B
; for details, see Management configuration.
This will enable setting up the deleteAfter
parameter on an unexport
operation in Cluster_B
initiated from Cluster_A
.
With the above example volume and remote snapshots, a user in Cluster_A
could unexport the volume1@294
snapshot and set its deleteAfter
flag to
7 days from the unexport with:
user@hostA # storpool snapshot remote location_b locationAId.aId.q unexport deleteAfter 7d
OK
After the completion of this operation the following events will occur:
The
volume1@294
snapshot will immediately stop being visible inCluster_A
.The snapshot will get a
deleteAfter
flag with timestamp a week from the time of the unexport call.A week later the snapshot will be deleted, however only if deferred snapshot deletion is still turned on.
Volume and snapshot move
Volume move
A volume could be moved both with (live) or without attachment (offline) to a neighbor sub-cluster in a multi-cluster environment. This is available only for multi-cluster and not possible for Multi site, where only snapshots could be transferred.
To move a volume use:
# storpool volume <volumeName> moveToRemote <clusterName>
The above command will succeed only in case the volume is not attached on any of
the nodes in this sub-cluster. To move the volume live while it is still
attached an additional option onAttached
should instruct the cluster how to
proceed, for example this command:
Lab-D-cl1> volume test moveToRemote Lab-D-cl2 onAttached export
Will move the volume to the Lab-D-cl2
sub-cluster and if the volume is
attached in the present cluster will export it back to Lab-D-cl1
.
This is an equivalent to:
Lab-D-cl1> multiCluster on
[MC] Lab-D-cl1> cluster cmd Lab-D-cl2 attach volume test client 12
OK
Or directly executing the same CLI command in multi-cluster mode at a host in
Lab-D-cl2
cluster.
Note
Moving a volume will also trigger moving all of its snapshots. In a case where there are parent snapshots with many child volumes they might end up in each sub-cluster their child volumes ended up being moved to as a space-saving measure.
Snapshot move
Moving a snapshot is essentially the same as moving a volume, with the difference that it cannot be moved when attached.
For example:
Lab-D-cl1> snapshot testsnap moveToRemote Lab-D-cl2
Will succeed only if the snapshot is not attached locally.
A snapshot part of a volume snapshot chain will trigger copying also the parent snapshots which will be automatically managed by the cluster.