Terminology

Here you can find explanations of the main terms in a disaster recovery setup.

Cloud Management Platform (CMP)

Manages resources in the zones - VMs, storage, computing, and so on. DR Engine supports configurations where all zones are controlled by a single CMP instance and configurations in which each zone is controlled by a dedicated CMP.

When there is more than one CMP, CMP instances can operate independently in a peer-to-peer mode. In this mode the instances send commands to each other, or are managed by a central Service Portal to coordinate the operations of the disaster recovery process.

Service Portal

Manages the services in the cloud. Sometimes, the Service Portal is integrated with the CMP. The Service Portal communicates with the CMPs and the DR Engines to orchestrate the disaster recovery process.

DR Engine

Responsible for replicating data between storage clusters and managing recovery points. Each zone contains an instance of the DR Engine. DR Engines in different zones don’t communicate directly with each other. DR Engine uses StorPool Storage for replicating data between zones.

StorPool Storage

In the context of this document, “StorPool” denotes a single StorPool cluster, or multi-cluster deployed in one Zone.

Zone

An independent site containing compute and storage resources, network, access to the Internet, and other components. Each zone can operate autonomously. Consider that there should be a connection (over the Internet or via a dedicated link) between the StorPool clusters in each zone to allow recovery points to be replicated from one zone to another, as well as connectivity to the CMP and/or the Service Portal. This allows recovery of VMs if a zone goes offline.

For disaster recovery, a zone may be connected only to a subset of all zones. It is not required that a zone can replicate recovery points to all other zones.

There is a deployment model without a centralized management, where each zone is controlled by a local CMP. In this case zones still need to be able to send control information to remote zones in a peer-to-peer manner.

All zones are equal. There are no primary and secondary, or recovery zones. For a VM, however, there is a primary zone (where the VM is running) and a recovery zone (where recovery points are sent). One zone can be primary for one VM and recovery for another.

Recovery point

A recovery point is a crash-consistent snapshot of all virtual disks of a VM, with VM metadata included. A VM can have many recovery points created at different moments. A recovery point has the following features:

Contains snapshots of all virtual disks attached to the VM when the recovery point is created, and a snapshot of the VM metadata.
It is created at the primary zone (where the VM runs) and replicated in the recovery zone.
Can have copies in the primary and the remote zone simultaneously. It is then considered a single recovery point with multiple copies.

The recovery points are replicated at the storage layer, and the actual data replication is handled by StorPool storage clusters in both zones.

Disaster recovery service (DR service)

A DR service is a set of parameters and rules for a VM that defines the disaster recovery behaviour of the VM. A DR service must be created to enable disaster recovery of a VM. A VM can have at most one DR service. One DR service will service one VM. In most cases, there is a one-to-one relationship between a DR service and a VM.

The DR service defines the policy: how often recovery points are created, how long they will be kept, and which is the recovery zone they will be sent to. The DR service is created in the primary zone, where the VM is running and available (visible) with the same identifier in both the primary and the recovery zones.

Usually, the DR service has a VM defined only in one zone. However, in case of failover, the VM at the recovery zone will become active, and in this case, the DR service will have VMs defined in both zones. In this case, the same DR service can have two different VMs defined - one at each zone.

The state of the DR service is not guaranteed to be consistent in all zones. This is done to ensure the DR service API is available even when there is no connection between the sites. For example, after a failover, if the primary zone is still operational or restored, the DR service will have different active VMs in each zone, and each zone will be the primary for the corresponding VM. In such a situation (when access to the primary zone is restored) the user should resolve the conflict by updating the DR service and mark the VM inactive.

A DR service cannot have more than one active VM in a zone. On the other hand, a DR service can have no active VM in a zone - this is the usual case with remote zones, where there are replicated recovery points from the primary zone.

VM metadata

Each DR service can store VM’s metadata (formatted a text string). It is up to the CMP or the Service Portal if and how this metadata will be used.

Each recovery point contains a snapshot of the VM metadata. Its main purpose is to provide information about the VM configuration at the moment the recovery point is created, and to allow the CMP to restore the VM with the correct configuration when the VM is reverted to that recovery point.

The CMP or the Service Portal is responsible for storing all the necessary information into the VM metadata record in the DR service and keeping it up to date when the VM configuration is changed.

End-user

Uses the user interface and API of the Service Portal to create and control VMs in the corresponding zone and to initiate failover.