VM failover

In case a disaster occurs in zone A, a VM instance should be started in zone B in the following way. This scenario assumes none of the resources in Zone A are accessible. Only Zone B, the Service Portal, and the CMP are used to recover the service.

Prerequisites

You are familiar with the Terminology and the following assumptions about the DR scenario:

  • There is a centralized Service Portal and CMP managing all zones.

  • Zones are named with single letters, like A, B, and C.

  • Names of DRE at the a zone follow a policy similar to the one for CMP. For example, DR_A and DR_B.

  • The Service Portal is integrated with the CMP and the DR engines in all zones. The Service Portal keeps information about the VMs in all zones and the IDs of the corresponding DR services.

  • The user is called Alice.

Also, you must ensure the Service Portal and CMP are redundant and will be available in case of a full zone outage.

Procedure

  1. Alice selects the stub VM in Zone B to be used for recovering operations.

  2. The Service Portal keeps the service ID for the stub VM. It uses this to retrieve the list of available recovery points for this service in Zone B by sending a request to DR_B using the GET /recoveryPoint method and providing the identifier of the service.

  3. The Service Portal provides Alice with a list of available recovery points in Zone B.

  4. Alice selects the recovery point to revert to in zone B.

  5. The Service Portal sends a request to DR_B to obtain metadata about a recovery point using the GET /metadata method and providing the identifier of the recovery point.

  6. The Service Portal updates the Stub VM configuration if needed. This is important if the stub VM configuration has been changed since the recovery point and doesn’t match the configuration of the original VM when the selected recovery point was created.

  7. The Service Portal sends a request to DR_B to revert the stub VM to the selected recovery point using the POST /drRevert method and providing the identifiers of the VM and the recovery point.

    Optionally, in this request the Service Portal can define a new DR policy and location for the restored VM running in Zone B. Note the following:

    • This operation requires the volumes of the test VM to match by number, order, and size the volumes of the original VM when the recovery point was created. Consider that StorPool would resize the volumes properly.

    • The guest OS configuration may need to be changed to run in the new environment. This can include IP configuration, service discovery, and so on.

  8. The Service Portal starts the stub VM and promotes it to an active VM.

  9. The DR Engine in Zone B starts protecting the new active VM by creating periodic recovery points and sending them to the new recovery size, as defined in the /drRevert method.

../../_images/dre_failover_real.png

More information

Disaster Recovery Engine API