Stale backup volumes and snapshots cleanup procedure

1. Introduction

During VM backups, OnApp temporarily creates backup volumes and snapshots, which are usually automatically deleted by the controller after the backup is completed. Under specific conditions, however, the deletion procedure fails and the temporary volumes and snapshots created during the backup are left unused and only taking up free space. Most cases are already covered through workarounds in LVMSP, but a small portion still requires manual intervention.

2. Volumes attached to nodes

Start with detaching and deleting all backup volumes and snapshots that are more than 2 days old, by executing the following code snippet on a node with installed storpool_mgmt:

while read -u 4 s; do
  v="${s%-SNP}"
  set -x
  storpool detach volume "$v" all && \
  storpool volume "$v" delete "$v" && \
  storpool snapshot "$s" delete "$s"
  set +x;
done 4< <(storpool_req SnapshotsList|jq -r "map(select(.name|contains(\":backup-\"))|select(.creationTimestamp<$(date --date='2 days ago' +%s)))[]|.name")

If the above code snippet fails for different volumes, follow through this page depending on the kind of node the volumes are attached on.

Note

The above code snippet takes older than 2 days snapshots with names ending with ‘-SNP’ from storpool -j snapshot list (-j is for JSON output), detaches and removes volumes without it, and finally deletes the parent snapshot ending with ‘-SNP’. The commands issued in this sequence are the regular storpool detach volume <volume_name> all, storpool volume <volume_name> delete <volume_name> and storpool snapshot <snapshot_name> delete <snapshot_name>.

3. Volumes exported through iSCSI (Hypervisor nodes)

Look for the Target ID (TID) of the iSCSI export:

tgtadm --mode target --op show | grep 'Target.*backup-xxxxxxxxxxxxxx'

By checking the iSCSI exports for the backup volumes in question, note the TID (Target $TID: iqn…) for the corresponding entry, for example:

# tgtadm --mode target --op show | grep 'Target.*backup-fn7giuli19bswq'
...
Target 9: iqn.2019-03-07:onapp.com:backup-fn7giuli19bswq
...

On newer versions of OnApp, it is possible that there will be no output like in the above example. In that case, try running tgtadm without grep:

tgtadm --mode target --op show

Example output (note the second “Backing store path” line):

...
Target 9: iqn.2019-03-07:onapp.com:snapshot-ohdriiwuvbnssx
  System information:
      Driver: iscsi
      State: ready
  I_T nexus information:
  LUN information:
      LUN: 0
          Type: controller
          SCSI ID: IET     00010000
          SCSI SN: beaf10
          Size: 0 MB, Block size: 1
          Online: Yes
          Removable media: No
          Prevent removal: No
          Readonly: No
          Backing store type: null
          Backing store path: None
          Backing store flags:
      LUN: 1
          Type: disk
          SCSI ID: IET     00010001
          SCSI SN: beaf11
          Size: 429497 MB, Block size: 512
          Online: Yes
          Removable media: No
          Prevent removal: No
          Readonly: No
          Backing store type: rdwr
          Backing store path: /dev/onapp-qkwnlpjsuvxhox/backup-fn7giuli19bswq
          Backing store flags:
  Account information:
  ACL information:
      10.200.1.40
...

If the volume appears in the target, then it needs to be removed.

Proceed with the removal of the iSCSI target:

tgtadm --mode target --op update --tid=$TID -n state -v offline
tgtadm --mode target --op delete --tid=$TID

You can re-run the code snippet that does the cleanup at this point or do it after going through all the backup volumes in question.

4. Volumes held by the device mappper (Backup nodes)

Check for mounted volumes and un-mount:

mount | grep backup-xxxxxxxxxxxxxx | awk '{print $3}' | xargs umount -v

List the device mapper entries which are using the volumes and preventing their detachment:

for s in /dev/storpool/*; do echo $s; sd=`readlink $s`; for dm in `ls /sys/block/${sd##*/}/holders`; do echo $dm; done ;done

The volumes in question should have entries with device IDs (dm-$DID) below them, for example:

/dev/storpool/onapp-5ef2xsbvljxc2f:backup-bqaauthhrsndqk
dm-5
/dev/storpool/onapp-5ef2xsbvljxc2f:backup-cjttqdnpoopinm
dm-6
/dev/storpool/onapp-5ef2xsbvljxc2f:backup-dymkzwrgmunegq
dm-8
/dev/storpool/onapp-5ef2xsbvljxc2f:backup-zbpzhztdfidpxp
dm-3
/dev/storpool/onapp-5ef2xsbvljxc2f:backup-okcdxzqokcxidc
dm-2
/dev/storpool/onapp-5ef2xsbvljxc2f:backup-wyfrhcqwmnrqfm
dm-4

If there are no running processes, the device mapper can be cleaned with the following command:

dmsetup remove /dev/dm-$DID

After removing the device mapper entries with dmsetup, there will be two symlinks left for each entry, one linking a device file to the backup volume and one with a similar device name linking to the raw StorPool disk device (/dev/sp-..). Both can be listed using:

find /dev/sp* -type l -exec ls -l {} +

Example output:

lrwxrwxrwx 1 root root      9 Nov 30 10:18 /dev/sp0p -> /dev/sp-0
lrwxrwxrwx 1 root root     56 Nov 30 10:18 /dev/sp0p1 -> /dev/mapper/onapp-5ef2xsbvljxc2f:backup-okcdxzqokcxidcX1
lrwxrwxrwx 1 root root      9 Nov 15 13:39 /dev/sp1p -> /dev/sp-1
lrwxrwxrwx 1 root root     56 Nov 15 13:39 /dev/sp1p1 -> /dev/mapper/onapp-5ef2xsbvljxc2f:backup-zbpzhztdfidpxpX1
lrwxrwxrwx 1 root root      9 Nov 30 12:04 /dev/sp2p -> /dev/sp-2
lrwxrwxrwx 1 root root     56 Nov 30 12:04 /dev/sp2p1 -> /dev/mapper/onapp-5ef2xsbvljxc2f:backup-wyfrhcqwmnrqfmX1
lrwxrwxrwx 1 root root      9 Dec 10 12:35 /dev/sp3p -> /dev/sp-3
lrwxrwxrwx 1 root root     56 Dec 10 12:35 /dev/sp3p1 -> /dev/mapper/onapp-5ef2xsbvljxc2f:backup-bqaauthhrsndqkX1
lrwxrwxrwx 1 root root      9 Dec 10 13:17 /dev/sp4p -> /dev/sp-4
lrwxrwxrwx 1 root root     56 Dec 10 13:17 /dev/sp4p1 -> /dev/mapper/onapp-5ef2xsbvljxc2f:backup-cjttqdnpoopinmX1
lrwxrwxrwx 1 root root      9 Nov 30 12:17 /dev/sp6p -> /dev/sp-6
lrwxrwxrwx 1 root root     56 Nov 30 12:17 /dev/sp6p1 -> /dev/mapper/onapp-5ef2xsbvljxc2f:backup-dymkzwrgmunegqX1

The pair of symlinks in question can be safely removed via rm.

When there are no holders left for the backup volumes in question, proceed with re-running the code snippet that does the cleanup.