Enable TRIM/Discard operations for KVM-based clouds

Issuing TRIM/Discard commands (trims), the Operating System notify the underlying storage device, which data is marked as deleted in the filesystem, and can be effectively erased from the physical media. Trim in the ATA command set (known as UNMAP in the SCSI command set) is well-known by storage vendors and has been in the Linux kernel for a long time. Supported in most used Linux filesystems (ext4, XFS, Btrfs, etc.), these commands help reduce SSD drive wearing and prevent progressive performance degradation of write operations in long-term use.

The following lines cover all steps to configure and enable trims inside virtual machines (VMs) and ensure these commands are propagated to the storage system. The steps are for virtio-scsi (/dev/sd*) and virtio-blk (/dev/vd*) virtual disks.

Verify the storage solution

The first step is ensuring the chosen storage solution supports trim operations on its provided virtual drives. On a Linux host with an attached virtual drive, check the values of discard_granularity and discard_max_bytes. In a StorPool cluster with an attached volume (volume-1) on a client, issue the following command:

# cat /sys/block/$(realpath --relative-to=/dev/ /dev/storpool/volume-1)/queue/{discard_granularity,discard_max_bytes}

If the output is two non-zero numbers (e.g. 4096 and 1048576), trims are supported. If the result contains only zeroes, there is no trim support.

Virtualization stack

Since version 1.5 of qemu, trims for virtio-scsi devices are supported and can be enabled. For virtio-blk, such capability raises in qemu 4.0. To check the version of the used qemu-kvm use:

For RHEL and its derivatives:

# /usr/libexec/qemu-kvm --version

For Ubuntu:

# /usr/bin/qemu-system-x86_64 --version

If your cloud’s hosts OS is relatively new (RHEL 8.x or Ubuntu 20.x), trims are supported for both virtio-scsi and virtio-blk devices. Otherwise, it is probably time to consider OS upgrades.

Virtual machine type

From qemu version 4.0, trims on virtio-blk devices are available. But there is one catch - only for VMs which type is q35. With the older i440fx type, trims are available only for virtio-scsi drives.

To check VM type use:

# virsh qemu-monitor-command $VM_NAME --hmp info qom-tree | head -n 1

The output will be something like this:

/machine (pc-q35-rhel8.2.0-machine)

or

/machine (pc-i440fx-2.11-machine)

An alternative way is to check running VM’s XML definition. For this approach, the command xmllint is needed, which is provided by the libxml2 package (in RHEL) or by libxml2-utils (in Ubuntu). To get VM type:

# virsh dumpxml $VM_NAME | xmllint --xpath 'string(/domain/os/type/@machine)' - ; echo

And the output will be something like this:

pc-q35-rhel8.2.0

or

pc-i440fx-2.11

Virtual drives discard

The next step is verifying that the VM is configured to propagate the trims to the storage system. First, let’s check the VM’s definition for the necessary parameters. Get the definition by:

# virsh dumpxml $VM_NAME | less

And, for each disk with type='block', ensure that there is discard='unmap’ in its driver configuration. For virtio-blk, it should look like this:

...
<disk type='block' device='disk'>
  <driver name='qemu' type='raw' cache='none' io='native' discard='unmap'/>
  <target dev='vda' bus='virtio'/>
  <alias name='virtio-disk0'/>
...

And, for virtio-scsi, like this:

...
<disk type='block' device='disk'>
  <driver name='qemu' type='raw' cache='none' io='native' discard='unmap'/>
  <target dev='sda' bus='scsi'/>
  <alias name='scsi0-0-0-0'/>
...

Then, check that these settings are effective in use by qemu-monitor-command and the alias of the device. For virtio-blk, the command is:

# virsh qemu-monitor-command $VM_NAME --cmd '{"execute": "qom-get", "arguments": { "path": "/machine/peripheral/virtio-disk0", "property": "discard" }}'

If trims are in charge it will return something like:

{"return": true, "id": "libvirt-2872196"}

For virtio-scsi devices need to check discard_granularity by:

# virsh qemu-monitor-command $VM_NAME --cmd '{"execute": "qom-get", "arguments": { "path": "/machine/peripheral/scsi0-0-0-0", "property": "discard_granularity" }}'

{"return":4096, "id": "libvirt-2872649"}

If the value of return is bigger than 0, trims are available.

Also, could check the command line of a running VM:

# ps axww | grep $VM_NAME
2972318 ? Sl 302:11 /usr/bin/qemu-kvm -name guest=$VM_NAME,debug-threads=on … -blockdev {"driver": "host_device", … "discard": "unmap"} … -blockdev {"driver": "host_device", … "discard": "unmap"} …

Guest configuration

Trim operations have been supported in the virtio-scsi kernel module for a long time ago, but such operations are available for the virtio-blk kernel module relatively soon, introduced in kernel version 4.20-rc7 (patch 1028619). Use the appropriate version of the guest kernel to take advantage of this feature. Ubuntu 18.04 LTS offers kernel 5.4, the same version as in 20.x LTS. RHEL, and its derivatives, like AlmaLinux, added discard support from version 8.1 with plans to backport it to RHEL 7, too (Solution 4780831).

After the OS is chosen and installed, there are a few more steps. Ensure that trims are available inside the VM.

# lsblk --discard

If the columns DISC-GRAN and DISC-MAX are not 0, the device is ready for trimming. Test it by:

# fstrim --verbose --all
/: 5.4 GiB (5793656832 bytes) trimmed on /dev/vda1

Output like this means that everything works as expected.

The last step for the administrator is to configure how trim operations will happen.

The first option is after each delete, the OS to issue trim command too. If that is the chosen strategy, in /etc/fstab, to mount options for a device or partition, add discard.

UUID=f41e390f-835b-4223-a9bb-9b45984ddf8d  /  xfs  defaults,discard  0 0

Remount with these options, and everything is done. However, observe how the VM will behave for a while, especially with frequent deletions. Because after each delete, the OS issue a trim command, there is a chance for impact on the whole VM performance.

An alternative approach is the execution of fstrim periodically, once a day or even a week, depending on the intensity of delete operations. If this is the chosen method, there is a systemd timer for that. Enable and start it.

# systemctl enable --now fstrim.timer

And the job is done.

Note

There is one thing that needs consideration in this approach - if fstrim.timer runs in multiple virtual machines, the administrator should spread the times for execution evenly. Otherwise, if all fstrim.timers run on the same day, at the same time, depending on the amount of trimmed data, there could be peaks in storage system utilization which impacts the performance.

Hints

  1. If the filesystem is placed over LVM volume, in /etc/lvm/lvm.conf, find the option issue_discards and set it to 1. That tells LVM to send discards to an LV’s underlying physical volumes when the LV no longer uses the physical volumes’ space, e.g. lvremove, lvreduce.

  2. If the filesystem is on LUKS encrypted partition, add the option discard for that partition in /etc/crypttab. Edit and save the file, reboot, and all is done.

  3. If inside VM, there is installed qemu-agent, and it runs, and trims are allowed, the administrator can issue fstrim commands from the hypervisor.

# virsh domfstrim --domain $VM_NAME