Enable TRIM/Discard operations for KVM-based clouds
Issuing TRIM/Discard commands (trims), the Operating System notifies the underlying storage device which data is marked as deleted in the filesystem and can be effectively erased from the physical media. Trim in the ATA command set (known as UNMAP in the SCSI command set) is well-known by storage vendors and has been in the Linux kernel for a long time. Supported in most used Linux filesystems (ext4, XFS, Btrfs, and so on), these commands help reduce SSD drive wearing and prevent progressive performance degradation of write operations in long-term use.
This document covers all steps needed to configure and enable trims inside virtual machines (VMs) and ensure these commands are propagated to the storage system. The steps are for virtio-scsi (/dev/sd*) and virtio-blk (/dev/vd*) virtual disks.
Verify the storage solution
The first step is ensuring the chosen storage solution supports trim operations
on its provided virtual drives. On a Linux host with an attached virtual drive,
check the values of DISC-GRAN
and DISC-MAX
in the output of
lsblk --discard
command. In a StorPool cluster with an attached volume
(volume-1
) on a client, issue the following command:
# lsblk --discard /dev/storpool/volume-1
If the values for DISC-GRAN
and DISC-MAX
output are non-zero numbers
(e.g. 4K and 1M), trims are supported. If the result contains only zeroes,
there is no trim support.
Virtualization stack
Since version 1.5 of qemu, trims for virtio-scsi devices are supported and can be enabled. For virtio-blk, such capability raises in qemu 4.0. To check the version of the used qemu-kvm use:
For RHEL and its derivatives:
# /usr/libexec/qemu-kvm --version
For Ubuntu:
# /usr/bin/qemu-system-x86_64 --version
If your cloud’s hosts OS is relatively new (RHEL 8.x or Ubuntu 20.x), trims are supported for both virtio-scsi and virtio-blk devices. Otherwise, it is probably time to consider OS upgrades.
Virtual machine type
From qemu version 4.0, trims on virtio-blk devices are available. But there is one catch - only for VMs which type is q35. With the older i440fx type, trims are available only for virtio-scsi drives.
To check VM type use:
# virsh qemu-monitor-command $VM_NAME --hmp info qom-tree | head -n 1
The output will be something like this:
/machine (pc-q35-rhel8.2.0-machine)
or
/machine (pc-i440fx-2.11-machine)
An alternative way is to check running VM’s XML definition. For this approach,
the command xmllint
is needed, which is provided by the libxml2 package (in
RHEL) or by libxml2-utils (in Ubuntu). To get VM type:
# virsh dumpxml $VM_NAME | xmllint --xpath 'string(/domain/os/type/@machine)' - ; echo
And the output will be something like this:
pc-q35-rhel8.2.0
or
pc-i440fx-2.11
Virtual drives discard
The next step is verifying that the VM is configured to propagate the trims to the storage system. First, let’s check the VM’s definition for the necessary parameters. Get the definition by:
# virsh dumpxml $VM_NAME | less
And, for each disk with type='block'
, ensure that there is
discard='unmap’
in its driver configuration. For virtio-blk, it should look
like this:
...
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native' discard='unmap'/>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
...
And, for virtio-scsi, like this:
...
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native' discard='unmap'/>
<target dev='sda' bus='scsi'/>
<alias name='scsi0-0-0-0'/>
...
Then, check that these settings are effective in use by qemu-monitor-command
and the alias of the device. For virtio-blk, the command is:
# virsh qemu-monitor-command $VM_NAME --cmd '{"execute": "qom-get", "arguments": { "path": "/machine/peripheral/virtio-disk0", "property": "discard" }}'
If trims are in charge it will return something like:
{"return": true, "id": "libvirt-2872196"}
For virtio-scsi devices need to check discard_granularity
by:
# virsh qemu-monitor-command $VM_NAME --cmd '{"execute": "qom-get", "arguments": { "path": "/machine/peripheral/scsi0-0-0-0", "property": "discard_granularity" }}'
{"return":4096, "id": "libvirt-2872649"}
If the value of return
is bigger than 0, trims are available.
Also, could check the command line of a running VM:
# ps axww | grep $VM_NAME
2972318 ? Sl 302:11 /usr/bin/qemu-kvm -name guest=$VM_NAME,debug-threads=on … -blockdev {"driver": "host_device", … "discard": "unmap"} … -blockdev {"driver": "host_device", … "discard": "unmap"} …
Guest configuration
Trim operations have been supported in the virtio-scsi kernel module for a long time ago, but such operations are available for the virtio-blk kernel module relatively soon, introduced in kernel version 4.20-rc7 (patch 1028619). Use the appropriate version of the guest kernel to take advantage of this feature. Ubuntu 18.04 LTS offers kernel 5.4, the same version as in 20.x LTS. RHEL, and its derivatives, like AlmaLinux, added discard support from version 8.1 with plans to backport it to RHEL 7, too (Solution 4780831).
After the OS is chosen and installed, there are a few more steps. Ensure that trims are available inside the VM.
# lsblk --discard
If the columns DISC-GRAN
and DISC-MAX
are not 0, the device is ready for
trimming. Test it by:
# fstrim --verbose --all
/: 5.4 GiB (5793656832 bytes) trimmed on /dev/vda1
Output like this means that everything works as expected.
The last step for the administrator is to configure how trim operations will happen.
The first option is after each delete, the OS to issue trim command too. If that is the chosen strategy, in /etc/fstab, to mount options for a device or partition, add discard.
UUID=f41e390f-835b-4223-a9bb-9b45984ddf8d / xfs defaults,discard 0 0
Remount with these options, and everything is done. However, observe how the VM will behave for a while, especially with frequent deletions. Because after each delete, the OS issue a trim command, there is a chance for impact on the whole VM performance.
An alternative approach is the execution of fstrim periodically, once a day or even a week, depending on the intensity of delete operations. If this is the chosen method, there is a systemd timer for that. Enable and start it.
# systemctl enable --now fstrim.timer
And the job is done.
Note
There is one thing that needs consideration in this approach - if
fstrim.timer
runs in multiple virtual machines, the administrator
should spread the times for execution evenly. Otherwise, if all
fstrim.timers
run on the same day, at the same time, depending on
the amount of trimmed data, there could be peaks in storage system
utilization which impacts the performance.
Hints
If the filesystem is placed over LVM volume, in /etc/lvm/lvm.conf, find the option
issue_discards
and set it to 1. That tells LVM to send discards to an LV’s underlying physical volumes when the LV no longer uses the physical volumes’ space, e.g. lvremove, lvreduce.If the filesystem is on LUKS encrypted partition, add the option discard for that partition in /etc/crypttab. Edit and save the file, reboot, and all is done.
If inside VM, there is installed qemu-agent, and it runs, and trims are allowed, the administrator can issue fstrim commands from the hypervisor.
# virsh domfstrim --domain $VM_NAME