5

I'm setting up a Linux system in KVM (QEMU) to test the effect of adding a writeback LVM cache on a fast disk in front of a logical volume that resides on a set of very slow disks (a RAID1 LV). This is modelled on an actual physical configuration that I don't want to touch until I know how it may possibly handle with the cache added.

The issue is that in KVM, all disks perform at the same speed, so the cache is rarely utilized, and I don't see any performance benefits. Ideally, I want the RAID1 mirror to struggle with I/O, allowing me to observe the cache volume filling up during writes and gradually writing back to the mirror set.

Is there a way to artificially throttle the speed of a disk in KVM/QEMU?

I'm currently using Debian 12 as the host for this experiment, with the KVM machine running Alpine Linux (and also testing with Debian 12). The KVM setup includes one main qcow2 image (the fast disk) and additional qcow2 images (the slow RAID mirror).

2
  • First idea to come into my head... write a simple FUSE filesystem (there's plenty of examples) that's deliberately slow (just add a delay every few Kb read/write?), create a large file on that and use that large file as a disk in KVM ? Commented Jul 5 at 13:12
  • @StephenHarris Yes, putting the image files on a slow media is one possibility. However, I was hoping there would be something that would allow me to set the I/O speeds a bit more precicely.
    – Kusalananda
    Commented Jul 5 at 13:14

4 Answers 4

7

Qemu has native support for Network Block Devices (NBD), --drive file=nbd://<host>[:<port>]/[<export>]. (you could also use nbd.ko in the guest itself).

That's kind of useful, because with nbdkit, there's a NBD server implementation that supports plugins to simulate service distoritions as filters. Something like nbdkit --filter=delay file /path/to/backing/storage rdelay=300ms wdelay=300ms would simulate a hard drive with atrocious access delay, using the delay filter.

You'd probably want the spinning filter to come somewhere close behaviour of spinning platters, combined with with rate filter.

Assuming you're not going to reuse the data and are doing this on a machine with plenty RAM, it'd make sense to me to use the memory plugin (instead of the file plugin as in the example above), to get real storage imperfections out of the picture.

6
  • This looks like a neat approach. I will try to test it later. Thanks.
    – Kusalananda
    Commented Jul 5 at 13:17
  • 1
    you're welcome! I wouldn't have thought of it had I not tried to image a laptop SSD using nbdkit earlier today. Commented Jul 5 at 13:18
  • It's a bit fiddly, but it definitely solves the issue.
    – Kusalananda
    Commented Jul 5 at 15:01
  • 1
    One suggestion to add here: While it’s possible to have the guest use the NBD devices itself instead of using them through QEMU, doing so means that the guest networking stack then has an impact on how everything performs. In my experience, that will usually be a bigger impact than having QEMU connect to the device itself and then expose it as a VirtIO block device to the guest, irrespective of what the guest network stack looks like. Commented Jul 6 at 22:55
  • Sorry, unaccepting this for now. I was a bit quick to accept it before. It does not mean it's a bad solution, only that I need to evaluate the other answers a bit more carefully first. Thanks again for mentioning nbdkit!
    – Kusalananda
    Commented 2 days ago
5

The device-mapper subsystem includes CONFIG_DM_DELAY, a device-mapper target that can be used to add delay to disk operations.

The Debian 12 standard kernel includes it as a module.

For example: create /dev/mapper/delayed, which will be the same as /dev/sdX, but with a 500 ms delay added:

echo "0 `blockdev --getsz /dev/sdX` delay /dev/sdX 0 500" | dmsetup create delayed

More info: https://docs.kernel.org/next/admin-guide/device-mapper/delay.html

2
  • This would probably also work, and I will test it at some later stage. I'm assuming you would do this in the guest VM?
    – Kusalananda
    Commented Jul 5 at 15:05
  • 1
    By using it in the guest VM you would get to set it up and tear it down immediately as needed. If the VM's disk exists as a separate device at the host level, you could do it at the virtualization host too, but you would have to change the VM configuration to make it use the mapped device instead of the original, so it would probably require a VM reboot.
    – telcoM
    Commented Jul 5 at 15:08
5

QEMU provides internal support for limiting bandwidth in bytes per second and IO rates based on IO operations per second (in both cases with support for doing this differently for read or write operations).

If invoking QEMU directly, the relevant additional options on the -drive flag are:

  • bps=x, bps_rd=x, bps_wr=x: To limit global, read, and write bandwidth to x bytes per second respectively.
  • bps_max=x, bps_rd_max=x, bps_wr_max=x: Similar, but to allow a temporary burst over the regular limit.
  • iops=x, iops_rd=x, iops_wr=x: To limit global, read, and write IO operations per second to x total operations per second respectively.
  • iops_max=x, iops_rd_max=x, iops_wr_max=x: Similar, but for a temporary burst over the regular limit.
  • iops_size=y: To treat each y bytes of an IO request as a separate IO operation for the above limits (intended to help better cap IO performance).
  • group=g: To associate the device with a quota group named g. All devices in the same named group will use the same bandwidth and IOPS settings, and share their limits.

libvirt also supports this interface in their domain XML format, using the <iotune> element (details can be found part way through this section of the libvirt domain XML docs), providing all of the same functionality, but with a few extra knobs to fine tune things better.

If you just need a quick test, this is probably the simplest approach as it is entirely self-contained. If the test is a throwaway setup, you could possibly also put the disk image on a tmpfs instance on the host side to limit impact from the host IO subsystem.

1
  • Ah, I was hoping there would be something simple like this! I will have to get back to this later to test it. Thanks!
    – Kusalananda
    Commented 2 days ago
1

I decided to try out Austin Hemmelgarn's method, and I can say definitively that it works... almost too well.

I wanted to run a quick test with the Fedora 40 Workstation VM I already have installed, so first I booted it up and ran a quick baseline disk-benchmark in GNOME Disks, because pictures are pretty any precision wasn't my concern. That run looked like this:

"before" benchmark

(My system has a lot of other stuff going on, and it's an older machine, so despite the SSD speeds are kind of all over the place in VM-world.)

Then I shut down the VM, opened up its KVM config, and made the following edit, setting the max. total transfer rate for the VM's boot/only virtual disk waaaaay down to 100K/s:

diff --git a/fedora40-wor.txt b/fedora40-wor.txt
index 25e7f9d..d46595a 100644
--- a/fedora40-wor.txt
+++ b/fedora40-wor.txt
@@ -53,6 +53,10 @@
       <driver name="qemu" type="qcow2" cache="writeback" discard="unmap"/>
       <source file="/home/ferd/.local/share/gnome-boxes/images/fedora40-wor"/>
       <target dev="vda" bus="virtio"/>
+      <iotune>
+        <total_bytes_sec>100000</total_bytes_sec>
+        <read_iops_sec>20000</read_iops_sec>
+      </iotune>
       <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
     </disk>
     <disk type="file" device="cdrom">

(You can also add a <write_iops_sec> limiting value if you want to constrain write operations.)

...Then I booted it back up. Well... started to. I was able to monitor the VM's disk activity in virsh with domstate <domain> --block, and as I watched the block.0.rd.bytes= value slooowly creep upwards, I realized I'd forgotten just how much disk activity is involved in booting up a Linux system.

So after around 60 seconds of getting nowhere, I gave up, destroy <domain>'d the VM instance, went back into the configuration, and added a pair of zeros into the the total_bytes_sec value to make it a more-tolerable 10M/s max rate.

That got the system to boot up in under than a minute (actually, in only a few seconds), and I headed back into GNOME Disks to repeat the benchmark.

It's a good thing I wasn't looking for precision, because GNOME Disks definitely does NOT deliver it. As I said, I set the max transfer rate to 10M/s. As my benchmark run of 100 20MB samples progressed, I could see that each sample was taking roughly 2 seconds, which checks out.

Disks, though, continuously estimated the read speed at an impossible 30MB/s for most of the run. Except for a couple of spots where it jumped up to really impossible values, hence the crazy spike and nonsense numbers in this image. But the actual benchmark run took right around 200 seconds, 2 seconds per sample, which is consistent with the limits I'd set.

"after" benchmark

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .