fio benchmark on multiple devices
In this post, we study how to run fio benchmark on multiple devices. We also try to understand how the iodepth reflects on each device.
We start with single device and the following global parameters are used.
- blocksize=16k
- filesize=50G (write/read 50G data on each device)
- iodepth=64 (will explain more with the experiment)
- end_fsync
- group_reporting
Write single device
Using one job to write single device /dev/nvme2n1:
$ fio --ioengine=libaio --direct=1 --readwrite=write --blocksize=16k --filesize=50G --end_fsync=1 --iodepth=64 --group_reporting --name=job1 --filename=/dev/nvme2n1
job1: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=64
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=1906MiB/s][r=0,w=122k IOPS][eta 00m:00s]
job1: (groupid=0, jobs=1): err= 0: pid=32600: Fri Apr 22 22:19:32 2022
write: IOPS=116k, BW=1820MiB/s (1908MB/s)(50.0GiB/28134msec)
slat (nsec): min=1362, max=64140, avg=2467.92, stdev=1052.40
clat (usec): min=4, max=4503, avg=546.60, stdev=554.19
lat (usec): min=12, max=4505, avg=549.15, stdev=554.20
clat percentiles (usec):
| 1.00th=[ 13], 5.00th=[ 18], 10.00th=[ 23], 20.00th=[ 34],
| 30.00th=[ 50], 40.00th=[ 90], 50.00th=[ 474], 60.00th=[ 619],
| 70.00th=[ 775], 80.00th=[ 1029], 90.00th=[ 1418], 95.00th=[ 1631],
| 99.00th=[ 1942], 99.50th=[ 2057], 99.90th=[ 2311], 99.95th=[ 2474],
| 99.99th=[ 3228]
bw ( MiB/s): min= 1664, max= 1966, per=100.00%, avg=1821.04, stdev=86.43, samples=56
iops : min=106554, max=125860, avg=116546.80, stdev=5531.48, samples=56
lat (usec) : 10=0.01%, 20=7.69%, 50=22.50%, 100=10.42%, 250=4.17%
lat (usec) : 500=5.92%, 750=17.86%, 1000=10.58%
lat (msec) : 2=20.14%, 4=0.71%, 10=0.01%
cpu : usr=12.36%, sys=36.52%, ctx=1591388, majf=0, minf=17
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,3276800,0,1 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=1820MiB/s (1908MB/s), 1820MiB/s-1820MiB/s (1908MB/s-1908MB/s), io=50.0GiB (53.7GB), run=28134-28134msec
Disk stats (read/write):
nvme2n1: ios=88/3276800, merge=0/0, ticks=10/1784551, in_queue=1784561, util=99.65%
In the iostat, w/s(writes per second) on nvme2n1 is 116k which is equal to the fio iops. The avgqu-sz(queue depth) is equal to fio iodepth 64.
$ iostat -ktdx 5 | egrep "Device|nvme2n1|nvme3n1"
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 116077.60 0.00 1857241.60 32.00 63.36 0.55 0.00 0.55 0.01 100.00
nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 116236.40 0.00 1859782.40 32.00 63.48 0.55 0.00 0.55 0.01 100.02
nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 116508.00 0.00 1864128.00 32.00 63.50 0.55 0.00 0.55 0.01 99.98
nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 115832.60 0.00 1853321.60 32.00 63.49 0.55 0.00 0.55 0.01 100.02
nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 117750.00 0.00 1884000.00 32.00 63.48 0.54 0.00 0.54 0.01 100.00
nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Write two devices
Using two jobs to write two devices /dev/nvme2n1 and /dev/nvme3n1 separately:
$ fio --ioengine=libaio --direct=1 --readwrite=write --blocksize=16k --filesize=50G --end_fsync=1 --iodepth=64 --group_reporting --name=job1 --filename=/dev/nvme2n1 --name=job2 --filename=/dev/nvme3n1
job1: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=64
job2: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=64
fio-3.7
Starting 2 processes
Jobs: 1 (f=1): [W(1),_(1)][100.0%][r=0KiB/s,w=2904MiB/s][r=0,w=186k IOPS][eta 00m:00s]
job1: (groupid=0, jobs=2): err= 0: pid=32892: Fri Apr 22 22:22:05 2022
write: IOPS=233k, BW=3648MiB/s (3825MB/s)(100GiB/28072msec)
slat (nsec): min=1356, max=57113, avg=2474.06, stdev=784.80
clat (usec): min=6, max=4165, avg=539.57, stdev=563.55
lat (usec): min=12, max=4167, avg=542.13, stdev=563.57
clat percentiles (usec):
| 1.00th=[ 13], 5.00th=[ 16], 10.00th=[ 21], 20.00th=[ 32],
| 30.00th=[ 44], 40.00th=[ 67], 50.00th=[ 490], 60.00th=[ 619],
| 70.00th=[ 775], 80.00th=[ 1037], 90.00th=[ 1450], 95.00th=[ 1647],
| 99.00th=[ 1926], 99.50th=[ 2024], 99.90th=[ 2278], 99.95th=[ 2376],
| 99.99th=[ 2900]
bw ( MiB/s): min= 1611, max= 1981, per=50.57%, avg=1844.83, stdev=83.99, samples=109
iops : min=103146, max=126830, avg=118069.08, stdev=5375.32, samples=109
lat (usec) : 10=0.01%, 20=9.06%, 50=24.62%, 100=11.34%, 250=2.22%
lat (usec) : 500=2.86%, 750=18.76%, 1000=10.17%
lat (msec) : 2=20.38%, 4=0.59%, 10=0.01%
cpu : usr=12.43%, sys=36.55%, ctx=3200368, majf=0, minf=32
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,6553600,0,2 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=3648MiB/s (3825MB/s), 3648MiB/s-3648MiB/s (3825MB/s-3825MB/s), io=100GiB (107GB), run=28072-28072msec
Disk stats (read/write):
nvme2n1: ios=88/3276800, merge=0/0, ticks=10/1782540, in_queue=1782549, util=99.67%
nvme3n1: ios=88/3276800, merge=0/0, ticks=13/1745688, in_queue=1745702, util=97.57%
Note that each job writes one device separately. The iops doubles compared to single device write.
In the iostat, the w/s on each device is 116k and the total w/s on two devices are ~233k. The avgqu-sz on each device is 64 which is expected and equal to fio iodepth 64.
$ iostat -ktdx 5 | egrep "Device|nvme2n1|nvme3n1"
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 116848.20 0.00 1869571.20 32.00 63.56 0.54 0.00 0.54 0.01 100.00
nvme3n1 0.00 0.00 0.00 119530.00 0.00 1912480.00 32.00 63.57 0.53 0.00 0.53 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 116253.00 0.00 1860048.00 32.00 63.57 0.55 0.00 0.55 0.01 100.00
nvme3n1 0.00 0.00 0.00 119619.80 0.00 1913916.80 32.00 63.58 0.53 0.00 0.53 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 116381.20 0.00 1862099.20 32.00 63.56 0.55 0.00 0.55 0.01 100.08
nvme3n1 0.00 0.00 0.00 118331.00 0.00 1893296.00 32.00 63.57 0.54 0.00 0.54 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 116712.20 0.00 1867395.20 32.00 63.57 0.54 0.00 0.54 0.01 100.00
nvme3n1 0.00 0.00 0.00 119082.40 0.00 1905318.40 32.00 63.56 0.53 0.00 0.53 0.01 100.00
Read single device
Using one job to read single device /dev/nvme2n1:
$ fio --ioengine=libaio --direct=1 --readwrite=read --blocksize=16k --filesize=50G --end_fsync=1 --iodepth=64 --group_reporting --name=job1 --filename=/dev/nvme2n1
job1: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=64
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=2825MiB/s,w=0KiB/s][r=181k,w=0 IOPS][eta 00m:00s]
job1: (groupid=0, jobs=1): err= 0: pid=33037: Fri Apr 22 22:24:12 2022
read: IOPS=181k, BW=2820MiB/s (2957MB/s)(50.0GiB/18153msec)
slat (nsec): min=1274, max=52836, avg=1738.48, stdev=799.29
clat (usec): min=75, max=2997, avg=352.48, stdev=89.63
lat (usec): min=76, max=2999, avg=354.28, stdev=89.64
clat percentiles (usec):
| 1.00th=[ 192], 5.00th=[ 229], 10.00th=[ 245], 20.00th=[ 273],
| 30.00th=[ 302], 40.00th=[ 322], 50.00th=[ 351], 60.00th=[ 371],
| 70.00th=[ 392], 80.00th=[ 416], 90.00th=[ 461], 95.00th=[ 506],
| 99.00th=[ 627], 99.50th=[ 676], 99.90th=[ 807], 99.95th=[ 889],
| 99.99th=[ 988]
bw ( MiB/s): min= 2781, max= 2826, per=100.00%, avg=2820.55, stdev= 7.65, samples=36
iops : min=178016, max=180900, avg=180515.03, stdev=489.56, samples=36
lat (usec) : 100=0.01%, 250=11.85%, 500=82.74%, 750=5.23%, 1000=0.17%
lat (msec) : 2=0.01%, 4=0.01%
cpu : usr=14.06%, sys=44.38%, ctx=2003314, majf=0, minf=271
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=3276800,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=2820MiB/s (2957MB/s), 2820MiB/s-2820MiB/s (2957MB/s-2957MB/s), io=50.0GiB (53.7GB), run=18153-18153msec
Disk stats (read/write):
nvme2n1: ios=3274022/0, merge=0/0, ticks=1149877/0, in_queue=1149877, util=99.53%
In the iostat, r/s(reads per second) on nvme2n1 is ~181k which is equal to the fio iops. The avgqu-sz(queue depth) is equal to fio iodepth 64.
$ iostat -ktdx 5 | egrep "Device|nvme2n1|nvme3n1"
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 180561.80 0.00 2888988.80 0.00 32.00 63.44 0.35 0.35 0.00 0.01 100.02
nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 180625.60 0.00 2890009.60 0.00 32.00 63.42 0.35 0.35 0.00 0.01 100.00
nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 180660.80 0.00 2890572.80 0.00 32.00 63.44 0.35 0.35 0.00 0.01 100.04
nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Read two devices
Using two jobs to read two devices /dev/nvme2n1 and /dev/nvme3n1 separately:
$ fio --ioengine=libaio --direct=1 --readwrite=read --blocksize=16k --filesize=50G --end_fsync=1 --iodepth=64 --group_reporting --name=job1 --filename=/dev/nvme2n1 --name=job2 --filename=/dev/nvme3n1
job1: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=64
job2: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=64
fio-3.7
Starting 2 processes
Jobs: 2 (f=2): [R(2)][100.0%][r=5639MiB/s,w=0KiB/s][r=361k,w=0 IOPS][eta 00m:00s]
job1: (groupid=0, jobs=2): err= 0: pid=33148: Fri Apr 22 22:25:16 2022
read: IOPS=360k, BW=5628MiB/s (5901MB/s)(100GiB/18195msec)
slat (nsec): min=1272, max=54671, avg=1803.20, stdev=748.72
clat (usec): min=70, max=1344, avg=353.14, stdev=87.70
lat (usec): min=73, max=1352, avg=355.00, stdev=87.70
clat percentiles (usec):
| 1.00th=[ 186], 5.00th=[ 225], 10.00th=[ 245], 20.00th=[ 277],
| 30.00th=[ 306], 40.00th=[ 330], 50.00th=[ 351], 60.00th=[ 371],
| 70.00th=[ 392], 80.00th=[ 416], 90.00th=[ 461], 95.00th=[ 502],
| 99.00th=[ 611], 99.50th=[ 660], 99.90th=[ 775], 99.95th=[ 873],
| 99.99th=[ 979]
bw ( MiB/s): min= 2779, max= 2819, per=50.02%, avg=2814.82, stdev= 6.34, samples=72
iops : min=177878, max=180456, avg=180148.69, stdev=405.55, samples=72
lat (usec) : 100=0.01%, 250=11.87%, 500=83.07%, 750=4.92%, 1000=0.12%
lat (msec) : 2=0.01%
cpu : usr=14.32%, sys=45.95%, ctx=3778567, majf=0, minf=541
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=6553600,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=5628MiB/s (5901MB/s), 5628MiB/s-5628MiB/s (5901MB/s-5901MB/s), io=100GiB (107GB), run=18195-18195msec
Disk stats (read/write):
nvme2n1: ios=3265872/0, merge=0/0, ticks=1149825/0, in_queue=1149825, util=99.51%
nvme3n1: ios=3267478/0, merge=0/0, ticks=1149712/0, in_queue=1149712, util=99.52%
Note that each job reads one device separately. The iops doubles compared to single device write.
In the iostat, the r/s on each device is 180k and the total w/s on two devices are ~360k. The avgqu-sz on each device is 64 which is expected and equal to fio iodepth 64.
$ iostat -ktdx 5 | egrep "Device|nvme2n1|nvme3n1"
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 180121.20 0.00 2881939.20 0.00 32.00 63.44 0.35 0.35 0.00 0.01 100.04
nvme3n1 0.00 0.00 180181.60 0.00 2882905.60 0.00 32.00 63.43 0.35 0.35 0.00 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 180242.20 0.00 2883875.20 0.00 32.00 63.44 0.35 0.35 0.00 0.01 100.00
nvme3n1 0.00 0.00 180260.00 0.00 2884160.00 0.00 32.00 63.44 0.35 0.35 0.00 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 180221.40 0.00 2883542.40 0.00 32.00 63.44 0.35 0.35 0.00 0.01 100.00
nvme3n1 0.00 0.00 180278.40 0.00 2884454.40 0.00 32.00 63.43 0.35 0.35 0.00 0.01 100.00
Incorrect way to write/read multiple devices
Using one job to write two devices:
$ fio --ioengine=libaio --direct=1 --readwrite=write --blocksize=16k --filesize=50G --end_fsync=1 --iodepth=64 --group_reporting --name=job1 --filename=/dev/nvme2n1:/dev/nvme3n1
job1: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=64
fio-3.7
Starting 1 process
Jobs: 1 (f=2): [W(1)][100.0%][r=0KiB/s,w=3537MiB/s][r=0,w=226k IOPS][eta 00m:00s]
job1: (groupid=0, jobs=1): err= 0: pid=33535: Fri Apr 22 22:59:18 2022
write: IOPS=210k, BW=3284MiB/s (3444MB/s)(100GiB/31177msec)
slat (nsec): min=1376, max=52293, avg=2321.83, stdev=1055.59
clat (usec): min=2, max=3190, avg=301.76, stdev=424.79
lat (usec): min=12, max=3192, avg=304.15, stdev=424.78
clat percentiles (usec):
| 1.00th=[ 13], 5.00th=[ 16], 10.00th=[ 20], 20.00th=[ 26],
| 30.00th=[ 32], 40.00th=[ 40], 50.00th=[ 57], 60.00th=[ 112],
| 70.00th=[ 334], 80.00th=[ 635], 90.00th=[ 979], 95.00th=[ 1254],
| 99.00th=[ 1663], 99.50th=[ 1811], 99.90th=[ 2073], 99.95th=[ 2147],
| 99.99th=[ 2343]
bw ( MiB/s): min= 2935, max= 3785, per=99.93%, avg=3282.15, stdev=221.08, samples=62
iops : min=187876, max=242266, avg=210057.40, stdev=14148.71, samples=62
lat (usec) : 4=0.01%, 10=0.01%, 20=11.44%, 50=35.69%, 100=11.67%
lat (usec) : 250=8.20%, 500=8.42%, 750=8.43%, 1000=6.59%
lat (msec) : 2=9.38%, 4=0.17%
cpu : usr=19.43%, sys=52.57%, ctx=1390981, majf=0, minf=22
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,6553600,0,2 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=3284MiB/s (3444MB/s), 3284MiB/s-3284MiB/s (3444MB/s-3444MB/s), io=100GiB (107GB), run=31177-31177msec
Disk stats (read/write):
nvme2n1: ios=59/3276800, merge=0/0, ticks=6/1239109, in_queue=1239116, util=99.68%
nvme3n1: ios=57/3276800, merge=0/0, ticks=7/649690, in_queue=649696, util=99.70%
In the iostat, the w/s on each device is ~102k. The avgqu-sz on the two devices are different(40 and 22) and the total queue depth is about 64. This is something we don’t expect on the benchmark. We usually expect the queue depth is identical on all the devices under benchmark workload.
$ iostat -ktdx 5 | egrep "Device|nvme2n1|nvme3n1"
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 102307.80 0.00 1636924.80 32.00 39.91 0.39 0.00 0.39 0.01 100.00
nvme3n1 0.00 0.00 0.00 102309.40 0.00 1636950.40 32.00 22.12 0.22 0.00 0.22 0.01 100.04
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 101318.40 0.00 1621094.40 32.00 40.48 0.40 0.00 0.40 0.01 100.00
nvme3n1 0.00 0.00 0.00 101295.80 0.00 1620732.80 32.00 21.48 0.21 0.00 0.21 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 100715.00 0.00 1611440.00 32.00 39.92 0.40 0.00 0.40 0.01 100.00
nvme3n1 0.00 0.00 0.00 100736.60 0.00 1611785.60 32.00 22.21 0.22 0.00 0.22 0.01 100.02
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 101647.20 0.00 1626355.20 32.00 40.23 0.40 0.00 0.40 0.01 100.00
nvme3n1 0.00 0.00 0.00 101632.20 0.00 1626115.20 32.00 21.81 0.21 0.00 0.21 0.01 99.98
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 109138.00 0.00 1746208.00 32.00 43.43 0.40 0.00 0.40 0.01 100.04
nvme3n1 0.00 0.00 0.00 109157.60 0.00 1746521.60 32.00 16.06 0.15 0.00 0.15 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 114420.80 0.00 1830732.80 32.00 36.36 0.32 0.00 0.32 0.01 100.00
nvme3n1 0.00 0.00 0.00 114415.40 0.00 1830646.40 32.00 21.52 0.19 0.00 0.19 0.01 100.00
Using two jobs to write two devices:
$ fio --ioengine=libaio --direct=1 --readwrite=write --blocksize=16k --filesize=50G --end_fsync=1 --iodepth=64 --numjobs=2 --group_reporting --name=job1 --filename=/dev/nvme2n1:/dev/nvme3n1
job1: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=64
...
fio-3.7
Starting 2 processes
Jobs: 2 (f=4): [W(2)][100.0%][r=0KiB/s,w=3139MiB/s][r=0,w=201k IOPS][eta 00m:00s]
job1: (groupid=0, jobs=2): err= 0: pid=33670: Fri Apr 22 23:09:45 2022
write: IOPS=216k, BW=3378MiB/s (3542MB/s)(200GiB/60623msec)
slat (nsec): min=1361, max=55969, avg=2845.77, stdev=1030.55
clat (usec): min=5, max=6608, avg=588.58, stdev=938.21
lat (usec): min=11, max=6610, avg=591.51, stdev=938.21
clat percentiles (usec):
| 1.00th=[ 13], 5.00th=[ 14], 10.00th=[ 16], 20.00th=[ 21],
| 30.00th=[ 25], 40.00th=[ 29], 50.00th=[ 34], 60.00th=[ 40],
| 70.00th=[ 330], 80.00th=[ 1663], 90.00th=[ 2311], 95.00th=[ 2540],
| 99.00th=[ 2966], 99.50th=[ 3097], 99.90th=[ 3458], 99.95th=[ 3621],
| 99.99th=[ 4047]
bw ( MiB/s): min= 1498, max= 1895, per=50.01%, avg=1689.53, stdev=95.93, samples=242
iops : min=95908, max=121316, avg=108129.62, stdev=6139.49, samples=242
lat (usec) : 10=0.01%, 20=19.61%, 50=45.92%, 100=2.35%, 250=1.43%
lat (usec) : 500=2.03%, 750=1.82%, 1000=1.70%
lat (msec) : 2=9.28%, 4=15.82%, 10=0.01%
cpu : usr=12.52%, sys=35.52%, ctx=4384091, majf=0, minf=42
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,13107200,0,4 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=3378MiB/s (3542MB/s), 3378MiB/s-3378MiB/s (3542MB/s-3542MB/s), io=200GiB (215GB), run=60623-60623msec
Disk stats (read/write):
nvme2n1: ios=118/6553600, merge=0/0, ticks=11/5128161, in_queue=5128173, util=99.87%
nvme3n1: ios=90/6553600, merge=0/0, ticks=10/2544319, in_queue=2544330, util=99.86%
In this experiment, by setting numjobs=2, there are two cloned jobs to run the same workload. Each job writes two devices.
In the iostat, w/s on each device is ~105k and the total w/s is 210k which is close to fio iops. However, the avgqu-sz on each device is very different(113 vs. 14). The total avgqu-sz is 126 which is close to the fio iodepth of two jobs(2x64=128).
Even though the total w/s is close to our previous experiment which has two separate jobs write each device, the avgqu-sz is not the same as fio iodepth 64 on each device.
So, we prefer to use two separate jobs to write different devices when to benchmark multiple devices.
$ iostat -ktdx 5 | egrep "Device|nvme2n1|nvme3n1"
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 104810.00 0.00 1676963.20 32.00 112.91 1.08 0.00 1.08 0.01 100.04
nvme3n1 0.00 0.00 0.00 104813.00 0.00 1677008.00 32.00 13.72 0.13 0.00 0.13 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 103548.00 0.00 1656764.80 32.00 112.87 1.09 0.00 1.09 0.01 100.04
nvme3n1 0.00 0.00 0.00 103550.60 0.00 1656809.60 32.00 13.74 0.13 0.00 0.13 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 104651.80 0.00 1674428.80 32.00 112.75 1.08 0.00 1.08 0.01 100.00
nvme3n1 0.00 0.00 0.00 104644.20 0.00 1674307.20 32.00 13.89 0.13 0.00 0.13 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 105734.20 0.00 1691747.20 32.00 113.22 1.07 0.00 1.07 0.01 100.00
nvme3n1 0.00 0.00 0.00 105744.40 0.00 1691910.40 32.00 13.40 0.13 0.00 0.13 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 105221.00 0.00 1683536.00 32.00 117.70 1.12 0.00 1.12 0.01 100.00
nvme3n1 0.00 0.00 0.00 105215.60 0.00 1683449.60 32.00 8.93 0.08 0.00 0.08 0.01 100.02
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 114836.60 0.00 1837388.80 32.00 82.89 0.72 0.00 0.72 0.01 100.00
nvme3n1 0.00 0.00 0.00 114786.80 0.00 1836588.80 32.00 43.73 0.38 0.00 0.38 0.01 99.98
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 115051.20 0.00 1840816.00 32.00 78.09 0.68 0.00 0.68 0.01 100.04
nvme3n1 0.00 0.00 0.00 115083.60 0.00 1841337.60 32.00 48.55 0.42 0.00 0.42 0.01 100.04
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 114826.60 0.00 1837225.60 32.00 81.10 0.71 0.00 0.71 0.01 100.00
nvme3n1 0.00 0.00 0.00 114844.60 0.00 1837513.60 32.00 45.49 0.40 0.00 0.40 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 111133.00 0.00 1778128.00 32.00 48.09 0.43 0.00 0.43 0.01 100.02
nvme3n1 0.00 0.00 0.00 111080.80 0.00 1777292.80 32.00 78.49 0.71 0.00 0.71 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 106740.00 0.00 1707840.00 32.00 26.91 0.25 0.00 0.25 0.01 100.00
nvme3n1 0.00 0.00 0.00 106743.40 0.00 1707894.40 32.00 99.64 0.93 0.00 0.93 0.01 100.04
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 0.00 106593.60 0.00 1705497.60 32.00 31.85 0.30 0.00 0.30 0.01 100.00
nvme3n1 0.00 0.00 0.00 106640.80 0.00 1706252.80 32.00 94.76 0.89 0.00 0.89 0.01 100.04
Using one job to read two devices:
$ fio --ioengine=libaio --direct=1 --readwrite=read --blocksize=16k --filesize=50G --end_fsync=1 --iodepth=64 --group_reporting --name=job1 --filename=/dev/nvme2n1:/dev/nvme3n1
job1: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=64
fio-3.7
Starting 1 process
Jobs: 1 (f=2): [R(1)][100.0%][r=4056MiB/s,w=0KiB/s][r=260k,w=0 IOPS][eta 00m:00s]
job1: (groupid=0, jobs=1): err= 0: pid=33378: Fri Apr 22 22:29:06 2022
read: IOPS=258k, BW=4035MiB/s (4231MB/s)(100GiB/25375msec)
slat (nsec): min=1268, max=80634, avg=1910.78, stdev=947.79
clat (usec): min=51, max=2289, avg=245.54, stdev=92.47
lat (usec): min=53, max=2291, avg=247.52, stdev=92.47
clat percentiles (usec):
| 1.00th=[ 85], 5.00th=[ 113], 10.00th=[ 133], 20.00th=[ 151],
| 30.00th=[ 172], 40.00th=[ 204], 50.00th=[ 239], 60.00th=[ 289],
| 70.00th=[ 314], 80.00th=[ 338], 90.00th=[ 363], 95.00th=[ 383],
| 99.00th=[ 441], 99.50th=[ 457], 99.90th=[ 498], 99.95th=[ 529],
| 99.99th=[ 709]
bw ( MiB/s): min= 3783, max= 4101, per=99.99%, avg=4034.89, stdev=43.41, samples=50
iops : min=242150, max=262484, avg=258233.02, stdev=2778.43, samples=50
lat (usec) : 100=2.81%, 250=48.78%, 500=48.31%, 750=0.09%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%
cpu : usr=17.61%, sys=59.69%, ctx=1442924, majf=0, minf=274
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=6553600,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=4035MiB/s (4231MB/s), 4035MiB/s-4035MiB/s (4231MB/s-4231MB/s), io=100GiB (107GB), run=25375-25375msec
Disk stats (read/write):
nvme2n1: ios=3245464/0, merge=0/0, ticks=801718/0, in_queue=801719, util=99.67%
nvme3n1: ios=3245472/0, merge=0/0, ticks=762968/0, in_queue=762969, util=99.67%
$ iostat -ktdx 5 | egrep "Device|nvme2n1|nvme3n1"
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 128705.40 0.00 2059286.40 0.00 32.00 31.86 0.25 0.25 0.00 0.01 99.98
nvme3n1 0.00 0.00 128703.00 0.00 2059248.00 0.00 32.00 30.47 0.24 0.24 0.00 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 129154.60 0.00 2066473.60 0.00 32.00 31.82 0.25 0.25 0.00 0.01 100.02
nvme3n1 0.00 0.00 129157.80 0.00 2066524.80 0.00 32.00 30.53 0.24 0.24 0.00 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 129703.80 0.00 2075260.80 0.00 32.00 31.93 0.25 0.25 0.00 0.01 100.00
nvme3n1 0.00 0.00 129702.40 0.00 2075238.40 0.00 32.00 30.42 0.23 0.23 0.00 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 129521.60 0.00 2072345.60 0.00 32.00 32.04 0.25 0.25 0.00 0.01 100.04
nvme3n1 0.00 0.00 129523.60 0.00 2072377.60 0.00 32.00 30.32 0.23 0.23 0.00 0.01 100.02
Using two jobs to read two devices:
$ fio --ioengine=libaio --direct=1 --readwrite=read --blocksize=16k --filesize=50G --end_fsync=1 --iodepth=64 --numjobs=2 --group_reporting --name=job1 --filename=/dev/nvme2n1:/dev/nvme3n1
job1: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=64
...
fio-3.7
Starting 2 processes
Jobs: 2 (f=4): [R(2)][100.0%][r=5606MiB/s,w=0KiB/s][r=359k,w=0 IOPS][eta 00m:00s]
job1: (groupid=0, jobs=2): err= 0: pid=33809: Fri Apr 22 23:18:37 2022
read: IOPS=358k, BW=5597MiB/s (5869MB/s)(200GiB/36593msec)
slat (nsec): min=1260, max=52904, avg=1967.07, stdev=877.86
clat (usec): min=63, max=9900, avg=355.00, stdev=150.12
lat (usec): min=65, max=9901, avg=357.03, stdev=150.12
clat percentiles (usec):
| 1.00th=[ 165], 5.00th=[ 198], 10.00th=[ 219], 20.00th=[ 245],
| 30.00th=[ 269], 40.00th=[ 297], 50.00th=[ 334], 60.00th=[ 371],
| 70.00th=[ 412], 80.00th=[ 453], 90.00th=[ 510], 95.00th=[ 562],
| 99.00th=[ 685], 99.50th=[ 766], 99.90th=[ 2212], 99.95th=[ 2704],
| 99.99th=[ 3195]
bw ( MiB/s): min= 2725, max= 2811, per=50.00%, avg=2798.56, stdev= 8.75, samples=146
iops : min=174406, max=179932, avg=179107.78, stdev=559.92, samples=146
lat (usec) : 100=0.01%, 250=22.38%, 500=66.31%, 750=10.74%, 1000=0.32%
lat (msec) : 2=0.13%, 4=0.12%, 10=0.01%
cpu : usr=14.17%, sys=46.00%, ctx=5209795, majf=0, minf=550
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=13107200,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=5597MiB/s (5869MB/s), 5597MiB/s-5597MiB/s (5869MB/s-5869MB/s), io=200GiB (215GB), run=36593-36593msec
Disk stats (read/write):
nvme2n1: ios=6516709/0, merge=0/0, ticks=2254035/0, in_queue=2254035, util=99.79%
nvme3n1: ios=6516726/0, merge=0/0, ticks=2345468/0, in_queue=2345468, util=99.82%
$ iostat -ktdx 5 | egrep "Device|nvme2n1|nvme3n1"
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 179103.00 0.00 2865648.00 0.00 32.00 61.69 0.34 0.34 0.00 0.01 100.00
nvme3n1 0.00 0.00 179101.20 0.00 2865619.20 0.00 32.00 64.77 0.36 0.36 0.00 0.01 100.04
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 179355.00 0.00 2869680.00 0.00 32.00 61.83 0.34 0.34 0.00 0.01 100.08
nvme3n1 0.00 0.00 179356.40 0.00 2869702.40 0.00 32.00 64.64 0.36 0.36 0.00 0.01 100.08
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 179112.80 0.00 2865804.80 0.00 32.00 61.77 0.34 0.34 0.00 0.01 100.00
nvme3n1 0.00 0.00 179112.40 0.00 2865798.40 0.00 32.00 64.69 0.36 0.36 0.00 0.01 100.04
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 179088.00 0.00 2865408.00 0.00 32.00 61.85 0.35 0.35 0.00 0.01 100.02
nvme3n1 0.00 0.00 179087.20 0.00 2865395.20 0.00 32.00 64.61 0.36 0.36 0.00 0.01 100.04
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 179096.20 0.00 2865539.20 0.00 32.00 62.14 0.35 0.35 0.00 0.01 99.98
nvme3n1 0.00 0.00 179095.00 0.00 2865520.00 0.00 32.00 64.33 0.36 0.36 0.00 0.01 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme2n1 0.00 0.00 179127.40 0.00 2866038.40 0.00 32.00 62.21 0.35 0.35 0.00 0.01 100.02
nvme3n1 0.00 0.00 179128.80 0.00 2866060.80 0.00 32.00 64.26 0.36 0.36 0.00 0.01 100.00
From the open files output from the command ps and lsof, we know that each job opens two devices for read.
$ ps -ef |grep fio
root 33827 30166 63 23:22 pts/0 00:00:07 fio --ioengine=libaio --direct=1 --readwrite=read --blocksize=16k --filesize=50G --end_fsync=1 --iodepth=64 --numjobs=2 --group_reporting --name=job1 --filename=/dev/nvme2n1:/dev/nvme3n1
root 33925 33827 57 23:22 ? 00:00:06 fio --ioengine=libaio --direct=1 --readwrite=read --blocksize=16k --filesize=50G --end_fsync=1 --iodepth=64 --numjobs=2 --group_reporting --name=job1 --filename=/dev/nvme2n1:/dev/nvme3n1
root 33926 33827 57 23:22 ? 00:00:06 fio --ioengine=libaio --direct=1 --readwrite=read --blocksize=16k --filesize=50G --end_fsync=1 --iodepth=64 --numjobs=2 --group_reporting --name=job1 --filename=/dev/nvme2n1:/dev/nvme3n1
root 33945 30233 0 23:22 pts/1 00:00:00 grep --color=auto fio
$ lsof | grep nvme | grep fio
fio 33925 root 3r BLK 259,0 0t0 33809 /dev/nvme2n1
fio 33925 root 4r BLK 259,11 0t0 33820 /dev/nvme3n1
fio 33925 root 5r BLK 259,11 0t0 33820 /dev/nvme3n1
fio 33926 root 3r BLK 259,0 0t0 33809 /dev/nvme2n1
fio 33926 root 4r BLK 259,0 0t0 33809 /dev/nvme2n1
fio 33926 root 5r BLK 259,11 0t0 33820 /dev/nvme3n1