blktrace - A block layer IO tracing utility

What can blktrace do?

Let’s dive into the block device layer and see how the I/O are handled in disk queues.

The following stack shows the I/O paths including the block device layer. The application can issue I/O requests directly to block device or through file systems. In the following sections, let’s dig into the block device layer with blktrace in order to understand the I/O pattern and disk queue activities.

        Applications
        |      |   |  
        V      |   | 
  File systems |   |
        |      |   |
        V      |   |
 Page Cache <--|   | 
        |          | 
        V          V
Block I/O Layer: Request Queues
        |
        V 
  SCSI Drivers
        |
        V
 Physical Devices               

Don’t forget iostat

iostat is always the first place to understand the I/O characteristics before we turn to other advanced utilities like blktrace.

It provides the following information for the disk IOs.

  • Number of read/write merges per second
  • Number of reads/writes per second
  • Average I/O request size(in sectors)
  • Average request queue size
  • Average I/O wait time

If any of the above metrics indicates disk I/O performance concerns and it’s not sufficient to help us explain the performance issue, we can turn to blktrace or other tracing utilities for more insights.

blktrace and blkparse

To trace the target block device:

$ blktrace -d /dev/<sd-device-name> -D <trace-raw-data-save-dir> -w <trace-time-in-seconds>

To parse the blktrace data:

$ blkparse -i <sd-device-name> -D <trace-raw-data-save-dir> -o blkparse.<sd-device-name>.out -d blktrace.bin

blkparse output snippet:

  8,0    7        3     0.992335623  4180  A  WS 680911952 + 8 <- (8,2) 679885904
  8,0    7        4     0.992336407  4180  Q  WS 680911952 + 8 [jbd2/dm-7-8]
  8,0    7        5     0.992338784  4180  G  WS 680911952 + 8 [jbd2/dm-7-8]
  8,0    7        6     0.992339977  4180  I  WS 680911952 + 8 [jbd2/dm-7-8]
  8,0    7        7     0.992341444  4180  D  WS 680911952 + 8 [jbd2/dm-7-8]
  8,0   56        1     0.992499505     0  C  WS 680911952 + 8 [0]
  8,0   47        7     0.991930131  4180  A  WS 680911920 + 8 <- (8,2) 679885872
  8,0   47        8     0.991930522  4180  Q  WS 680911920 + 8 [jbd2/dm-7-8]
  8,0   47        9     0.991932697  4180  M  WS 680911920 + 8 [jbd2/dm-7-8]

The columns are Dev major,minor, CPU id, Sequence number, Timestamp, PID, Event, Operation, Start block + number of blocks(offset), Process name.

In the above example, The first IO starts at block 680911952 with the offset of 8 blocks. It is handled in the following sequence.

  • Remapped to a different device(8,2)
  • Handled by the request queue code
  • Get the request
  • Inserted to the request queue
  • Dispatch to device driver
  • Completion

The second IO starts at block 680911920 with the offset of 8 blocks. It’s handled in the following sequence.

  • Remapped to a different device(8,2)
  • Handled by the request queue code
  • Back merged with request on queue

blkparse output also includes a summary to explain the number of I/Os in each queuing phase. In the following example, there are 86 writes handled by request queue. 19 out of 86 writes are merged with request on queue. Thus, only 67 writes are issued to device to complete the front end requests.

Total (sda):
 Reads Queued:           0,        0KiB  Writes Queued:          86,      628KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:       67,      628KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:       67,      628KiB
 Read Merges:            0,        0KiB  Write Merges:           19,       88KiB
 IO unplugs:            21               Timer unplugs:           0

btt

btt is a post-processing tool for blktrace. blktrace is capable of producing tremendous amounts of output in the form of multiple individual traces per IO executed during the traced run. It is also capable of producing some general statistics concerning IO rates and the like. btt goes further and produces a variety of overall statistics about each of the individual handling of IOs, and provides data we believe is useful to plot to provide visual comparisons for evaluation.

btt processes the binary file produced by blkparse.The major areas of output measured by btt include:

  • Q2Q : Queue-to-Queue time
  • Q2G : Queue-to-GetRequest time
  • S2G : Sleep-to-GetRequest time
  • G2I : GetRequest-to-Insert time
  • Q2M : Queue-to-Merge time
  • I2D : Insert-to-Issue time
  • M2D : Merge-to-Issue time
  • D2C : Issue-to-Complete time
  • Q2C : Queue-to-Complete time

For the D2C, it includes the driver and device time. It’s the average time from when the actual IO was issued to the driver until is completed (completion trace) back to the block IO layer. The D2C time should be greater than the actual physcial disk I/O latency which is usually measured in disk(array) side.

In the following exampl, 98.9265% of time is spent on D2C which is expected. The average IO time serviced by the disk is 1.65ms. The max IO time is 5.10ms. We may measure the I2D metric for different I/O scheduler, like noop in SSD case.

$ btt -i blktrace.bin -B offset -o btt.out

$ ls btt.*.out*
btt.sda.30s.out.avg  btt.sda.30s.out.dat  btt.sda.30s.out_dhist.dat  btt.sda.30s.out.msg  btt.sda.30s.out_qhist.dat

$ cat btt.out.avg
==================== All Devices ====================

ALL               MIN           AVG           MAX                  N
--------------- ------------- ------------- ------------- -----------

Q2Q               0.000001053   0.304967428   5.071043004          85
Q2G               0.000000338   0.000001928   0.000008143          67
G2I               0.000000161   0.000008231   0.000126276          67
Q2M               0.000000178   0.000000664   0.000002175          19
I2D               0.000000223   0.000003979   0.000024517          67
M2D               0.000002771   0.000030058   0.000116136          19
D2C               0.000089761   0.001640496   0.005096214          86
Q2C               0.000093453   0.001658297   0.005098018          86

==================== Device Overhead ====================

       DEV |       Q2G       G2I       Q2M       I2D       D2C
---------- | --------- --------- --------- --------- ---------
 (  8,  0) |   0.0906%   0.3867%   0.0088%   0.1869%  98.9265%
---------- | --------- --------- --------- --------- ---------
   Overall |   0.0906%   0.3867%   0.0088%   0.1869%  98.9265%
[..]