What is high availability?

High availability refers to a system or component that is operational without interruption for long periods of time.

High availability is measured as a percentage, with a 100% percent system indicating a service that experiences zero downtime. This would be a system that never fails. It’s pretty rare with complex systems. Most services fall somewhere between 99% and 100% uptime. Most cloud vendors offer some type of Service Level Agreement around availability. Amazon, Google, and Microsoft’s set their cloud SLAs at 99.9%. The industry generally recognizes this as very reliable uptime. A step above, 99.99%, or “four nines,” as is considered excellent uptime.

But four nines uptime is still 52 minutes of downtime per year. Consider how many people rely on web tools to run their lives and businesses. A lot can go wrong in 52 minutes.

So what is it that makes four nines so hard? What are the best practices for high availability engineering? And why is 100% uptime so difficult?

Availability and downtime

As shown in the table below, the number of nines(availability %) correlates to the system downtime.

Image

Reference

Latency Comparison Numbers (~2012)

Image

Conclusion

  • Disk is much slower than the memory
  • Avoid the disk seek if possible
  • Compressing data is worth to consider before sending over network
  • It takes time to send data between data centers if they are in different regions

Reference

Data volumes

The volume of data in a single file or file system can be described by a unit called a byte. However, data volumes can become very large when dealing with Earth satellite data. Below is a table to explain data volume units (Credit: Roy Williams, Center for Advanced Computing Research at the California Insittute of Technology).

  • Kilo- means 1,000; a Kilobyte is one thousand bytes.
  • Mega- means 1,000,000; a Megabyte is a million bytes.
  • Giga- means 1,000,000,000; a Gigabyte is a billion bytes.
  • Tera- means 1,000,000,000,000; a Terabyte is a trillion bytes.
  • Peta- means 1,000,000,000,000,000; a Petabyte is 1,000 Terabytes.
  • Exa- means 1,000,000,000,000,000,000; an Exabyte is 1,000 Petabytes.
  • Zetta- means 1,000,000,000,000,000,000,000; a Zettabyte is 1,000 Exabytes.
  • Yotta- means 1,000,000,000,000,000,000,000,000; a Yottabyte is 1,000 Zettabytes.

Examples of Data Volumes:

Image

Reference

When people think of their worst nightmares, they tend to see terrifying monsters with sharp fangs, ugly trolls with hairy eyeballs, and other intimidating creatures. But in the minds of the sixth-graders of Nirvana Middle School, nothing is worse than the same nightmare that haunts their mind every day and night: A pop quiz.

Read more »

Thin provisioning volume

Logical volume can be thinly provisioned. It allows storage administrator to overcommit the physical storage. In other words, it’s possible to create a logical volume which is larger than the available extents.

Create thin provisioning volume

In the following example, we create a 500GiB thin pool and 100GiB volume.

$ vgcreate vg1 /dev/nvme0n1
  Physical volume "/dev/nvme0n1" successfully created.
  Volume group "vg1" successfully created

$ vgs
  VG     #PV #LV #SN Attr   VSize   VFree
  centos   1   3   0 wz--n- 893.05g      0
  vg1      1   0   0 wz--n- 931.51g 931.51g

$ lvcreate -L 500G --thinpool thinpool1 vg1
  Thin pool volume with chunk size 256.00 KiB can address at most 63.25 TiB of data.
  Logical volume "thinpool1" created.

$ lvs
  LV        VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home      centos -wi-ao---- 839.05g
  root      centos -wi-ao----  50.00g
  swap      centos -wi-ao----   4.00g
  thinpool1 vg1    twi-a-tz-- 500.00g             0.00   10.41

$ lvs -ao name,size,stripesize,chunksize,metadata_percent
  LV                LSize   Stripe Chunk   Meta%
  home              839.05g     0       0
  root               50.00g     0       0
  swap                4.00g     0       0
  [lvol0_pmspare]   128.00m     0       0
  thinpool1         500.00g     0  256.00k 10.41
  [thinpool1_tdata] 500.00g     0       0
  [thinpool1_tmeta] 128.00m     0       
  
$ lvcreate -V 100G --thin -n thinvol1 vg1/thinpool1
  Logical volume "thinvol1" created.

$ lvs
  LV        VG     Attr       LSize   Pool      Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home      centos -wi-ao---- 839.05g
  root      centos -wi-ao----  50.00g
  swap      centos -wi-ao----   4.00g
  thinpool1 vg1    twi-aotz-- 500.00g                  0.00   10.42
  thinvol1  vg1    Vwi-a-tz-- 100.00g thinpool1        0.00

Thin pool volume chunk size

By default, lvm2 starts with 64KiB chunk size and increase its value when the resulting size of the thin pool metadata device grows above 128MiB.

In the previous example, the 500GiB thin pool results in 256KiB chunk size. In the following example, the 100MiB thin pool results in 64KiB chunk size.

$ lvcreate  -L 100M --thinpool thinpool2 vg1
  Thin pool volume with chunk size 64.00 KiB can address at most 15.81 TiB of data.
  Logical volume "thinpool2" created.

$ lvs -ao name,size,stripesize,chunksize,metadata_percent
  LV                LSize   Stripe Chunk   Meta%
  home              839.05g     0       0
  root               50.00g     0       0
  swap                4.00g     0       0
  [lvol0_pmspare]   128.00m     0       0
  thinpool1         500.00g     0  256.00k 10.42
  [thinpool1_tdata] 500.00g     0       0
  [thinpool1_tmeta] 128.00m     0       0
  thinpool2         100.00m     0   64.00k 10.84
  [thinpool2_tdata] 100.00m     0       0
  [thinpool2_tmeta]   4.00m     0       0
  thinvol1          100.00g     0       0  

The “-c” option can be used to specify the desired chunk size if needed.

$ lvcreate -c 128k -L 100M --thinpool thinpool3 vg1
  Thin pool volume with chunk size 128.00 KiB can address at most 31.62 TiB of data.
  Logical volume "thinpool3" created.

$ lvs -ao name,size,stripesize,chunksize,metadata_percent
  LV                LSize   Stripe Chunk   Meta%
  home              839.05g     0       0
  root               50.00g     0       0
  swap                4.00g     0       0
  [lvol0_pmspare]   128.00m     0       0
  thinpool1         500.00g     0  256.00k 10.42
  [thinpool1_tdata] 500.00g     0       0
  [thinpool1_tmeta] 128.00m     0       0
  thinpool2         100.00m     0   64.00k 10.84
  [thinpool2_tdata] 100.00m     0       0
  [thinpool2_tmeta]   4.00m     0       0
  thinpool3         100.00m     0  128.00k 10.84
  [thinpool3_tdata] 100.00m     0       0
  [thinpool3_tmeta]   4.00m     0       0
  thinvol1          100.00g     0       0

Use the following criteria for using the chunk size:

  • A smaller chunk size requires more metadata and hinders performance, but provides better space utilization with snapshots.
  • A bigger chunk size requires less metadata manipulation, but makes the snapshot less space efficient.

Normal snapshot volume

The LVM snapshot provides the ability to create a virtual image of device at a point in time without a service interruption.

When the original data block is overwritten after snapshot is taken, the original data needs to be copied to the snapshot volume. This will introduce copy-on-write overhead whenever the original block is overwritten. The state of original data can be reconstructed with the snapshot.

Thinly-provisioned snapshot volume

Unlike normal snapshot volume, the thin snapshot and volume are all about metadata. When the volume is snapshot, its metadata are copied for the thin snapshot volume use. As the metadata of the volume is changed, the snapshot volume still addresses the original data blocks. The new data will be written to new blocks. In other words, overwrites actually write the data to new blocks. Thus, the original data blocks can be still addressed by snapshot volume metadata after the data change.

Create the snapshot volume

$ lvcreate -s -L 100G -n thinvol1-snap /dev/vg1/thinvol1
  Logical volume "thinvol1-snap" created.

$ ls /dev/vg1
thinpool2  thinpool3   thinvol1  thinvol1-snap

$ lvs
  LV            VG     Attr       LSize   Pool      Origin   Data%  Meta%  Move Log Cpy%Sync Convert
  home          centos -wi-ao---- 839.05g
  root          centos -wi-ao----  50.00g
  swap          centos -wi-ao----   4.00g
  thinpool1     vg1    twi-aotz-- 500.00g                    0.00   10.42
  thinpool2     vg1    twi-a-tz-- 100.00m                    0.00   10.84
  thinpool3     vg1    twi-a-tz-- 100.00g                    0.00   10.43
  thinvol1      vg1    owi-a-tz-- 100.00g thinpool1          0.00
  thinvol1-snap vg1    swi-a-s--- 100.00g           thinvol1 0.00

$ lvs -ao name,size,stripesize,chunksize,metadata_percent
  LV                LSize   Stripe Chunk   Meta%
  home              839.05g     0       0
  root               50.00g     0       0
  swap                4.00g     0       0
  [lvol0_pmspare]   128.00m     0       0
  thinpool1         500.00g     0  256.00k 10.42
  [thinpool1_tdata] 500.00g     0       0
  [thinpool1_tmeta] 128.00m     0       0
  thinpool2         100.00m     0   64.00k 10.84
  [thinpool2_tdata] 100.00m     0       0
  [thinpool2_tmeta]   4.00m     0       0
  thinpool3         100.00m     0  128.00k 10.84
  [thinpool3_tdata] 100.00m     0       0
  [thinpool3_tmeta]   4.00m     0       0
  thinvol1          100.00g     0       0
  thinvol1-snap     100.00g     0    4.00k

The chunk size of snapshot volume can be specified with “-c” option.

$ lvcreate -s -c 128k -L 100G -n thinvol1-snap2 /dev/vg1/thinvol1
  Logical volume "thinvol1-snap2" created.

$ lvs -ao name,size,stripesize,chunksize,metadata_percent
  LV                LSize   Stripe Chunk   Meta%
  home              839.05g     0       0
  root               50.00g     0       0
  swap                4.00g     0       0
  [lvol0_pmspare]   128.00m     0       0
  thinpool1         500.00g     0  256.00k 10.42
  [thinpool1_tdata] 500.00g     0       0
  [thinpool1_tmeta] 128.00m     0       0
  thinpool2         100.00m     0   64.00k 10.84
  [thinpool2_tdata] 100.00m     0       0
  [thinpool2_tmeta]   4.00m     0       0
  thinpool3         100.00m     0  128.00k 10.84
  [thinpool3_tdata] 100.00m     0       0
  [thinpool3_tmeta]   4.00m     0       0
  thinvol1          100.00g     0       0
  thinvol1-snap     100.00g     0    4.00k
  thinvol1-snap2    100.00g     0  128.00k

Remove the snaphost volume

$ lvremove /dev/vg1/thinvol1-snap
Do you really want to remove active logical volume vg1/thinvol1-snap? [y/n]: y
  Logical volume "thinvol1-snap" successfully removed
$ lvremove /dev/vg1/thinvol1-snap2
Do you really want to remove active logical volume vg1/thinvol1-snap2? [y/n]: y
  Logical volume "thinvol1-snap2" successfully removed

$ lvs
  LV        VG     Attr       LSize   Pool      Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home      centos -wi-ao---- 839.05g
  root      centos -wi-ao----  50.00g
  swap      centos -wi-ao----   4.00g
  thinpool1 vg1    twi-aotz-- 500.00g                  0.00   10.42
  thinpool2 vg1    twi-a-tz-- 100.00m                  0.00   10.84
  thinpool3 vg1    twi-a-tz-- 100.00g                  0.00   10.43
  thinvol1  vg1    Vwi-a-tz-- 100.00g thinpool1        0.00

Remove the volume and pool

$ lvremove /dev/vg1/thinvol1 -f
  Logical volume "thinvol1" successfully removed

$ lvremove /dev/vg1/thinpool1
Do you really want to remove active logical volume vg1/thinpool1? [y/n]: y
  Logical volume "thinpool1" successfully removed  

Reference

On RHEL8, it uses systemd instead of cron jobs to manage SAR data collection service.

Run the following command to check if the SAR data collection is started.

[root@h04-11 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.5 (Ootpa)

[root@h04-11 ~]# systemctl status sysstat-collect.timer
● sysstat-collect.timer - Run system activity accounting tool every 10 minutes
   Loaded: loaded (/usr/lib/systemd/system/sysstat-collect.timer; enabled; vendor preset: disabled)
   Active: inactive (dead)
  Trigger: n/a

If it’s not started, run the following command to start it.

[root@h04-11 ~]# systemctl start sysstat-collect.timer
[root@h04-11 ~]# systemctl status sysstat-collect.timer
● sysstat-collect.timer - Run system activity accounting tool every 10 minutes
   Loaded: loaded (/usr/lib/systemd/system/sysstat-collect.timer; enabled; vendor preset: disabled)
   Active: active (waiting) since Tue 2022-01-04 19:49:54 UTC; 1s ago
  Trigger: Tue 2022-01-04 19:50:00 UTC; 4s left

Jan 04 19:49:54 h04-11 systemd[1]: Started Run system activity accounting tool every 10 minutes.

Check the sar file existence after a few minutes as below.

[root@h04-11 ~]# ls -ltr /var/log/sa
total 0
[root@h04-11 ~]# date
Tue Jan  4 19:50:25 UTC 2022
[root@h04-11 ~]# ls -ltr /var/log/sa
total 12
-rw-r--r--. 1 root root 11632 Jan  4 19:50 sa04

Check the default interval of SAR data collection as below.

[root@h04-11 ~]# systemctl cat sysstat-collect.timer
# /usr/lib/systemd/system/sysstat-collect.timer
# /usr/lib/systemd/system/sysstat-collect.timer
# (C) 2014 Tomasz Torcz <tomek@pipebreaker.pl>
#
# sysstat-11.7.3 systemd unit file:
#        Activates activity collector every 10 minutes

[Unit]
Description=Run system activity accounting tool every 10 minutes

[Timer]
OnCalendar=*:00/10

[Install]
WantedBy=sysstat.service

To change the interval of SAR data collection, edit the systemd unit file as below.

[root@h04-11 ~]# export SYSTEMD_EDITOR=/usr/bin/vi
[root@h04-11 ~]# systemctl edit sysstat-collect.timer

Add the following to set the desired interval. In this example, we changed it from 10 minutes to 1 minute. The blank “OnCalendar=” directive is there to remove the original setting.

[Unit]
Description=Run system activity accounting tool every 1 minute

[Timer]
OnCalendar=
OnCalendar=*:00/1

Reload systemd service to apply the change.

[root@h04-11 ~]# systemctl daemon-reload

[root@h04-11 ~]# systemctl cat sysstat-collect.timer
# /usr/lib/systemd/system/sysstat-collect.timer
# /usr/lib/systemd/system/sysstat-collect.timer
# (C) 2014 Tomasz Torcz <tomek@pipebreaker.pl>
#
# sysstat-11.7.3 systemd unit file:
#        Activates activity collector every 10 minutes

[Unit]
Description=Run system activity accounting tool every 10 minutes

[Timer]
OnCalendar=*:00/10

[Install]
WantedBy=sysstat.service

# /etc/systemd/system/sysstat-collect.timer.d/override.conf
[Unit]
Description=Run system activity accounting tool every 1 minute

[Timer]
OnCalendar=
OnCalendar=*:00/1

[root@h04-11 ~]# cat /etc/systemd/system/sysstat-collect.timer.d/override.conf
[Unit]
Description=Run system activity accounting tool every 1 minute

[Timer]
OnCalendar=
OnCalendar=*:00/1

[root@h04-11 ~]# systemctl status sysstat-collect.timer
● sysstat-collect.timer - Run system activity accounting tool every 1 minute
   Loaded: loaded (/usr/lib/systemd/system/sysstat-collect.timer; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/sysstat-collect.timer.d
           └─override.conf
   Active: active (running) since Tue 2022-01-04 19:49:54 UTC; 27min ago
  Trigger: n/a

Jan 04 19:49:54 h04-11 systemd[1]: Started Run system activity accounting tool every 10 minutes.

To verify if the interval of SAR data collection is modified successfully, do the following.

[root@h04-11 ~]# sar -u -f /var/log/sa/sa04
Linux 4.18.0-348.2.1.el8_5.x86_64 (h04-11) 	01/04/2022 	_x86_64_	(32 CPU)

07:50:18 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
08:00:18 PM     all      0.11      0.00      0.13      0.00      0.00     99.76
08:10:08 PM     all      0.11      0.00      0.13      0.00      0.00     99.75
08:14:08 PM     all      0.12      0.00      0.13      0.00      0.00     99.74
08:15:35 PM     all      0.12      0.00      0.13      0.01      0.00     99.73
08:16:18 PM     all      0.12      0.00      0.14      0.02      0.00     99.72
08:17:07 PM     all      0.10      0.00      0.13      0.00      0.00     99.77
08:18:18 PM     all      0.12      0.00      0.13      0.00      0.00     99.75
Average:        all      0.11      0.00      0.13      0.00      0.00     99.75

Reference

Install the leetcode extension in VS Code

Image

  1. Login to leetcode from Google Chrome
  2. In Chrome, Inspect -> Network -> Fetch/XHR
  3. Click on any button in leetcode page, and in Inspector to the right under the “Name” tab find and select bottom “graphql” and under Headers tab and in “Request Headers” portion, select and copy the entire cookie string starting from “__cfduid” and ending with “_gat=1”
  4. Paste the cookie string in VS Code leetcode login prompt.

Image

Enjoy coding in VS Code

Image

Reference

Issue Description

$ sudo fio --blocksize=64k --directory=/mnt/bench1 --filename=testfile --ioengine=libaio --readwrite=randread --size=10G --name=test --numjobs=512 --group_reporting --direct=1 --iodepth=128 --randrepeat=1 --disable_lat=0 --gtod_reduce=0

test: (g=0): rw=randread, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=128
...
fio-3.7
Starting 512 processes
fio: pid=38868, err=11/file:engines/libaio.c:354, func=io_queue_init, error=Resource temporarily unavailable
...
fio: check /proc/sys/fs/aio-max-nr
fio: io engine libaio init failed. Perhaps try reducing io depth?

Resolution

The Linux kernel provides the Asynchronous non-blocking I/O (AIO) feature that allows a process to initiate multiple I/O operations simultaneously without having to wait for any of them to complete. This helps boost performance for applications that are able to overlap processing and I/O.

The performance can be tuned using the /proc/sys/fs/aio-max-nr virtual file in the proc file system. The aio-max-nr parameter determines the maximum number of allowable concurrent requests.

To set the aio-max-nr value, add the following line to the /etc/sysctl.d/99-sysctl.conf file:

$  cat  /proc/sys/fs/aio-max-nr
65536
$ echo "fs.aio-max-nr = 1048576" >> /etc/sysctl.d/99-sysctl.conf

To activate the new setting, run the following command:

$ sysctl -p /etc/sysctl.d/99-sysctl.conf
fs.aio-max-nr = 1048576

Reference

0%