FlamingBytes

Vdbench performance test on raw device

Posted on 2021-08-23 Edited on 2023-10-22 In Tech , Benchmarking

Terminology

Master and Slave: Vdbench runs as two or more Java Virtual Machines (JVMs). The JVM that you start is the master. The master takes care of the parsing of all the parameters, it determines which workloads should run, and then will also do all the reporting. The actual workload is executed by one or more Slaves. A Slave can run on the host where the Master was started, or it can run on any remote host as defined in the parameter file.
Raw I/O workload parameters describe the storage configuration to be used and the workload to be generated. The parameters include General, Host Definition (HD), Replay Group (RG), Storage Definition (SD), Workload Definition (WD) and Run Definition (RD) and must always be entered in the order in which they are listed here. A Run is the execution of one workload requested by a Run Definition. Multiple Runs can be requested within one Run Definition.
File system Workload parameters describe the file system configuration to be used and the workload to be generated. The parameters include General, Host Definition (HD), File System Definition (FSD), File system Workload Definition (FWD) and Run Definition(RD) and must always be entered in the order in which they are listed here. A Run is the execution of one workload requested by a Run Definition. Multiple Runs can be requested within one Run Definition.

Install java

java is required by vdbench on both master and slave hosts.

$ apt install default-jre

$ java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)

Install vdbench

Vdbench is packaged as a zip file. Unzip the file and you’re ready to go.

$ ls -la vdbench50407.zip
-rw-r--r--. 1 root root 3073219 Aug 26 21:23 vdbench50407.zip

$ unzip vdbench50407.zip

$ file vdbench
vdbench: Bourne-Again shell script, ASCII text executable

Vdbench job file

The following is an example job file to run random read I/O for a minute on the raw block device of the remote host.

$ cat jobfile/dryrun.job
hd=default,vdbench=/home/tester/vdbench,shell=ssh,user=root
hd=host1,system=<slave-host-ip>
sd=sd1,host=host1,lun=/dev/sdd,hitarea=10m,openflag=o_direct,size=20000m
wd=wd_random_rd1,sd=sd1,seekpct=100

# 4KB random read
rd=rd_4KB_randread,wd=wd_random_rd1,iorate=max,rdpct=100,xfersize=4K,elapsed=60,interval=10,th=1

Benchmark run and result

You can run the benchmark job with the following command.

$ ./vdbench -f jobfile/dryrun.job
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Vdbench distribution: vdbench50407 Tue June 05  9:49:29 MDT 2018
For documentation, see 'vdbench.pdf'.

22:06:52.177 input argument scanned: '-fjobfile/dryrun.job'
22:06:52.353 Starting slave: ssh <slave-host-ip> -l root /home/tester/vdbench/vdbench SlaveJvm -m <master-host-ip> -n <slave-host-ip>-10-210826-22.06.52.140 -l host1-0 -p 5570
22:06:53.123 Clock synchronization warning: slave host1-0 is 41 seconds out of sync. This can lead to heartbeat issues.
22:06:53.140 All slaves are now connected
22:06:54.002 Starting RD=rd_4KB_randread; I/O rate: Uncontrolled MAX; elapsed=60; For loops: rdpct=100 xfersize=4k threads=1

Aug 26, 2021    interval        i/o   MB/sec   bytes   read     resp     read    write     read    write     resp  queue  cpu%  cpu%
                               rate  1024**2     i/o    pct     time     resp     resp      max      max   stddev  depth sys+u   sys
22:07:04.055           1     5563.1    21.73    4096 100.00    0.157    0.157    0.000    10.89     0.00    0.342    0.9  16.5   7.6
22:07:14.011           2     7022.2    27.43    4096 100.00    0.127    0.127    0.000     7.54     0.00    0.108    0.9  10.7   7.5
22:07:24.009           3     7018.5    27.42    4096 100.00    0.128    0.128    0.000     9.67     0.00    0.087    0.9  10.9   8.2
22:07:34.009           4     7026.8    27.45    4096 100.00    0.127    0.127    0.000     6.99     0.00    0.105    0.9  10.7   7.9
22:07:44.008           5     7264.4    28.38    4096 100.00    0.123    0.123    0.000     7.75     0.00    0.082    0.9  10.7   7.7
22:07:54.014           6     7311.4    28.56    4096 100.00    0.122    0.122    0.000     7.21     0.00    0.076    0.9  10.5   7.4
22:07:54.024     avg_2-6     7128.7    27.85    4096 100.00    0.126    0.126    0.000     9.67     0.00    0.092    0.9  10.7   7.7
22:07:54.445 Vdbench execution completed successfully. Output directory: /data/vdbench_test/output

The result is saved in output directory by default. You can also check the result summary by opening “summary.html” in a browser.

$ ls output/
config.html    flatfile.html   host1-0.html         host1.html               logfile.html   parmscan.html       sd1.html   status.html   swat_mon_total.txt  totals.html errorlog.html  histogram.html  host1-0.stdout.html  host1.var_adm_msgs.html  parmfile.html  sd1.histogram.html  skew.html  summary.html  swat_mon.txt

$ cat output/summary.html
<title>Vdbench output/summary.html</title><pre>
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Vdbench summary report, created 22:06:52 Aug 26 2021 UTC (22:06:52 Aug 26 2021 UTC)

Link to logfile:                 <A HREF="logfile.html">logfile</A>
Run totals:                      <A HREF="totals.html">totals</A>
Vdbench status:                  <A HREF="status.html">status</A>
Copy of input parameter files:   <A HREF="parmfile.html">parmfile</A>
Copy of parameter scan detail:   <A HREF="parmscan.html">parmscan</A>
Link to errorlog:                <A HREF="errorlog.html">errorlog</A>
Link to flatfile:                <A HREF="flatfile.html">flatfile</A>

Link to HOST reports:            <A HREF="host1.html">host1</A>
Link to response time histogram: <A HREF="histogram.html">histogram</A>
Link to workload skew report:    <A HREF="skew.html">skew</A>
Link to SD reports:              <A HREF="sd1.html">sd1</A>

Link to Run Definitions:         <A HREF="#_463345942">rd_4KB_randread For loops: rdpct=100 xfersize=4k threads=1</A>

Link to config output:           <A HREF="config.html">config</A>

<a name="_463345942"></a><i><b>22:06:54.002 Starting RD=rd_4KB_randread; I/O rate: Uncontrolled MAX; elapsed=60; For loops: rdpct=100 xfersize=4k threads=1</b></i>


Aug 26, 2021    interval        i/o   MB/sec   bytes   read     resp     read    write     read    write     resp  queue  cpu%  cpu%
                               rate  1024**2     i/o    pct     time     resp     resp      max      max   stddev  depth sys+u   sys
22:07:04.052           1     5563.1    21.73    4096 100.00    0.157    0.157    0.000    10.89     0.00    0.342    0.9  16.5   7.6
22:07:14.010           2     7022.2    27.43    4096 100.00    0.127    0.127    0.000     7.54     0.00    0.108    0.9  10.7   7.5
22:07:24.008           3     7018.5    27.42    4096 100.00    0.128    0.128    0.000     9.67     0.00    0.087    0.9  10.9   8.2
22:07:34.008           4     7026.8    27.45    4096 100.00    0.127    0.127    0.000     6.99     0.00    0.105    0.9  10.7   7.9
22:07:44.008           5     7264.4    28.38    4096 100.00    0.123    0.123    0.000     7.75     0.00    0.082    0.9  10.7   7.7
22:07:54.014           6     7311.4    28.56    4096 100.00    0.122    0.122    0.000     7.21     0.00    0.076    0.9  10.5   7.4
22:07:54.023     avg_2-6     7128.7    27.85    4096 100.00    0.126    0.126    0.000     9.67     0.00    0.092    0.9  10.7   7.7
22:07:54.445 Vdbench execution completed successfully

Reference

Deploy ceph cluster on Ubuntu 18.04 and CentOS 7.8

Posted on 2021-08-22 Edited on 2023-10-22 In Tech , Cloud Storage

In this article, we learn to deploy ceph cluster on ubuntu 18.04. Three nodes are used for this study.

We target to deploy the most recent ceph release which is called Pacific. With this release, we can use cephadm to create a ceph cluster by bootstrapping on a single host and expanding the cluster to additional hosts.

Intro to ceph

Whether you want to provide Ceph Object Storage and/or Ceph Block Device services to Cloud Platforms, deploy a Ceph Filesystem or use Ceph for another purpose, all Ceph Storage Cluster deployments begin with setting up each Ceph Node, your network, and the Ceph Storage Cluster. A Ceph Storage Cluster requires at least one Ceph Monitor, Ceph Manager, and Ceph OSD (Object Storage Daemon). The Ceph Metadata Server is also required when running Ceph Filesystem clients.

Monitors: A Ceph Monitor (ceph-mon) maintains maps of the cluster state, including the monitor map, manager map, the OSD map, and the CRUSH map. These maps are critical cluster state required for Ceph daemons to coordinate with each other. Monitors are also responsible for managing authentication between daemons and clients. At least three monitors are normally required for redundancy and high availability.
Managers: A Ceph Manager daemon (ceph-mgr) is responsible for keeping track of runtime metrics and the current state of the Ceph cluster, including storage utilization, current performance metrics, and system load. The Ceph Manager daemons also host python-based plugins to manage and expose Ceph cluster information, including a web-based dashboard and REST API. At least two managers are normally required for high availability.
Ceph OSDs: A Ceph OSD (object storage daemon, ceph-osd) stores data, handles data replication, recovery, rebalancing, and provides some monitoring information to Ceph Monitors and Managers by checking other Ceph OSD Daemons for a heartbeat. At least 3 Ceph OSDs are normally required for redundancy and high availability.
MDSs: A Ceph Metadata Server (MDS, ceph-mds) stores metadata on behalf of the Ceph Filesystem (i.e., Ceph Block Devices and Ceph Object Storage do not use MDS). Ceph Metadata Servers allow POSIX file system users to execute basic commands (like ls, find, etc.) without placing an enormous burden on the Ceph Storage Cluster.

Ceph stores data as objects within logical storage pools. Using the CRUSH algorithm, Ceph calculates which placement group should contain the object, and further calculates which Ceph OSD Daemon should store the placement group. The CRUSH algorithm enables the Ceph Storage Cluster to scale, rebalance, and recover dynamically.

Deploy a ceph storage cluster

Prepare Ubuntu Linux and packages

From ceph installation guide, the following system requirements must be met before deployment.

Python 3
Systemd
Podman or Docker for running containers
Time synchronization (such as chrony or NTP)
LVM2 for provisioning storage devices

root@host1:~# cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04

Upgrade to Ubuntu 18.0.4

root@host1:~# apt install update-manager-core

root@host1:~# do-release-upgrade -c
Checking for a new Ubuntu release
New release '18.04.5 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

root@host1:~# do-release-upgrade

root@host1:~# cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04

Install python3

root@host1:~# apt-get install python3

Install docker

Refer to here

Install ntp

root@host1:~# apt-get install ntp
root@host1:~# service ntp start
root@host1:~# timedatectl set-timezone UTC

Install lvm2

root@host1:~# apt-get install lvm2

Check and disable firewall status

root@host1:~# ufw status

Add cluster nodes to /etc/hosts

Configure passwordless ssh from primary host to the others

Install cephadm

The cephadm command can

bootstrap a new cluster
launch a containerized shell with a working Ceph CLI
aid in debugging containerized Ceph daemons

root@host1:# curl –silent –remote-name –location https://github.com/ceph/ceph/raw/pacific/src/cephadm/cephadm
root@host1:# ls
cephadm
root@host1:~# chmod +x cephadm

root@host1:# ./cephadm add-repo –release pacific
root@host1:# ./cephadm install
root@host1:~# which cephadm
/usr/sbin/cephadm

Bootstrap a new cluster

The first step in creating a new Ceph cluster is running the cephadm bootstrap command on the Ceph cluster’s first host. The act of running the cephadm bootstrap command on the Ceph cluster’s first host creates the Ceph cluster’s first “monitor daemon”, and that monitor daemon needs an IP address. You must pass the IP address of the Ceph cluster’s first host to the ceph bootstrap command, so you’ll need to know the IP address of that host.

root@host1:~# cephadm bootstrap --mon-ip <host1-ip> --allow-fqdn-hostname
Ceph Dashboard is now available at:

         URL: https://host1:8443/
        User: admin
    Password: btauef87vj

Enabling client.admin keyring and conf on hosts with "admin" label
You can access the Ceph CLI with:

    sudo /usr/sbin/cephadm shell --fsid ad30a6fc-068f-11ec-8323-000c29bf98ea -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring

Please consider enabling telemetry to help improve Ceph:

    ceph telemetry on

For more information see:

    https://docs.ceph.com/docs/pacific/mgr/telemetry/

Bootstrap complete.

root@host1:~# docker ps
CONTAINER ID   IMAGE                        COMMAND                  CREATED         STATUS         PORTS     NAMES
a946ae868dbc   prom/alertmanager:v0.20.0    "/bin/alertmanager -…"   6 minutes ago   Up 6 minutes             ceph-ad30a6fc-068f-11ec-8323-000c29bf98ea-alertmanager.host1
504d9271b24c   ceph/ceph-grafana:6.7.4      "/bin/sh -c 'grafana…"   6 minutes ago   Up 6 minutes             ceph-ad30a6fc-068f-11ec-8323-000c29bf98ea-grafana.host1
622a5e234406   prom/prometheus:v2.18.1      "/bin/prometheus --c…"   6 minutes ago   Up 6 minutes             ceph-ad30a6fc-068f-11ec-8323-000c29bf98ea-prometheus.host1
6c2b0440d4c1   prom/node-exporter:v0.18.1   "/bin/node_exporter …"   6 minutes ago   Up 6 minutes             ceph-ad30a6fc-068f-11ec-8323-000c29bf98ea-node-exporter.host1
8bc618e9ffa3   ceph/ceph                    "/usr/bin/ceph-crash…"   6 minutes ago   Up 6 minutes             ceph-ad30a6fc-068f-11ec-8323-000c29bf98ea-crash.host1
b57a021238ba   ceph/ceph:v16                "/usr/bin/ceph-mgr -…"   7 minutes ago   Up 7 minutes             ceph-ad30a6fc-068f-11ec-8323-000c29bf98ea-mgr.host1.ltfphc
e812853ef17d   ceph/ceph:v16                "/usr/bin/ceph-mon -…"   7 minutes ago   Up 7 minutes             ceph-ad30a6fc-068f-11ec-8323-000c29bf98ea-mon.host1

Enable ceph CLI

To execute ceph commands, you can also run commands like this:

root@host1:~# cephadm shell -- ceph -s
Inferring fsid ad30a6fc-068f-11ec-8323-000c29bf98ea
Inferring config /var/lib/ceph/ad30a6fc-068f-11ec-8323-000c29bf98ea/mon.host1/config
Using recent ceph image ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb
  cluster:
    id:     ad30a6fc-068f-11ec-8323-000c29bf98ea
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 3

  services:
    mon: 1 daemons, quorum host1 (age 9m)
    mgr: host1.ltfphc(active, since 10m)
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

Cephadm does not require any Ceph packages to be installed on the host. However, it recommends enabling easy access to the ceph command.

You can install the ceph-common package, which contains all of the ceph commands, including ceph, rbd, mount.ceph (for mounting CephFS file systems), etc.:

root@host1:~# cephadm add-repo --release pacific
Installing repo GPG key from https://download.ceph.com/keys/release.gpg...
Installing repo file at /etc/apt/sources.list.d/ceph.list...
Updating package list...
Completed adding repo.
root@host1:~# cephadm install ceph-common
Installing packages ['ceph-common']...

root@host1:~# ceph -v
ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable)

root@host1:~# ceph status
  cluster:
    id:     ad30a6fc-068f-11ec-8323-000c29bf98ea
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 3

  services:
    mon: 1 daemons, quorum host1 (age 11m)
    mgr: host1.ltfphc(active, since 12m)
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

Adding additional hosts to the cluster

To add each new host to the cluster, perform two steps:

Install the cluster’s public SSH key in the new host’s root user’s authorized_keys file:

root@host1:# ssh-copy-id -f -i /etc/ceph/ceph.pub root@host2
root@host1:# ssh-copy-id -f -i /etc/ceph/ceph.pub root@host3
Tell Ceph that the new node is part of the cluster:

root@host1:# ceph orch host add host2 –labels _admin
root@host1:# ceph orch host add host3 –labels _admin

Wait for a while until the monitor detects the new hosts. Verify the new added hosts as below.

root@host1:~# cat /etc/ceph/ceph.conf
# minimal ceph.conf for ad30a6fc-068f-11ec-8323-000c29bf98ea
[global]
    fsid = ad30a6fc-068f-11ec-8323-000c29bf98ea
    mon_host = [v2:<host2-ip>:3300/0,v1:<host2-ip>:6789/0] [v2:<host3-ip>:3300/0,v1:<host3-ip>:6789/0] [v2:<host1-ip>:3300/0,v1:<host1-ip>:6789/0]

root@host1:~# ceph status
  cluster:
    id:     ad30a6fc-068f-11ec-8323-000c29bf98ea
    health: HEALTH_WARN
            clock skew detected on mon.host2, mon.host3
            OSD count 0 < osd_pool_default_size 3

  services:
    mon: 3 daemons, quorum host1,host2,host3 (age 115s)
    mgr: host1.ltfphc(active, since 30m), standbys: host2.dqlsnk
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

Adding storage

To add storage to the cluster, either tell Ceph to consume any available and unused device:

ceph orch apply osd --all-available-devices

Or Deploy OSDs with specified storage devices.

Listing storage devices

In order to deploy an OSD, there must be a storage device that is available on which the OSD will be deployed.

Run this command to display an inventory of storage devices on all cluster hosts:

root@host1:~# ceph orch device ls
Hostname                            Path      Type  Serial  Size   Health   Ident  Fault  Available
host2  /dev/sdb  hdd           85.8G  Unknown  N/A    N/A    Yes
host3  /dev/sdb  hdd           85.8G  Unknown  N/A    N/A    Yes
host1  /dev/sdb  hdd           85.8G  Unknown  N/A    N/A    Yes

A storage device is considered available if all of the following conditions are met:

The device must have no partitions.
The device must not have any LVM state.
The device must not be mounted.
The device must not contain a file system.
The device must not contain a Ceph BlueStore OSD.
The device must be larger than 5 GB.

Ceph will not provision an OSD on a device that is not available.

Creating new OSDs

There are a few ways to create new OSDs:

Tell Ceph to consume any available and unused storage device:

ceph orch apply osd –all-available-devices

After running the above command:

If you add new disks to the cluster, they will automatically be used to create new OSDs.
If you remove an OSD and clean the LVM physical volume, a new OSD will be created automatically.

If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the unmanaged parameter:

ceph orch apply osd --all-available-devices --unmanaged=true

Create an OSD from a specific device on a specific host:

ceph orch daemon add osd :

For example:

ceph orch daemon add osd host1:/dev/sdb

In our case, we use the following commands to create OSDs for the three nodes. We only need run the commands from host1.

root@host1:~#   ceph orch daemon add osd host1:/dev/sdb
Created osd(s) 0 on host 'host1'
root@host1:~# ceph orch daemon add osd host2:/dev/sdb
Created osd(s) 1 on host 'host2'
root@host1:~# ceph orch daemon add osd host3:/dev/sdb
Created osd(s) 2 on host 'host3'

root@host1:~# ceph status
  cluster:
    id:     ad30a6fc-068f-11ec-8323-000c29bf98ea
    health: HEALTH_WARN
            clock skew detected on mon.host2, mon.host3
            59 slow ops, oldest one blocked for 130 sec, mon.host2 has slow ops

  services:
    mon: 3 daemons, quorum host1,host2,host3 (age 2m)
    mgr: host1.ltfphc(active, since 102s), standbys: host2.dqlsnk
    osd: 3 osds: 3 up (since 7m), 3 in (since 7m)

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   15 MiB used, 240 GiB / 240 GiB avail
    pgs:     1 active+clean

Rry Run

The –dry-run flag causes the orchestrator to present a preview of what will happen without actually creating the OSDs.

For example:

ceph orch apply osd --all-available-devices --dry-run

Create a pool

Pools are logical partitions for storing objects. When you first deploy a cluster without creating a pool, Ceph uses the default pools for storing data.

By default, Ceph makes 3 replicas of RADOS objects. Ensure you have a realistic number of placement groups. Ceph recommends approximately 100 per OSD and always use the nearest power of 2.

root@host1:~# ceph osd lspools
1 device_health_metrics
root@host1:~# ceph osd pool create datapool 128 128
pool 'datapool' created
root@host1:~# ceph osd lspools
1 device_health_metrics
2 datapool


root@host1:~# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 22 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 'datapool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 39 flags hashpspool stripe_width 0

root@host1:~# ceph osd pool get datapool all
size: 3
min_size: 2
pg_num: 128
pgp_num: 128
crush_rule: replicated_rule
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
fast_read: 0
pg_autoscale_mode: on

On the admin node, use the rbd tool to initialize the pool for use by RBD:

[ceph: root@host1 /]# rbd pool init datapool

Create rbd volume and map to a block device on the host

The rbd command enables you to create, list, introspect and remove block device images. You can also use it to clone images, create snapshots, rollback an image to a snapshot, view a snapshot, etc.

root@host1:~# rbd create --size 512000 datapool/rbdvol1
root@host1:~# rbd map datapool/rbdvol1
rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the kernel with "rbd feature disable datapool/rbdvol1 object-map fast-diff deep-flatten".
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (6) No such device or address

root@host1:~# dmesg | tail
[50268.015821] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation
[59168.019848] Key type ceph registered
[59168.020080] libceph: loaded (mon/osd proto 15/24)
[59168.023667] rbd: loaded (major 252)
[59168.028478] libceph: mon2 <host1-ip>:6789 session established
[59168.028571] libceph: mon2 <host1-ip>:6789 socket closed (con state OPEN)
[59168.028594] libceph: mon2 <host1-ip>:6789 session lost, hunting for new mon
[59175.101037] libceph: mon0 <host1-ip>:6789 session established
[59175.101413] libceph: client14535 fsid ad30a6fc-068f-11ec-8323-000c29bf98ea
[59175.105601] rbd: image rbdvol1: image uses unsupported features: 0x38

root@host1:~# rbd feature disable datapool/rbdvol1 object-map fast-diff deep-flatten

root@host1:~# rbd map datapool/rbdvol1
/dev/rbd0

root@host1:~# rbd showmapped
id  pool      namespace  image    snap  device
0   datapool             rbdvol1  -     /dev/rbd0

root@host1:~# lsblk
NAME                                                                                                  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                                                                                                     8:0    0  128G  0 disk
├─sda1                                                                                                  8:1    0  127G  0 part /
├─sda2                                                                                                  8:2    0    1K  0 part
└─sda5                                                                                                  8:5    0  975M  0 part
sdb                                                                                                     8:16   0   80G  0 disk
└─ceph--bc7eff08--2ac6--44a5--b941--5444c4a8600a-osd--block--b4dfb938--05af--413d--a327--18d26fc75b8d 253:0    0   80G  0 lvm
rbd0                                                                                                  252:0    0  100G  0 disk

root@host1:~# ls -la /dev/rbd/datapool/rbdvol1
lrwxrwxrwx 1 root root 10 Aug 26 19:35 /dev/rbd/datapool/rbdvol1 -> ../../rbd0

root@host1:~# ls -la /dev/rbd0
brw-rw---- 1 root disk 252, 0 Aug 26 19:35 /dev/rbd0

root@host1:~# rbd status datapool/rbdvol1
Watchers:
    watcher=<host1-ip>:0/2778790200 client.14556 cookie=18446462598732840967

root@host1:~# rbd info datapool/rbdvol1
rbd image 'rbdvol1':
    size 100 GiB in 25600 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 38bebe718b2f
    block_name_prefix: rbd_data.38bebe718b2f
    format: 2
    features: layering, exclusive-lock
    op_features:
    flags:
    create_timestamp: Thu Aug 26 19:31:29 2021
    access_timestamp: Thu Aug 26 19:31:29 2021
    modify_timestamp: Thu Aug 26 19:31:29 2021

Create filesystem and mount rbd volume

You can use Linux standard commands to create filesystem on the volume and mount it for different purpose.

Troubleshooting

Ceph does not support pacific or later on centos7.8

If you are installing Ceph with version of pacific on CentOS 7.8, you may see the following issue.

$ cat /etc/*release
CentOS Linux release 7.8.2003 (Core)
NAME="CentOS Linux"

$ uname -r
5.7.12-1.el7.elrepo.x86_64

# ./cephadm add-repo --release pacific
ERROR: Ceph does not support pacific or later for this version of this linux distro and therefore cannot add a repo for it

You can install Ceph with version “octopus” instead.

$ ./cephadm add-repo --release octopus
Writing repo to /etc/yum.repos.d/ceph.repo...
Enabling EPEL...
Completed adding repo

Note: cephadm is new in Ceph release v15.2.0 (Octopus) and does not support older versions of Ceph.

Invalid GPG Key

$ ./cephadm install
Installing packages [‘cephadm’]…
Non-zero exit code 1 from yum install -y cephadm
yum: stdout Loaded plugins: fastestmirror, langpacks, priorities
yum: stdout Loading mirror speeds from cached hostfile
yum: stdout * base: pxe.dev.purestorage.com
yum: stdout * centosplus: pxe.dev.purestorage.com
yum: stdout * epel: mirror.lax.genesisadaptive.com
yum: stdout * extras: pxe.dev.purestorage.com
yum: stdout * updates: pxe.dev.purestorage.com
yum: stdout 279 packages excluded due to repository priority protections
yum: stdout Resolving Dependencies
yum: stdout –> Running transaction check
yum: stdout —> Package cephadm.noarch 2:15.2.14-0.el7 will be installed
yum: stdout –> Finished Dependency Resolution
yum: stdout
yum: stdout Dependencies Resolved
yum: stdout
yum: stdout ================================================================================
yum: stdout Package Arch Version Repository Size
yum: stdout ================================================================================
yum: stdout Installing:
yum: stdout cephadm noarch 2:15.2.14-0.el7 Ceph-noarch 55 k
yum: stdout
yum: stdout Transaction Summary
yum: stdout ================================================================================
yum: stdout Install 1 Package
yum: stdout
yum: stdout Total download size: 55 k
yum: stdout Installed size: 223 k
yum: stdout Downloading packages:
yum: stdout Public key for cephadm-15.2.14-0.el7.noarch.rpm is not installed
yum: stdout Retrieving key from https://download.ceph.com/keys/release.gpg
yum: stderr warning: /var/cache/yum/x86_64/7/Ceph-noarch/packages/cephadm-15.2.14-0.el7.noarch.rpm: Header V4 RSA/SHA256 Signature, key ID 460f3994: NOKEY
yum: stderr
yum: stderr
yum: stderr Invalid GPG Key from https://download.ceph.com/keys/release.gpg: No key found in given key data
Traceback (most recent call last):
File “./cephadm”, line 8432, in
main()
File “./cephadm”, line 8420, in main
r = ctx.func(ctx)
File “./cephadm”, line 6384, in command_install
pkg.install(ctx.packages)
File “./cephadm”, line 6231, in install
call_throws(self.ctx, [self.tool, ‘install’, ‘-y’] + ls)
File “./cephadm”, line 1461, in call_throws
raise RuntimeError(‘Failed command: %s’ % ‘ ‘.join(command))
RuntimeError: Failed command: yum install -y cephadm

Based on Ceph Documentation , execute the following to install the release.asc key.

$ rpm --import 'https://download.ceph.com/keys/release.asc'

Install cephadm package again and it succeeds.

$ ./cephadm install
Installing packages ['cephadm']...

$ which cephadm
/usr/sbin/cephadm

Failed to add host during bootstrap

$ cephadm bootstrap –mon-ip 192.168.1.183
Adding host host1…
Non-zero exit code 22 from /usr/bin/docker run –rm –ipc=host –net=host –entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=host1 -v /var/log/ceph/ccc938de-0c30-11ec-8c3f-ac1f6bc8d268:/var/log/ceph:z -v /tmp/ceph-tmpdqxjp0ly:/etc/ceph/ceph.client.admin. keyring:z -v /tmp/ceph-tmpmt5hrjo9:/etc/ceph/ceph.conf:z docker.io/ceph/ceph:v15 orch host add host1
/usr/bin/ceph: stderr Error EINVAL: Failed to connect to host1 (host1).
/usr/bin/ceph: stderr Please make sure that the host is reachable and accepts connections using the cephadm SSH key
/usr/bin/ceph: stderr
/usr/bin/ceph: stderr To add the cephadm SSH key to the host:
/usr/bin/ceph: stderr > ceph cephadm get-pub-key > ~/ceph.pub
/usr/bin/ceph: stderr > ssh-copy-id -f -i ~/ceph.pub root@host1
/usr/bin/ceph: stderr
/usr/bin/ceph: stderr To check that the host is reachable:
/usr/bin/ceph: stderr > ceph cephadm get-ssh-config > ssh_config
/usr/bin/ceph: stderr > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
/usr/bin/ceph: stderr > chmod 0600 ~/cephadm_private_key
/usr/bin/ceph: stderr > ssh -F ssh_config -i ~/cephadm_private_key root@host1
ERROR: Failed to add host : Failed command: /usr/bin/docker run –rm –ipc=host –net=host –entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=host1 -v /var/log/ceph/ccc938de-0c30-11ec-8c3f-ac1f6bc8d268:/var/log/ceph:z -v /tmp/ ceph-tmpdqxjp0ly:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpmt5hrjo9:/etc/ceph/ceph.conf:z docker.io/ceph/ceph:v15 orch host add host1

Note: If there are multiple networks and interfaces, be sure to choose one that will be accessible by any host accessing the Ceph cluster.

Make sure passwordless ssh is configured on each host.

Remove ceph cluster

$ cephadm rm-cluster –fsid ccc938de-0c30-11ec-8c3f-ac1f6bc8d268 –force
ceph-common installation failure

$ cephadm install ceph-common
Installing packages [‘ceph-common’]…
Non-zero exit code 1 from yum install -y ceph-common
yum: stdout Loaded plugins: fastestmirror, langpacks, priorities
yum: stdout Loading mirror speeds from cached hostfile
yum: stdout * base: pxe.dev.purestorage.com
yum: stdout * centosplus: pxe.dev.purestorage.com
yum: stdout * epel: mirror.lax.genesisadaptive.com
yum: stdout * extras: pxe.dev.purestorage.com
yum: stdout * updates: pxe.dev.purestorage.com
yum: stdout 279 packages excluded due to repository priority protections
yum: stdout Resolving Dependencies
yum: stdout –> Running transaction check
yum: stdout —> Package ceph-common.x86_64 1:10.2.5-4.el7 will be installed
yum: stdout –> Processing Dependency: python-rbd = 1:10.2.5-4.el7 for package: 1:ceph-common-10.2.5-4.el7.x86_64
yum: stdout Package python-rbd is obsoleted by python3-rbd, but obsoleting package does not provide for requirements
yum: stdout –> Processing Dependency: python-rados = 1:10.2.5-4.el7 for package: 1:ceph-common-10.2.5-4.el7.x86_64
yum: stdout Package python-rados is obsoleted by python3-rados, but obsoleting package does not provide for requirements
yum: stdout –> Processing Dependency: hdparm for package: 1:ceph-common-10.2.5-4.el7.x86_64
yum: stdout –> Processing Dependency: gdisk for package: 1:ceph-common-10.2.5-4.el7.x86_64
yum: stdout –> Processing Dependency: libboost_regex-mt.so.1.53.0()(64bit) for package: 1:ceph-common-10.2.5-4.el7.x86_64
yum: stdout –> Processing Dependency: libboost_program_options-mt.so.1.53.0()(64bit) for package: 1:ceph-common-10.2.5-4.el7.x86_64
yum: stdout –> Running transaction check
yum: stdout —> Package boost-program-options.x86_64 0:1.53.0-28.el7 will be installed
yum: stdout —> Package boost-regex.x86_64 0:1.53.0-28.el7 will be installed
yum: stdout –> Processing Dependency: libicuuc.so.50()(64bit) for package: boost-regex-1.53.0-28.el7.x86_64
yum: stdout –> Processing Dependency: libicui18n.so.50()(64bit) for package: boost-regex-1.53.0-28.el7.x86_64
yum: stdout –> Processing Dependency: libicudata.so.50()(64bit) for package: boost-regex-1.53.0-28.el7.x86_64
yum: stdout —> Package ceph-common.x86_64 1:10.2.5-4.el7 will be installed
yum: stdout –> Processing Dependency: python-rbd = 1:10.2.5-4.el7 for package: 1:ceph-common-10.2.5-4.el7.x86_64
yum: stdout Package python-rbd is obsoleted by python3-rbd, but obsoleting package does not provide for requirements
yum: stdout –> Processing Dependency: python-rados = 1:10.2.5-4.el7 for package: 1:ceph-common-10.2.5-4.el7.x86_64
yum: stdout Package python-rados is obsoleted by python3-rados, but obsoleting package does not provide for requirements
yum: stdout —> Package gdisk.x86_64 0:0.8.10-3.el7 will be installed
yum: stdout —> Package hdparm.x86_64 0:9.43-5.el7 will be installed
yum: stdout –> Running transaction check
yum: stdout —> Package ceph-common.x86_64 1:10.2.5-4.el7 will be installed
yum: stdout –> Processing Dependency: python-rbd = 1:10.2.5-4.el7 for package: 1:ceph-common-10.2.5-4.el7.x86_64
yum: stdout Package python-rbd is obsoleted by python3-rbd, but obsoleting package does not provide for requirements
yum: stdout –> Processing Dependency: python-rados = 1:10.2.5-4.el7 for package: 1:ceph-common-10.2.5-4.el7.x86_64
yum: stdout Package python-rados is obsoleted by python3-rados, but obsoleting package does not provide for requirements
yum: stdout —> Package libicu.x86_64 0:50.2-4.el7_7 will be installed
yum: stdout –> Finished Dependency Resolution
yum: stdout You could try using –skip-broken to work around the problem
yum: stdout You could try running: rpm -Va –nofiles –nodigest
yum: stderr Error: Package: 1:ceph-common-10.2.5-4.el7.x86_64 (base)
yum: stderr Requires: python-rbd = 1:10.2.5-4.el7
yum: stderr Available: 1:python-rbd-10.2.5-4.el7.x86_64 (base)
yum: stderr python-rbd = 1:10.2.5-4.el7
yum: stderr Available: 2:python3-rbd-15.2.14-0.el7.x86_64 (Ceph)
yum: stderr python-rbd = 2:15.2.14-0.el7
yum: stderr Error: Package: 1:ceph-common-10.2.5-4.el7.x86_64 (base)
yum: stderr Requires: python-rados = 1:10.2.5-4.el7
yum: stderr Available: 1:python-rados-10.2.5-4.el7.x86_64 (base)
yum: stderr python-rados = 1:10.2.5-4.el7
yum: stderr Available: 2:python3-rados-15.2.14-0.el7.x86_64 (Ceph)
yum: stderr python-rados = 2:15.2.14-0.el7
Traceback (most recent call last):
File “/usr/sbin/cephadm”, line 6242, in
r = args.func()
File “/usr/sbin/cephadm”, line 5073, in command_install
pkg.install(args.packages)
File “/usr/sbin/cephadm”, line 4931, in install
call_throws([self.tool, ‘install’, ‘-y’] + ls)
File “/usr/sbin/cephadm”, line 1112, in call_throws
raise RuntimeError(‘Failed command: %s’ % ‘ ‘.join(command))
RuntimeError: Failed command: yum install -y ceph-common
cephadm log

/var/log/ceph/cephadm.log

rbd image map failed

[ceph: root@host1 /]# rbd map datapool/rbdvol1
modinfo: ERROR: Module alias rbd not found.
modprobe: FATAL: Module rbd not found in directory /lib/modules/5.7.12-1.el7.elrepo.x86_64
rbd: failed to load rbd kernel module (1)
rbd: sysfs write failed
In some cases useful info is found in syslog - try “dmesg | tail”.
rbd: map failed: (2) No such file or directory

[root@host1 ~]# modprobe rbd
[root@host1 ~]# lsmod | grep rbd
rbd 106496 0
libceph 331776 1 rbd
rbd image map failed on other cluster nodes

[ceph: root@host2 /]# rbd map datapool/rbdvol5 –id admin
2021-09-21T19:49:49.384+0000 7f91ea781500 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
rbd: sysfs write failed
2021-09-21T19:49:49.387+0000 7f91ea781500 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-09-21T19:49:49.387+0000 7f91ea781500 -1 AuthRegistry(0x5633b09431e0) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2021-09-21T19:49:49.388+0000 7f91ea781500 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-09-21T19:49:49.388+0000 7f91ea781500 -1 AuthRegistry(0x7fffd357c350) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2021-09-21T19:49:49.389+0000 7f91d9b68700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-09-21T19:49:49.389+0000 7f91da369700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-09-21T19:49:49.389+0000 7f91dab6a700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-09-21T19:49:49.389+0000 7f91ea781500 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
rbd: couldnot connect to the cluster!
In some cases useful info is found in syslog - try “dmesg | tail”.
rbd: map failed: (22) Invalid argument

[ceph: root@host2 /]# ls /etc/ceph
ceph.conf rbdmap

Copy the /etc/ceph/ceph.keyring from admin node host1 to host2

[ceph: root@host2 /]# ls /etc/ceph
ceph.conf  ceph.keyring  rbdmap

[ceph: root@host2 /]# rbd map datapool/rbdvol5
[ceph: root@host2 /]# rbd device list
id  pool      namespace  image    snap  device
0   datapool             rbdvol5  -     /dev/rbd0

Reference

Capture and analyze network packets with tcpdump

Posted on 2021-08-20 Edited on 2023-10-22 In Tech , Linux

Capture packets with tcpdump

In this example, we only capture 1000 packets(-c1000) and use IP addresses and ports(-nn) for easier analysis. The raw packets are written to file “tcpdump.1000” for further analsis.

$ tcpdump -i eth5 -c1000 -nn -w tcpdump.1000

View the packets data

We can use tcpdump to view the packets directly.

$ tcpdump -nn -r tcpdump.1000 | more

reading from file tcpdump.1000, link-type EN10MB (Ethernet)

22:42:10.018758 IP 192.168.1.18.980 > 192.168.1.16.2049: Flags [P.], seq 1:16545, ack 72, win 501, options [nop,nop,TS val 4292822374 ecr 2230894482], length 16544: NFS request xid 2912663971 16540 getattr fh 0,0/22
22:42:10.018895 IP 192.168.1.16.2049 > 192.168.1.18.980: Flags [.], ack 16545, win 9508, options [nop,nop,TS val 2230894483 ecr 4292822374], length 0
22:42:10.021125 IP 192.168.1.16.2049 > 192.168.1.18.980: Flags [P.], seq 72:144, ack 16545, win 9596, options [nop,nop,TS val 2230894485 ecr 4292822374], length 72: NFS reply xid 781957539 reply ok 68


22:42:10.021400 IP 192.168.1.18.980 > 192.168.1.16.2049: Flags [P.], seq 16545:33089, ack 144, win 501, options [nop,nop,TS val 4292822377 ecr 2230894485], length 16544: NFS request xid 2929441187 16540 getattr fh 0,0/22
22:42:10.021536 IP 192.168.1.16.2049 > 192.168.1.18.980: Flags [.], ack 33089, win 9508, options [nop,nop,TS val 2230894485 ecr 4292822377], length 0
22:42:10.023219 ARP, Request who-has 70.0.69.235 tell 70.0.193.122, length 46
22:42:10.023558 IP 192.168.1.16.2049 > 192.168.1.18.980: Flags [P.], seq 144:216, ack 33089, win 9596, options [nop,nop,TS val 2230894487 ecr 4292822377], length 72: NFS reply xid 798734755 reply ok 68


22:42:10.023844 IP 192.168.1.18.980 > 192.168.1.16.2049: Flags [P.], seq 33089:49633, ack 216, win 501, options [nop,nop,TS val 4292822379 ecr 2230894487], length 16544: NFS request xid 2946218403 16540 getattr fh 0,0/22
22:42:10.023962 IP 192.168.1.16.2049 > 192.168.1.18.980: Flags [.], ack 49633, win 9508, options [nop,nop,TS val 2230894488 ecr 4292822379], length 0
22:42:10.025962 IP 192.168.1.16.2049 > 192.168.1.18.980: Flags [P.], seq 216:288, ack 49633, win 9596, options [nop,nop,TS val 2230894490 ecr 4292822379], length 72: NFS reply xid 815511971 reply ok 68

<omitted...>

We also can view the data with wireshark GUI. We can use the filter to only display the data we are interested in. In this case, we only display the data related the ip address “192.168.1.18”. The wireshare can be installed on the desktop and open the captured packets data file.

Understand the tcpdump output

The first field, 22:42:10.021125, represents the timestamp of the captured packet.
The next field, IP, represents the network layer protocol, in this case, IPv4.
The next field, 192.168.1.16.2049 > 192.168.1.18.980, is the source and destination IP address and port.
The next field, Flags [P.], represents the TCP flags. The typical values for this field include the following.

The next field, seq 72:144, is the sequence number of the data contained in the packet. It means the packet contains bytes 72 to 144.
The next field, ack 16545, is the Ack number. For the side of receiving data, this field represents the next expected byte of data. In this case, the Ack number for the next expected packet would be 16545.
The next field, win 9596, is the window size. It represents the number of bytes available in the receiving buffer.
Followed by a field, length 72, which represents the length in bytes of the data.

In the above example, the data are sending from 192.168.1.18 to 192.168.1.16. The average packet size is 16k which is sent over NFSv4.

Five simple tools that helps analyze network latency in Linux

Posted on 2021-08-13 Edited on 2024-01-27 In Tech , Linux

ping

Ping is one of the most basic commands in network management, verifying network connectivity through the roundtrip times taken by the ICMP protocol packets sent to a target host.

ping - send ICMP ECHO_REQUEST to network hosts

-c count

Stop after sending count ECHO_REQUEST packets. With deadline option, ping waits for count ECHO_REPLY packets, until the timeout expires.

-i interval

Wait interval seconds between sending each packet. The default is to wait for one second between each packet normally, or not to wait in flood mode. Only super-user may set interval to values less 0.2 seconds.

$ ping 10.10.1.17 -c 1000 -i 0.010
PING 10.10.1.17 (10.10.1.17) 56(84) bytes of data.
64 bytes from 10.10.1.17: icmp_seq=1 ttl=64 time=0.176 ms
64 bytes from 10.10.1.17: icmp_seq=2 ttl=64 time=0.173 ms
<omitted...>
64 bytes from 10.10.1.17: icmp_seq=999 ttl=64 time=0.197 ms
64 bytes from 10.10.1.17: icmp_seq=1000 ttl=64 time=0.195 ms

--- 10.10.1.17 ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 10992ms
rtt min/avg/max/mdev = 0.096/0.173/0.210/0.025 ms

Round-trip time (RTT) is the duration, measured in milliseconds, from when the source server sends a request to when it receives a response from a target server. It’s a key performance metric to measure network latency.

Actual round trip time can be influenced by:

Distance – The length a signal has to travel correlates with the time taken for a request to reach a server.
Transmission medium – The medium used to route a signal (e.g., copper wire, fiber optic cables) can impact how quickly a request is received by a server and routed back to a user.
Number of network hops – Intermediate routers or servers take time to process a signal, increasing RTT. The more hops a signal has to travel through, the higher the RTT.
Traffic levels – RTT typically increases when a network is congested with high levels of traffic. Conversely, low traffic times can result in decreased RTT.
Server response time – The time taken for a target server to respond to a request depends on its processing capacity, the number of requests being handled and the nature of the request (i.e., how much server-side work is required). A longer server response time increases RTT.

traceroute

A traceroute displays the path that the signal took as it traveled around the Internet to the website. It also displays times which are the response times that occurred at each stop along the route. If there is a connection problem or latency connecting to a site, it will show up in these times. You will be able to identify which of the stops (also called ‘hops’) along the route is the culprit.

$ for i in `seq 1 5`; do traceroute 10.10.1.17;sleep 3; done
traceroute to 10.10.1.17 (10.10.1.17), 30 hops max, 60 byte packets
 1  10.10.1.17 (10.10.1.17)  0.181 ms  0.086 ms  0.084 ms
traceroute to 10.10.1.17 (10.10.1.17), 30 hops max, 60 byte packets
 1  10.10.1.17 (10.10.1.17)  0.179 ms  0.087 ms  0.081 ms
traceroute to 10.10.1.17 (10.10.1.17), 30 hops max, 60 byte packets
 1  10.10.1.17 (10.10.1.17)  0.175 ms  0.087 ms  0.081 ms
traceroute to 10.10.1.17 (10.10.1.17), 30 hops max, 60 byte packets
 1  10.10.1.17 (10.10.1.17)  0.183 ms  0.073 ms  0.081 ms
traceroute to 10.10.1.17 (10.10.1.17), 30 hops max, 60 byte packets
 1  10.10.1.17 (10.10.1.17)  0.177 ms  0.080 ms  0.081 ms

-
Hop Number – the first column is simply the number of the hop along the route.

-
RTT Columns – The last three columns display the round trip time (RTT) for the packet to reach that point and return. It is listed in milliseconds. There are three columns because the traceroute sends three separate signal packets. This is to display consistency in the route.

netperf

Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency. The environments currently measureable by netperf include:

TCP and UDP via BSD Sockets for both IPv4 and IPv6
DLPI
Unix Domain Sockets
SCTP for both IPv4 and IPv6

netperf -h

Usage: netperf [global options] – [test options]

Global options:
-a send,recv Set the local send,recv buffer alignment
-A send,recv Set the remote send,recv buffer alignment
-B brandstr Specify a string to be emitted with brief output
-c [cpu_rate] Report local CPU usage
-C [cpu_rate] Report remote CPU usage
-d Increase debugging output
-D [secs,units] * Display interim results at least every secs seconds
using units as the initial guess for units per second
-f G|M|K|g|m|k Set the output units
-F fill_file Pre-fill buffers with data from fill_file
-h Display this text
-H name|ip,fam * Specify the target machine and/or local ip and family
-i max,min Specify the max and min number of iterations (15,1)
-I lvl[,intvl] Specify confidence level (95 or 99) (99)
and confidence interval in percentage (10)
-j Keep additional timing statistics
-l testlen Specify test duration (>0 secs) (<0 bytes|trans)
-L name|ip,fam * Specify the local ip|name and address family
-o send,recv Set the local send,recv buffer offsets
-O send,recv Set the remote send,recv buffer offset
-n numcpu Set the number of processors for CPU util
-N Establish no control connection, do ‘send’ side only
-p port,lport* Specify netserver port number and/or local port
-P 0|1 Donot/Do display test headers
-r Allow confidence to be hit on result only
-s seconds Wait seconds between test setup and test start
-S Set SO_KEEPALIVE on the data connection
-t testname Specify test to perform
-T lcpu,rcpu Request netperf/netserver be bound to local/remote cpu
-v verbosity Specify the verbosity level
-W send,recv Set the number of send,recv buffers
-v level Set the verbosity level (default 1, min 0)
-V Display the netperf version and exit
For those options taking two parms, at least one must be specified;
specifying one value without a comma will set both parms to that
value, specifying a value with a leading comma will set just the second
parm, a value with a trailing comma will set just the first. To set
each parm to unique values, specify both and separate them with a
comma.
- For these options taking two parms, specifying one value with no comma
  will only set the first parms and will leave the second at the default
  value. To set the second value it must be preceded with a comma or be a
  comma-separated pair. This is to retain previous netperf behaviour.
$ wget -O netperf-2.5.0.tar.gz -c https://codeload.github.com/HewlettPackard/netperf/tar.gz/netperf-2.5.0
$ tar xf netperf-2.5.0.tar.gz && cd netperf-netperf-2.5.0
$ ./configure && make && make install

[root@10.0.0.17]$ netserver -D
[root@10.0.0.16]$ netperf -H 10.10.1.17 -l -1000000 -t TCP_RR -w 10ms -b 1 -v 2 – -O min_latency,mean_latency,max_latency,stddev_latency,transaction_rate
Packet rate control is not compiled in.
Packet burst size is not compiled in.
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.10.1.17 (10.10.1.17) port 0 AF_INET : first burst 0
Minimum Mean Maximum Stddev Transaction
Latency Latency Latency Latency Rate
Microseconds Microseconds Microseconds Microseconds Tran/s

63 84.92 2980 7.86 11740.092

iperf

iPerf3 is a tool for active measurements of the maximum achievable bandwidth on IP networks. It supports tuning of various parameters related to timing, buffers and protocols (TCP, UDP, SCTP with IPv4 and IPv6). For each test it reports the bandwidth, loss, and other parameters.

lldp

LLDP (Link Layer Discovery Protocol) can be essential in the situations of complex network-server infrastructure configurations and it’s extremely helpful in case there is no direct access to our setup but we need to determine what network ports on the switches are our servers NIC cards connected to.

Below example shows how to install and enable LLDP Daemon on CentOS and check what are the corresponding neighbor ports connected to the server network cards.

$ yum install lldpd
$ systemctl --now enable lldpd
  
$ lldpcli show neighbors
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface:    enp6s0f1, via: LLDP, RID: 1, Time: 0 day, 00:01:29
  Chassis:
    ChassisID:    mac 00:1c:73:82:07:ee
    SysName:      xx-ay-01.06.09
    SysDescr:     Arista Networks EOS version 4.16.6M running on an Arista Networks Lab-71x-28
    MgmtIP:       10.0.254.9
    Capability:   Bridge, on
    Capability:   Router, on
  Port:
    PortID:       ifname Ethernet17
    TTL:          120
-------------------------------------------------------------------------------

Reference

Iouring - A modern asynchronous I/O interface for Linux

Posted on 2021-08-04 Edited on 2023-10-22 In Tech , Linux

The Linux kernel has had asynchronous I/O since version 2.5, but it was seen as difficult to use and inefficient.

io_uring (previously known as aioring) is a Linux kernel system call interface for storage device asynchronous I/O operations addressing performance issues with similar interfaces provided by functions like read()/write() or aio_read()/aio_write() etc. for operations on data accessed by file descriptors.

It was primarily developed by Jens Axboe at Facebook.

Internally it works by creating two buffers dubbed as “queue rings” (circular buffers) for storage of submission and completion of I/O requests (for storage devices, submission queue (SQ) and completion queue (CQ) respectively). Keeping these buffers shared between the kernel and application helps to boost the I/O performance by eliminating the need to issue extra and expensive system calls to copy these buffers between the two.According to the io_uring design paper, the SQ buffer is writable only by consumer application, and CQ - by kernel.

The API provided by liburing library for userspace (applications) can be used to interact with the kernel interface more easily.

Both kernel interface and library were adapted in Linux 5.1 kernel version.

Credit to: Donald Hunter (A visual representation of the io_uring submission and completion queues)

Reference

btrfs - A modern copy on write(CoW) filesystem for Linux

Posted on 2021-08-03 Edited on 2023-10-22 In Tech , Cloud Storage

Introduction

btrfs is a modern copy on write (CoW) filesystem for Linux aimed at implementing advanced features while also focusing on fault tolerance, repair and easy administration. Its main features and benefits are:

Snapshots which do not make the full copy of files
RAID - support for software-based RAID 0, RAID 1, RAID 10
Self-healing - checksums for data and metadata, automatic detection of silent data corruptions

Development of Btrfs started in 2007. Since that time, Btrfs is a part of the Linux kernel and is under active development.

Copy on Write (CoW)

The CoW operation is used on all writes to the filesystem (unless turned off, see below).
This makes it much easier to implement lazy copies, where the copy is initially just a reference to the original, but as the copy (or the original) is changed, the two versions diverge from each other in the expected way.
If you just write a file that didn’t exist before, then the data is written to empty space, and some of the metadata blocks that make up the filesystem are CoWed. In a “normal” filesystem, if you then go back and overwrite a piece of that file, then the piece you’re writing is put directly over the data it is replacing. In a CoW filesystem, the new data is written to a piece of free space on the disk, and only then is the file’s metadata changed to refer to the new data. At that point, the old data that was replaced can be freed up because nothing points to it any more.
If you make a snapshot (or a cp –reflink=always) of a piece of data, you end up with two files that both reference the same data. If you modify one of those files, the CoW operation I described above still happens: the new data is written elsewhere, and the file’s metadata is updated to point at it, but the original data is kept, because it’s still referenced by the other file.
This leads to fragmentation in heavily updated-in-place files like VM images and database stores.
Note that this happens even if the data is not shared, because data is stored in segments, and only the newly updated part of a segment is subject to CoW.
If you mount the filesystem with nodatacow, or use chattr +C on the file, then it only does the CoW operation for data if there’s more than one copy referenced.
Some people insist that Btrfs does “Redirect-on-write” rather than “Copy-on-write” because Btrfs is based on a scheme for redirect-based updates of B-trees by Ohad Rodeh, and because understanding the code is easier with that mindset.

Filesystem creation

A Btrfs filesystem can be created on top of many devices, and more devices can be added after the FS has been created.

By default, metadata will be mirrored across two devices and data will be striped across all of the devices present. This is equivalent to mkfs.btrfs -m raid1 -d raid0.

If only one device is present, metadata will be duplicated on that one device. For HDD mkfs.btrfs -m dup -d single, for SSD (or non-rotational device) mkfs.btrfs -m single -d single.

mkfs.btrfs will accept more than one device on the command line. It has options to control the RAID configuration for data (-d) and metadata (-m). Valid choices are raid0, raid1, raid10, raid5, raid6, single and dup. The option -m single means that no duplication is done, which may be desired when using hardware RAID.

# Create a filesystem across four drives (metadata mirrored, linear data allocation)
$ mkfs.btrfs -d single /dev/sdb /dev/sdc /dev/sdd /dev/sde

# Stripe the data without mirroring, metadata are mirrored
$ mkfs.btrfs -d raid0 /dev/sdb /dev/sdc

# Use raid10 for both data and metadata
$ mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde

# Don't duplicate metadata on a single drive (default on single SSDs)
$ mkfs.btrfs -m single /dev/sdb

Once you create a multi-device filesystem, you can use any device in the FS for the mount command. The btrfs file system size is the total size of the devices to create it.

$ mkfs.btrfs /dev/sdb /dev/sdc /dev/sde
$ mount /dev/sde /mnt
$ df -h

The following commands can be used to check filesystem usage.

$ btrfs filesystem show
$ btrfs filesystem df -h /mnt
$ btrfs filesystem usage /mnt

Filesystem deletion

$ umount -f /mnt
$ wipefs --all -t btrfs /dev/sdb /dev/sdc /dev/sde
$ btrfs filesystem show

Subvolumes and snapshots

Creating subvolumes and snapshots are the commonly used operations for btrfs.

Create btrfs filesystem with two disks:

$ mkfs.btrfs -d raid0 /dev/sdd /dev/sdf
btrfs-progs v4.9.1
See http://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               6a154e4e-61d4-474e-9839-50d1fcd50bbb
Node size:          16384
Sector size:        4096
Filesystem size:    1.75TiB
Block group profiles:
  Data:             RAID0             2.00GiB
  Metadata:         RAID1             1.00GiB
  System:           RAID1             8.00MiB
SSD detected:       yes
Incompat features:  extref, skinny-metadata
Number of devices:  2
Devices:
   ID        SIZE  PATH
    1   894.25GiB  /dev/sdd
    2   894.25GiB  /dev/sdf

$ mkdir /mnt/pool1
$ mount -t btrfs /dev/sdd /mnt/pool1
$ df -h  | egrep "Filesystem|pool1"
Filesystem                 Size  Used Avail Use% Mounted on
/dev/sdd                   1.8T  4.3M  1.8T   1% /mnt/pool1
$ ls -la /mnt/pool1
total 16
drwxr-xr-x  1 root root  0 Mar 22 21:19 .
drwxr-xr-x. 7 root root 69 Mar 22 21:20 ..

Create a subvolume:

$ btrfs subvolume create /mnt/pool1/subvol1
Create subvolume '/mnt/pool1/subvol1'

$ mkdir /mnt/testbtrfs
$ mount -t btrfs -o subvol=subvol1 /dev/sdd /mnt/testbtrfs

$ ls -la /mnt/pool1
total 16
drwxr-xr-x  1 root root 14 Mar 22 21:19 .
drwxr-xr-x. 8 root root 86 Mar 22 21:22 ..
drwxr-xr-x  1 root root  0 Mar 22 21:21 subvol1

$ ls -la /mnt/pool1/subvol1/
total 16
drwxr-xr-x 1 root root  0 Mar 22 21:21 .
drwxr-xr-x 1 root root 14 Mar 22 21:19 ..

$ ls -la /mnt/testbtrfs
total 0
drwxr-xr-x  1 root root  0 Mar 22 21:21 .
drwxr-xr-x. 8 root root 86 Mar 22 21:22 ..

Create snapshot of the subvolume:

$ btrfs subvolume snapshot
btrfs subvolume snapshot: too few arguments
usage: btrfs subvolume snapshot [-r] [-i <qgroupid>] <source> <dest>|[<dest>/]<name>

    Create a snapshot of the subvolume

    Create a writable/readonly snapshot of the subvolume <source> with
    the name <name> in the <dest> directory.  If only <dest> is given,
    the subvolume will be named the basename of <source>.

    -r             create a readonly snapshot
    -i <qgroupid>  add the newly created snapshot to a qgroup. This
                   option can be given multiple times.

$ btrfs subvolume snapshot /mnt/pool1/subvol1 /mnt/pool1/subvol1/subvol1-snap
Create a snapshot of '/mnt/pool1/subvol1' in '/mnt/pool1/subvol1/subvol1-snap'

$ ls -la /mnt/pool1/subvol1/
total 16
drwxr-xr-x 1 root root 24 Mar 22 21:26 .
drwxr-xr-x 1 root root 14 Mar 22 21:19 ..
drwxr-xr-x 1 root root  0 Mar 22 21:21 subvol1-snap

$ ls -la /mnt/pool1/subvol1/subvol1-snap/
total 0
drwxr-xr-x 1 root root  0 Mar 22 21:21 .
drwxr-xr-x 1 root root 24 Mar 22 21:26 ..

$ ls -la /mnt/testbtrfs/
total 0
drwxr-xr-x  1 root root 24 Mar 22 21:26 .
drwxr-xr-x. 8 root root 86 Mar 22 21:22 ..
drwxr-xr-x  1 root root  0 Mar 22 21:21 subvol1-snap

$ ls -la /mnt/testbtrfs/subvol1-snap/
total 0
drwxr-xr-x 1 root root  0 Mar 22 21:21 .
drwxr-xr-x 1 root root 24 Mar 22 21:26 ..

List the subvoluems and snapshots:

$  btrfs subvolume list /mnt/pool1
ID 258 gen 10 top level 5 path subvol1
ID 259 gen 9 top level 258 path subvol1/subvol1-snap

Reference

Setting up LVM volumes on a mdraid array

Posted on 2021-08-02 Edited on 2023-10-22 In Tech , Linux

Setting up mdraid array

$ mdadm --create /dev/md0 --name=mdvol --level=raid0 --raid-devices=2 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

$ cat /proc/mdstat
Personalities : [raid0] [linear]
md0 : active raid0 nvme4n1[3] nvme3n1[2] nvme2n1[1] nvme1n1[0]
      15002423296 blocks super 1.2 512k chunks

unused devices: <none>

$ mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Tue Feb  1 19:07:32 2022
        Raid Level : raid0
        Array Size : 15002423296 (13.97 TiB 15.36 TB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

       Update Time : Tue Feb  1 19:07:32 2022
             State : clean
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

        Chunk Size : 512K

Consistency Policy : none

              Name : mdvol
              UUID : aa6fe868:3aa2391d:37a21dc8:d0f4c5f1
            Events : 0

    Number   Major   Minor   RaidDevice State
       0     259        8        0      active sync   /dev/nvme1n1
       1     259        9        1      active sync   /dev/nvme2n1
       2     259       10        2      active sync   /dev/nvme3n1
       3     259       11        3      active sync   /dev/nvme4n1

Settup LVM volumes on mdraid array

$ pvcreate /dev/md0
$ pvs | egrep "PV|md"
  PV             VG     Fmt  Attr PSize  PFree
  /dev/md0              lvm2 ---  13.97t 13.97t
$ vgcreate testvg /dev/md0  

$ vgs | egrep "VG|testvg"
  VG     #PV #LV #SN Attr   VSize  VFree
  testvg   1   0   0 wz--n- 13.97t 13.97t

$ lvcreate -n testlv11 -L 500G testvg -Wy --yes

$ lvs | egrep "LV|testvg"
  LV       VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  testlv11 testvg -wi-a----- 500.00g  

$ lvs -ao name,size,stripesize,chunksize,metadata_percent | egrep "LV|testlv"
  LV       LSize   Stripe Chunk Meta%
  testlv11 500.00g     0     0  

$ mkfs.ext4 /dev/testvg/testlv11

$ mkdir -p /mnt/testmnt11

$ mount /dev/testvg/testlv11 /mnt/testmnt11

$ lsblk
NAME                MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
<omitted...>
nvme1n1             259:8    0  3.5T  0 disk
└─md0                 9:0    0   14T  0 raid0
  └─testvg-testlv11 253:3    0  500G  0 lvm   /mnt/testmnt11
nvme2n1             259:11   0  3.5T  0 disk
└─md0                 9:0    0   14T  0 raid0
  └─testvg-testlv11 253:3    0  500G  0 lvm   /mnt/testmnt11
nvme3n1             259:10   0  3.5T  0 disk
└─md0                 9:0    0   14T  0 raid0
  └─testvg-testlv11 253:3    0  500G  0 lvm   /mnt/testmnt11
nvme4n1             259:9    0  3.5T  0 disk
└─md0                 9:0    0   14T  0 raid0
  └─testvg-testlv11 253:3    0  500G  0 lvm   /mnt/testmnt11

Linux Software RAID

Posted on 2021-08-01 Edited on 2023-10-22 In Tech , Linux

What is RAID?

RAID stands for either Redundant Array of Independent Disks, or Redundant Array of Inexpensive Disks. The intention of RAID is to spread your data across several disks, such that a single disk failure will not lose that data.

The current RAID drivers in Linux support the following levels:

Linear Mode : JBOD
RAID0/Stripe : Two or more disks. No redundancy.
RAID1/Mirror : Two or more disks. Redundancy.
RAID-4 : Three or more disks. Redundancy. Not used very often.
RAID-5 : Three or more disks. Redundancy. Allows one disk failure.
RAID-6 : Four or more disks. Redundancy. Allows two disks failure.
RAID-10 : Four or more disks. Combination of RAID-1 and RAID-0.

Linux Software RAID (often called mdraid or MD/RAID) makes the use of RAID possible without a hardware RAID controller.

mdadm utility

The mdadm utility can be used to create and manage storage arrays using Linux’s software RAID capabilities.

Creating a RAID Array

The following example shows the creation of a RAID 0 array with 8 NVME disks.

$ mdadm --create /dev/md0 --name=mdvol --level=raid0 --raid-devices=8 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1

Check RAID array status

$ cat /proc/mdstat
    Personalities : [raid0]
    md0 : active raid0 nvme9n1[7] nvme8n1[6] nvme7n1[5] nvme6n1[4] nvme5n1[3] nvme4n1[2] nvme3n1[1] nvme2n1[0]
      30004846592 blocks super 1.2 512k chunks

$  mdadm --detail /dev/md0
    /dev/md0:
               Version : 1.2
         Creation Time : Mon Sep 20 17:39:47 2021
            Raid Level : raid0
            Array Size : 30004846592 (27.94 TiB 30.72 TB)
          Raid Devices : 8
         Total Devices : 8
           Persistence : Superblock is persistent
    
           Update Time : Mon Sep 20 17:39:47 2021
                 State : clean
        Active Devices : 8
       Working Devices : 8
        Failed Devices : 0
         Spare Devices : 0
    
            Chunk Size : 512K
    
    Consistency Policy : none
    
                  Name : host1:mdvol
                  UUID : 5908fc3f:8c8b8851:c0875278:ea274fec
                Events : 0
    
        Number   Major   Minor   RaidDevice State
           0     259        2        0      active sync   /dev/nvme2n1
           1     259        6        1      active sync   /dev/nvme3n1
           2     259        7        2      active sync   /dev/nvme4n1
           3     259        8        3      active sync   /dev/nvme5n1
           4     259       11        4      active sync   /dev/nvme6n1
           5     259       12        5      active sync   /dev/nvme7n1
           6     259       10        6      active sync   /dev/nvme8n1
           7     259        9        7      active sync   /dev/nvme9n1

Deleting a RAID Array

If a RAID volume is no longer required, it can be deactivated using the following commands:

$ mdadm --stop /dev/md0
mdadm: stopped /dev/md0

A Linux software RAID array stores all of the necessary information about a RAID array in a superblock. The superblock for the individual devices can be deleted by the following commands. By doing this, you can re-use these disks for new RAID arrays.

$ mdadm --zero-superblock /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1

Reference

RAID - Redundant Array of Inexpensive Disks

Posted on 2021-07-31 Edited on 2023-10-22 In Tech , Cloud Storage

RAID stands for Redundant Array of Inexpensive (Independent) Disks.

On most situations you will be using one of the following four levels of RAIDs.

RAID 0
RAID 1
RAID 5(6)
RAID 10 (also known as RAID 1+0)

RAID 0

Following are the key points to remember for RAID level 0.

Minimum 2 disks.
Excellent performance (as blocks are striped).
No redundancy (no mirror, no parity).
Don’t use this for any critical system.

RAID 1

Following are the key points to remember for RAID level 1.

Minimum 2 disks.
Good performance (no striping. no parity).
Excellent redundancy (as blocks are mirrored).

RAID 5(6)

Following are the key points to remember for RAID level 5.

Minimum 3 disks.
Good performance (as blocks are striped).
Good redundancy (distributed parity).
Best cost-effective option providing both performance and redundancy. Use this for DB that is heavily read oriented. Write operations will be slow.

RAID6 allows two disks failure while RAID5 allows only one disk failure.

RAID 10

Following are the key points to remember for RAID level 10.

Minimum 4 disks.
This is also called as “stripe of mirrors”
Excellent redundancy (as blocks are mirrored)
Excellent performance (as blocks are striped)
If you can afford the dollar, this is the BEST option for any mission critical applications (especially databases).

References

How to upgrade kernel version in centos 7

Posted on 2021-07-18 Edited on 2024-01-27 In Tech , Linux

Before upgrade

$ cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)

$ uname -r
3.10.0-693.el7.x86_64

$ cat /boot/grub2/grubenv
saved_entry=CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)

Install the newer kernel package

$ rpm -ivh kernel-ml-5.7.12-1.el7.elrepo.x86_64.rpm
warning: kernel-ml-5.7.12-1.el7.elrepo.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID baadae52: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:kernel-ml-5.7.12-1.el7.elrepo    ################################# [100%]

$ rpm -ivh kernel-ml-devel-5.7.12-1.el7.elrepo.x86_64.rpm
warning: kernel-ml-devel-5.7.12-1.el7.elrepo.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID baadae52: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:kernel-ml-devel-5.7.12-1.el7.elre################################# [100%]

$  rpm -qa | grep kernel
kernel-ml-devel-5.7.12-1.el7.elrepo.x86_64
kernel-3.10.0-693.el7.x86_64
kernel-tools-3.10.0-693.el7.x86_64
kernel-ml-5.7.12-1.el7.elrepo.x86_64
kernel-tools-libs-3.10.0-693.el7.x86_64

Set the default boot kernel entry

Notes: if /etc/grub2.cfg does not exist, do next step to rebuild the GRUB2 and come back later after that.

$ awk -F\' '$1=="menuentry " {print $2}' /etc/grub2.cfg
CentOS Linux (5.7.12-1.el7.elrepo.x86_64) 7 (Core)
CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)
CentOS Linux (0-rescue-8ef36acf9f544b90bf0621450fe05f75) 7 (Core)

$ grub2-set-default 0 
$ grep saved /boot/grub2/grubenv
saved_entry=0

Rebuild the GRUB2 configuration

$ grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.7.12-1.el7.elrepo.x86_64
Found initrd image: /boot/initramfs-5.7.12-1.el7.elrepo.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-693.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-693.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-8ef36acf9f544b90bf0621450fe05f75
Found initrd image: /boot/initramfs-0-rescue-8ef36acf9f544b90bf0621450fe05f75.img
done

Reboot the system

$ reboot

Verify the new kernel version

$ uname -r
5.7.12-1.el7.elrepo.x86_64
$ cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)