In a competitive fio write performance benchmark between 3rd party storage solution and cloud native disk, we noticed that the write performance for the 3rd party storage solution is 10x faster than the cloud native. This usually seems impossible since we would argue that no one can beat the raw disk performance. The following case shows an interesting write performance optimization from the storage solution under test.

Cloud native raw disk performance(fio,4k,write)

From the benchmark result, it shows the IOPS limit of the cloud drive is ~7500. This aligns with the cloud storage SPEC in use.

Image
Image
Image

10x faster IOPS is observed!!!

In the output of fio, the IOPS(>75k) is 10x faster than the result from cloud native raw disk(~7500). This can be verified from the iostat output. The IOPS at logic volume layer is aligned with the fio output.

Image
Image
Image

Iosize matters!!!

When the write request comes to the physical disk from logic volume, the I/O size is changed from 4k to 400k. It indicates the smaller writes got merged to larger writes.

Image
Image
Image

Conclusion

With limited IOPS on the cloud drive, larger IO size really help improve the write performance by reducing the write requests to disk. With proper write performance optimization at logic volume layer, the write performance can be boosted 10x faster with no surprise!!!

When to run fio write with small blocksize(e.g. 1k), the following error is seen.

$ fio --blocksize=1k --ioengine=libaio --readwrite=randwrite --filesize=2G --group_reporting --direct=1 --iodepth=128 --randrepeat=1 --end_fsync=1 --
name=job1 --numjobs=1 --filename=/mnt/fiomnt/fio.dat
job1: (g=0): rw=randwrite, bs=(R) 1024B-1024B, (W) 1024B-1024B, (T) 1024B-1024B, ioengine=libaio, iodepth=128
fio-3.7
Starting 1 process
fio: io_u error on file /mnt/fiomnt/fio.dat: Invalid argument: write offset=129521664, buflen=1024
fio: io_u error on file /mnt/fiomnt/fio.dat: Invalid argument: write offset=1589760000, buflen=1024
fio: pid=93922, err=22/file:io_u.c:1747, func=io_u error, error=Invalid argument
job1: (groupid=0, jobs=1): err=22 (file:io_u.c:1747, func=io_u error, error=Invalid argument): pid=93922: Wed Jan 25 00:42:00 2023
cpu : usr=0.00%, sys=0.00%, ctx=1, majf=0, minf=14
IO depths : 1=0.8%, 2=1.6%, 4=3.1%, 8=6.2%, 16=12.5%, 32=25.0%, >=64=50.8%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,128,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128

Cause:

For direct I/O,  the I/O size has to be multiple of filesystem/block device blocksize. In this case, the filesystem blocksize is 4k which can not be well aligned with the requested I/O size(1k). To fix this, the filesystem blocksize should be less than or equal to 1k.

Goal

According to weekly data between 2020 to 2022, we want to get to know the waves and peaks of COVID pandemic in these years.

Download the data

We will continue to use NCHS(National Center for Health Statistics) as our data source.

Visit https://data.cdc.gov/browse?category=NCHS&sortBy=last_modified, and search Provisional COVID-19 Death Counts by Week, we will find the data we are intrest.

https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Week-Ending-D/r8kw-7aab, in this page, we can export data into csv file.

With that, we may get the data source csv, Provisional_COVID-19_Death_Counts_by_Week_Ending_Date_and_State.csv.

Load the data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
ibrary("dplyr")
ibrary("janitor")
ibrary("tidyr")
ibrary("readr")
f <- readr::read_csv(file.path(getwd(), "Provisional_COVID-19_Death_Counts_by_Week_Ending_Date_and_State.csv"), col_names = TRUE)
f <- clean_names(df)
mp_start_date <- strptime(df$start_date, "%m/%d/%Y")
f$start_date <- format(tmp_start_date, "%Y-%m-%d")
> glimpse(df)
Rows: 10,800
Columns: 17
$ data_as_of <chr> "01/09/2023", "01/09/2023", "01…
$ start_date <chr> "2019-12-29", "2020-01-05", "20…
$ end_date <chr> "01/04/2020", "01/11/2020", "01…
$ group <chr> "By Week", "By Week", "By Week"…
$ year <chr> "2019/2020", "2020", "2020", "2…
$ month <dbl> NA, NA, NA, NA, NA, NA, NA, NA,…
$ mmwr_week <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, …
$ week_ending_date <chr> "01/04/2020", "01/11/2020", "01…
$ state <chr> "United States", "United States…
$ covid_19_deaths <dbl> 0, 1, 2, 3, 0, 4, 6, 6, 9, 38, …
$ total_deaths <dbl> 60176, 60734, 59362, 59162, 588…
$ percent_of_expected_deaths <dbl> 98, 97, 98, 99, 99, 100, 100, 1…
$ pneumonia_deaths <dbl> 4111, 4153, 4066, 3915, 3818, 3…
$ pneumonia_and_covid_19_deaths <dbl> 0, 1, 2, 0, 0, 1, 1, 3, 5, 19, …
$ influenza_deaths <dbl> 434, 475, 468, 500, 481, 520, 5…
$ pneumonia_influenza_or_covid_19_deaths <dbl> 4545, 4628, 4534, 4418, 4299, 4…
$ footnote <chr> NA, NA, NA, NA, NA, NA, NA, NA,…
!> df
# A tibble: 10,800 × 17
data_as_of start_date end_d…¹ group year month mmwr_…² week_…³ state covid…⁴
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <dbl>
1 01/09/2023 2019-12-29 01/04/… By W… 2019… NA 1 01/04/… Unit… 0
2 01/09/2023 2020-01-05 01/11/… By W… 2020 NA 2 01/11/… Unit… 1
3 01/09/2023 2020-01-12 01/18/… By W… 2020 NA 3 01/18/… Unit… 2
4 01/09/2023 2020-01-19 01/25/… By W… 2020 NA 4 01/25/… Unit… 3
5 01/09/2023 2020-01-26 02/01/… By W… 2020 NA 5 02/01/… Unit… 0
6 01/09/2023 2020-02-02 02/08/… By W… 2020 NA 6 02/08/… Unit… 4
7 01/09/2023 2020-02-09 02/15/… By W… 2020 NA 7 02/15/… Unit… 6
8 01/09/2023 2020-02-16 02/22/… By W… 2020 NA 8 02/22/… Unit… 6
9 01/09/2023 2020-02-23 02/29/… By W… 2020 NA 9 02/29/… Unit… 9
10 01/09/2023 2020-03-01 03/07/… By W… 2020 NA 10 03/07/… Unit… 38
# … with 10,790 more rows, 7 more variables: total_deaths <dbl>,
# percent_of_expected_deaths <dbl>, pneumonia_deaths <dbl>,
# pneumonia_and_covid_19_deaths <dbl>, influenza_deaths <dbl>,
# pneumonia_influenza_or_covid_19_deaths <dbl>, footnote <chr>, and
# abbreviated variable names ¹ end_date, ² mmwr_week, ³ week_ending_date,
# ⁴ covid_19_deaths
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Identify the data we want to focus

As we can see, there are 4 diffrent groups, and it has the whole United states and each state’s data

1
2
3
4
5
6
7
8
9
10
11
12
> unique(df$group)
[1] "By Week" "By Month" "By Year" "By Total"
>
```

We only want to get the weekly data, so we may want to `filter` with "group=By Week", and "state=United states"

in the mean time, we may only want to `select` only 2 columns.

- start_date
- covid_19_deaths

df1 <- df %>%
filter(state == “United States” & group == “By Week”) %>%
select(start_date, covid_19_deaths)
print(df1,n=20)

  • A tibble: 158 × 2

    start_date covid_19_deaths

    1 2019-12-29 0
    2 2020-01-05 1
    3 2020-01-12 2
    4 2020-01-19 3
    5 2020-01-26 0
    6 2020-02-02 4
    7 2020-02-09 6
    8 2020-02-16 6
    9 2020-02-23 9
    10 2020-03-01 38
    11 2020-03-08 60
    12 2020-03-15 588
    13 2020-03-22 3226
    14 2020-03-29 10141
    15 2020-04-05 16347
    16 2020-04-12 17221
    17 2020-04-19 15557
    18 2020-04-26 13223
    19 2020-05-03 11243
    20 2020-05-10 9239

    1
    2
    3

    ## Draw the graph to see the wave

    library(“ggplot2”)
    library(“sjPlot”)
    p = ggplot(df1, aes( x=start_date, y=covid_19_deaths, group=1)) +
    geom_line(color=”blue”) +
    theme(axis.text.x=element_text(angle=45,hjust=1,size=5))
    save_plot(“covid_plot_weekly_wave.svg”, fig = p, width=60, height=20)
    1
    2
    3
    4
    5
    6
    7
    8
    9

    ![Image](/images/covid_plot_weekly_wave.svg)

    ## Find the peak by R mark it in the graph

    From above graph, we can easily to figure out the waves and peaks, but we also can let R help us to do it, it's pretty useful if we have to deal with many data and many graphs.

    To achive it, firstly we can call `findpeaks` from `pracma` library to find the peaks

    library(“pracma”)
  • peaks = findpeaks(df1$covid_19_deaths, npeaks=5, sortstr=TRUE)

    peaks
    [,1] [,2] [,3] [,4]

[1,] 26027 54 40 66
[2,] 21364 108 98 121
[3,] 17221 16 8 26
[4,] 15536 88 79 98
[5,] 8308 31 26 38

1
2
3
4
5

The 2nd column means the the row index of the peak. in this case, we can tell, the 54th row has the top peak covid death number `26027`.

It's not very obvious which week(start_date) is hitting the peak, so we can do something like this.

is_peak <- vector( “logical” , length(df1$covid_19_deaths ))
df1$is_peak = is_peak

for (x in peaks[,2]) {
df1$is_peak[x] = TRUE
}

1
2
3

As you can see, we added a new column `is_peak`, so we can use it to filter out those none peak data, sort the peak data points.

!> df2 = df1 %>% filter(is_peak == TRUE)

  • df2[order(-df2$covid_19_deaths),]

    A tibble: 5 × 3

    start_date covid_19_deaths is_peak

    1 2021-01-03 26027 TRUE
    2 2022-01-16 21364 TRUE
    3 2020-04-12 17221 TRUE
    4 2021-08-29 15536 TRUE
    5 2020-07-26 8308 TRUE

    1
    2
    3

    ## Hightlight the peak points

    p = ggplot(df1, aes(x=start_date, y=covid_19_deaths, group=1)) +
    geom_line(color=”blue”) +
    geom_point(data = . %>% filter(is_peak == TRUE), stat=”identity”, size = 4, color = “red”) +
    scale_y_continuous(breaks=seq(0,30000,4000)) +
    theme(axis.text.x=element_text(angle=45,hjust=1,size=5))

save_plot(“covid_plot_weekly_peak.svg”, fig = p, width=60, height=20)

1
2
3
4
5

![Image](/images/covid_plot_weekly_peak.svg)

## Other finding

!> > sum(df1$covid_19_deaths)
[1] 1089714 ===> the total covid_19_deaths death number from 2020 to 2022

!> summary(df1$covid_19_deaths)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 2223 4428 6897 9862 26027

!> df3 <- df %>%

  • filter(state == "United States" & group == "By Week") %>%
    
  • select(start_date, total_deaths)
    
  • sum(df3$total_deaths)
  • [1] 10077273 ===> the total death number from 2020 to 2022

  • summary(df3$total_deaths)

    Min. 1st Qu. Median Mean 3rd Qu. Max.
    7100 58522 60451 63780 68610 87415


Goal

Based on the overall death data and COVID realated death data since 2019, we want to study the impact trend of COVID to the overall pupulation death of US in these years.

Download the data

We will use NCHS(National Center for Health Statistics) as our data source.

Visit https://data.cdc.gov/browse?category=NCHS&sortBy=last_modified, and search VSRR Quarterly, we will find the data we are intrested in.

https://data.cdc.gov/NCHS/NCHS-VSRR-Quarterly-provisional-estimates-for-sele/489q-934x

In this page, we can export data into csv file as NCHS_-_VSRR_Quarterly_provisional_estimates_for_selected_indicators_of_mortality.csv

Take a quick look at the data

To load the data:

# If "readr" not installed, run install.packages("readr") to install it
library("readr")
df <- readr::read_csv(file.path(getwd(), "NCHS_-_VSRR_Quarterly_provisional_estimates_for_selected_indicators_of_mortality.csv"), col_names = TRUE)   

To check the first few lines:

> head(df)
# A tibble: 6 × 69
  Year a…¹ Time …² Cause…³ Rate …⁴ Unit  Overa…⁵ Rate …⁶ Rate …⁷ Rate …⁸ Rate …⁹
  <chr>    <chr>   <chr>   <chr>   <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 2019 Q1  12 mon… All ca… Age-ad… Deat…   712.    600.    844.       NA      NA
2 2019 Q1  12 mon… Alzhei… Age-ad… Deat…    29.6    33.1    23.8      NA      NA
3 2019 Q1  12 mon… COVID-… Age-ad… Deat…    NA      NA      NA        NA      NA
4 2019 Q1  12 mon… Cancer  Age-ad… Deat…   148.    128.    175.       NA      NA
5 2019 Q1  12 mon… Chroni… Age-ad… Deat…    11       7.7    14.7      NA      NA
6 2019 Q1  12 mon… Chroni… Age-ad… Deat…    38.5    35.7    42.4      NA      NA
# … with 59 more variables: `Rate Age 15-24` <dbl>, `Rate Age 25-34` <dbl>,
#   `Rate Age 35-44` <dbl>, `Rate Age 45-54` <dbl>, `Rate Age 55-64` <dbl>,
#   `Rate 65-74` <dbl>, `Rate Age 75-84` <dbl>, `Rate Age 85 plus` <dbl>,
#   `Rate Alaska` <dbl>, `Rate Alabama` <dbl>, `Rate Arkansas` <dbl>,
#   `Rate Arizona` <dbl>, `Rate California` <dbl>, `Rate Colorado` <dbl>,
#   `Rate Connecticut` <dbl>, `Rate District of Columbia` <dbl>,
#   `Rate Delaware` <dbl>, `Rate Florida` <dbl>, `Rate Georgia` <dbl>, …
# ℹ Use `colnames()` to see all variable names   

To get a summary:

summary(df)
 Year and Quarter   Time Period        Cause of Death      Rate Type        
 Length:1232        Length:1232        Length:1232        Length:1232       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  

     Unit            Overall Rate     Rate Sex Female   Rate Sex Male    
 Length:1232        Min.   :   1.20   Min.   :   0.60   Min.   :   1.90  
 Class :character   1st Qu.:  11.40   1st Qu.:   6.95   1st Qu.:  13.20  
 Mode  :character   Median :  17.00   Median :  14.80   Median :  23.50  
                    Mean   :  80.51   Mean   :  70.56   Mean   :  91.63  
                    3rd Qu.:  50.60   3rd Qu.:  49.92   3rd Qu.:  65.42  
                    Max.   :1142.30   Max.   :1067.00   Max.   :1219.90  
                    NA's   :44        NA's   :44        NA's   :44       
  ...

To get glimpse from columns point of view:

> library("dplyr")
> glimpse(df)
Rows: 1,232
Columns: 69
$ `Year and Quarter`          <chr> "2019 Q1", "2019 Q1", "2019 Q1", "2019 Q1"…
$ `Time Period`               <chr> "12 months ending with quarter", "12 month…
$ `Cause of Death`            <chr> "All causes", "Alzheimer disease", "COVID-…
$ `Rate Type`                 <chr> "Age-adjusted", "Age-adjusted", "Age-adjus…
$ Unit                        <chr> "Deaths per 100,000", "Deaths per 100,000"…
$ `Overall Rate`              <dbl> 712.2, 29.6, NA, 148.1, 11.0, 38.5, 21.3, …
$ `Rate Sex Female`           <dbl> 600.3, 33.1, NA, 127.9, 7.7, 35.7, 16.8, 1…
$ `Rate Sex Male`             <dbl> 843.7, 23.8, NA, 175.4, 14.7, 42.4, 26.9, …
$ `Rate Age 1-4`              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Rate Age 5-14`             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Rate Age 15-24`            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Rate Age 25-34`            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Rate Age 35-44`            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Rate Age 45-54`            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Rate Age 55-64`            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Rate 65-74`                <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Rate Age 75-84`            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
...

Clear the columns name

As we can see, the column names has space, it may cause some trouble while refering it in R, it’s a common sugestion to convert space to “_” before doing any R o/p.

A good thing is , there is a R package can help us on it.

> library("janitor")
> df <- clean_names(df)
!> glimpse(df)
 Rows: 1,232
 Columns: 69
 $ year_and_quarter          <chr> "2019 Q1", "2019 Q1", "2019 Q1", "2019 Q1", …
 $ time_period               <chr> "12 months ending with quarter", "12 months …
 $ cause_of_death            <chr> "All causes", "Alzheimer disease", "COVID-19…
 $ rate_type                 <chr> "Age-adjusted", "Age-adjusted", "Age-adjuste…
 $ unit                      <chr> "Deaths per 100,000", "Deaths per 100,000", …
 $ overall_rate              <dbl> 712.2, 29.6, NA, 148.1, 11.0, 38.5, 21.3, 20…
 $ rate_sex_female           <dbl> 600.3, 33.1, NA, 127.9, 7.7, 35.7, 16.8, 13.…
 $ rate_sex_male             <dbl> 843.7, 23.8, NA, 175.4, 14.7, 42.4, 26.9, 27…
 $ rate_age_1_4              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

As you can see, while we run glimpse(df), the colum name has been changed from “Year and Quarter” to “year_and_quarter”.

Filter and select

The raw data has so many columns and rows, but we just want to focus on those cols/rows we really have intrest.

We can use filter(applying on rows) and select(applying on columns) in such condition.

> df1 <- df %>%
      filter(time_period == "3-month period" & rate_type == "Crude" & cause_of_death %in% c("All causes", "COVID-19")) %>%
      select(year_and_quarter, cause_of_death, overall_rate)
> print(df1,n=50)
 # A tibble: 28 × 3
    year_and_quarter cause_of_death overall_rate
    <chr>            <chr>                 <dbl>
  1 2019 Q1          All causes            910
  2 2019 Q1          COVID-19               NA
  3 2019 Q2          All causes            851.
  4 2019 Q2          COVID-19               NA
  5 2019 Q3          All causes            827.
  6 2019 Q3          COVID-19               NA
  7 2019 Q4          All causes            891.
  8 2019 Q4          COVID-19               NA
  9 2020 Q1          All causes            945.
 10 2020 Q1          COVID-19                8.2
 11 2020 Q2          All causes           1035.
 12 2020 Q2          COVID-19              137.
 13 2020 Q3          All causes            985.
 14 2020 Q3          COVID-19               87.3
 15 2020 Q4          All causes           1142.
 16 2020 Q4          COVID-19              193.
 17 2021 Q1          All causes           1116
 18 2021 Q1          COVID-19              191.
 19 2021 Q2          All causes            915.
 20 2021 Q2          COVID-19               42.4
 21 2021 Q3          All causes           1051.
 22 2021 Q3          COVID-19              138.
 23 2021 Q4          All causes           1092.
 24 2021 Q4          COVID-19              131.
 25 2022 Q1          All causes           1116
 26 2022 Q1          COVID-19              150.
 27 2022 Q2          All causes            899.
 28 2022 Q2          COVID-19               17.6

%>% looks strange, it’s just like | (pipe) in linux shell command.

df1 <- df %>%
    filter(time_period == "3-month period" & rate_type == "Crude" & cause_of_death %in% c("All causes", "COVID-19")) %>%
    select(year_and_quarter, cause_of_death, overall_rate)

It means only keep those rows which meet those condition of filter, and those columns which meet condition of select.

Deal with NA value

As you can see, in “overall_rate” column, there are few “NA” values, in this context it means 0, so we may want to convert it as 0 for future’s process.

We can do it like as below.

 > df1 <- df1 %>%
     mutate_at(c("overall_rate"), ~coalesce(.,0))

It means , we want to convert all NA to 0 in “overall_rate” column.

Now, let’s check the df1 again, we can see all NA has been changed to 0.

!+ > df1
 # A tibble: 28 × 3
    year_and_quarter cause_of_death overall_rate
    <chr>            <chr>                 <dbl>
  1 2019 Q1          All causes            910
  2 2019 Q1          COVID-19                0
  3 2019 Q2          All causes            851.
  4 2019 Q2          COVID-19                0
  5 2019 Q3          All causes            827.
  6 2019 Q3          COVID-19                0
  7 2019 Q4          All causes            891.
  8 2019 Q4          COVID-19                0
  9 2020 Q1          All causes            945.
 10 2020 Q1          COVID-19                8.2
 # … with 18 more rows

Draw diagram for the whole US data

To draw a diagram directly:

> ggplot(df1, aes(fill=cause_of_death, x=year_and_quarter, y=overall_rate)) +
    geom_bar(position="stack", stat="identity") +
    geom_col() +
    geom_smooth(aes(group=cause_of_death)) +
    scale_y_continuous(breaks=seq(0,1500,100))
    theme_bw()

To save the diagram in a file:

library(sjPlot)
p = ggplot(df1, aes(fill=cause_of_death, x=year_and_quarter, y=overall_rate)) +
    geom_bar(position="stack", stat="identity") +
    geom_col() +
    geom_smooth(aes(group=cause_of_death)) +
    scale_y_continuous(breaks=seq(0,1500,100)) +
    theme_bw()

save_plot("covid_plot.svg", fig = p, width=30, height=20)

The diagram looks like below.

Draw diagram for California data

Do you want to try it by yourself?

Create/Calculate a new column for covid ratio

Somehow, we want to get to know the trend for covid ratio.

  • covid_ratio = overall_rate_of_covid / overall_rate_of_all_causes

    covid_death_rate <- df1 %>%
    filter(cause_of_death == “COVID-19”) %>%
    select(“overall_rate”)
    all_causes_rate <- df1 %>%
    filter(cause_of_death == “All causes”) %>%
    select(overall_rate)
    covid_ratio <- covid_death_rate / all_causes_rate

    df_ratio <- df1 %>%
    filter(cause_of_death == “All causes”) %>%
    select(year_and_quarter)
    df_ratio[“covid_ratio”] = covid_ratio

    print(df_ratio)
    A tibble: 14 × 2
    year_and_quarter covid_ratio

    1 2019 Q1 0
    2 2019 Q2 0
    3 2019 Q3 0
    4 2019 Q4 0
    5 2020 Q1 0.00868
    6 2020 Q2 0.132
    7 2020 Q3 0.0886
    8 2020 Q4 0.169
    9 2021 Q1 0.171
    10 2021 Q2 0.0463
    11 2021 Q3 0.131
    12 2021 Q4 0.120
    13 2022 Q1 0.134
    14 2022 Q2 0.0196

Draw diagram for covid ratio

Do you want to try it by yourself?

Installation

The simplest way is to use Homebrew:

$ brew install r

Another way is to download installation package from https://cloud.r-project.org/

“Hello world” of R

Run it from R console

Command R will start a R console, and you can run R code inside R console.

(base) ➜  benchling git:(b_test_pr) ✗ R

R version 4.2.2 (2022-10-31) -- "Innocent and Trusting"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin20 (64-bit)

...
> print("hello,world")
[1] "hello,world"

Run it from terminal

Rscript is a binary front-end to R, for use in scripting applications, see https://linux.die.net/man/1/rscript for more detail.

(base) ➜  R git:(b_test_pr) ✗ cat hello.R
print("hello,world")
(base) ➜  R git:(b_test_pr) ✗ Rscript hello.R
[1] "hello,world"

Install commonly used packages

R installation package comes along with a lot of useful packages, besides that, there are a lot of useful packages available from CRAN.

Here are top 10 most important packages in R for data science.

  • ggplot2
  • data.table
  • dplyr
  • tidyr
  • Shiny
  • plotly
  • knitr
  • mlr3

To install those packages from CRAN, we can just simiply follow below steps.

  • Start R console
  • Call “install.packages(XXX)”

Here is an example:

> install.packages("mlr3")
--- Please select a CRAN mirror for use in this session ---
Secure CRAN mirrors

 1: 0-Cloud [https]
 2: Australia (Canberra) [https]
 3: Australia (Melbourne 1) [https]
 ....
 Selection: 1
also installing the dependencies ‘globals’, ‘listenv’, ‘PRROC’, ‘future’, ‘future.apply’, ‘lgr’, ‘mlbench’, ‘mlr3measures’, ‘mlr3misc’, ‘parallelly’, ‘palmerpenguins’, ‘paradox’

trying URL 'https://cloud.r-project.org/bin/macosx/big-sur-arm64/contrib/4.2/globals_0.16.2.tgz'
...
>> library(mlr3)
> ?mlr3

As above, after installation completes, we can try to run library(<package name>) to verify, and run ?<package name> to see its document.

Reference

When users access a non-existent page, they may see a page which shows the message “404 page not found” in Ghost blog.

We can redirect the page to another page by creating a error-404.hbs file under the theme folder.

In the following example, we redirect the users to the website home page when they access a non-existent page.

<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="refresh" content="0; url='http://localhost:2368/'" />
  </head>
  <body>
  </body>
</html>

Install GNOME desktop via the groups option of yum:

$ yum update
$ yum -y groups install "GNOME Desktop"

Inform the startx command which desktop env to run:

$ echo "exec gnome-session" >> ~/.xinitrc

Manually start the GUI desktop:

$ startx

Automaticlly start the GUI desktop after reboot:

$ systemctl set-default graphical.target
$ reboot
0%