Benchmarking Elasticsearch cluster with Rally

Posted on 2022-10-26 Edited on 2023-10-22 In Tech , Benchmarking Views:

Install Rally on each node

Prerequisites

$ yum update
$ yum install openssl-devel bzip2-devel libffi-devel
$ yum groupinstall "Development Tools"

Install Python 3.8+

$ wget https://www.python.org/ftp/python/3.8.15/Python-3.8.15.tar.xz
$ tar xf Python-3.8.15.tar.xz
$ vim Python-3.8.15/Modules/Setup
SSL=/usr/local/ssl
_ssl _ssl.c \
        -DUSE_SSL -I$(SSL)/include -I$(SSL)/include/openssl \
        -L$(SSL)/lib -lssl -lcrypto
$ mv Python-3.8.15 /usr/src
$ cd /usr/src/Python-3.8.15/
$ ./configure --enable-optimizations
$ make altinstall
$ python3.8 -m ssl
$ pip3.8 install --upgrade pip

Install Git (Not required for load generator node)

$ yum install libcurl-devel
$ wget https://mirrors.edge.kernel.org/pub/software/scm/git/git-2.38.1.tar.xz
$ tar xvf git-2.38.1.tar.xz
$ cd git-2.38.1/
$ make configure
$ ./configure --prefix=/usr/local
$ make install
$ git --version
git version 2.38.1

Install JDK (Not required for load generator node)

$ yum install java
$ java -version
openjdk version "1.8.0_352"

Install esrally

$ pip3.8 install esrally
$ esrally --version
esrally 2.6.0

Elasticsearch can not be launched as root. Create a non-root user on each node.

$ groupadd es
$ useradd es -g es
$ passwd es
$ cd /home/es
$ su - es

Set JAVA_HOME path

$ vim .bash_profile
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64/jre
export JAVA_HOME
$ source .bash_profile
$ echo $JAVA_HOME
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64/jre

Benchmarking a Single Node

Install Elasticsearch

$ esrally install --distribution-version=7.17.0 --node-name="rally-node-0" --network-host="127.0.0.1" --http-port=39200 --master-nodes="rally-node-0" --seed-hosts="127.0.0.1:39300"

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] Downloading Elasticsearch 7.17.0 (297.0 MB total size)                       [100%]
{
  "installation-id": "10735bfa-f1b8-44c4-8e7f-8932c8daa201"
}

--------------------------------
[INFO] SUCCESS (took 10 seconds)
--------------------------------

Start the Elasticsearch node

$ export INSTALLATION_ID=10735bfa-f1b8-44c4-8e7f-8932c8daa201
$ export RACE_ID=$(uuidgen)
$ esrally start --installation-id="${INSTALLATION_ID}" --race-id="${RACE_ID}"

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/


-------------------------------
[INFO] SUCCESS (took 3 seconds)
-------------------------------

Run a benchmark

$ esrally race --pipeline=benchmark-only --target-host=127.0.0.1:39200 --track=geonames --challenge=append-no-conflicts-index-only --on-error=abort --race-id=${RACE_ID}

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] Race id is [4e93324a-9326-49e8-be72-9f77cc837657]
[INFO] Downloading track data (252.9 MB total size)                               [100.0%]
[INFO] Decompressing track data from [/home/es/.rally/benchmarks/data/geonames/documents-2.json.bz2] to [/home/es/.rally/benchmarks/data/geonames/documents-2.json] (resulting size: [3.30] GB) ... [OK]
[INFO] Preparing file offset table for [/home/es/.rally/benchmarks/data/geonames/documents-2.json] ... [OK]
[INFO] Racing on track [geonames], challenge [append-no-conflicts-index-only] and car ['external'] with version [7.17.0].

Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running check-cluster-health                                                   [100% done]
Running index-append                                                           [100% done]
Running force-merge                                                            [100% done]
Running wait-until-merges-finish                                               [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                                                         Metric |         Task |           Value |   Unit |
|---------------------------------------------------------------:|-------------:|----------------:|-------:|
|                     Cumulative indexing time of primary shards |              |    12.5856      |    min |
|             Min cumulative indexing time across primary shards |              |     0.0190333   |    min |
|          Median cumulative indexing time across primary shards |              |     2.49065     |    min |
|             Max cumulative indexing time across primary shards |              |     2.61288     |    min |
|            Cumulative indexing throttle time of primary shards |              |     0.0081      |    min |
|    Min cumulative indexing throttle time across primary shards |              |     0           |    min |
| Median cumulative indexing throttle time across primary shards |              |     0           |    min |
|    Max cumulative indexing throttle time across primary shards |              |     0.0061      |    min |
|                        Cumulative merge time of primary shards |              |     6.75938     |    min |
|                       Cumulative merge count of primary shards |              |    55           |        |
|                Min cumulative merge time across primary shards |              |     0           |    min |
|             Median cumulative merge time across primary shards |              |     1.32735     |    min |
|                Max cumulative merge time across primary shards |              |     1.47518     |    min |
|               Cumulative merge throttle time of primary shards |              |     1.72393     |    min |
|       Min cumulative merge throttle time across primary shards |              |     0           |    min |
|    Median cumulative merge throttle time across primary shards |              |     0.323533    |    min |
|       Max cumulative merge throttle time across primary shards |              |     0.42085     |    min |
|                      Cumulative refresh time of primary shards |              |     2.97073     |    min |
|                     Cumulative refresh count of primary shards |              |   246           |        |
|              Min cumulative refresh time across primary shards |              |     0.00185     |    min |
|           Median cumulative refresh time across primary shards |              |     0.5943      |    min |
|              Max cumulative refresh time across primary shards |              |     0.599217    |    min |
|                        Cumulative flush time of primary shards |              |     0.237767    |    min |
|                       Cumulative flush count of primary shards |              |     8           |        |
|                Min cumulative flush time across primary shards |              |     0.00241667  |    min |
|             Median cumulative flush time across primary shards |              |     0.0473      |    min |
|                Max cumulative flush time across primary shards |              |     0.0492167   |    min |
|                                        Total Young Gen GC time |              |    14.009       |      s |
|                                       Total Young Gen GC count |              |  1159           |        |
|                                          Total Old Gen GC time |              |     3.491       |      s |
|                                         Total Old Gen GC count |              |    66           |        |
|                                                     Store size |              |     3.20482     |     GB |
|                                                  Translog size |              |     3.07336e-07 |     GB |
|                                         Heap used for segments |              |     1.00685     |     MB |
|                                       Heap used for doc values |              |     0.0602913   |     MB |
|                                            Heap used for terms |              |     0.77124     |     MB |
|                                            Heap used for norms |              |     0.104492    |     MB |
|                                           Heap used for points |              |     0           |     MB |
|                                    Heap used for stored fields |              |     0.0708237   |     MB |
|                                                  Segment count |              |   141           |        |
|                                    Total Ingest Pipeline count |              |     0           |        |
|                                     Total Ingest Pipeline time |              |     0           |      s |
|                                   Total Ingest Pipeline failed |              |     0           |        |
|                                                 Min Throughput | index-append | 86654           | docs/s |
|                                                Mean Throughput | index-append | 87091.9         | docs/s |
|                                              Median Throughput | index-append | 87152.4         | docs/s |
|                                                 Max Throughput | index-append | 87276.3         | docs/s |
|                                        50th percentile latency | index-append |   315.247       |     ms |
|                                        90th percentile latency | index-append |   573.935       |     ms |
|                                        99th percentile latency | index-append |  1139.89        |     ms |
|                                       100th percentile latency | index-append |  1153.22        |     ms |
|                                   50th percentile service time | index-append |   315.247       |     ms |
|                                   90th percentile service time | index-append |   573.935       |     ms |
|                                   99th percentile service time | index-append |  1139.89        |     ms |
|                                  100th percentile service time | index-append |  1153.22        |     ms |
|                                                     error rate | index-append |     0           |      % |


---------------------------------
[INFO] SUCCESS (took 255 seconds)
---------------------------------

Stop the Elasticsearch node

$ esrally stop --installation-id="${INSTALLATION_ID}"

If you only want to shutdown the node but don’t want to delete the node and the data, pass –preserve-install additionally.

Benchmarking a Cluster

Install and start Elasticsearch on each cluster node

$ esrally install --distribution-version=7.17.0 --node-name="rally-node-0" --network-host="10.10.10.2" --http-port=39200 --master-nodes="rally-node-0,rally-node-1,rally-node-2" --seed-hosts="10.10.10.2:39300,10.10.10.3:39300,10.10.10.4:39300"
[INFO] Downloading Elasticsearch 7.17.0 (297.0 MB total size)  [100%]
{
  "installation-id": "aa826112-d371-4f09-9b68-f9084e7c9e0b"
}

Generate a race id on one of the nodes

$ uuidgen
734bb4b3-8b7a-4c0b-9fa6-aaeb4659569f

Note: The same race id is set on all the nodes including the one where will generate load.

Start the cluster by running the following command on each node

$ export INSTALLATION_ID=aa826112-d371-4f09-9b68-f9084e7c9e0b
$ export RACE_ID=734bb4b3-8b7a-4c0b-9fa6-aaeb4659569f
$ esrally start --installation-id="${INSTALLATION_ID}" --race-id="${RACE_ID}"

Note: The INSTALLATION_ID is specific to each node and the RACI_ID is identical for all the nodes.

Once the cluster is started, check the cluster status with the _cat/health API

[es@node1 ~]$ curl http://10.10.10.2:39200/_cat/health\?v
epoch      timestamp cluster         status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1667015754 03:55:54  rally-benchmark green           3         3      6   3    0    0        0             0                  -                100.0%

On each cluster node, check the elastic process and port

$ ps -ef | egrep -i "rally|elastic" | grep -v grep
es        2258     1 91 20:52 ?        00:58:39 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64/jre/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j2.formatMsgNoLookups=true -Djava.locale.providers=SPI,JRE -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.io.tmpdir=/tmp/elasticsearch-1322059221495755520 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/heapdump -XX:ErrorFile=/home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/logs/server/hs_err_pid%p.log -XX:+ExitOnOutOfMemoryError -XX:MaxDirectMemorySize=536870912 -Des.path.home=/home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/install/elasticsearch-7.17.0 -Des.path.conf=/home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/install/elasticsearch-7.17.0/config -Des.distribution.flavor=default -Des.distribution.type=tar -Des.bundled_jdk=true -cp /home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/install/elasticsearch-7.17.0/lib/* org.elasticsearch.bootstrap.Elasticsearch -d -p ./pid
es        2285  2258  0 20:52 ?        00:00:00 /home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/install/elasticsearch-7.17.0/modules/x-pack-ml/platform/linux-x86_64/bin/controller

$ netstat -anop | grep 39200
tcp6       0      0 10.10.10.2:39200       :::*                    LISTEN      2258/java            off (0.00/0/0)

Start the benchmark on the load generator node (remember to set the race id there)

[es@node1 ~]$ export RACE_ID=734bb4b3-8b7a-4c0b-9fa6-aaeb4659569f
[es@node1 ~]$ esrally race --pipeline=benchmark-only --target-host=10.10.10.2:39200,10.10.10.3:39200,10.10.10.4:39200 --track=geonames --challenge=append-no-conflicts --on-error=abort --race-id=${RACE_ID}
    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] Race id is [734bb4b3-8b7a-4c0b-9fa6-aaeb4659569f]
[INFO] Racing on track [geonames], challenge [append-no-conflicts] and car ['external'] with version [7.17.0].

[WARNING] merges_total_time is 420149 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] merges_total_throttled_time is 81765 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] indexing_total_time is 825388 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 76340 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] flush_total_time is 10787 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running check-cluster-health                                                   [100% done]
Running index-append                                                           [100% done]
Running refresh-after-index                                                    [100% done]
Running force-merge                                                            [100% done]
Running refresh-after-force-merge                                              [100% done]
Running wait-until-merges-finish                                               [100% done]
Running index-stats                                                            [100% done]
Running node-stats                                                             [100% done]
Running default                                                                [100% done]
Running term                                                                   [100% done]
Running phrase                                                                 [100% done]
Running country_agg_uncached                                                   [100% done]
Running country_agg_cached                                                     [100% done]
Running scroll                                                                 [100% done]
Running expression                                                             [100% done]
Running painless_static                                                        [100% done]
Running painless_dynamic                                                       [100% done]
Running decay_geo_gauss_function_score                                         [100% done]
Running decay_geo_gauss_script_score                                           [100% done]
Running field_value_function_score                                             [100% done]
Running field_value_script_score                                               [100% done]
Running large_terms                                                            [100% done]
Running large_filtered_terms                                                   [100% done]
Running large_prohibited_terms                                                 [100% done]
Running desc_sort_population                                                   [100% done]
Running asc_sort_population                                                    [100% done]
Running asc_sort_with_after_population                                         [100% done]
Running desc_sort_geonameid                                                    [100% done]
Running desc_sort_with_after_geonameid                                         [100% done]
Running asc_sort_geonameid                                                     [100% done]
Running asc_sort_with_after_geonameid                                          [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                                                         Metric |                           Task |          Value |    Unit |
|---------------------------------------------------------------:|-------------------------------:|---------------:|--------:|
|                     Cumulative indexing time of primary shards |                                |   13.3055      |     min |
|             Min cumulative indexing time across primary shards |                                |    0           |     min |
|          Median cumulative indexing time across primary shards |                                |    2.68572     |     min |
|             Max cumulative indexing time across primary shards |                                |    2.74885     |     min |
|            Cumulative indexing throttle time of primary shards |                                |    0           |     min |
|    Min cumulative indexing throttle time across primary shards |                                |    0           |     min |
| Median cumulative indexing throttle time across primary shards |                                |    0           |     min |
|    Max cumulative indexing throttle time across primary shards |                                |    0           |     min |
|                        Cumulative merge time of primary shards |                                |    4.82182     |     min |
|                       Cumulative merge count of primary shards |                                |   57           |         |
|                Min cumulative merge time across primary shards |                                |    0           |     min |
|             Median cumulative merge time across primary shards |                                |    0.984917    |     min |
|                Max cumulative merge time across primary shards |                                |    1.06472     |     min |
|               Cumulative merge throttle time of primary shards |                                |    0.978367    |     min |
|       Min cumulative merge throttle time across primary shards |                                |    0           |     min |
|    Median cumulative merge throttle time across primary shards |                                |    0.195508    |     min |
|       Max cumulative merge throttle time across primary shards |                                |    0.265933    |     min |
|                      Cumulative refresh time of primary shards |                                |    1.20573     |     min |
|                     Cumulative refresh count of primary shards |                                |  148           |         |
|              Min cumulative refresh time across primary shards |                                |    3.33333e-05 |     min |
|           Median cumulative refresh time across primary shards |                                |    0.258775    |     min |
|              Max cumulative refresh time across primary shards |                                |    0.283433    |     min |
|                        Cumulative flush time of primary shards |                                |    0.172783    |     min |
|                       Cumulative flush count of primary shards |                                |   11           |         |
|                Min cumulative flush time across primary shards |                                |    1.66667e-05 |     min |
|             Median cumulative flush time across primary shards |                                |    0.0345      |     min |
|                Max cumulative flush time across primary shards |                                |    0.0385167   |     min |
|                                        Total Young Gen GC time |                                |   16.263       |       s |
|                                       Total Young Gen GC count |                                | 2821           |         |
|                                          Total Old Gen GC time |                                |    2.312       |       s |
|                                         Total Old Gen GC count |                                |   41           |         |
|                                                     Store size |                                |    3.03867     |      GB |
|                                                  Translog size |                                |    3.58559e-07 |      GB |
|                                         Heap used for segments |                                |    0.701981    |      MB |
|                                       Heap used for doc values |                                |    0.0314178   |      MB |
|                                            Heap used for terms |                                |    0.54541     |      MB |
|                                            Heap used for norms |                                |    0.0736694   |      MB |
|                                           Heap used for points |                                |    0           |      MB |
|                                    Heap used for stored fields |                                |    0.0514832   |      MB |
|                                                  Segment count |                                |  102           |         |
|                                    Total Ingest Pipeline count |                                |    0           |         |
|                                     Total Ingest Pipeline time |                                |    0           |       s |
|                                   Total Ingest Pipeline failed |                                |    0           |         |
|                                                     error rate |                   index-append |    0           |       % |
|                                                 Min Throughput |                    index-stats |   90.01        |   ops/s |
|                                                Mean Throughput |                    index-stats |   90.02        |   ops/s |
|                                              Median Throughput |                    index-stats |   90.02        |   ops/s |
|                                                 Max Throughput |                    index-stats |   90.04        |   ops/s |
|                                        50th percentile latency |                    index-stats |    5.16153     |      ms |
|                                        90th percentile latency |                    index-stats |    6.00114     |      ms |
|                                        99th percentile latency |                    index-stats |    6.61081     |      ms |
|                                      99.9th percentile latency |                    index-stats |   10.4064      |      ms |
|                                       100th percentile latency |                    index-stats |   10.8105      |      ms |
|                                   50th percentile service time |                    index-stats |    4.00402     |      ms |
|                                   90th percentile service time |                    index-stats |    4.6339      |      ms |
|                                   99th percentile service time |                    index-stats |    5.10083     |      ms |
|                                 99.9th percentile service time |                    index-stats |    9.17415     |      ms |
|                                  100th percentile service time |                    index-stats |    9.22474     |      ms |
[..]

[WARNING] No throughput metrics available for [index-append]. Likely cause: The benchmark ended already during warmup.

----------------------------------
[INFO] SUCCESS (took 4008 seconds)
----------------------------------

Shutdown the cluster on each node

$ esrally stop --installation-id="${INSTALLATION_ID}"

Note: If you only want to shutdown the node but don’t want to delete the node and the data, add the option “–preserve-install” additionally.

Troubleshooting

Elasticsearch start failure due to max virtual memory is too low

$ cat /home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/logs/server/rally-benchmark.log
[..]
[2022-10-28T16:58:41,041][ERROR][o.e.b.Bootstrap          ] [rally-node-0] node validation exception
[1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.
bootstrap check failure [1] of [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[..]

$ sysctl -a | grep max_map_count
vm.max_map_count = 65530

To fix this issue, change the kernel parameter

$ vim /etc/sysctl.conf
vm.max_map_count=1048576
$ sysctl -p
vm.max_map_count = 1048576

Restart Elasticsearch and verify the process and port

$ esrally stop --installation-id="${INSTALLATION_ID}" --preserve-install
$ esrally start --installation-id="${INSTALLATION_ID}" --race-id="${RACE_ID}"

$ netstat -anop | grep 39200
tcp6       0      0 10.10.10.2:39200       :::*                    LISTEN      23726/java           off (0.00/0/0)

$ ps -ef | egrep -i "rally|elastic" | grep -v grep
es       23726     1 18 18:01 ?        00:00:42 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64/jre/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j2.formatMsgNoLookups=true -Djava.locale.providers=SPI,JRE -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.io.tmpdir=/tmp/elasticsearch-7969870787666953814 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/heapdump -XX:ErrorFile=/home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/logs/server/hs_err_pid%p.log -XX:+ExitOnOutOfMemoryError -XX:MaxDirectMemorySize=536870912 -Des.path.home=/home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/install/elasticsearch-7.17.0 -Des.path.conf=/home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/install/elasticsearch-7.17.0/config -Des.distribution.flavor=default -Des.distribution.type=tar -Des.bundled_jdk=true -cp /home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/install/elasticsearch-7.17.0/lib/* org.elasticsearch.bootstrap.Elasticsearch -d -p ./pid
es       23752 23726  0 18:01 ?        00:00:00 /home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/install/elasticsearch-7.17.0/modules/x-pack-ml/platform/linux-x86_64/bin/controller

Elasticsearch start failure due to max file descriptors is too low

$ cat /home/es/.rally/benchmarks/races/aa826112-d371-4f09-9b68-f9084e7c9e0b/rally-node-0/logs/server/rally-benchmark.log
[..]
[2022-10-28T18:02:09,107][ERROR][o.e.b.Bootstrap          ] [rally-node-1] node validation exception
[1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.
bootstrap check failure [1] of [1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535]
[..]

To fix this issue, change the max open files value in /etc/security/limits.conf

$ vim /etc/security/limits.conf
*              soft     nofile          1048576
*              hard     nofile          1048576

Exit and login back to the shell to see the changed value

ulimit -a  | grep "open files"
open files                      (-n) 1048576

Install Rally on each node

Benchmarking a Single Node

Benchmarking a Cluster

Troubleshooting

Reference