Each CockroachDB node is an equally suitable SQL gateway to your cluster, but to ensure client performance and reliability, it’s important to use load balancing.
Performance: Load balancers spread client traffic across nodes. This prevents any one node from being overwhelmed by requests and improves overall cluster performance (queries per second).
Reliability: Load balancers decouple client health from the health of a single CockroachDB node. In cases where a node fails, the load balancer redirects client traffic to available nodes.
HAProxy is one of the most popular open-source TCP load balancers, and CockroachDB includes a built-in command for generating a configuration file that is preset to work with your running cluster.
With a single load balancer, client connections are resilient to node failure, but the load balancer itself is a point of failure. It’s therefore best to make load balancing resilient as well by using multiple load balancing instances, with a mechanism like floating IPs or DNS to select load balancers for clients.
For performance and availability reasons, it is not recommended to run the HAproxy on the same node as the cockroachDB.
This post shows a detailed steps how to deploy a HAProxy load balancer for CockroachDB cluster.
Deploy the CockroachDB cluster
[root@host1 ~]# ps -ef | grep cockroach | grep -v grep
root 4783 1 99 18:59 ? 2-17:35:59 cockroach start --log-dir=/var/log/cockroachdb_logs/1 --store=/mnt/cockroachdb_mnt1 --insecure --listen-addr=host1:26257 --http-addr=host1:8080 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root 4876 1 95 18:59 ? 03:33:42 cockroach start --log-dir=/var/log/cockroachdb_logs/2 --store=/mnt/cockroachdb_mnt2 --insecure --listen-addr=host1:26258 --http-addr=host1:8081 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root 4932 1 77 18:59 ? 02:53:13 cockroach start --log-dir=/var/log/cockroachdb_logs/3 --store=/mnt/cockroachdb_mnt3 --insecure --listen-addr=host1:26259 --http-addr=host1:8082 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root 5049 1 99 18:59 ? 06:02:17 cockroach start --log-dir=/var/log/cockroachdb_logs/4 --store=/mnt/cockroachdb_mnt4 --insecure --listen-addr=host1:26260 --http-addr=host1:8083 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root 5131 1 94 18:59 ? 03:31:51 cockroach start --log-dir=/var/log/cockroachdb_logs/5 --store=/mnt/cockroachdb_mnt5 --insecure --listen-addr=host1:26261 --http-addr=host1:8084 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root 5220 1 72 18:59 ? 02:42:39 cockroach start --log-dir=/var/log/cockroachdb_logs/6 --store=/mnt/cockroachdb_mnt6 --insecure --listen-addr=host1:26262 --http-addr=host1:8085 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root 5309 1 94 18:59 ? 03:30:22 cockroach start --log-dir=/var/log/cockroachdb_logs/7 --store=/mnt/cockroachdb_mnt7 --insecure --listen-addr=host1:26263 --http-addr=host1:8086 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root 5358 1 58 18:59 ? 02:11:20 cockroach start --log-dir=/var/log/cockroachdb_logs/8 --store=/mnt/cockroachdb_mnt8 --insecure --listen-addr=host1:26264 --http-addr=host1:8087 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
[root@host1 ~]# cockroach node status --insecure --host=host1:26257
id | address | sql_address | build | started_at | updated_at | locality | is_available | is_live
-----+---------------------+---------------------+---------+----------------------------+----------------------------+----------+--------------+----------
1 | host1:26257 | host1:26257 | v22.1.6 | 2022-09-17 18:59:37.989384 | 2022-09-17 22:43:39.507057 | | true | true
2 | host1:26258 | host1:26258 | v22.1.6 | 2022-09-17 18:59:39.128398 | 2022-09-17 22:43:40.640431 | | true | true
3 | host1:26259 | host1:26259 | v22.1.6 | 2022-09-17 18:59:39.374127 | 2022-09-17 22:43:40.885678 | | true | true
4 | host1:26260 | host1:26260 | v22.1.6 | 2022-09-17 18:59:39.726667 | 2022-09-17 22:43:36.73931 | | true | true
5 | host1:26261 | host1:26261 | v22.1.6 | 2022-09-17 18:59:39.971549 | 2022-09-17 22:43:36.983538 | | true | true
6 | host1:26262 | host1:26262 | v22.1.6 | 2022-09-17 18:59:40.225358 | 2022-09-17 22:43:37.235768 | | true | true
7 | host1:26263 | host1:26263 | v22.1.6 | 2022-09-17 18:59:40.447688 | 2022-09-17 22:43:37.459962 | | true | true
8 | host1:26264 | host1:26264 | v22.1.6 | 2022-09-17 18:59:40.677217 | 2022-09-17 22:43:37.690883 | | true | true
9 | host2:26257 | host2:26257 | v22.1.6 | 2022-09-17 18:59:41.604348 | 2022-09-17 22:43:38.613662 | | true | true
10 | host2:26258 | host2:26258 | v22.1.6 | 2022-09-17 18:59:41.904071 | 2022-09-17 22:43:38.912773 | | true | true
11 | host2:26259 | host2:26259 | v22.1.6 | 2022-09-17 18:59:42.194446 | 2022-09-17 22:43:39.202688 | | true | true
12 | host2:26260 | host2:26260 | v22.1.6 | 2022-09-17 18:59:42.512201 | 2022-09-17 22:43:39.519727 | | true | true
13 | host2:26261 | host2:26261 | v22.1.6 | 2022-09-17 18:59:42.850974 | 2022-09-17 22:43:39.860063 | | true | true
14 | host2:26262 | host2:26262 | v22.1.6 | 2022-09-17 18:59:43.177162 | 2022-09-17 22:43:40.191082 | | true | true
15 | host2:26263 | host2:26263 | v22.1.6 | 2022-09-17 18:59:43.543973 | 2022-09-17 22:43:40.553146 | | true | true
16 | host2:26264 | host2:26264 | v22.1.6 | 2022-09-17 18:59:43.858993 | 2022-09-17 22:43:40.86705 | | true | true
17 | host3:26257 | host3:26257 | v22.1.6 | 2022-09-17 18:59:44.807641 | 2022-09-17 22:43:37.315927 | | true | true
18 | host3:26258 | host3:26258 | v22.1.6 | 2022-09-17 18:59:45.119533 | 2022-09-17 22:43:37.628497 | | true | true
19 | host3:26259 | host3:26259 | v22.1.6 | 2022-09-17 18:59:45.508535 | 2022-09-17 22:43:38.015714 | | true | true
20 | host3:26260 | host3:26260 | v22.1.6 | 2022-09-17 18:59:45.901631 | 2022-09-17 22:43:38.409443 | | true | true
21 | host3:26261 | host3:26261 | v22.1.6 | 2022-09-17 18:59:46.191724 | 2022-09-17 22:43:38.700531 | | true | true
22 | host3:26262 | host3:26262 | v22.1.6 | 2022-09-17 18:59:46.540504 | 2022-09-17 22:43:39.048942 | | true | true
23 | host3:26263 | host3:26263 | v22.1.6 | 2022-09-17 18:59:46.944353 | 2022-09-17 22:43:39.453175 | | true | true
24 | host3:26264 | host3:26264 | v22.1.6 | 2022-09-17 18:59:47.274458 | 2022-09-17 22:43:39.782378 | | true | true
(24 rows)
Generate HAProxy configuration file
Install HAProxy on one of the CockroachDB nodes:
[root@host1 ~]# yum install haproxy
[root@host1 ~]# haproxy -v
HA-Proxy version 1.5.18 2016/05/10
Copyright 2000-2016 Willy Tarreau <willy@haproxy.org>
Generate the HAProxy configuration file:
[root@host1 ~]# cockroach gen haproxy --host=host1:26257 --insecure
[root@host1 ~]# cat haproxy.cfg
global
maxconn 4096
defaults
mode tcp
# Timeout values should be configured for your specific use.
# See: https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4-timeout%20connect
# With the timeout connect 5 secs,
# if the backend server is not responding, haproxy will make a total
# of 3 connection attempts waiting 5s each time before giving up on the server,
# for a total of 15 seconds.
retries 2
timeout connect 5s
# timeout client and server govern the maximum amount of time of TCP inactivity.
# The server node may idle on a TCP connection either because it takes time to
# execute a query before the first result set record is emitted, or in case of
# some trouble on the server. So these timeout settings should be larger than the
# time to execute the longest (most complex, under substantial concurrent workload)
# query, yet not too large so truly failed connections are lingering too long
# (resources associated with failed connections should be freed reasonably promptly).
timeout client 10m
timeout server 10m
# TCP keep-alive on client side. Server already enables them.
option clitcpka
listen psql
bind :26257
mode tcp
balance roundrobin
option httpchk GET /health?ready=1
server cockroach1 host1:26257 check port 8080
server cockroach2 host1:26258 check port 8081
server cockroach3 host1:26259 check port 8082
server cockroach4 host1:26260 check port 8083
server cockroach5 host1:26261 check port 8084
server cockroach6 host1:26262 check port 8085
server cockroach7 host1:26263 check port 8086
server cockroach8 host1:26264 check port 8087
server cockroach9 host2:26257 check port 8080
server cockroach10 host2:26258 check port 8081
server cockroach11 host2:26259 check port 8082
server cockroach12 host2:26260 check port 8083
server cockroach13 host2:26261 check port 8084
server cockroach14 host2:26262 check port 8085
server cockroach15 host2:26263 check port 8086
server cockroach16 host2:26264 check port 8087
server cockroach17 host3:26257 check port 8080
server cockroach18 host3:26258 check port 8081
server cockroach19 host3:26259 check port 8082
server cockroach20 host3:26260 check port 8083
server cockroach21 host3:26261 check port 8084
server cockroach22 host3:26262 check port 8085
server cockroach23 host3:26263 check port 8086
server cockroach24 host3:26264 check port 8087
Start HAProxy
It’s not recommended to deploy HAProxy on the same node as CockroachDB for performance and availability reasons. Thus, we are going to start the HAProxy on a dedicated node.
Connect to the dedicated HAProxy node and install the proxy on it:
[root@host4 ~]# yum install haproxy
Start HAproxy with the generated configuration file:
[root@host4 ~]# scp host1:/root/haproxy.cfg ./
[root@host4 ~]# haproxy -f haproxy.cfg
Verify the HAProxy
The Load Balancer distributes the requests sent to the cluster equally between the different nodes.
From primary node, we can check which node is being connected as below:
[root@host1 ~]# cockroach sql --host=host4 --insecure
root@host4:26257/defaultdb> SHOW node_id;
node_id
-----------
1
root@host4:26257/defaultdb> exit
[root@host1 ~]# cockroach sql --host=host4 --insecure
root@host4:26257/defaultdb> SHOW node_id;
node_id
-----------
2
[root@host1 ~]# cockroach sql --host=host4 --insecure
root@host4:26257/defaultdb> SHOW node_id;
node_id
-----------
3
Run TPCC through the HAProxy load balancer
[root@host4 cockroachdb]# cockroach workload run tpcc --warehouses=5000 --split --scatter --ramp=30s --duration=5m 'postgresql://root@host4:26257/tpcc?sslmode=disable'
Reference