In this article, we will build a docker image which is based on python versioned alpine Linux. Alpine Linux is much smaller than most distribution base images (~5MB), and thus leads to much slimmer images in general.

In the python versioned alpine Linux image, we can add additional packages to support our python script “perfbench.py”. The following is the Dockerfile we will use to build the docker image.

cat Dockerfile
FROM python:3-alpine

RUN apk add --no-cache \
    bash \
    sudo \
    lsblk \
    util-linux \
    procps \
    fio==3.28-r1

COPY perfbench/perfbench.py /

Now that we have Dockerfile defined, we can build the docker image as below.

$ docker build -t perfbench .

$ docker images
REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
perfbench                            latest              fb9441429ea1        21 minutes ago      61.2MB
python                               3-alpine            08d07b62c1c9        2 days ago          48.6MB

Detached mode

To start a container in detached mode, we can use the -d option.

$ docker run -t -d --privileged --name myperfbench -v /data:/data perfbench

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                         NAMES
3473aa8332cb        perfbench            "python3"                4 seconds ago       Up 3 seconds                                      myperfbench

$ docker exec -it myperfbench bash
bash-5.1# cat /etc/alpine-release
3.15.0
bash-5.1# python --version
Python 3.10.2

Notes:

1.
Adding the “-t” flag prevents the container from exiting when running in the background. It allocates a pseudo-tty. You would see the following issue if it’s not specified.

$ docker run -d --name myperfbench perfbench

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                     PORTS                         NAMES
ed997c56e9e9        perfbench            "python3"                6 seconds ago       Exited (0) 4 seconds ago                                 myperfbench

2.
Using “–privileged” flag to give extended privileges to the container. For example, we can drop cache inside the container with this privilege.

Foreground mode

For interactive processes (like a shell), you must use -i -t together in order to allocate a tty for the container process.

$ docker run -it --rm --privileged --name myperfbench -v /data:/data perfbench bash

$ docker run --rm --privileged --name myperfbench -v /data:/data perfbench bash -c "python perfbench.py --dir /data --logdir /data/result"

Note:

  • By default a container’s file system persists even after the container exits. This makes debugging a lot easier (since you can inspect the final state) and you retain all your data by default. But if you are running short-term foreground processes, these container file systems can really pile up. If instead you’d like Docker to automatically clean up the container and remove the file system when the container exits, you can add the –rm flag.

Push docker image to docker repository

$ docker tag perfbench:latest noname/perfbench:latest
$ docker push noname/perfbench:latest

Reference

Why Docker-based fio benchmarking

fio is a flexible I/O tester which generates I/O and measure I/O performance on the target storage system. In the case we want to run the fio workload on the cloud deployments, we can containerize fio. Also we can encapsulate necessary packages in the docker image so that it can be easily deployed to avoid package dependency.

There are ready-to-use fio docker image online if you search with google. In this article, we discuss how to create a docker image which consumes a python script to run fio workload.

Build docker image with Dockerfile

Docker can build images automatically by reading the instructions from a Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image. Using docker build users can create an automated build that executes several command-line instructions in succession.

The following is Dockerfile we are going to use to build the docker image.

$ cat Dockerfile
FROM python:3-alpine

RUN apk add --no-cache \
    fio==3.28-r1 \
    sudo \
    lsblk \
    util-linux \
    procps

COPY perfbench/perfbench.py /
COPY perfbench/run.sh /

ENTRYPOINT [ "/run.sh" ]

We use Alpine Linux which is a security-oriented, lightweight Linux distribution based on musl libc and busybox. Since we need python support, we leverge the official python:3-alpine image which is based on Alpine Linux.

We install the latest supported fio-3.28 to the docker image. And we install packages like sudo, lsblk, util-linux and procps which are needed by the python script. We copy the python script and wrapper shell script to the root directory. The run.sh script will be run once the container is started in order to run fio benchmark.

The following is the run.sh script.

$ cat perfbench/run.sh
#!/bin/sh

[ -z "$FIO_DATA_DIR" ] && echo "FIO_DATA_DIR variable is required." && exit 1;
[ -z "$FIO_LOG_DIR" ] && echo "FIO_LOG_DIR variable is required." && exit 1;
[ ! -d "$FIO_DATA_DIR" ] && echo "The data directory $FIO_DATA_DIR does not exist." && exit 1;
[ ! -d "$FIO_LOG_DIR" ] && echo "The result directory $FIO_LOG_DIR does not exit." && exit 1;
echo "Running fio benchmark on directory $FIO_DATA_DIR"
python perfbench.py --dir $FIO_DATA_DIR --logdir $FIO_LOG_DIR

Now, we can build the docker image with the Dockerfile.

$ docker build -t perfbench .
Sending build context to Docker daemon  19.46kB
Step 1/5 : FROM python:3-alpine
Step 2/5 : RUN apk add --no-cache     fio==3.28-r1     sudo     lsblk     util-linux     procps
Step 3/5 : COPY perfbench/perfbench.py /
Step 4/5 : COPY perfbench/run.sh /
Step 5/5 : ENTRYPOINT /run.sh
Successfully built 9c0957911607
Successfully tagged perfbench:latest

$ docker image list
REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
perfbench                            latest              f63e13d57991        32 minutes ago      60MB
python                               3-alpine            c7100ae3ac4d        2 weeks ago         48.7MB

Run fio benchmark with docker

Use the following command to run fio benchmark. Note that we have defined the fio benchmark logic in the customized python script consumed by the container.

$ docker run --rm --privileged -v /data:/data -e FIO_DATA_DIR=/data -e FIO_LOG_DIR=/data/result perfbench

Note that the option “–privileged” is to allow the python script in the container to drop cache in this case. The same purpose can also be approached with the following method. Then we can do “echo 3 > drop_caches” in the container to drop cache.

$ docker run --rm -v /proc/sys/vm/drop_caches:/drop_caches -v /data:/data -e FIO_DATA_DIR=/data -e FIO_RESULT_DIR=/data/result perfbench

Reference

Modern technology has changed how people communicate with each other, and helped them connect with their friends and family. Children of age group 9-15 should not be allowed to have cell phones because they aren’t responsible enough. If they call an unknown number, they could talk with scammers and hackers, which could be dangerous. Also, it could easily distract them and give them the urge to waste time by playing video games. This makes the use of cell phones quite unnecessary.

Cell phones are commonly used for communication, but sometimes this could lead to danger for the children. They could misuse it and lose integrity. Children might call unknown strangers, who might ask them for personal information. Also, they might pick up scamming phone calls. They could easily get disturbed and get distracted to relax by playing games. Children aren’t responsible enough to handle a phone, and they start the trust of their parents by lying to them about what they’re doing with the phone.

In other people’s opinion, children should be given a phone. They think children are responsible enough and can control their time. They need it to contact parents if there’s an emergency, or to keep in touch with their friends. They can also use it for research on projects. The parents think they can teach kids to have self-control and self-maintenance. Some children, who are responsible, might be able to use their time wisely and use the cell phones to enhance their life.

Children are not old enough to be able to manage their time. They shouldn’t be trusted with a phone because they might lose their integrity. Cell phones are expensive items, so it would not be good if children accidentally lost them. If they really need to call a friend, they could borrow their parent’s phone. In times of emergency, they could borrow someone else’s phone to contact their parents. If they needed to do research, they could use the computer instead of a phone. Looking at a cell phone’s small screen could damage their vision, so they should use computers, which have a large screen. Overall, it is not a rational decision to let children have their own phones.

The First Amendment is one of the most important amendments to the Constitution. It allows U.S. citizens to express their ideas through words and deeds. It guarantees freedom of speech, the press, assembly, petitioning, and religion.

Read more »

Data structure

Bloom filter

A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. For example, checking availability of username is set membership problem, where the set is the list of all registered username. The price we pay for efficiency is that it is probabilistic in nature that means, there might be some False Positive results. False positive means, it might tell that given username is already taken but actually it’s not.

Source1
Source2

Merkle tree

In cryptography and computer science, a hash tree or Merkle tree is a tree in which every “leaf” (node) is labelled with the cryptographic hash of a data block, and every node that is not a leaf (called a branch, inner node, or inode) is labelled with the cryptographic hash of the labels of its child nodes. A hash tree allows efficient and secure verification of the contents of a large data structure. A hash tree is a generalization of a hash list and a hash chain.

Demonstrating that a leaf node is a part of a given binary hash tree requires computing a number of hashes proportional to the logarithm of the number of leaf nodes in the tree. Conversely, in a hash list, the number is proportional to the number of leaf nodes itself. A Merkle tree is therefore an efficient example of a cryptographic commitment scheme, in which the root of the tree is seen as a commitment and leaf nodes may be revealed and proven to be part of the original commitment.

Source

Vector Clock

A vector clock is a data structure used for determining the partial ordering of events in a distributed system and detecting causality violations. Just as in Lamport timestamps, inter-process messages contain the state of the sending process’s logical clock. A vector clock of a system of N processes is an array/vector of N logical clocks, one clock per process; a local “largest possible values” copy of the global clock-array is kept in each process.

Source

Linux

Daemon

In multitasking computer operating systems, a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user. Traditionally, the process names of a daemon end with the letter d, for clarification that the process is in fact a daemon, and for differentiation between a daemon and a normal computer program. For example, syslogd is a daemon that implements system logging facility, and sshd is a daemon that serves incoming SSH connections.

In a Unix environment, the parent process of a daemon is often, but not always, the init process. A daemon is usually created either by a process forking a child process and then immediately exiting, thus causing init to adopt the child process, or by the init process directly launching the daemon. In addition, a daemon launched by forking and exiting typically must perform other operations, such as dissociating the process from any controlling terminal (tty). Such procedures are often implemented in various convenience routines such as daemon(3) in Unix.

Systems often start daemons at boot time that will respond to network requests, hardware activity, or other programs by performing some task. Daemons such as cron may also perform defined tasks at scheduled times.

Source

SSH

The Secure Shell Protocol (SSH) is a cryptographic network protocol for operating network services securely over an unsecured network.[2] Its most notable applications are remote login and command-line execution.

SSH applications are based on a client–server architecture, connecting an SSH client instance with an SSH server.[3] SSH operates as a layered protocol suite comprising three principal hierarchical components: the transport layer provides server authentication, confidentiality, and integrity; the user authentication protocol validates the user to the server; and the connection protocol multiplexes the encrypted tunnel into multiple logical communication channels.

SSH was designed on Unix-like operating systems, as a replacement for Telnet and for unsecured remote Unix shell protocols, such as the Berkeley Remote Shell (rsh) and the related rlogin and rexec protocols, which all use insecure, plaintext transmission of authentication tokens.

Source

Telnet

Telnet is an application protocol used on the Internet or local area network to provide a bidirectional interactive text-oriented communication facility using a virtual terminal connection. User data is interspersed in-band with Telnet control information in an 8-bit byte oriented data connection over the Transmission Control Protocol (TCP).

Source

MISC

Chaos Engineering

Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system’s capability to withstand turbulent and unexpected conditions.

Source

Serialization

In computing, serialization is the process of translating a data structure or object state into a format that can be stored (for example, in a file or memory data buffer) or transmitted (for example, over a computer network) and reconstructed later (possibly in a different computer environment).When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

This process of serializing an object is also called marshalling an object in some situations.The opposite operation, extracting a data structure from a series of bytes, is deserialization, (also called unserialization or unmarshalling).

Source

Callback functions

In computer programming, a callback, also known as a “call-after” function, is any reference to executable code that is passed as an argument to other code; that other code is expected to call back (execute) the code at a given time. This execution may be immediate as in a synchronous callback, or it might happen at a later point in time as in an asynchronous callback. Programming languages support callbacks in different ways, often implementing them with subroutines, lambda expressions, blocks, or function pointers.

Source

Consistent Hashing

In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only n/m keys need to be remapped on average where n is the number of keys and m is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.

Source

Object Storage

Object storage, also known as object-based storage, is a strategy that manages and manipulates data storage as distinct units, called objects. These objects are kept in a single storehouse and are not ingrained in files inside other folders. Instead, object storage combines the pieces of data that make up a file, adds all its relevant metadata to that file, and attaches a custom identifier.

Object storage adds comprehensive metadata to the file, eliminating the tiered file structure used in file storage, and places everything into a flat address space, called a storage pool. This metadata is key to the success of object storage in that it provides deep analysis of the use and function of data in the storage pool.

Object storage vs. file storage vs. block storage

Object storage takes each piece of data and designates it as an object. Data is kept in separate storehouses versus files in folders and is bundled with associated metadata and a unique identifier to form a storage pool.

File storage stores data as a single piece of information in a folder to help organize it among other data. This is also called hierarchical storage, imitating the way that paper files are stored. When you need access to data, your computer system needs to know the path to find it.

Block storage takes a file apart into singular blocks of data and then stores these blocks as separate pieces of data. Each piece of data has a different address, so they don’t need to be stored in a file structure.

Benefits of object storage

Now that we’ve described what object storage is, what are its benefits?

  • Greater data analytics. Object storage is driven by metadata, and with this level of classification for every piece of data, the opportunity for analysis is far greater.
  • Infinite scalability. Keep adding data, forever. There’s no limit.

Faster data retrieval. Due to the categorization structure of object storage, and the lack of folder hierarchy, you can retrieve your data much faster.

  • Reduction in cost. Due to the scale-out nature of object storage, it’s less costly to store all your data.
  • Optimization of resources. Because object storage does not have a filing hierarchy, and the metadata is completely customizable, there are far fewer limitations than with file or block storage.

Object storage use cases

There are multiple use cases for object storage. For example, it can assist you in the following ways:

  • Deliver rich media. Define workflows by leveraging industry-leading solutions for managing unstructured data. Reduce your costs for globally distributed rich media.
  • Manage distributed content. Optimize the value of your data throughout its lifecycle and deliver competitive storage services.
  • Embrace the Internet of Things (IoT). Manage machine-to-machine data efficiently, support artificial intelligence and analytics, and compress the cost and time of the design process.

Source

Redis

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker. Redis provides data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes, and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions, and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.

Source

IP address

Networking is about one computer sending a message to another computer. This message is called packet in the IP world. It is just like a postcard in the postal service. The postal service can take a postcard addressed to someone and deliver it. Similarly, it needs to provide a network address when a computer to ask the network to deliver a packet to another computer. The network address is called IP address.

An Internet Protocol address is a numerical label such as 192.0.2.1 that is connected to a computer network that uses the Internet Protocol for communication. An IP address serves two main functions: network interface identification and location addressing. Internet Protocol version 4 (IPv4) defines an IP address as a 32-bit number. However, because of the growth of the Internet and the depletion of available IPv4 addresses, a new version of IP (IPv6) uses 128 bits for the IP address. Source

Network Routing

In order for a souce computer to deliver the addressed packet to the destination computer, it needs to understand the IP address similar to how the postal service understands a mailing address. For example, if the destination address is in the same postal code as sender’s, the mail probably never leaves the same neighbourhood. But if the destination mailing address is in a different postal code, the postal service has to leverage some mechanism to get to the destination. Network routing is very similar. If a packet is sent from 192.168.0.2 to 192.168.0.5, it is likely the two computers are close to each other and the routing is simple, probably just a single hop to get to the target computer. However, if a packet is sent from 192.168.1.2 to 7.7.0.5, it probably requires many hops across complicated network.

Network Port

On a postcard, the name of recipient is usually written. It is because there might be more than one person living at a particular mailing address. In the computer world, there may be many processes on a computer to use the network. The packet can only be sent to the target computer if all we have is a network address. There is no way to decide which process should receive the packet. This is solved by the use of a network port. It’s usually represented with the form of ip:port, for example, 7.7.0.5:80.

Transmission Control Protocol(TCP)

TCP is a layer on top of the Internet Protocol(IP) but they are often used together as “TCP/IP”.

Similar to the postal service, networking protocols also set an upper limit to how many bytes can be sent in a single IP packet. TCP can establish a connection to a particular port on the target computer and the desired data can be sent with multiple packets while each packet has a upper limit. TCP will ensure all the data arrives at the destination in the proper order by chopping it up into individually numbered IP packets. In the case of packet loss in transit, it can be resent. TCP also has mechanisms to notice if packets are not getting through for an extended period of time and notifying you of a “broken” connection.

TCP provides reliable, ordered, and error-checked delivery of a stream of octets (bytes) between applications running on hosts communicating via an IP network. Major internet applications such as the World Wide Web, email, remote administration, and file transfer rely on TCP, which is part of the Transport Layer of the TCP/IP suite. SSL/TLS often runs on top of TCP. TCP is connection-oriented, and a connection between client and server is established before data can be sent. The server must be listening (passive open) for connection requests from clients before a connection is established. Three-way handshake (active open), retransmission, and error detection adds to reliability but lengthens latency. Applications that do not require reliable data stream service may use the User Datagram Protocol (UDP), which provides a connectionless datagram service that prioritizes time over reliability. TCP employs network congestion avoidance. However, there are vulnerabilities to TCP, including denial of service, connection hijacking, TCP veto, and reset attack. Source

UDP

In computer networking, the User Datagram Protocol (UDP) is one of the core members of the Internet protocol suite. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol (IP) network. Prior communications are not required in order to set up communication channels or data paths. UDP uses a simple connectionless communication model with a minimum of protocol mechanisms. UDP provides checksums for data integrity, and port numbers for addressing different functions at the source and destination of the datagram. It has no handshaking dialogues, and thus exposes the user’s program to any unreliability of the underlying network; there is no guarantee of delivery, ordering, or duplicate protection. If error-correction facilities are needed at the network interface level, an application may instead use Transmission Control Protocol (TCP) or Stream Control Transmission Protocol (SCTP) which are designed for this purpose. UDP is suitable for purposes where error checking and correction are either not necessary or are performed in the application; UDP avoids the overhead of such processing in the protocol stack. Time-sensitive applications often use UDP because dropping packets is preferable to waiting for packets delayed due to retransmission, which may not be an option in a real-time system. The protocol was designed by David P. Reed in 1980 and formally defined in RFC 768. Source

Domain Name System(DNS)

The destination computer can be addressed with IP in network. However, it’s hard to remember the IP address directly for a user. When the computer and network are upgraded over time, it’s likely to assign a different IP address to a computer. The same destination computer would be unaccessible with the old IP address. Thus, a “phone book” called Domain Name System(DNS) is used to solve these issues.

The dig utility can be used to query DNS. In the follow example, the domain name google.com is eventually translated to the IP address 142.250.191.46.

$ dig google.com
[..]
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		222	IN	A	142.250.191.46
[..]

Hypertext Transfer Protocol(HTTP)

The Hypertext Transfer Protocol is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser. Source

When we type a web address, such as http://example.com/hello, into a web browser. Firstly, the browser consults DNS to translate the domain name into a IP address. Then, the web browser establishes a TCP connection to the translated IP address of the web server on port 80 which is the default “well known” port for the HTTP. Once a connection is established, the web browser send a GET message to the server to ask for a particular resource. The server will reply with an OK message and the requested content.

curl is a commonly used CLI tool that you can use to issue GET requests.

$ curl -v http://7.7.0.2/hello
* About to connect() to 7.7.0.2 port 80 (#0)
*   Trying 7.7.0.2...
* Connected to 7.7.0.2 (7.7.0.2) port 80 (#0)
> GET /hello HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 7.7.0.2
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: nginx/1.17.10
< Date: Sun, 13 Mar 2022 01:50:19 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 15
< Connection: keep-alive
< 
* Connection #0 to host 7.7.0.2 left intact
Hello world

Here we use curl utility to issue a simple GET request. In this case, we don’t have DNS setup. We issue the GET request to the target server with IP address directly. It establishes a TCP connection on default port 80 of that address. Then it sends a GET message for the resource /hello. Finally, a response of 200 OK is received, along with the content Hello world.

In the example output, there are additional lines in both the request and response which look like Name: Value. These are called headers which convey additional information about the request and response. For example, the request contains header Accept: / which means the client can accept the response in any format. In the response, we see the header Content-Type: text/html; charset=utf-8, which is the server telling the client the body of response is just a text.

OSI layer architecture

Image

Source

SQL vs. NoSQL

Image

Source

Benefits of NoSQL databases

NoSQL databases offer many benefits over relational databases. NoSQL databases have flexible data models, scale horizontally, have incredibly fast queries, and are easy for developers to work with.

  • Flexible data models

NoSQL databases typically have very flexible schemas. A flexible schema allows you to easily make changes to your database as requirements change. You can iterate quickly and continuously integrate new application features to provide value to your users faster.

  • Horizontal scaling

Most SQL databases require you to scale-up vertically (migrate to a larger, more expensive server) when you exceed the capacity requirements of your current server. Conversely, most NoSQL databases allow you to scale-out horizontally, meaning you can add cheaper, commodity servers whenever you need to.

  • Fast queries

Queries in NoSQL databases can be faster than SQL databases. Why? Data in SQL databases is typically normalized, so queries for a single object or entity require you to join data from multiple tables. As your tables grow in size, the joins can become expensive. However, data in NoSQL databases is typically stored in a way that is optimized for queries. The rule of thumb when you use MongoDB is Data that is accessed together should be stored together. Queries typically do not require joins, so the queries are very fast.

  • Easy for developers

Some NoSQL databases like MongoDB map their data structures to those of popular programming languages. This mapping allows developers to store their data in the same way that they use it in their application code. While it may seem like a trivial advantage, this mapping can allow developers to write less code, leading to faster development time and fewer bugs.

Source

Apache Cassandra

Cassandra is a NoSQL distributed database. By design, NoSQL databases are lightweight, open-source, non-relational, and largely distributed. Counted among their strengths are horizontal scalability, distributed architectures, and a flexible approach to schema definition.

NoSQL databases enable rapid, ad-hoc organization and analysis of extremely high-volume, disparate data types. That’s become more important in recent years, with the advent of Big Data and the need to rapidly scale databases in the cloud. Cassandra is among the NoSQL databases that have addressed the constraints of previous data management technologies, such as SQL databases.

Image

Source

ACID and CAP theorem

A clear explanation can be referenced here

Database Index

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

An index is a copy of selected columns of data, from a table, that is designed to enable very efficient search. An index normally includes a “key” or direct link to the original row of data from which it was copied, to allow the complete row to be retrieved efficiently. Some databases extend the power of indexing by letting developers create indexes on column values that have been transformed by functions or expressions. For example, an index could be created on upper(last_name), which would only store the upper-case versions of the last_name field in the index. Another option sometimes supported is the use of partial indices, where index entries are created only for those records that satisfy some conditional expression. A further aspect of flexibility is to permit indexing on user-defined functions, as well as expressions formed from an assortment of built-in functions.

Source

How are indexes created?

In a database, data is stored in rows which are organized into tables. Each row has a unique key which distinguishes it from all other rows and those keys are stored in an index for quick retrieval.

Since keys are stored in indexes, each time a new row with a unique key is added, the index is automatically updated. However, sometimes we need to be able to quickly lookup data that is not stored as a key. For example, we may need to quickly lookup customers by telephone number. It would not be a good idea to use a unique constraint because we can have multiple customers with the same phone number. In these cases, we can create our own indexes.

Source

Sharding

Sharding is a method of splitting and storing a single logical dataset in multiple databases. By distributing the data among multiple machines, a cluster of database systems can store larger dataset and handle additional requests. Sharding is necessary if a dataset is too large to be stored in a single database. Moreover, many sharding strategies allow additional machines to be added. Sharding allows a database cluster to scale along with its data and traffic growth.

Sharding is also referred as horizontal partitioning. The distinction of horizontal vs vertical comes from the traditional tabular view of a database. A database can be split vertically — storing different tables & columns in a separate database, or horizontally — storing rows of a same table in multiple database nodes.

Source1
Source2

Amazon DynamoDB

Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale. DynamoDB offers built-in security, continuous backups, automated multi-region replication, in-memory caching, and data export tools.

Source

SSTable

Sorted Strings Table (SSTable) is a persistent file format used by Scylla, Apache Cassandra, and other NoSQL databases to take the in-memory data stored in memtables, order it for fast access, and store it on disk in a persistent, ordered, immutable set of files. Immutable means SSTables are never modified. They are later merged into new SSTables or deleted as data is updated.

Image

Source1
Source2

This post contains basic explanations for concepts you should know related to distributed system.

API and REST API

Application Programming Interface, abbreviated as API, enables connection between computers or computer programs. It is a Software Interface that offers services to other software to enhance the required functionalities.

REST API is an API that follows a set of rules for an application and services to communicate with each other. As it is constrained to REST architecture, REST API is referred to as RESTful API. REST APIs provide a way of accessing web services in a flexible way without massive processing capabilities.

Image

Source

Below are the underlying rules of REST API:

  • Statelessness - Systems aligning with the REST paradigm are bound to become stateless. For Client-Server communication, stateless constraint enforces servers to remain unaware of the client state and vice-versa. A constraint is applied by using resources instead of commands, and they are nouns of the web that describe any object, document, or thing to store/send to other resources.
  • Cacheable - Cache helps servers to mitigate some constraints of statelessness. It is a critical factor that has improved the performance of modern web applications. Caching not only enhances the performance on the client-side but also scales significant results on the server-side. A well-established cache mechanism would drastically reduce the average response time of your server.
  • Decoupled - REST is a distributed approach, where client and server applications are decoupled from each other. Irrespective of where the requests are initiated, the only information the client application knows is the Uniform Resource Identifier (URI) of the requested resource. A server application should pass requested data via HTTP but should not try modifying the client application.
  • Layered - A Layered system makes a REST architecture scalable. With RESTful architecture, Client and Server applications are decoupled, so the calls and responses of REST APIs go through different layers. As REST API is layered, it should be designed such that neither Client nor Server identifies its communication with end applications or an intermediary.

Source

Concurrency

Concurrency allows different parts of a program to run at the same time without affecting the outcome. For example, two people tried to withdraw $1000 from the same bank account in which only has $1500. Concurrency ensures the two people can’t overdraw the account.

There are three common tactics to ensure concurrency.

  • Locking - A mechanism where a process has the right to update or write data. When a process acquires a lock, other processes can’t update or write.
  • Atomicity - An atomic action is an action whose intermediate state can’t be seen by other processes or threads.
  • Transaction - A sequence of atomic operations.

Message Queues

Queues are a component of service-based architectures. It accept client messages for delivery to a service, then hold the message until the service requests delivery.Once a queue has accepted a message, it provides a strong guarantee that the message will eventually be read and processed.Messages remain in the queue and available for delivery until the server confirms that it has finished with the message and deletes it.

Microservice Architecture

A microservice architecture – a variant of the service-oriented architecture (SOA) structural style – arranges an application as a collection of loosely-coupled services. In a microservices architecture, services are fine-grained and the protocols are lightweight. The goal is that teams can bring their services to life independent of others. Loose coupling reduces all types of dependencies and the complexities around it, as service developers do not need to care about the users of the service, they do not force their changes onto users of the service. Therefore it allows organizations developing software to grow fast, and big, as well as use off the shelf services easier. Communication requirements are less. But it comes at a cost to maintain the decoupling. Interfaces need to be designed carefully and treated as a public API. Techniques like having multiple interfaces on the same service, or multiple versions of the same service, to not break existing users code.

Source

Proxy vs. Reverse Proxy

A proxy server, sometimes referred to as a forward proxy, is a server that routes traffic between client(s) and another system, usually external to the network. By doing so, it can regulate traffic according to preset policies, convert and mask client IP addresses, enforce security protocols, and block unknown traffic.

A reverse proxy is a type of proxy server. Unlike a traditional proxy server, which is used to protect clients, a reverse proxy is used to protect servers. A reverse proxy is a server that accepts a request from a client, forwards the request to another one of many other servers, and returns the results from the server that actually processed the request to the client as if the proxy server had processed the request itself. The client only communicates directly with the reverse proxy server and it does not know that some other server actually processed its request.

Source

Horzontal vs. Vertical Scaling

Horizontal scaling (aka scaling out) refers to adding additional nodes or machines to your infrastructure to cope with new demands. If you are hosting an application on a server and find that it no longer has the capacity or capabilities to handle traffic, adding a server may be your solution.

While horizontal scaling refers to adding additional nodes, vertical scaling describes adding more power to your current machines. For instance, if your server requires more processing power, vertical scaling would mean upgrading the CPUs. You can also vertically scale the memory, storage, or network speed.

Source

Distributed Cache

Caching means saving frequently accessed data in-memory, that is in RAM instead of the hard drive. Accessing data from RAM is always faster than accessing it from the hard drive.

Image

Caching serves the below-stated purposes in web applications.

  1. It reduces application latency by notches. Simply, due to the fact that it has all the frequently accessed data stored in RAM, it doesn’t has to talk to the hard drive when the user requests for the data. This makes the application response times faster.
  2. It intercepts all the user data requests before they go to the database. This averts the database bottleneck issue. The database is hit with comparatively lesser number of requests eventually making the application as a whole performant.
  3. Caching often comes in really handy in bringing the application running costs down.

Caching can be leveraged at every layer of the web application architecture be it a database, CDN, DNS etc.

A distributed cache is a cache which has its data spread across several nodes in a cluster also across several clusters across several data centres located around the globe.

Image

Being deployed on multiple nodes helps with the horizontal scalability, instances can be added on the fly as per the demand.

Distributed caching is being primarily used in the industry today, for having the potential to scale on demand & being highly available.

Scalability, High Availability, Fault-tolerance are crucial to the large scale services running online today. Businesses cannot afford to have their services go offline. Think about health services, stock markets, military. They have no scope for going down. They are distributed across multiple nodes with a pretty solid amount of redundancy.

Distributed cache, & not just cache, distributed systems are the preferred choice for cloud computing. Solely due to the ability to scale & being available.

Google Cloud uses Memcache for caching data on its public cloud platform. Redis is used by internet giants for caching, NoSQL datastore & several other use cases.

Source1
Source2
Source3

Content Delivery Network(CDN)

A content delivery network (CDN) refers to a geographically distributed group of servers which work together to provide fast delivery of Internet content.

A CDN allows for the quick transfer of assets needed for loading Internet content including HTML pages, javascript files, stylesheets, images, and videos. The popularity of CDN services continues to grow, and today the majority of web traffic is served through CDNs, including traffic from major sites like Facebook, Netflix, and Amazon.

Source

Hadoop Distributed File System(HDFS)

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Source

Map Reduce

MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.

A MapReduce program is composed of a map procedure, which performs filtering and sorting (such as sorting students by first name into queues, one queue for each name), and a reduce method, which performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The “MapReduce System” (also called “infrastructure” or “framework”) orchestrates the processing by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance.

Source
Example

Apache Zookeeper

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

ZooKeeper aims at distilling the essence of these different services into a very simple interface to a centralized coordination service. The service itself is distributed and highly reliable. Consensus, group management, and presence protocols will be implemented by the service so that the applications do not need to implement them on their own. Application specific uses of these will consist of a mixture of specific components of Zoo Keeper and application specific conventions. ZooKeeper Recipes shows how this simple service can be used to build much more powerful abstractions.

Source

Apache Kafka

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Source

Read-write quorum

Read-write quorums define two configurable parameters, R and W. R is the minimum number of nodes that must participate in a read operation, and W the minimum number of nodes that must participate in a write operation.

Read-Write Quorum Systems Made Practical - Michael Whittaker, Aleksey Charapko, Joseph M. Hellerstein, Heidi Howard, Ion Stoica

This paper reviews some concepts of the quorum systems, and it presents a concrete tool named “Quoracle” that explores the trade-offs of the read-write quorum systems.

Quoracle provides an alternative to the majority quorum systems that are widely adopted in distributed systems. The majority quorum can be defined as

\frac{n}{2} where n=number of nodes

In the case of a read-write quorum systems the majority is represented in a similar way:

r = w = \frac{n}{2} + 1

where r and w are the read and write quorums.

Source

Gossip protocol

Gossip protocol is a communication protocol that allows state sharing in distributed systems. Most modern systems use this peer-to-peer protocol to disseminate information to all the members in a network or cluster.

This protocol is used in a decentralized system that does not have any central node to keep track of all nodes and know if a node is down or not.

Gathering state information by multicasting

So, how does a node know every other node’s current state in a decentralized distributed system?

The simplest way to do this is to have every node maintain heartbeats with every other node. Heartbeat is a periodic message sent to a central monitoring server or other servers in the system to show that it is alive and functioning. When a node goes down, it stops sending out heartbeats, and everyone else finds out immediately. But then O(N^2) messages get sent to every tick (N being the number of nodes), which is an expensive operation in any sizable cluster.

How the protocol works

The Gossip protocol is used to repair the problems caused by multicasting; it is a type of communication where a piece of information or gossip in this scenario, is sent from one or more nodes to a set of other nodes in a network. This is useful when a group of clients in the network require the same data at the same time. But there are many problems that occur during multicasting, if there are many nodes present at the recipient end, latency increases; the average time for a receiver to receive a multicast.

To get this multicast message or gossip across the desired targets in the group, the gossip protocol sends out the gossip periodically to random nodes in the network, once a random node receives the gossip, it is said to be infected due to the gossip. Now the random node that receives the gossip does the same thing as the sender, it sends multiple copies of the gossip to random targets. This process continues until the target nodes get the multicast. This process turns the infected nodes to uninfected nodes after sending the gossip out to random nodes.

Source1
Source2

Fan Out

Let’s try to understand how fan out apporach works based on the system design of twitter here.

Fanout simply means spreading the data from a single point. Let’s see how to use it. Whenever a tweet is made by a user (Followee) do a lot of preprocessing and distribute the data into different users (followers) home timelines. In this process, you won’t have to make any database queries. You just need to go to the cache by user_id and access the home timeline data in Redis. So this process will be much faster and easier because it is an in-memory we get the list of tweets. Here is the complete flow of this approach.

  • User X is followed by three people and this user has a cache called user timeline. X Tweeted something.
  • Through Load Balancer tweet will flow into back-end servers.
  • Server node will save tweet in DB/cache
  • Server node will fetch all the users that follow User X from the cache.
  • Server node will inject this tweet into in-memory timelines of his followers (fanout)
  • All followers of User X will see the tweet of User X in their timeline. It will be refreshed and updated every time a user will visit on his/her timeline.

Image

Source

GUID and UUID

GUID (aka UUID) is an acronym for ‘Globally Unique Identifier’ (or ‘Universally Unique Identifier’). It is a 128-bit integer number used to identify resources. The term GUID is generally used by developers working with Microsoft technologies, while UUID is used everywhere else.

Source1
Source2

It is always a good idea to estimate the scale of the system as it help reflect if the designed system could fulfill the functional requirements. The requirements might include:

  • Number of users
  • Number of active users(NAU)
  • Requrests per second(RPS)
  • Logins per seconds
  • Transactions per second(TPS) for E-commerce
  • Likes/dislikes per second, shares per second, comments per second for social media sites
  • Searches per second for sites with a search feature
  • Storage needed
  • Servers needed
  • Network bandwidth needed

To estimate hardware resource needed, we need to understand that there are four major resources in a computer system.

  • CPU
  • Memory
  • Storage
  • Network

Estimate servers needed

The modern computer system is a multi-processor system. It varies from single CPU core to multiple CPU cores. The following is a 32 CPU threads system.

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz

In order to estimate how many servers are needed, we can approach in the following manner.

  1. How much work can single CPU do?
  2. How much work can single server do?
  3. How many servers are needed?

Let’s have an example to go through this approach.

Let’s say it takes 100ms for a sinlge-core CPU system to handle single client request. It means the system can handle 10 requests per second. So, we can extrapolate a 32-core system can handle 320 requests per second. Let’s say we have to handle 320,000 requests per second(RPS). It means 1000 servers are needed.

Notice that this is a rough calculation only considering CPU needs. In real case, there might be other performance bottleneck to handle 320 requests per second in a system. For example, the system might be already I/O bound before running out of CPU bandwidth. But this method still gives us a estimation at the high level.

Estimate storage needed

To estimate storage needed, we can approach as below.

  1. Identify the different data types
  2. Estimate the space needed for each data type
  3. Get the total space needed

Let’s take YouTube as an example to understand this approach.

  • Data types: videos, thumbnail images and comments.
  • Let’s assume there are roughly 2B users and 5% users(100M users) upload videos consistently. On average, each user has a weekly upload(~50 videos per year). Roughly, 13M videos(100M*50/365) are uploaded daily. Let’s assume the video is 10 minutes long on average and it takes 50MB storage space after compression. Let’s say each video has a thumbnail image of 20KB. Each video has about 5 comments and the size of each comment is 200 bytes. In total, the space need for each video is 50MB + 20KB + 1KB, roughly 50MB. By multiplying 13M videos, it needs 619TB storage in a day.

Estimate network bandwidth needed

Determine the incoming and outgoing data for network bandwidth estimation

  • We already know there would be ~619TB data uploaded to YouTube in a day. Dividing this by the number of seconds in a day(619TB/86400 seconds), the incoming data to YouTube would be 7.3GB/s.
  • Let’s say 10% of YouTube users are daily active users. With approximately 200M daily users, let’s assume a user watches 10 videos a day. Then YouTube would have 2B views in a day. This would result in ~93PB outgoing data in a day. Dividing this by the number of seconds in a day(93PB/86400 seconds), the outgoing speed would be 1128GB/s.
0%