Redis is a popular cache that employs an open-source in-memory database written entirely in C. Many people refer to Redis as a data structure because its main data types, such as lists, strings, and sets, are quite comparable to those found in other programming languages.

In this article, you will learn about Redis, its key features, Data Replication, and Redis Data Replication Process.

What is Redis?

Redis Data Replication: redis logo

Redis is a fast in-memory database and cache that is built in C and tuned for speed. Redis stands for “Remote Dictionary Server” and it is open source under the BSD license.

Its main data types are similar to those used in programming languages, such as strings, lists, dictionaries (or hashes), sets, and sorted sets.

Hence Redis is often referred to as a data structure server.

Many other data structures and capabilities for approximate counting, geolocation, and stream processing are also available.

Redis’ numerous data structures are very similar to the native data structures programmers typically utilize inside applications and algorithms of the NoSQL databases.

Because essential data structures are easily shared between processes and services, it’s suitable for rapid development and applications.

Key Features of Redis

  • Performance: Redis stores data in memory for ultra-fast access, handling millions of operations per second with read/write operations taking less than a millisecond.
  • Simplicity and Usability: Simplifies code with fewer lines, utilizing a straightforward command structure instead of complex query languages.
  • Persistence and Replication: Uses primary-replica architecture for asynchronous data replication, allowing for faster recovery and point-in-time backups to disk.
  • Scalability and High Availability: Supports single or clustered architectures for high availability, with flexible scaling options to meet evolving needs.
  • Source Code: Open-source with a vibrant community, avoiding vendor lock-in by adhering to open standards and formats.

What is Data Replication?

The process of storing data in multiple sites or nodes is known as Data Replication. It is beneficial in terms of increasing data availability.

It simply involves copying data from a database from one server to another so that all users have access to the same information. The result is a distributed database in which users can access data relevant to their tasks without interfering with the work of others.

Data Replication entails the continuous duplicating of transactions so that the duplicate is always up to date and synced with the source.

However, data is available in multiple locations in data replication, but a specific relation must reside in only one.

Full replication, in which the entire database is stored at each site, is possible. Partial replication is also possible, in which some frequently used database fragments are replicated while others are not.

Types of Data Replication

  • Transactional Replication: Ensures real-time, consistent data replication from publisher to subscriber, commonly used for server-to-server contexts.
  • Snapshot Replication: Delivers data exactly as it appears at a specific point in time, suitable for infrequent data changes and initial synchronization.
  • Merge Replication: Allows independent changes from both publisher and subscriber, commonly used for server-to-client contexts to synchronize multiple databases.

Features of Redis Data Replication

Here are some features of Redis Data Replication:

  • Asynchronous replication is used by Redis, with asynchronous replica-to-master acknowledgments of data processed.
  • Multiple replicas are possible for a master.
  • Connections from other replicas can be accepted by replicas. Aside from linking several replicas to the same master, replicas can also be linked to each other in a cascading pattern. Since Redis 4.0, the master sends the identical Redis data replication stream to all sub-replicas.
  • On the master side, Redis data replication is non-blocking. When one or more replicas execute the initial synchronization or partial resynchronization, the master will continue to handle queries.
  • Redis Data Replication can be used for scalabilities, such as having several replicas for read-only queries (slow O(N) operations can be offloaded to replicas) or just to improve data safety and availability.
  • You can use Redis data replication to avoid the cost of the master writing the entire dataset to disc: a standard strategy involves configuring your master redis.conf to avoid persisting at all, then connecting a replica configured to save periodically or with AOF enabled.
  • This setup, however, must be treated with caution because a restarted master will start with an empty dataset, and if the replica attempts to sync with it, the replica will be emptied as well.

Redis Data Replication Process

How Redis Data Replication Works?

Every Redis master has a Redis data replication ID, which is a long pseudo-random string that identifies a certain dataset tale.

Every byte of the replication stream that is produced and transferred to replicas increments an offset, which is used to update the state of the replicas with the new modifications changing the dataset.

Even if no replica is connected, the Redis data replication offset is incremented, thus basically every pair of:

Replication ID, Offset

Which determines the exact version of a master’s dataset.

When replicas connect to masters, they send their previous master replication ID and the offsets they’ve processed so far using the PSYNC command.

This allows the master to deliver only the incremental parts that are required.

If there is an insufficient backlog in the master buffers, or if the replica is referring to a history (replication ID) that is no longer valid, a full resynchronization occurs: the replica will receive a fresh copy of the dataset.

What is Redis Data Replication ID?

As stated in the last section that two instances with the same replication ID and replication offset have identical data.

However, it is important to understand what the replication ID is and why instances have two replication IDs, the primary and secondary.

  • A replication ID identifies a certain data set’s history. A unique replication ID is generated for this instance every time it starts over as a master or a replica is promoted to master.
  • After the handshake, replicas connected to a master will inherit its replication ID. So two instances with the same ID are linked since they each have the same data but at a different time.
  • For a given history (replication ID) that has the most current data set, it is the offset that acts as a logical time to grasp.

For example, if two instances A and B both have the same replication ID, but one has offset 1000 and the other has offset 1023, the first is missing some commands applied to the data set.

It also implies that A can achieve the same state as B with just a few commands.

Diskless Redis Data Replication

In Redis Data Replication a full resynchronization often entails generating an RDB (Retrospect Disk Backup) file on a disc, then reloading that RDB from the disc to supply the replicas with data.

This can be a highly stressful procedure for the master if the discs are slow. The first version of Redis to offer diskless replication was version 2.8.18.

In this configuration, the child process delivers the RDB straight over the wire to replicas, bypassing the disc.

Redis Data Replication Configuration

Basic Redis Data Replication is simple to set up. Simply add this line to the replica configuration file:

replicaof 192.168.1.1 6379

You must substitute your master IP address (or hostname) and port with 192.168.1.1 6379. You may also use the REPLICAOF command to have the master host start a sync with the replica.

A few settings are also available for customizing the replication backlog held in memory by the master during partial resynchronization.

For further details, see the sample redis.conf included with the Redis distribution.

The repl-diskless-sync configuration setting enables diskless replication.

The repl-diskless-sync-delay parameter controls the time it takes to start the transfer while waiting for more replicas to arrive after the first one.

Read-only Replica

In Redis Data Replication replicas have had a read-only mode enabled by default since Redis 2.6.

The replica-read-only option in the redis.conf file controls this behavior, which may be enabled and disabled at runtime with CONFIG SET.

All write commands in Redis Data Replication will be rejected by read-only replicas, making it impossible to write to a replica by accident.

Because administrative commands like DEBUG and CONFIG are still available, this does not imply that the functionality is meant to expose a replica instance to the internet or, more broadly, to a network with untrusted clients. 

Setting a Replica to Authenticate a Master

It’s simple to configure the replica to utilize the master’s password in all sync operations if the master has one through ‘requirepass‘ in Redis Data Replication.

Use redis-cli and type the below to do it on a running instance:

config set masterauth <password>

Add the following to your config file to make it permanent:

masterauth <password>

Allow Writes only with N Attached Replicas

You can configure a Redis master to accept write queries only if at least N replicas are actively connected to the master starting with Redis 2.8 in Redis Data Replication.

However, because Redis uses asynchronous replication, there is no way to know whether a replica received a given write, therefore data loss is always a possibility.

How it works is explained below:

  • In Redis Data Replication every second, Redis replicas ping the master, confirming the amount of replication stream processed.
  • The last time Redis received a ping from each replica will be remembered by the masters.
  • The user can choose a minimum number of copies with a latency of no more than a certain number of seconds.
  • The write will be allowed if there are at least N clones with a lag of less than M seconds.

Dealing with Expires on Keys

In Redis Data Replication, Redis expires keys, giving them a finite lifespan (TTL: Time To Live). This capability is dependent on an instance’s ability to measure the seconds, although Redis replicas correctly replicate keys with expires, even when they are changed using Lua scripts.

To implement such a feature, Redis cannot rely on the master and replica clocks being in sync, as this is an unsolvable problem that would result in race situations and divergent data sets, hence Redis uses three main strategies to make expired key replication work:

  • Replicas do not have keys that expire; instead, they wait for masters to do so. When a master expires a key (or evicts it due to LRU (Line Replaceable Unit)), it creates a DEL command that is sent to all replicas.
  • In Redis Data Replication, due to master-driven expiration, replicas may have keys in memory that have already logically expired because the master was unable to issue the DEL command in time.
  • To deal with this, the replica uses its logical clock to report that a key does not exist for reading operations that do not break the data set’s consistency (as new commands from the master will arrive). This prevents replicas from reporting keys that have logically expired.
  • In practice, a replica-based HTML fragments cache will avoid retrieving items that are already older than the specified time to live.
  • During the execution of Lua scripts, no key expirations are done. When a Lua script starts, the time in the master is theoretically frozen, so a given key will either exist or not exist for the duration of the script. This avoids keys from expiring in the middle of a script, and it’s required to send the same script to the replica in a way that ensures the data set’s effects are the same.

Once a replica is promoted to a master, it will begin to expire keys on its own, without the assistance of its previous master.

INFO and ROLE Command

  • There are two Redis commands that provide a wealth of information about the master and replica instances’ current replication parameters: INFO and ROLE.
  • Only information relevant to the replication is presented when the command is run with the replication option as INFO replication.
  • ROLE is a more computer-friendly command that displays the replication status of masters and replicas, as well as replication offsets, a list of linked replicas, and so on.

Partial Sync After Restarts and Failover

  • When an instance is promoted to master following a failover in Redis 4.0, it can still execute a partial resynchronization with the old master’s replicas.
  • To do so, the replica remembers its previous master’s old replication ID and offset, allowing it to share some of the backlogs with connecting replicas even if they request the old replication ID.
  • However, because the promoted replica has a distinct data set history, its new Redis Data Replication ID will be different.
  • For example, the master might become online again and accept writes for a while, therefore using the same Redis data replication ID in the promoted replica would be a violation of the rule that a Redis data replication ID and offset pair identifies only one data set.

Read more about Best Open Source Data Replication.

Conclusion

In this article, you learned about the Redis Data Replication process and all its important features.

However, as a Developer, extracting complex data from a diverse set of data sources like Databases, CRMs, Project management Tools, Streaming Services, and Marketing Platforms to your MariaDB or MongoDB Database can seem to be quite challenging. If you are from non-technical background or are new in the game of data warehouse and analytics, Hevo Data can help!

Want to take Hevo for a spin? Sign Up or a 14-day free trial and experience the feature-rich Hevo suite firsthand. Also checkout our unbeatable pricing to choose the best plan for your organization.

Sharon Rithika
Content Writer, Hevo Data

Sharon is a data science enthusiast with a hands-on approach to data integration and infrastructure. She leverages her technical background in computer science and her experience as a Marketing Content Analyst at Hevo Data to create informative content that bridges the gap between technical concepts and practical applications. Sharon's passion lies in using data to solve real-world problems and empower others with data literacy.

No-Code Data Pipeline for Your Data Warehouse