Redis Data Replication Simplified 101

on Data Replication, Redis • May 16th, 2022 • Write for Hevo

Redis Data Replication FI

Redis is a popular cache that employs an open-source in-memory database written entirely in C. Many people refer to Redis as a data structure because its main data types, such as lists, strings, and sets, are quite comparable to those found in other programming languages.

In this article, you will learn about Redis, its key features, Data Replication, and Redis Data Replication Process.

Table of Contents

What is Redis?

Redis Data Replication: redis logo
Image Source

Redis is a fast in-memory database and cache that is built in C and tuned for speed. Redis stands for “Remote Dictionary Server” and it is open source under the BSD license.

Its main data types are similar to those used in programming languages, such as strings, lists, dictionaries (or hashes), sets, and sorted sets. Hence Redis is often referred to as a data structure server. Many other data structures and capabilities for approximate counting, geolocation, and stream processing are also available.

Redis’ numerous data structures are very similar to the native data structures programmers typically utilize inside applications and algorithms of the NoSQL databases. Because essential data structures are easily shared between processes and services, it’s suitable for rapid development and applications.

Key Features of Redis

  • Performance: All Redis data is stored in memory, allowing for fast data access with low latency. In-memory data storage, unlike traditional databases, does not require a trip to the disc, decreasing engine latency to microseconds. As a result, in-memory data stores can handle orders of magnitude more operations and provide significantly faster reaction times. The result is the blazing-fast performance with average read and writes operations taking less than a millisecond and support for millions of operations per second.
  • Simplicity and Usability: Redis makes it possible to create typically difficult code in fewer and simpler lines. Redis allows you to store, read, and utilize data in your applications with fewer lines of code. The distinction is that Redis developers can utilize a simple command structure instead of typical database query languages.
  • Persistence and Replication: Redis uses a primary-replica architecture that allows for asynchronous replication of data across many replica servers. This improves read performance (since queries are distributed among the servers) and allows for faster recovery when the primary server goes down. Redis allows point-in-time backups for persistence (copying the Redis data set to disk).
  • Scalability and High Availability: Redis supports either a single node primary or a clustered primary-replica architecture. This enables you to create highly available solutions with predictable performance and dependability. When you need to change the size of your cluster, you have the option to scale up, scale in, or scale-out. This allows your cluster to scale to meet your needs.
  • Source Code: Redis is an open-source project with a thriving community, which includes AWS. Because Redis is built on open standards, supports open data formats, and has a diverse collection of clients, there is no vendor or technological lock-in.

What is Data Replication?

Redis Data Replication: replication
Image Source

The process of storing data in multiple sites or nodes is known as Data Replication. It is beneficial in terms of increasing data availability. It simply involves copying data from a database from one server to another so that all users have access to the same information. The result is a distributed database in which users can access data relevant to their tasks without interfering with the work of others.

Data Replication entails the continuous duplicating of transactions so that the duplicate is always up to date and synced with the source.

However, data is available in multiple locations in data replication, but a specific relation must reside in only one.

Full replication, in which the entire database is stored at each site, is possible. Partial replication is also possible, in which some frequently used database fragments are replicated while others are not.

Types of Data Replication

  • Transactional Replication: Transactional replication provides users with full initial copies of the database, as well as updates when data changes. Transactional consistency is assured because data is replicated in real-time from the publisher to the receiving database (subscriber) in the same order as it occurs with the publisher. In server-to-server contexts, transactional replication is commonly utilized. It doesn’t just duplicate the data changes; it repeats each one consistently and properly.
  • Snapshot Replication: Snapshot replication delivers data precisely as it appears at a given point in time, without checking for updates. Users receive the whole snapshot, which is generated and given to them. When data changes are rare, snapshot replication is commonly utilized. It is a little slower than transactional since it moves many records from one end to the other on each try. Snapshot replication is a useful approach to synchronize the publisher and subscriber for the first time.
  • Merge Replication: combines data from multiple databases into a single database. Merge replication is the most difficult since it allows both the publisher and the subscriber to make changes to the database independently. In server-to-client contexts, merge replication is commonly employed. It allows one publisher to send modifications to several subscribers.

Replicate Data in Minutes Using Hevo’s No-Code Data Pipeline

Hevo Data, a Fully-managed Data Pipeline solution, can help you automate, simplify & enrich your aggregation process in a few clicks. With Hevo’s out-of-the-box connectors and blazing-fast Data Pipelines, you can extract & aggregate data from 100+ Data Sources straight into your Data Warehouse, Database, or any destination. To further streamline and prepare your data for analysis, you can process and enrich Raw Granular Data using Hevo’s robust & built-in Transformation Layer without writing a single line of code!”

GET STARTED WITH HEVO FOR FREE

Hevo is the fastest, easiest, and most reliable data replication platform that will save your engineering bandwidth and time multifold.

Experience an entirely automated hassle-free Data Aggregation. Try our 14-day free trial today!

Features of Redis Data Replication

Here are some features of Redis Data Replication:

  • Asynchronous replication is used by Redis, with asynchronous replica-to-master acknowledgments of data processed.
  • Multiple replicas are possible for a master.
  • Connections from other replicas can be accepted by replicas. Aside from linking several replicas to the same master, replicas can also be linked to each other in a cascading pattern. Since Redis 4.0, the master sends the identical Redis data replication stream to all sub-replicas.
  • On the master side, Redis data replication is non-blocking. When one or more replicas execute the initial synchronization or partial resynchronization, the master will continue to handle queries.
  • Redis Data Replication can be used for scalabilities, such as having several replicas for read-only queries (slow O(N) operations can be offloaded to replicas) or just to improve data safety and availability.
  • You can use Redis data replication to avoid the cost of the master writing the entire dataset to disc: a standard strategy involves configuring your master redis.conf to avoid persisting at all, then connecting a replica configured to save periodically or with AOF enabled. This setup, however, must be treated with caution because a restarted master will start with an empty dataset, and if the replica attempts to sync with it, the replica will be emptied as well.

Redis Data Replication Process

The Redis Data Replication Process will be explained below:

Redis Data Replication: replication process
Image Source

How Redis Data Replication Works?

Every Redis master has a Redis data replication ID, which is a long pseudo-random string that identifies a certain dataset tale. Every byte of replication stream that is produced and transferred to replicas increments an offset, which is used to update the state of the replicas with the new modifications changing the dataset. Even if no replica is connected, the Redis data replication offset is incremented, thus basically every pair of:

Replication ID, Offset

Which determines the exact version of a master’s dataset.

When replicas connect to masters, they send their previous master replication ID and the offsets they’ve processed so far using the PSYNC command. This allows the master to deliver only the incremental parts that are required. If there is an insufficient backlog in the master buffers, or if the replica is referring to a history (replication ID) that is no longer valid, a full resynchronization occurs: the replica will receive a fresh copy of the dataset.

What is Redis Data Replication ID?

As stated in the last section that two instances with the same replication ID and replication offset have identical data. However, it is important to understand what the replication ID is and why instances have two replication IDs, the primary and secondary.

A replication ID identifies a certain data set’s history. A unique replication ID is generated for this instance every time it starts over as a master or a replica is promoted to master. After the handshake, replicas connected to a master will inherit its replication ID. So two instances with the same ID are linked since they each have the same data but at a different time. For a given history (replication ID) that has the most current data set, it is the offset that acts as a logical time to grasp.

For example, if two instances A and B both have the same replication ID, but one has offset 1000 and the other has offset 1023, the first is missing some commands applied to the data set. It also implies that A can achieve the same state as B with just a few commands.

Diskless Redis Data Replication

In Redis Data Replication a full resynchronization often entails generating an RDB (Retrospect Disk Backup) file on a disc, then reloading that RDB from the disc to supply the replicas with data.

This can be a highly stressful procedure for the master if the discs are slow. The first version of Redis to offer diskless replication was version 2.8.18. In this configuration, the child process delivers the RDB straight over the wire to replicas, bypassing the disc.

Redis Data Replication Configuration

Basic Redis Data Replication is simple to set up. Simply add this line to the replica configuration file:

replicaof 192.168.1.1 6379

You must substitute your master IP address (or hostname) and port with 192.168.1.1 6379. You may also use the REPLICAOF command to have the master host start a sync with the replica.

A few settings are also available for customizing the replication backlog held in memory by the master during partial resynchronization. For further details, see the sample redis.conf included with the Redis distribution.

The repl-diskless-sync configuration setting enables diskless replication. The repl-diskless-sync-delay parameter controls the time it takes to start the transfer while waiting for more replicas to arrive after the first one.

Read-only Replica

In Redis Data Replication replicas have had a read-only mode enabled by default since Redis 2.6. The replica-read-only option in the redis.conf file controls this behavior, which may be enabled and disabled at runtime with CONFIG SET.

All write commands in Redis Data Replication will be rejected by read-only replicas, making it impossible to write to a replica by accident. Because administrative commands like DEBUG and CONFIG are still available, this does not imply that the functionality is meant to expose a replica instance to the internet or, more broadly, to a network with untrusted clients. 

Simplify your Data Analysis with Hevo’s No-code Data Pipeline

Data Analysis can be a mammoth task without the right set of tools. Hevo’s automated platform empowers you with everything you need to have for a smooth Data Replication experience. Our platform has the following in store for you!

  • Exceptional Security: A Fault-tolerant Architecture that ensures Zero Data Loss.
  • Built to Scale: Exceptional Horizontal Scalability with Minimal Latency for Modern-data Needs.
  • Built-in Connectors: Support for 100+ Custom Data Sources, including Databases, SaaS Platforms, Native Webhooks, REST APIs, Files & More. 
  • Data Transformations: Best-in-class & flexible Native Support for Complex Code and No-code Data Transformation at the fingertips of everyone.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Quick Setup: Hevo with its automated features, can be set up in minimal time. Moreover, with its simple and interactive UI, it is extremely easy for new customers to work on and perform operations.
  • Auto Schema Mapping: Hevo takes away the tedious task of schema management & automatically detects the format of incoming data and replicates it to the destination schema. You can also choose between Full & Incremental Mappings to suit your Redis Data Replication requirements.

Simplify your Data Analysis with Hevo today! SIGN UP HERE FOR A 14-DAY FREE TRIAL!

Setting a Replica to Authenticate a Master

It’s simple to configure the replica to utilize the master’s password in all sync operations if the master has one through require pass in Redis Data Replication.

Use redis-cli and type the below to do it on a running instance:

config set masterauth <password>

Add the following to your config file to make it permanent:

masterauth <password>

Allow Writes only with N Attached Replicas

You can configure a Redis master to accept write queries only if at least N replicas are actively connected to the master starting with Redis 2.8 in Redis Data Replication.

However, because Redis uses asynchronous replication, there is no way to know whether a replica received a given write, therefore data loss is always a possibility.

How it works is explained below:

  • In Redis Data Replication every second, Redis replicas ping the master, confirming the amount of replication stream processed.
  • The last time Redis received a ping from each replica will be remembered by the masters.
  • The user can choose a minimum number of copies with a latency of no more than a certain number of seconds.
  • The write will be allowed if there are at least N clones with a lag of less than M seconds.

Dealing with Expires on Keys

In Redis Data Replication, Redis expires keys, giving them a finite lifespan (TTL: Time To Live). This capability is dependent on an instance’s ability to measure the seconds, although Redis replicas correctly replicate keys with expires, even when they are changed using Lua scripts.

To implement such a feature, Redis cannot rely on the master and replica clocks being in sync, as this is an unsolvable problem that would result in race situations and divergent data sets, hence Redis uses three main strategies to make expired key replication work:

  • Replicas do not have keys that expire; instead, they wait for masters to do so. When a master expires a key (or evicts it due to LRU (Line Replaceable Unit)), it creates a DEL command that is sent to all replicas.
  • In Redis Data Replication, due to master-driven expiration, replicas may have keys in memory that have already logically expired because the master was unable to issue the DEL command in time. To deal with this, the replica uses its logical clock to report that a key does not exist for reading operations that do not break the data set’s consistency (as new commands from the master will arrive). This prevents replicas from reporting keys that have logically expired. In practice, a replica-based HTML fragments cache will avoid retrieving items that are already older than the specified time to live.
  • During the execution of Lua scripts, no key expirations are done. When a Lua script starts, the time in the master is theoretically frozen, so a given key will either exist or not exist for the duration of the script. This avoids keys from expiring in the middle of a script, and it’s required to send the same script to the replica in a way that ensures the data set’s effects are the same.

Once a replica is promoted to a master, it will begin to expire keys on its own, without the assistance of its previous master.

INFO and ROLE Command

There are two Redis commands that provide a wealth of information about the master and replica instances’ current replication parameters. The first is INFO. Only information relevant to the replication is presented when the command is run with the replication option as INFO replication. ROLE is a more computer-friendly command that displays the replication status of masters and replicas, as well as replication offsets, a list of linked replicas, and so on.

Partial Sync After Restarts and Failover

When an instance is promoted to master following a failover in Redis 4.0, it can still execute a partial resynchronization with the old master’s replicas. To do so, the replica remembers its previous master’s old replication ID and offset, allowing it to share some of the backlog with connecting replicas even if they request the old replication ID.

However, because the promoted replica has a distinct data set history, its new Redis Data Replication ID will be different. For example, the master might become online again and accept writes for a while, therefore using the same Redis data replication ID in the promoted replica would be a violation of the rule that a Redis data replication ID and offset pair identifies only one data set.

Conclusion

In this article, you learned about the Redis Data Replication process and all the important features.

However, as a Developer, extracting complex data from a diverse set of data sources like Databases, CRMs, Project management Tools, Streaming Services, and Marketing Platforms to your MariaDB or MongoDB Database can seem to be quite challenging. If you are from non-technical background or are new in the game of data warehouse and analytics, Hevo Data can help!

Visit our Website to Explore Hevo

Hevo Data will automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc. This platform allows you to transfer data from 100+ multiple sources to Cloud-based Data Warehouses like Snowflake, Google BigQuery, Amazon Redshift, etc. It will provide you with a hassle-free experience and make your work life much easier.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!

No-Code Data Pipeline for Your Data Warehouse