Postgres Replication Types: 3 Comprehensive Methods

By: Published: June 22, 2020

Postgresql Replication Types - FI

The advent of the internet coincided with the rise of distributed systems. More services now have to support a massive number of users. For many such services, it is no longer feasible to store data on a single server no matter how large the server. Data now has to be distributed not only to cater to the increased workload but to make sure that production systems do not have a single point of failure. It is within this narrative that the need to have a replication strategy for databases becomes paramount.

In this blog post, you are not only going to learn the Postgres replication types but also about the theoretical underpinnings of replication, the reasons for replication and you will also be exposed to hands-on replication examples using the PostgreSQL database engine.  This article assumes that the reader is familiar with the concept of a database and how transactions are applied. The reader will also need access to a simple text editor as you will be editing some configuration files.

This article covers the following aspects:

What is PostgreSQL?

PostgreSQL is an open-source relational database management system (RDBMS) that prides itself on extensibility and strict SQL compliance. It is widely used in production systems and it has been under active development since its initial release in 1996. PostgreSQL can be used as the database layer for web applications. It can also be used as a data warehouse or for analytics/business intelligence applications.

What is Replication?

The generic definition of replication is the act of copying or reproducing an entity such that there exist multiple copies. The concept of replication in databases is closely related to the standard definition. Database replication is the process of sharing information between two or more databases.  In layman’s terms, it is the act of copying data from one database instance to another.

Replication could occur between databases hosted on the same physical machine or across the network. The database which acts as the active node on which changes initially occur is known as the main/master/primary/publisher while the other databases that receive the changes subsequently (in effect, copies the data), are known as the standby/slave/secondary/subscriber. Throughout the rest of this article, for the purpose of consistency, the Main-Standby nomenclature will be used.

There are multiple ways to classify replication depending on perspective, so in the next few sections, you will go through each of those.

Simplify PostgreSQL Replication Using Hevo’s No-Code Data Pipeline

Postgres Replication Types: PostgreSQL Streaming Replication

In Postgres Replication Types: Streaming replication allows the standby databases to be as close as possible to the main database because it uses the Write-Ahead Logs (WAL) records which stream changes to the standbys in binary format as it is generated instead of waiting for the WAL file to be filled.

Postgres Replication Types: PostgreSQL Replication Using Hevo Data

The second method in Postgres Replication Types is Hevo Data. Hevo Data, an Automated No-code Data Pipeline provides you with a hassle-free solution to perform the PostgreSQL Replication with an easy-to-use no-code interface. Hevo is fully managed and completely automates the process of not only replicating data from PostgreSQL but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code.

Hevo’s fault-tolerant Data Pipeline offers a faster way to move your data from PostgreSQL and 100+ other data sources(including 40+ free data sources) like PostgreSQL into Data Warehouses, Databases, BI Tools, or PostgreSQL, or any other destination of your choice. Hevo will take full charge of the data replication process, allowing you to focus on key business activities.

Postgres Replication Types: PostgreSQL Logical Replication

Postgres Replication Types: Logical replication in PostgreSQL is achieved by the construction of a stream of logical data modifications from the WAL file. It doesn’t require a particular database to be designated as the main database but supports multi-directional data flow.

GET STARTED WITH HEVO FOR FREE

Synchronous Versus Asynchronous Replication

In synchronous replication under Postgres Replication Types, when the main node receives a write transaction, it propagates that change to all standby nodes and receives a confirmation before the transaction is committed or marked as completed. This way the data is identical across all nodes and is said to be consistent. Whereas, in asynchronous replication, when the main node receives a transaction, it goes ahead to commit the transaction before propagating that change to standby nodes. It does so with an expectation that the changes propagated to standby nodes will be committed but it does not wait for a confirmation.

This means that there will always be a lag between the main node and standbys. It is also important to note that there is a brief window within which data loss could occur if the main node fails before it can propagate changes to the standbys unless a special failure mechanism is implemented to guarantee eventual consistency, however, the very nature of asynchronous replication means that the possibility of data loss in such a scenario cannot be completely ruled out.

Theoretically, synchronous replication has the advantage of being consistent but in practice, it is usually expensive to implement and in many cases outright infeasible. It is for this reason that most database replication processes adopt the asynchronous approach.

Synchronous/Asynchronous Setup

On the primary server, synchronous standby names in postgresql.conf is used to set up the synchronous setup. You can specify the servers to be synced and the order of priority for COMMIT if there are several standby servers. Asynchronous standby servers will be used if this parameter is not supplied.

‘synchronous standby names = FIRST 2 (s1, s2, s3)’, for example.

You can optionally define how synchronous standby servers are selected from a list of servers, such as ‘FIRST n (list)’ or ‘ANY n (list)’. The FIRST keyword indicates that the first n standby servers in the list will be synchronous, and that transaction commitments will be delayed until the first n standby servers’ WAL records are replicated. The ANY keyword indicates that transaction commitments should be delayed until their WAL records have been replicated to at least n standby servers in the list.

Setting synchronous standby names to ‘FIRST 2 (s1, s2, s3)’ in an environment with standby servers s1, s2, s3, and s4 will setup s1 and s2 for Postgres Replication Types: synchronous replication and s3 and s4 for asynchronous replication. Before committing, the primary server will wait for s1 and s2 to finish processing. In the event that s1 or s2 fail, s3 switches to synchronous replication.

Make that the name set in synchronous standby names on the standby server matches the name set in application name of primary conninfo in postgresql.conf. This parameter will be addressed in greater detail later in this text.

Setting the Synchronization Level

Set synchronous commit in postgresql.conf on the primary server to set the synchronisation level of the standby server. The values that can be set and their overview are described in this section. The numbers in the table’s ‘Guaranteed range’ column correspond to the numbers in the diagram just below it.

Sync levelSet valueOverviewGuaranteed range
Full synchronousremote_applyCommits are not made until the WAL has been replicated to the standby servers and the new data has been made available to read-only queries. It is highly suited for load balancing of read-only workloads that demand data freshness because the data is guaranteed to be in sync.1 to 9
Synchronouson (default)Commits wait until after the WAL is replicated to the standby servers. This setting achieves the best balance between performance and reliability.1 to 6
Semi-synchronousremote_writeCommits aren’t made until the WAL has been replicated to the backup servers. This configuration gives the optimal performance and reliability balance.1 to 5
AsynchronouslocalCommits aren’t made until the primary server’s WAL write is finished.1 to 2
AsynchronousoffCommits do not wait for the primary server’s WAL write to finish. This is not a good setting to use.1

Differences Between Synchronous and Asynchronous

The distinction between Postgres Replication Types: synchronous and asynchronous replication is whether or not the primary server waits for a response from the standby server before proceeding with the processing. Choose the configuration that best suits your needs, since it has an impact on SQL processing response time and high availability.

Postgres Replication Types: Synchronous Replication
  • Before completing a procedure, the primary server waits for a response from the standby server.
  • As a result, the log shipping time is included in the overall response time.
  • Because WAL shipping to the standby server is not delayed, the standby server’s data freshness (reliability) is improved.
  • Suitable for read-only load balancing and failover procedures.
Postgres Replication Types: Asynchronous Replication (default)
  • The primary server completes a task without waiting for the standby server to respond.
  • As a result, the overall response time is similar to when no streaming replication is performed.
  • Due to the asynchronous nature of WAL shipping and its application (data update) on the standby server, the updated result on the primary server may not be available immediately on the standby server.
  • Data loss may occur depending on the time of the failover.
  • Suitable for disaster recovery replication to remote locations.

Single Master Versus Multi-Master Replication

Another classification of Postgres Replication Types is the single master versus multi-master replication. In a single master/main setup, there is only one active node that can receive write and read transactions. All other nodes can only perform read operations. The main is the only node on which write operations can be carried out, it then propagates the transaction to other nodes.

On the other hand, in a multi-master replication under Postgres Replication Types, more than one node can receive write/read transactions. Even though this means writing can be carried out faster, it introduces a level of complexity to the setup. Conflicts between different database nodes must be handled gracefully because writes can occur anywhere and therefore the state of the data needs to be synchronized.

Logical Versus Physical Replication

The core idea of physical replication in Postgres Replication Types is that it tries to preserve the state of the physical system as much as possible to encode changes that take place at the database level. Therefore, physical replication deals with binary format while logical replication is only concerned about copying the data to make sure that it is identical to the change that occurred, without being biased about how that is achieved. Logical replication is preferred when moving between database versions, even if it is slightly more complicated to set up when compared to physical replication.

Why is Replication Important?

There are three main reasons for which databases are replicated, namely – high availability, load balancing, and disaster recovery. High availability refers to the ability to have an up-to-date copy of your database at all times. This means that in the event of a failure of your main database, the standby copy can be promoted to main and you can start receiving traffic.

Load balancing refers to the practice of distributing incoming requests to your application in a way that is balanced so that no particular database experiences an uneven workload. With replication, this is possible since multiple copies of the data exist at any point in time.

The final reason for replication is the need for effective disaster recovery in the event of a systemic failure.

What are the different Postgres Replication Types?

PostgreSQL supports multiple Postgres Replication Types out of the box. It is also extensible with third-party solutions that are geared towards specific use cases as there isn’t a one size fits all solution that can serve as a magic bullet for all use cases. There will always be tradeoffs that will depend on the business need you are trying to solve. For the rest of this article, you will be introduced to two such techniques for Postgres Replication Types: streaming replication and logical replication.

Postgres Replication Types: PostgreSQL Streaming Replication

The first type in Postgres Replication Types is Streaming Replication. Streaming replication is especially useful in setting up a high availability system as failover can easily be performed because the database clusters are identical. Below is a walkthrough of the procedure required to set it up.

You must create a user in the main database with a replication role that will be used by standby databases to access the main database. To do this, launch the psql terminal-based frontend for PostgreSQL  and issue the following command:

CREATE USER your_user REPLICATION;

Next, add the user you just created to the pg_hba.conf as shown below:

# TYPE  DATABASE        USER         ADDRESS               METHOD
  host  replication     your_user    <ip address range>    trust

You then have to take a backup of the standby PostgreSQL cluster. To do that use the PostgreSQL utility pg_basebackup and issue the following command from your regular terminal:

pg_basebackup -h <primary-ip> -U your_user -D ~/standby

Go over to the postgresql.conf configuration file and make changes as  shown below:

wal_level=replica
max_wal_senders = 5

The wal_level should be set to replica and max_wal_senders should be the number of standby connections to allow, in our example it is 5.

Finally, on each standby, create a recovery.conf and populate it with the following entries. On starting the standbys nodes after making the changes below, they will connect to the main database and begin replicating its data via streaming:

standby_mode=on
primary_conninfo='user=your_user host=<primary-ip> port=<primary-port>'

Postgres Replication Types: PostgreSQL Replication Using Hevo Data

Postgres Replication Types: hevo method | Hevo Data
Image source

The second type in Postgres Replication types is Hevo Data. Hevo Data, an Automated No Code Data Pipeline, helps you replicate data from PostgreSQL to Data Warehouses, Business Intelligence Tools, or any other destination of your choice in a completely hassle-free & automated manner. Hevo supports data ingestion replication from PostgreSQL servers via Write Ahead Logs (WALs) set at the logical level.

Use Hevo’s no-code data pipeline to replicate data from PostgreSQL to a destination of your choice in a seamless and automated way. Try our 14-day full feature access free trial.

Get Started with Hevo for Free

Hevo also provides you with an all-in-one solution for replicating data from Amazon Aurora PostgreSQL, Amazon RDS PostgreSQL, Azure PostgreSQL, and various other PostgreSQL sources. This way you can focus more on your Data Analysis and let Hevo manage the PostgreSQL Data Replication.

To learn more, check out Hevo’s documentation for PostgreSQL Replication.

Check out what makes Hevo amazing:

  • Auto Schema Mapping: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data from PostgreSQL and replicates it to the destination schema.
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Quick Setup: Hevo with its automated features, can be set up in minimal time. Moreover, with its simple and interactive UI, it is extremely easy for new customers to work on and perform operations.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. Y
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.

Postgres Replication Types: PostgreSQL Logical Replication

The third type in Postgres Replication types is Logical replication. Logical replication in PostgreSQL can be used to copy individual tables instead of an entire database. Do note, however, that because it supports data flow in multiple directions, conflicts could arise and there should be a procedure in place to handle those occurrences. Below is a full example of implementing Postgres Replication Types: Logical replication in PostgreSQL.

The first step is to create a database user in the main database with a replication role like so:

CREATE USER your_user REPLICATION;

Add the newly created user to pg_hba.conf file to enable client authentication:

# TYPE    DATABASE        USER         ADDRESS                 METHOD
  host    replication     your_user    <ip address range>      trust

In the postgresql.conf configuration file of either the main or standby, set wal_level to logical:

wal_level=logical

On the main database, create a publication for the tables you want to replicate:

CREATE PUBLICATION your_publication FOR TABLE table1, table2;

Then on the standby database recreate the exact tables you want to replicate, maintaining the table structures, in this example that will be table1 and table2. You should then create a subscription on the standby database for the publication you created earlier on the main database, using the command below:

CREATE SUBSCRIPTION your_subscription CONNECTION 'host=<primary-ip> port=<primary-port> user=your_user' PUBLICATION your_publication;

That is all that is needed to set up PostgreSQL Replication Types: Logical replication on PostgreSQL.

Hevo also supports Logical Replication in Postgres Replication Types. Hevo Data, a No-code Data Pipeline, helps load data from any data source such as Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process. It supports 100+ Data Sources such as Postgres,  including 40+ Free Sources. It is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination.

Conclusion

In this article, you have been introduced to the concept of replication and Postgres Replication Types and its various classification schemes. A discussion on some of the reasons for database replication was also touched upon. Furthermore, you were guided through streaming and logical replication in Postgres Replication Types and how to implement the same. You should now be well equipped to use database replication as a tool in your arsenal to design highly available and fault-tolerant systems. There is more to learn, do check out the official PostgreSQL documentation on these topics to get a deeper dive or consider exploring third-party solutions that offer a managed experience.

While Datadogs Webhooks Integration is insightful, it is a hectic task to Set Up the proper environment. Moreover, you will need to implement it manually, which will consume your time & resources and is error-prone.

Hevo Data provides an Automated No-code Data Pipeline that empowers you to overcome the above-mentioned limitations. Hevo caters to 100+ data sources (including 40+ free sources) and can seamlessly perform PostgreSQL Replication in real-time. Hevo’s fault-tolerant architecture ensures a consistent and secure replication of your PostgreSQL data. It will make your life easier and make Data Replication hassle-free.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of learning about Postgres Replication types in the comments section below.

Ofem Eteng
Freelance Technical Content Writer, Hevo Data

Ofem is a freelance writer specializing in data-related topics, who has expertise in translating complex concepts. With a focus on data science, analytics, and emerging technologies.