What Is PostgreSQL Sync Replication and How Does It Work?

Q: 5. What is data synchronization vs replication?

Data synchronization ensures data consistency between systems in real-time or near real-time, while data replication involves copying data from one location to another, which may not be immediate.

In addition to being open-source, PostgreSQL is highly extensible. It allows you to define your data types, develop your own custom functions, and write codes in different programming languages without even recompiling your database.

PostgreSQL also adheres to SQL standards more closely than other databases like MySQL. One of the most essential features that PostgreSQL provides is the streaming replication feature, which can be used to set up PostgreSQL sync replication.

With the help of this feature, you can set up different replicated cluster configurations in different ways for different uses. In this article, you will answer questions like how PostgreSQL sync tables between databases, how to PostgreSQL sync database, its importance, and more about two different types of PostgreSQL sync replication.

Table of Contents

What is PostgreSQL Sync Replication?

PostgreSQL synchronous replication is the process of copying data from a primary server to one or more standby servers, ensuring that a transaction is not considered committed until it is confirmed by both the primary and at least one synchronous standby server. This setup guarantees zero data loss.

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with various data sources (60+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Hevo’s salient features include:

It is an easy-to-use interface; there is no need for any prior coding knowledge.
Highly Scalable and fault-tolerant architecture.
Transparent pricing with various tiers to choose from to meet your varied needs.
Real-time data integration ensures that your data is always analysis-ready.

Thousands of customers trust Hevo for their ETL process. Join them and experience seamless data migration.

Get Started with Hevo for Free

What are the Methods to Set Up PostgreSQL Sync Replication?

Data replication is one of the most significant processes in RDBMS. You can perform the PostgreSQL sync two databases replication using the following three methods:

Let’s dive in and read about all the above methods of PostgreSQL sync replication!

Method 1: PostgreSQL Asynchronous Replication

In the case of asynchronous replication, WAL records are shipped after the transaction is committed. In other words, data might be hit by a STANDBY after the changes have been completed on the database on the primary server which might result in some data loss if the primary server suffers a sudden failure.

As the process is asynchronous, there is a minor delay which can usually cause a minor data loss in case of a server crash. The delay is however very minute, typically under one second assuming the standby is strong enough to bear the load. To make sure that the data loss is minimum, archive_timeout is required but with streaming replication, there is no need to call up this function.

Let’s discuss how to set up asynchronous replication in detail.

First of all, in the primary node, a few changes need to be made in postgresql.conf & pg_hba.conf files present in the data directory. You can confirm the path using the following command.

postgres= # SELECT setting FROM pg_settings WHERE name = 'data_directory';setting

Now you need to make the following changes in the postgresql.conf file.

You can check the IP addresses where you want to listen on. * represents all the IP addresses on the server. Also, you can set the server port which by default is 5432.

listen_addresses = '*'
port = 5432

Next, you need to set up wal_level to stand_by as it is mapped to the replica.

wal_level = hot_standby

Next, you need to set the maximum number of wal_sender processes that manage the connection with the standby server & also set a minimum amount of WAL files to be kept in pg_wal directory.

Max_wal_senders = ‘’  (set a value)
Wal_keep_segments = ‘’ (set a value)

After setting these values, you need to restart the database service.

$ systemctl restart postgresql-11

Now you need to make changes in the second file i.e. pg_hba.conf.

Here you need to create a role or user access and assign replication privilege to it.

postgres=# CREATE ROLE replication_user WITH LOGIN PASSWORD 'PASSWORD' REPLICATION;
CREATE ROLE

The main parameters in this file look like this.

TYPE	DATABASE	USER ADDRESS	METHOD
host	replication	replication_user	IP_STANDBY_NODE/32 md5
host	replication	replication_user	IP_PRIMARY_NODE/32 md5

Now let’s talk about changes that need to be made in the secondary node.

You need to get the current primary data directory & assign the updated database i.e. PostgreSQL.

Before that, you need to remove the existing data directory.

Go to the path i.e. /var/lib/pgsql/11/

$ cd /var/lib/pgsql/11/
$ mv data data.bk

Now get the current data directory.

$ pg_basebackup -h 192.168.100.145 -D /var/lib/pgsql/11/data/ -P -U replication_user --wal-method=stream
$ chown -R postgres.postgres data

Now, you need to make some changes in recovery.conf and postgresql.conf files.

standby_mode = 'on'
primary_conninfo = 'host=IP_PRIMARY_NODE port=5432 user=replication_user password=PASSWORD' 
recovery_target_timeline = 'latest'
trigger_file = '/tmp/failover_5432.trigger'

Like earlier, we need to set wal_level to hot_standby which refers to the replica.

wal_level = hot_standby
hot_standby = on

After making these changes, the configuration setup is complete.

By using the following command, you can confirm that the replication status in the primary node is asynchronous.

$  systemctl start postgresql-11

Performance of Asynchronous Replication

Let’s evaluate asynchronous PostgreSQL sync replication. Usually, there are some important parameters considered to evaluate any sort of replication.

Write Performance: Asynchronous replication usually performs better when it comes to the insert or write function. There is only a wait for primary to write the WAL. It does not have to wait for network permission from the secondary location.
Reading Consistency: As this is asynchronous Postgres sync replication, reading the replicated data can be a bit inconsistent between the primary node & secondary node and there might be some time lag until the changes have been streamed to the replicas & applied to their databases.
Data Loss: This is one of the drawbacks that can happen particularly if the primary server crashes all of a sudden before the WAL is streamed to replicas & applied to their databases. However, if the primary server crashes & recovers, replicas will also resume streaming from where they left off & eventually catch up with the primary server.
Distance Proximity: Asynchronous replication usually works over long distances much better than synchronous replication as long as a network connection between data centers is available.

Load your Data from PostgreSQL to Snowflake

Get a Demo Try it

Migrate your Data from PostgreSQL to Redshift

Get a Demo Try it

Sync your Data from PostgreSQL to BigQuery

Get a Demo Try it

Replicate your Data from PostgreSQL to Databricks

Get a Demo Try it

Method 2: PostgreSQL Synchronous Replication

Setting up Postgres synchronous replication mainly follows the steps similar to those discussed in asynchronous replication.

One of the most important parameters for the ‘Postgres sync two databases’ replication, when we talk about synchronous replication, is the application_name parameter.

There is a strong relationship between the application_name variable & synchronous replication. If the application_name value is the part of synchronous_standby_names, the slave or secondary would be synchronous.

Furthermore, in order to be synchronous standby, there are other conditions that matter as well like it has to be connected & it must be streaming data in real-time i.e. it is not fetching the old WAL records.

Moving on, let’s repeat the process of making changes in the postgresql.conf file on the master node.

wal_level = hot_standby
hot_standby = on
Max_wal_senders = ‘’ # a value
Wal_keep_segments = ‘’ # a value

The purpose of Wal_keep_segments is to keep more transactional logs. It helps in making the entire setup very robust.

Moving on, you can work on the recovery.conf file.

primary_conninfo = 'host=localhost
               	application_name=article_writing
               	port=5432'
standby_mode = on

Well, you only added the application_name variable in the file. Rest of the file is the same.

You can now test the setup by running the system at master & have a look at pg_stat_replication.

postgres=# x
Expanded display is on.
postgres=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid       	 | 62345
usesysid     	| 14
usename     	| fz
application_name | article_writing
client_addr 	| ::1
client_hostname |
client_port 	| 45678
backend_start   | 2020-06-28 13:03:14
state       	| streaming
sent_location   | 0/30001E8
write_location   | 0/30001E8
flush_location   | 0/30001E8
replay_location | 0/30001E8
sync_priority   | 1
sync_state   	| sync

The sync_state, in the end, is showing that the setup is working in a synchronization mode.

Performance of Synchronous Replication

Let’s evaluate synchronous replication performance on the same parameters as we used for asynchronous replication.

Firstly, let’s talk about writing performance. It is slow as compared to asynchronous replication. As two things are happening at the same time i.e. reading WALs & then writing those on secondary or slave nodes so it is a relatively time taking process.
Secondly, reading consistency is comparatively better as compared to its counterpart but still, there are some issues that persist.
Data loss chances are very rare & the data recovery rate is very good as data is being written on two different nodes.
As far as distance proximity is concerned, synchronous replication is suitable for short distances i.e. within the proximity.

What is Automatic Failover?

Failover is a way of retrieving data when the primary server fails for whatever reason. As long as you’ve set up PostreSQL to manage your physical streaming replication, you and your users will be safe from downtime due to a primary server failure.

It is worth noting that the failover procedure can take some time to set up and initiate. PostgreSQL does not have any built-in tools for monitoring and scoping server failures, so you’ll have to get creative.

Fortunately, you do not have to rely on PostgreSQL for failover. Some specific tools enable automatic failover and transition to backup, reducing database downtime.

Setting up failover replication guarantees high availability by guaranteeing that backups are available if the primary server fails.

What are PostgreSQL Manual Failover Steps?

Here are the steps for manual failover in PostgreSQL:

Crash the main server.
Activate the secondary server by using the following command:

./pg_ctl promote -D ../sb_data/
server promoting

Establish a connection to the activated secondary server and add a row:

-bash-4.2$ ./edb-psql -p 5432 edb

Password:

psql.bin (10.7)

Type "help" for help.

edb=# insert into abc values(4,'Four');

If the row was added successfully, the secondary, a read-only server, has been promoted as the new primary server.

How to Configure Automatic Failover in PostgreSQL?

To configure an automatic failover, you will require the EDB PostgreSQL failover manager (EFM). After downloading and installing EFM on each primary and standby node, you can set up an EFM Cluster comprising one or more standby nodes, a primary node, and an optional Witness node to confirm claims in the event of failure.

EFM continuously tracks system health and generates email alerts depending on system occurrences. In the event of a failure, it automatically switches to the most recent standby and resets all other standby servers to recognize the new primary node.

It also transforms load balancers (such as pgPool) and avoids “split-brain” (in which two nodes each believe they are the primary).

Before wrapping up, let’s cover some basics.

What are the Differences Between Synchronous and Asynchronous Streaming Replication?

Synchronous Streaming Replication	Asynchronous Streaming Replication
Allows data to be written to both the primary and secondary servers simultaneously.	It ensures that the data is first written to the host and then copied to the secondary server.
Transactions to the primary database are only considered completed after they’ve been replicated to all replicas.	Transactions on the primary server can be declared complete when the changes have been done on the primary server.
The replica servers must always be available for the transactions to be completed on the primary.	The replica servers can remain out-of-sync for a certain duration, which is called a replication lag.
Used in high-end transactional environments with immediate failover requirements.	Used for cases where a small amount of data loss is acceptable, such as in read scaling configurations.

What is PostgreSQL Sync Replication and Why is it Important?

PostgreSQL sync replication is actually the ability to shift & apply to Write Ahead Logs or WALs. WALs are important, particularly in all modern RDBMS, as they provide durable & easy transactions. Simply put, every transaction carried in RDBMS is first written out as a WAL file. After it’s written on a WAL file, changes are applied to the actual on-disk table data files. Also, check out PostgreSQL Multi-Master Replication.

One property of the WAL files is that they are highly sequential which result makes the WAL file sequence a “replay log” of changes. This implies that if you need to replicate all the changes happening in one database server, what you will need is to do is copy all WAL files as they are being created & apply them in a sequence to another server or node.

This direct movement of WAL files or records from one server (master) to another server (slave) is known as log shipping and by default, this log shipping or replication is asynchronous in nature (we will discuss this in detail later).

Why to Convert Asynchronous PostgreSQL Replication to Synchronous Replication

The shift from Asynchronous PostgreSQL replication to Synchronous Replication is necessary because of numerous reasons like:

Guaranteed Data Consistency: Synchronous replication guarantees consistency between the primary and replica databases in real-time by first confirming from the replica before committing transactions. This way, both copies can be maintained up to date, which will, in turn, reduce data inconsistency.
Fault tolerance: If the primary server fails, the standby server can act as a server because the contained data for both primary and standby servers is the same.

Benefits of PostgreSQL Synchronous Replication

Synchronous replication in PostgreSQL offers several advantages that enhance data reliability and availability:

Data Durability
Ensures that all committed transactions are safely replicated to at least one standby server, minimizing data loss risks in case of a failure.
High Availability
Guarantees data consistency across primary and standby servers, allowing for seamless failover and reducing downtime during server outages.
Improved Data Consistency
Keeps the primary and standby servers synchronized, ensuring that all nodes always have the same data, which is crucial for critical applications.
Disaster Recovery
Acts as a robust solution for disaster recovery by maintaining an exact copy of the data on standby servers.
Read Scalability
Standby servers in sync replication can handle read operations, reducing the load on the primary server while providing consistent data.
Regulatory Compliance
Supports compliance with regulations requiring real-time or near-real-time data consistency and availability across systems.

Conclusion

This article described three easy methods of performing the PostgreSQL replication. While you can set up PostgreSQL synchronous replication (or async replication) as described in this post, it is quite effort-intensive and requires in-depth technical expertise.

For an even more streamlined and automated approach, sign up for a 14-day free trial with Hevo Data to enjoy hassle-free data replication from multiple sources to a centralized destination of your choice.

FAQ on PostgreSQL Synchronous Replication

1. What is synchronous replication in PostgreSQL?

Synchronous replication ensures that changes on the primary database are committed only after confirmation that the changes have been applied to the standby server, providing high data consistency.

2. Does PostgreSQL support replication?

Yes, PostgreSQL supports various types of replication, including streaming, logical, and physical replication.

3. How to sync data in PostgreSQL?

Use pg_basebackup for initial data synchronization and configure streaming replication for continuous syncing. Logical replication can also be set up for more granular control.

4. What are the best practices for replication in PostgreSQL?

Best practices include using synchronous replication for critical data, regular monitoring, ensuring network stability, and having a proper backup strategy.

5. What is data synchronization vs replication?

Data synchronization ensures data consistency between systems in real-time or near real-time, while data replication involves copying data from one location to another, which may not be immediate.

Muhammad Faraz Technical Content Writer, Hevo Data

Muhammad Faraz is an AI/ML and MLOps expert with extensive experience in cloud platforms and new technologies. With a Master's degree in Data Science, he excels in data science, machine learning, DevOps, and tech management. As an AI/ML and tech project manager, he leads projects in machine learning and IoT, contributing extensively researched technical content to solve complex problems.

PostgreSQL Sync Replication: Effective Methods to Get Started Explained

What is PostgreSQL Sync Replication?

What are the Methods to Set Up PostgreSQL Sync Replication?

Method 1: PostgreSQL Asynchronous Replication

Performance of Asynchronous Replication

Method 2: PostgreSQL Synchronous Replication

Performance of Synchronous Replication

What is Automatic Failover?

What are PostgreSQL Manual Failover Steps?

How to Configure Automatic Failover in PostgreSQL?

What are the Differences Between Synchronous and Asynchronous Streaming Replication?

What is PostgreSQL Sync Replication and Why is it Important?

Why to Convert Asynchronous PostgreSQL Replication to Synchronous Replication

Benefits of PostgreSQL Synchronous Replication

Conclusion

FAQ on PostgreSQL Synchronous Replication

1. What is synchronous replication in PostgreSQL?

2. Does PostgreSQL support replication?

3. How to sync data in PostgreSQL?

4. What are the best practices for replication in PostgreSQL?

5. What is data synchronization vs replication?

Related articles

PostgreSQL Sync Replication: Effective Methods to Get Started Explained

What is PostgreSQL Sync Replication?

What are the Methods to Set Up PostgreSQL Sync Replication?

Method 1: PostgreSQL Asynchronous Replication

Performance of Asynchronous Replication

Method 2: PostgreSQL Synchronous Replication

Performance of Synchronous Replication

What is Automatic Failover?

What are PostgreSQL Manual Failover Steps?

How to Configure Automatic Failover in PostgreSQL?

What are the Differences Between Synchronous and Asynchronous Streaming Replication?

What is PostgreSQL Sync Replication and Why is it Important?

Why to Convert Asynchronous PostgreSQL Replication to Synchronous Replication

Benefits of PostgreSQL Synchronous Replication

Conclusion

FAQ on PostgreSQL Synchronous Replication

1. What is synchronous replication in PostgreSQL?

2. Does PostgreSQL support replication?

3. How to sync data in PostgreSQL?

4. What are the best practices for replication in PostgreSQL?

5. What is data synchronization vs replication?

Related Articles

Optimize your data integration with Hevo!

Related articles