In addition to being open-source, PostgreSQL is highly extensible. It allows you to define your data types, develop your own custom functions, and write codes in different programming languages without even recompiling your database.

PostgreSQL also adheres to SQL standards more closely than other databases like MySQL. One of the most essential features that PostgreSQL provides is the streaming replication feature, which can be used to set up PostgreSQL sync replication.

With the help of this feature, you can set up different replicated cluster configurations in different ways for different uses. In this article, you will answer questions like how Postgres can sync two databases, how to sync a PostgreSQL database, its importance, and more about two different types of PostgreSQL sync replication.

What is PostgreSQL Replication?

The process of copying data from the PostgreSQL database to any other destination is called PostgreSQL replication. The source database is the ‘primary’ or ‘master’ server, and the database receiving the copy of data is known as the ‘replica’ server. 

Replication is crucial in maintaining a highly available PostgreSQL database; keeping it online is likely critical for successful analysis and data access. 

Load Data from PostgreSQL to Snowflake
Load Data from PostgreSQL to Redshift
Load Data from PostgreSQL to BigQuery
Load Data from PostgreSQL to Databricks

Methods to Set Up PostgreSQL Sync Replication

Data replication is one of the most significant processes in RDBMS. You can perform the PostgreSQL sync two databases replication using the following three methods: 

Let’s dive in and read about all the above methods of PostgreSQL sync replication!

Method 1: PostgreSQL Sync Replication Using Hevo Data

Hevo Data, a no-code data pipeline, helps you replicate data from PostgreSQL to data warehouses, business intelligence tools, or any other destination of your choice in a completely hassle-free & automated manner.

Hevo supports data ingestion replication from PostgreSQL servers via Write Ahead Logs (WALs) set at the logical level. A WAL is a collection of log files that record information about data modifications and data object modifications made on your PostgreSQL server instance.

To learn more, check out Hevo’s documentation for PostgreSQL replication.

For further information about data replication, read our comprehensive guide.

Check out what makes Hevo amazing:

  • Auto Schema Mapping: Hevo eliminates the tedious task of schema management. It automatically detects the schema of incoming data from PostgreSQL and replicates it to the destination schema.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag-and-drop transformations like Date and Control Functions, JSON, and Event Manipulation, to name a few. These can be configured and tested before being put to use for aggregation.
  • Hevo Is Built To Scale: As the number of sources and your data volume grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of modified data in real-time, ensuring efficient bandwidth utilization on both ends.
Simplify PostgreSQL Replication using Hevo

Hevo’s integration platform simplifies setting up PostgreSQL sync replication by seamlessly managing data streaming and ensuring SQL standards compliance. This ensures real-time data consistency and reliability across your PostgreSQL instances.

Get Started with Hevo for Free

Method 2: PostgreSQL Asynchronous Replication

In the case of asynchronous replication, WAL records are shipped after the transaction is committed. In other words, data might be hit by a STANDBY after the changes have been completed on the database on the primary server which might result in some data loss if the primary server suffers a sudden failure. 

As the process is asynchronous, there is a minor delay which can usually cause a minor data loss in case of a server crash. The delay is however very minute, typically under one second assuming the standby is strong enough to bear the load. To make sure that the data loss is minimum, archive_timeout is required but with streaming replication, there is no need to call up this function.

Let’s discuss how to set up asynchronous replication in detail. 

First of all, in the primary node, a few changes need to be made in postgresql.conf & pg_hba.conf files present in the data directory. You can confirm the path using the following command.

postgres= # SELECT setting FROM pg_settings WHERE name = 'data_directory';setting

Now you need to make the following changes in the postgresql.conf file.   

You can check the IP addresses where you want to listen on. * represents all the IP addresses on the server. Also, you can set the server port which by default is 5432. 

listen_addresses = '*'
port = 5432

Next, you need to set up wal_level to stand_by as it is mapped to the replica. 

wal_level = hot_standby

Next, you need to set the maximum number of wal_sender processes that manage the connection with the standby server & also set a minimum amount of  WAL files to be kept in pg_wal directory. 

Max_wal_senders = ‘’  (set a value)
Wal_keep_segments = ‘’ (set a value)

After setting these values, you need to restart the database service. 

$ systemctl restart postgresql-11

Now you need to make changes in the second file i.e. pg_hba.conf.

Here you need to create a role or user access and assign replication privilege to it.

postgres=# CREATE ROLE replication_user WITH LOGIN PASSWORD 'PASSWORD' REPLICATION;
CREATE ROLE

The main parameters in this file look like this. 

TYPEDATABASEUSER ADDRESSMETHOD
hostreplicationreplication_userIP_STANDBY_NODE/32  md5
host replication replication_user IP_PRIMARY_NODE/32  md5

Now let’s talk about changes that need to be made in the secondary node. 

You need to get the current primary data directory & assign the updated database i.e. PostgreSQL. 

Before that, you need to remove the existing data directory. 

Go to the path i.e. /var/lib/pgsql/11/

$ cd /var/lib/pgsql/11/
$ mv data data.bk

Now get the current data directory. 

$ pg_basebackup -h 192.168.100.145 -D /var/lib/pgsql/11/data/ -P -U replication_user --wal-method=stream
$ chown -R postgres.postgres data

Now, you need to make some changes in recovery.conf and postgresql.conf  files.

standby_mode = 'on'
primary_conninfo = 'host=IP_PRIMARY_NODE port=5432 user=replication_user password=PASSWORD' 
recovery_target_timeline = 'latest'
trigger_file = '/tmp/failover_5432.trigger'

Like earlier, we need to set wal_level to hot_standby which refers to the replica. 

wal_level = hot_standby
hot_standby = on

After making these changes, the configuration setup is complete. 

By using the following command, you can confirm that the replication status in the primary node is asynchronous. 

$  systemctl start postgresql-11

Performance of Asynchronous Replication

Let’s evaluate asynchronous PostgreSQL sync replication. Usually, there are some important parameters considered to evaluate any sort of replication.

  • Write Performance: Asynchronous replication usually performs better when it comes to the insert or write function. There is only a wait for primary to write the WAL. It does not have to wait for network permission from the secondary location. 
  • Reading Consistency: As this is asynchronous Postgres sync replication, reading the replicated data can be a bit inconsistent between the primary node & secondary node and there might be some time lag until the changes have been streamed to the replicas & applied to their databases. 
  • Data Loss: This is one of the drawbacks that can happen particularly if the primary server crashes all of a sudden before the WAL is streamed to replicas & applied to their databases. However, if the primary server crashes & recovers, replicas will also resume streaming from where they left off & eventually catch up with the primary server.    
  • Distance Proximity: Asynchronous replication usually works over long distances much better than synchronous replication as long as a network connection between data centers is available. 

Method 3: PostgreSQL Synchronous Replication

Setting up Postgres synchronous replication mainly follows the steps similar to those discussed in asynchronous replication. 

One of the most important parameters for the ‘Postgres sync two databases’ replication, when we talk about synchronous replication, is the application_name parameter. 

There is a strong relationship between the application_name variable & synchronous replication. If the application_name value is the part of synchronous_standby_names, the slave or secondary would be synchronous.

Furthermore, in order to be synchronous standby, there are other conditions that matter as well like it has to be connected & it must be streaming data in real-time i.e. it is not fetching the old WAL records. 

Moving on, let’s repeat the process of making changes in the postgresql.conf file on the master node. 

wal_level = hot_standby
hot_standby = on
Max_wal_senders = ‘’ # a value
Wal_keep_segments = ‘’ # a value

The purpose of Wal_keep_segments is to keep more transactional logs. It helps in making the entire setup very robust. 

Moving on, you can work on the recovery.conf file. 

primary_conninfo = 'host=localhost
               	application_name=article_writing
               	port=5432'
standby_mode = on

Well, you only added the application_name variable in the file. Rest of the file is the same. 

You can now test the setup by running the system at master & have a look at pg_stat_replication.   

postgres=# x
Expanded display is on.
postgres=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid       	 | 62345
usesysid     	| 14
usename     	| fz
application_name | article_writing
client_addr 	| ::1
client_hostname |
client_port 	| 45678
backend_start   | 2020-06-28 13:03:14
state       	| streaming
sent_location   | 0/30001E8
write_location   | 0/30001E8
flush_location   | 0/30001E8
replay_location | 0/30001E8
sync_priority   | 1
sync_state   	| sync

The sync_state, in the end, is showing that the setup is working in a synchronization mode. 

Performance of Synchronous Replication

Let’s evaluate synchronous replication performance on the same parameters as we used for asynchronous replication. 

  • Firstly, let’s talk about writing performance. It is slow as compared to asynchronous replication. As two things are happening at the same time i.e. reading WALs & then writing those on secondary or slave nodes so it is a relatively time taking process. 
  • Secondly, reading consistency is comparatively better as compared to its counterpart but still, there are some issues that persist. 
  • Data loss chances are very rare & the data recovery rate is very good as data is being written on two different nodes. 
  • As far as distance proximity is concerned, synchronous replication is suitable for short distances i.e. within the proximity. 

Automatic Failover

Failover is a way of retrieving data when the primary server fails for whatever reason. As long as you’ve set up PostreSQL to manage your physical streaming replication, you and your users will be safe from downtime due to a primary server failure.

It is worth noting that the failover procedure can take some time to set up and initiate. PostgreSQL does not have any built-in tools for monitoring and scoping server failures, so you’ll have to get creative.

Fortunately, you do not have to rely on PostgreSQL for failover. Some specific tools enable automatic failover and transition to backup, reducing database downtime.

Setting up failover replication guarantees high availability by guaranteeing that backups are available if the primary server fails.

PostgreSQL Manual Failover Steps

Here are the steps for manual failover in PostgreSQL:

  • Crash the main server.
  • Activate the secondary server by using the following command:
./pg_ctl promote -D ../sb_data/
server promoting

Establish a connection to the activated secondary server and add a row:

-bash-4.2$ ./edb-psql -p 5432 edb

Password:

psql.bin (10.7)

Type "help" for help.

edb=# insert into abc values(4,'Four');

If the row was added successfully, the secondary, a read-only server, has been promoted as the new primary server.

Automating Failover in PostgreSQL

To configure an automatic failover, you will require the EDB PostgreSQL failover manager (EFM). After downloading and installing EFM on each primary and standby node, you can set up an EFM Cluster comprising one or more standby nodes, a primary node, and an optional Witness node to confirm claims in the event of failure.

EFM continuously tracks system health and generates email alerts depending on system occurrences. In the event of a failure, it automatically switches to the most recent standby and resets all other standby servers to recognize the new primary node.

It also transforms load balancers (such as pgPool) and avoids “split-brain” (in which two nodes each believe they are the primary).

Before wrapping up, let’s cover some basics.

Synchronous vs Asynchronous Streaming Replication

Synchronous Streaming ReplicationAsynchronous Streaming Replication 
Allows data to be written to both the primary and secondary servers simultaneously.It ensures that the data is first written to the host and then copied to the secondary server.
Transactions to the primary database are only considered completed after they’ve been replicated to all replicas.Transactions on the primary server can be declared complete when the changes have been done on the primary server.
The replica servers must always be available for the transactions to be completed on the primary.The replica servers can remain out-of-sync for a certain duration, which is called a replication lag.
Used in high-end transactional environments with immediate failover requirements.Used for cases where a small amount of data loss is acceptable, such as in read scaling configurations.

PostgreSQL Sync Replication and its Importance

PostreSQL sync replication: Postgres logo
Image Source

PostgreSQL sync replication is actually the ability to shift & apply to Write Ahead Logs or WALs. WALs are important, particularly in all modern RDBMS, as they provide durable & easy transactions. Simply put, every transaction carried in RDBMS is first written out as a WAL file. After it’s written on a WAL file, changes are applied to the actual on-disk table data files. Also, check out PostgreSQL Multi-Master Replication.

One property of the WAL files is that they are highly sequential which result makes the WAL file sequence a “replay log” of changes. This implies that if you need to replicate all the changes happening in one database server, what you will need is to do is copy all WAL files as they are being created & apply them in a sequence to another server or node.

This direct movement of WAL files or records from one server (master) to another server (slave) is known as log shipping and by default, this log shipping or replication is asynchronous in nature (we will discuss this in detail later). 

Why to Convert Asynchronous PostgreSQL Replication to Synchronous Replication

The shift from Asynchronous PostgreSQL replication to Synchronous Replication is necessary because of numerous reasons like:

  • Guaranteed Data Consistency: Synchronous replication guarantees consistency between the primary and replica databases in real-time by first confirming from the replica before committing transactions. This way, both copies can be maintained up to date, which will, in turn, reduce data inconsistency.
  • Fault tolerance: If the primary server fails, the standby server can act as a server because the contained data for both primary and standby servers is the same.

To learn more about PostgreSQL.

Download the Ultimate Guide on Database Replication
Download the Ultimate Guide on Database Replication
Download the Ultimate Guide on Database Replication
Learn the 3 ways to replicate databases & which one you should prefer.

Conclusion

This article described three easy methods of performing the PostgreSQL replication. While you can set up PostgreSQL synchronous replication (or async replication) as described in this post, it is quite effort-intensive and requires in-depth technical expertise.

For an even more streamlined and automated approach, sign up for a 14-day free trial with Hevo Data to enjoy hassle-free data replication from multiple sources to a centralized destination of your choice.

FAQ on PostgreSQL Synchronous Replication

What is synchronous replication in PostgreSQL?

Synchronous replication ensures that changes on the primary database are committed only after confirmation that the changes have been applied to the standby server, providing high data consistency.

Does PostgreSQL support replication?

Yes, PostgreSQL supports various types of replication, including streaming, logical, and physical replication.

How to sync data in PostgreSQL?

Use pg_basebackup for initial data synchronization and configure streaming replication for continuous syncing. Logical replication can also be set up for more granular control.

What are the best practices for replication in PostgreSQL?

Best practices include using synchronous replication for critical data, regular monitoring, ensuring network stability, and having a proper backup strategy.

What is data synchronization vs replication?

Data synchronization ensures data consistency between systems in real-time or near real-time, while data replication involves copying data from one location to another, which may not be immediate.

References:

Muhammad Faraz
Technical Content Writer, Hevo Data

Muhammad Faraz is an AI/ML and MLOps expert with extensive experience in cloud platforms and new technologies. With a Master's degree in Data Science, he excels in data science, machine learning, DevOps, and tech management. As an AI/ML and tech project manager, he leads projects in machine learning and IoT, contributing extensively researched technical content to solve complex problems.