In addition to being open-source, PostgreSQL is highly extensible as it allows you to define your data types, develop your own custom functions & you can write codes in different programming languages without even recompiling your database. PostgreSQL also adheres to SQL standards more closely than other databases like MySQL. One of the most important features that PostgreSQL provides is the streaming replication feature which can be used to set up PostgreSQL sync replication.

With the help of this feature, you can set up different replicated cluster configurations in different ways for different uses. In this article, you will answer questions like how can Postgres sync two databases, how to sync PostgreSQL database, its importance, and more about 2 different types of PostgreSQL sync replication.

Methods to Set Up PostgreSQL Sync Replication

Data replication, needless to say, is one of the most significant processes in RDBMS. You can perform the PostgreSQL sync two databases replication using the following 3 methods: 

Let’s dive in and read about all the above methods of PostgreSQL sync replication!

Method 1: PostgreSQL Sync Replication Using Hevo Data

Hevo Data, a no-code data pipeline, helps you replicate data from PostgreSQL to data warehouses, business intelligence tools, or any other destination of your choice in a completely hassle-free & automated manner. Hevo supports data ingestion replication from PostgreSQL servers via Write Ahead Logs (WALs) set at the logical level. A WAL is a collection of log files that record information about data modifications and data object modifications made on your PostgreSQL server instance.

To learn more, check out Hevo’s documentation for PostgreSQL replication.

For further information about data replication, read our comprehensive guide.

Check out what makes Hevo amazing:

  • Auto Schema Mapping: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data from PostgreSQL and replicates it to the destination schema.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use for aggregation.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
Solve your data replication problems with Hevo’s reliable, no-code, automated pipelines with 150+ connectors.
Get your free trial right away!

Method 2: PostgreSQL Asynchronous Replication

In the case of asynchronous replication, WAL records are shipped after the transaction is committed. In other words, data might be hit by a STANDBY after the changes have been completed on the database on the primary server which might result in some data loss if the primary server suffers a sudden failure. 

As the process is asynchronous, there is a minor delay which can usually cause a minor data loss in case of a server crash. The delay is however very minute, typically under one second assuming the standby is strong enough to bear the load. To make sure that the data loss is minimum, archive_timeout is required but with streaming replication, there is no need to call up this function.

Let’s discuss how to set up asynchronous replication in detail. 

First of all, in the primary node, a few changes need to be made in postgresql.conf & pg_hba.conf files present in the data directory. You can confirm the path using the following command.

postgres= # SELECT setting FROM pg_settings WHERE name = 'data_directory';setting

Now you need to make the following changes in the postgresql.conf file.   

You can check the IP addresses where you want to listen on. * represents all the IP addresses on the server. Also, you can set the server port which by default is 5432. 

listen_addresses = '*'
port = 5432

Next, you need to set up wal_level to stand_by as it is mapped to the replica. 

wal_level = hot_standby

Next, you need to set the maximum number of wal_sender processes that manage the connection with the standby server & also set a minimum amount of  WAL files to be kept in pg_wal directory. 

Max_wal_senders = ‘’  (set a value)
Wal_keep_segments = ‘’ (set a value)

After setting these values, you need to restart the database service. 

$ systemctl restart postgresql-11

Now you need to make changes in the second file i.e. pg_hba.conf.

Here you need to create a role or user access and assign replication privilege to it.

postgres=# CREATE ROLE replication_user WITH LOGIN PASSWORD 'PASSWORD' REPLICATION;
CREATE ROLE

The main parameters in this file look like this. 

TYPEDATABASEUSER ADDRESSMETHOD
hostreplicationreplication_userIP_STANDBY_NODE/32  md5
host replication replication_user IP_PRIMARY_NODE/32  md5

Now let’s talk about changes that need to be made in the secondary node. 

You need to get the current primary data directory & assign the updated database i.e. PostgreSQL. 

Before that, you need to remove the existing data directory. 

Go to the path i.e. /var/lib/pgsql/11/

$ cd /var/lib/pgsql/11/
$ mv data data.bk

Now get the current data directory. 

$ pg_basebackup -h 192.168.100.145 -D /var/lib/pgsql/11/data/ -P -U replication_user --wal-method=stream
$ chown -R postgres.postgres data

Now, you need to make some changes in recovery.conf and postgresql.conf  files.

standby_mode = 'on'
primary_conninfo = 'host=IP_PRIMARY_NODE port=5432 user=replication_user password=PASSWORD' 
recovery_target_timeline = 'latest'
trigger_file = '/tmp/failover_5432.trigger'

Like earlier, we need to set wal_level to hot_standby which refers to the replica. 

wal_level = hot_standby
hot_standby = on

After making these changes, the configuration setup is complete. 

By using the following command, you can confirm that the replication status in the primary node is asynchronous. 

$  systemctl start postgresql-11

Performance of Asynchronous Replication

Let’s evaluate asynchronous PostgreSQL sync replication. Usually, there are some important parameters considered to evaluate any sort of replication.

  • Write Performance: Asynchronous replication usually performs better when it comes to the insert or write function. There is only a wait for primary to write the WAL. It does not have to wait for network permission from the secondary location. 
  • Reading Consistency: As this is asynchronous Postgres sync replication, reading the replicated data can be a bit inconsistent between the primary node & secondary node and there might be some time lag until the changes have been streamed to the replicas & applied to their databases. 
  • Data Loss: This is one of the drawbacks that can happen particularly if the primary server crashes all of a sudden before the WAL is streamed to replicas & applied to their databases. However, if the primary server crashes & recovers, replicas will also resume streaming from where they left off & eventually catch up with the primary server.    
  • Distance Proximity: Asynchronous replication usually works over long distances much better than synchronous replication as long as a network connection between data centers is available. 

Method 3: PostgreSQL Synchronous Replication

Setting up Postgres synchronous replication mainly follows the steps similar to those discussed in asynchronous replication. 

One of the most important parameters for the ‘Postgres sync two databases’ replication, when we talk about synchronous replication, is the application_name parameter. 

There is a strong relationship between the application_name variable & synchronous replication. If the application_name value is the part of synchronous_standby_names, the slave or secondary would be synchronous. Furthermore, in order to be synchronous standby, there are other conditions that matter as well like it has to be connected & it must be streaming data in real-time i.e. it is not fetching the old WAL records. 

Moving on, let’s repeat the process of making changes in the postgresql.conf file on the master node. 

wal_level = hot_standby
hot_standby = on
Max_wal_senders = ‘’ # a value
Wal_keep_segments = ‘’ # a value

The purpose of Wal_keep_segments is to keep more transactional logs. It helps in making the entire setup very robust. 

Moving on, you can work on the recovery.conf file. 

primary_conninfo = 'host=localhost
               	application_name=article_writing
               	port=5432'
standby_mode = on

Well, you only added the application_name variable in the file. Rest of the file is the same. 

You can now test the setup by running the system at master & have a look at pg_stat_replication.   

postgres=# x
Expanded display is on.
postgres=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid       	 | 62345
usesysid     	| 14
usename     	| fz
application_name | article_writing
client_addr 	| ::1
client_hostname |
client_port 	| 45678
backend_start   | 2020-06-28 13:03:14
state       	| streaming
sent_location   | 0/30001E8
write_location   | 0/30001E8
flush_location   | 0/30001E8
replay_location | 0/30001E8
sync_priority   | 1
sync_state   	| sync

The sync_state, in the end, is showing that the setup is working in a synchronization mode. 

Performance of Synchronous Replication

Let’s evaluate synchronous replication performance on the same parameters as we used for asynchronous replication. 

  • Firstly, let’s talk about writing performance. It is slow as compared to asynchronous replication. As two things are happening at the same time i.e. reading WALs & then writing those on secondary or slave nodes so it is a relatively time taking process. 
  • Secondly, reading consistency is comparatively better as compared to its counterpart but still, there are some issues that persist. 
  • Data loss chances are very rare & the data recovery rate is very good as data is being written on two different nodes. 
  • As far as distance proximity is concerned, synchronous replication is suitable for short distances i.e. within the proximity. 

Automatic Failover

Failover is a way of retrieving data when the primary server fails for whatever reason. As long as you’ve set up PostreSQL to manage your physical streaming replication, you and your users will be safe from downtime due to a primary server failure.

It is worth noting that the failover procedure can take some time to set up and initiate. PostgreSQL does not have any built-in tools for monitoring and scoping server failures, so you’ll have to get creative.

Fortunately, you do not have to rely on PostgreSQL for failover. Some specific tools enable automatic failover and transition to backup, reducing database downtime.

Setting up failover replication guarantees high availability by guaranteeing that backups are available if the primary server fails.

PostgreSQL Manual Failover Steps

Here are the steps for manual failover in PostgreSQL:

  • Crash the main server.
  • Activate the secondary server by using the following command:
./pg_ctl promote -D ../sb_data/
server promoting

Establish a connection to the activated secondary server and add a row:

-bash-4.2$ ./edb-psql -p 5432 edb

Password:

psql.bin (10.7)

Type "help" for help.

edb=# insert into abc values(4,'Four');

If the row was added successfully, the secondary, a read-only server, has been promoted as the new primary server.

Automating Failover in PostgreSQL

To configure an automatic failover, you will require the EDB PostgreSQL failover manager (EFM). After downloading and installing EFM on each primary and standby node, you can set up an EFM Cluster comprising one or more standby nodes, a primary node, and an optional Witness node to confirm claims in the event of failure.

EFM continuously tracks system health and generates email alerts depending on system occurrences. In the event of a failure, it automatically switches to the most recent standby and resets all other standby servers to recognize the new primary node.

It also transforms load balancers (such as pgPool) and avoids “split-brain” (in which two nodes each believe they are the primary).

Before wrapping up, let’s cover some basics.

PostgreSQL Sync Replication and its Importance

PostreSQL sync replication: Postgres logo
Image Source

PostgreSQL sync replication is actually the ability to shift & apply to Write Ahead Logs or WALs. WALs are important, particularly in all modern RDBMS, as they provide durable & easy transactions. Simply put, every transaction carried in RDBMS is first written out as a WAL file. After it’s written on a WAL file, changes are applied to the actual on-disk table data files. Also, check out PostgreSQL Multi-Master Replication.

One property of the WAL files is that they are highly sequential which result makes the WAL file sequence a “replay log” of changes. This implies that if you need to replicate all the changes happening in one database server, what you will need is to do is copy all WAL files as they are being created & apply them in a sequence to another server or node. This direct movement of WAL files or records from one server (master) to another server (slave) is known as log shipping and by default, this log shipping or replication is asynchronous in nature (we will discuss this in detail later). 

To learn more about PostgreSQL, visit here.

Download the Ultimate Guide on Database Replication
Download the Ultimate Guide on Database Replication
Download the Ultimate Guide on Database Replication
Learn the 3 ways to replicate databases & which one you should prefer.

Conclusion

This article described 3 easy methods of performing the PostgreSQL replication. While you can set up PostgreSQL synchronous replication (or async replication) as described in this post, it is quite effort-intensive and requires in-depth technical expertise.

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. Hevo caters to 150+ data sources (including 40+ free sources) and can seamlessly perform PostgreSQL sync replication in real-time. Hevo’s fault-tolerant architecture ensures a consistent and secure replication for your PostgreSQL data.

Learn more about Hevo

Want to take Hevo for a spin? Signup for a 14-day free trial and experience the feature-rich Hevo suite firsthand.

We would love to hear from you. Please share your thoughts and experience of setting up PostgreSQL Sync Replication in the comments section.

References:

Muhammad Faraz
Freelance Technical Content Writer, Hevo Data

In his role as a freelance writer, Muhammad loves to use his analytical mindset and a problem-solving ability to help businesses solve problems by offering extensively researched content.