PostgreSQL is one of the most widely used open-source relational databases. In addition to being open-source, it is highly extensible as it allows you to define your data types, develop your own custom functions & you can write code in different programming languages without even recompiling your database. PostgreSQL also adheres to SQL Standard more closely as compared to other databases like MySQL.
One of the most important features that PostgreSQL provides is the streaming replication feature which can be used to set up PostgreSQL Sync Replication. You can set up different replicated cluster configurations in different ways for different uses. In this article, you will learn about PostgreSQL Sync Replication, its importance, and more about 2 different types of PostgreSQL Sync Replication.
Table of Contents
- PostgreSQL Sync Replication and its Importance
- Methods to Set Up PostgreSQL Sync Replication
PostgreSQL Sync Replication and its Importance
PostgreSQL Sync Replication is actually the ability to shift & apply to Write Ahead Logs or WALs. WALs are important, particularly in all modern RDBMS, as they provide durable & easy transactions. Simply put, every transaction carried in RDBMS is first written out as a WAL file. After it’s written on a WAL file, changes are applied to the actual on-disk table data files. Also, check out PostgreSQL Multi-Master Replication.
One property of the WAL files is that they are highly sequential which result makes the WAL file sequence a “replay log” of changes. This implies that if you need to replicate all the changes happening in one database server, what you will need is to do is copy all WAL files as they are being created & apply them in a sequence to another server or node. This direct movement of WAL files or records from one server (master) to another server (slave) is known as log shipping and by default, this log shipping or replication is asynchronous in nature (we will discuss this in detail later).
To learn more about PostgreSQL, visit here.
PostgreSQL Sync Replication : 3 Methods
PostgreSQL Asynchronous Replication is a methodology in which it is possible to create read-only replicas of primary or master servers so that these might be used in an emergency situation or achieve the benefits/objectives mentioned above.
As the name suggests that data is replicated to the secondary servers at the same time as it is written on the primary servers. The main idea of PostgreSQL Synchronous Replication lies in the fact that changes must be on at least a couple of servers before the primary returns a positive response or success to the secondary or a slave node.
Hevo Data, an Automated Data Pipeline, provides you with a hassle-free solution to perform the PostgreSQL Sync Replication with an easy-to-use no-code interface. Hevo is fully managed and completely automates the process of not only replicating data from PostgreSQL but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code.
Hevo’s fault-tolerant Data Pipeline offers a faster way to move your data from PostgreSQL and 100+ other data sources (including 40+ free data sources) into Data Warehouses, Databases, BI Tools, or any other destination of your choice. Hevo will take full charge of the data replication process, allowing you to focus on key business activities
Methods to Set Up PostgreSQL Sync Replication
Data replication, needless to say, is one of the most significant processes in RDBMS. You can perform the PostgreSQL Sync Replication using the following 3 methods:
- Method 1: PostgreSQL Asynchronous Replication
- Method 2: PostgreSQL Synchronous Replication
- Method 3: PostgreSQL Sync Replication Using Hevo Data
Let’s dive in and read about all the above methods of PostgreSQL Sync Replication!
Method 1: PostgreSQL Asynchronous Replication
In the case of Asynchronous Replication, WAL records are shipped after the transaction is committed. In other words, data might be hit by a STANDBY after the changes have been completed on the database on the primary server which might result in some data loss if the primary server suffers a sudden failure.
As the process is asynchronous, there is a minor delay which can usually cause a minor data loss in case of a server crash. The delay is however very minute, typically under one second assuming the standby is strong enough to bear the load. To make sure that the data loss is minimum, archive_timeout is required but with streaming replication, there is no need to call up this function.
Let’s discuss how to set up asynchronous replication in detail.
First of all, in the primary node, a few changes need to be made in postgresql.conf & pg_hba.conf files present in the data directory. You can confirm the path using the following command.
postgres= # SELECT setting FROM pg_settings WHERE name = 'data_directory';setting
Now you need to make the following changes in the postgresql.conf file.
You can check the IP addresses where you want to listen on. * represents all the IP addresses on the server. Also, you can set the server port which by default is 5432.
listen_addresses = '*' port = 5432
Next, you need to set up wal_level to stand_by as it is mapped to the replica.
wal_level = hot_standby
Next, you need to set the maximum number of wal_sender processes that manage the connection with the standby server & also set a minimum amount of WAL files to be kept in pg_wal directory.
Max_wal_senders = ‘’ (set a value) Wal_keep_segments = ‘’ (set a value)
After setting these values, you need to restart the database service.
$ systemctl restart postgresql-11
Now you need to make changes in the second file i.e. pg_hba.conf.
Here you need to create a role or user access and assign replication privilege to it.
postgres=# CREATE ROLE replication_user WITH LOGIN PASSWORD 'PASSWORD' REPLICATION; CREATE ROLE
The main parameters in this file look like this.
Now let’s talk about changes that need to be made in the secondary node.
You need to get the current primary data directory & assign the updated database i.e. PostgreSQL.
Before that, you need to remove the existing data directory.
Go to the path i.e. /var/lib/pgsql/11/
$ cd /var/lib/pgsql/11/ $ mv data data.bk
Now get the current data directory.
$ pg_basebackup -h 192.168.100.145 -D /var/lib/pgsql/11/data/ -P -U replication_user --wal-method=stream $ chown -R postgres.postgres data
Now, you need to make some changes in recovery.conf and postgresql.conf files.
standby_mode = 'on' primary_conninfo = 'host=IP_PRIMARY_NODE port=5432 user=replication_user password=PASSWORD' recovery_target_timeline = 'latest' trigger_file = '/tmp/failover_5432.trigger'
Like earlier, we need to set wal_level to hot_standby which refers to the replica.
wal_level = hot_standby hot_standby = on
After making these changes, the configuration setup is complete.
By using the following command, you can confirm that the replication status in the primary node is asynchronous.
$ systemctl start postgresql-11
Performance of Asynchronous Replication
Let’s evaluate Asynchronous PostgreSQL Sync Replication. Usually, there are some important parameters considered to evaluate any sort of replication.
- Write Performance: Asynchronous Replication usually performs better when it comes to the insert or write function. There is only a wait for primary to write the WAL. It does not have to wait for network permission from the secondary location.
- Reading Consistency: As this is Asynchronous PostgreSQL Sync Replication, reading the replicated data can be a bit inconsistent between the primary node & secondary node and there might be some time lag until the changes have been streamed to the replicas & applied to their databases.
- Data Loss: This is one of the drawbacks that can happen particularly if the primary server crashes all of a sudden before the WAL is streamed to replicas & applied to their databases. However, if the primary server crashes & recovers, replicas will also resume streaming from where they left off & eventually catch up with the primary server.
- Distance Proximity: Asynchronous Replication usually works over long distances much better than synchronous replication as long as a network connection between data centers is available.
Method 2: PostgreSQL Synchronous Replication
Setting up PostgreSQL sync replication mainly follows the steps similar to those discussed in asynchronous replication.
One of the most important parameters, when we talk about synchronous replication, is the application_name parameter.
There is a strong relationship between the application_name variable & synchronous replication. If the application_name value is the part of synchronous_standby_names, the slave or secondary would be synchronous. Furthermore, in order to be synchronous standby, there are other conditions that matter as well like it has to be connected & it must be streaming data in real-time i.e. it is not fetching the old WAL records.
Moving on, let’s repeat the process of making changes in the postgresql.conf file on the master node.
wal_level = hot_standby hot_standby = on Max_wal_senders = ‘’ # a value Wal_keep_segments = ‘’ # a value
The purpose of Wal_keep_segments is to keep more transactional logs. It helps in making the entire setup very robust.
Moving on, you can work on the recovery.conf file.
primary_conninfo = 'host=localhost application_name=article_writing port=5432' standby_mode = on
Well, you only added the application_name variable in the file. Rest of the file is the same.
You can now test the setup by running the system at master & have a look at pg_stat_replication.
postgres=# x Expanded display is on. postgres=# SELECT * FROM pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 62345 usesysid | 14 usename | fz application_name | article_writing client_addr | ::1 client_hostname | client_port | 45678 backend_start | 2020-06-28 13:03:14 state | streaming sent_location | 0/30001E8 write_location | 0/30001E8 flush_location | 0/30001E8 replay_location | 0/30001E8 sync_priority | 1 sync_state | sync
The sync_state, in the end, is showing that the setup is working in a synchronization mode.
Performance of Synchronous Replication
Let’s evaluate synchronous replication performance on the same parameters as we used for asynchronous replication.
- Firstly, let’s talk about writing performance. It is slow as compared to asynchronous replication. As two things are happening at the same time i.e. reading WALs & then writing those on secondary or slave nodes so it is a relatively time taking process.
- Secondly, reading consistency is comparatively better as compared to its counterpart but still, there are some issues that persist.
- Data loss chances are very rare & the data recovery rate is very good as data is being written on two different nodes.
- As far as distance proximity is concerned, synchronous replication is suitable for short distances i.e. within the proximity.
Method 3: PostgreSQL Sync Replication Using Hevo Data
Hevo Data, a No-code Data Pipeline, helps you replicate data from PostgreSQL to Data Warehouses, Business Intelligence Tools, or any other destination of your choice in a completely hassle-free & automated manner. Hevo supports data ingestion replication from PostgreSQL servers via Write Ahead Logs (WALs) set at the logical level. A WAL is a collection of log files that record information about data modifications and data object modifications made on your PostgreSQL server instance.
To learn more, check out Hevo’s documentation for PostgreSQL replication.
Check out what makes Hevo amazing:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Auto Schema Mapping: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data from PostgreSQL and replicates it to the destination schema.
- Quick Setup: Hevo with its automated features, can be set up in minimal time. Moreover, with its simple and interactive UI, it is extremely easy for new customers to work on and perform operations.
- All in One Solution: Hevo enables you to replicate data from all variations of PostgreSQL sources including Amazon RDS PostgreSQL, Azure PostgreSQL, Google Cloud PostgreSQL, etc.
- Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use for aggregation.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
With continuous real-time data movement, Hevo allows you to replicate your PostgreSQL data along with your other data sources and seamlessly load it to the destination of your choice with a no-code, easy-to-setup interface. Try our 14-day full-feature access free trial!Get Started with Hevo for Free
This article described 3 easy methods of performing the PostgreSQL Replication. While you can set up PostgreSQL sync replication (or async replication) as described in this post, it is quite effort-intensive and requires in-depth technical expertise.
Hevo Data provides an Automated No-code Data Pipeline that empowers you to overcome the above-mentioned limitations. Hevo caters to 100+ data sources (including 40+ free sources) and can seamlessly perform PostgreSQL Sync Replication in real-time. Hevo’s fault-tolerant architecture ensures a consistent and secure replication for your PostgreSQL data. It will make your life easier and make data replication hassle-free.Learn more about Hevo
Want to take Hevo for a spin? Signup for a 14-day free trial and experience the feature-rich Hevo suite firsthand.
We would love to hear from you. Please share your thoughts and experience of setting up PostgreSQL Sync Replication in the comments section.