Easy Steps to Set up PostgreSQL Logical Replication

on Tutorials • June 2nd, 2020 • Write for Hevo

Introduction

One of the major goals of storing data in a database is to ensure data availability. You would want to access your data at any time that you want. Having a single copy of the database can be dangerous. In case of an unprecedented event, you won’t be able to access it. That’s why it’s a good idea for you to have multiple copies of your database. 

PostgreSQL database management system makes this possible. It supports the concept of replication that you can use to create multiple copies of your database. 

Table of Contents

Prerequisites

Make sure that your computer is installed with the following:

  • PostgreSQL version 10.0 and above. 

What is Logical Replication?

When deploying applications in a production environment, it is always good for you to have multiple copies of the database. After making the copies, you must ensure that they are all in sync. This process of keeping database copies in sync is known as replication. 

There are two types of replication in PostgreSQL: 

  1. Physical replication
  2. Logical replication

Physical replication was introduced in PostgreSQL 9.0. In this mechanism, every change in the master is streamed via WAL (write-ahead logging) then applied in the destination/standby server. However, physical replication is associated with a number of limitations. These include:

  • It cannot replicate a part of the database (selective replication). 
  • It cannot write to the standby server. 
  • Replication cannot be done between 2 different major versions. 
  • Replication cannot be done between different platforms, for example, Windows and Linux. 

To take care of the above limitations, PostgreSQL 10.0 introduced the concept of logical replication. PostgreSQL logical replication uses WAL to record the logical changes independent from specific architecture or version. The subscriber first receives a copy of a replicated database from the publisher. Normally, changes will be published on the publisher database. These changes should be replicated in the subscriber database to ensure that the two databases remain in sync. The subscriber node pulls the changes as they happen on the publisher database in real-time. This way, the subscriber and the publisher databases remain in sync. 

Read more about the PostgreSQL logical replication here.

The Architecture of PostgreSQL Logical Replication

The PostgreSQL logical replication follows the publish/subscribe model. The publisher node creates a publication, which is also called a change set or a replication set. The publication is simply a set of changes from one or more tables. The subscriber node creates a subscription, with the capability to subscribe to either one or more publications. Now that you know the basics of logical replication, how is the logical replication in PostgreSQL done? That’s what you will learn in the next section. 

PostgreSQL Logical Replication Process

The logical replication in PostgreSQL follows the steps given below:

Step 1: A snapshot of the publishing database is copied to the subscriber. This step is also called the table synchronization phase. To reduce the amount of time spent in this phase, you can spawn multiple table synchronization workers. However, you can only have one synchronization worker for each table. 

 Step 2: After the copy is completed, subsequent changes made in the publisher node will be sent to the subscriber node. The changes will be sent as they happen in real-time. The changes will be applied in a commit order to ensure there is transactional consistency. 

Methods to setup PostgreSQL logical replication

Method 1: PostgreSQL logical replication using manual method

Logical replication in PostgreSQL is achieved by the construction of a stream of logical data modifications from the WAL file. It involves setting up the wal_level = logical for PostgreSQL to convert binary changes into logical ones.

Method 2: PostgreSQL logical replication using Hevo’s automated data pipeline

Hevo Data, an Automated No-code Data Pipeline provides you with a hassle-free solution to perform the PostgreSQL Replication with an easy-to-use no-code interface. Hevo is fully managed and completely automates the process of not only replicating data from PostgreSQL but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code.

Hevo’s fault-tolerant Data Pipeline offers a faster way to move your data from PostgreSQL and 100+ other data sources(including 40+ free data sources) into Data Warehouses, Databases, BI Tools, or any other destination of your choice. Hevo will take full charge of the data replication process, allowing you to focus on key business activities.

Method 1: PostgreSQL logical replication using manual method

Basic Syntax

To perform logical replication in PostgreSQL, you’ve to change the wal_level = logical parameter. The parameter tells the server to store additional information in WAL for converting binary changes into logical ones. To change the value of this parameter, you have to open the postgresql.conf file. 

On Linux, this file can be found in the following directory:

/etc/postgresql/version/main

On Windows, the file can be found in the following directory:

C:Program FilesPostgreSQLversiondata

After opening the file, go to the WRITE-AHEAD LOG settings. 

Uncomment the parameter wal_level and set it to the following:

wal_level = logical

You can then restart the PostgreSQL service. 

On Windows, you can open the Services dialog and restart the service from there. 

On Linux, you can stop and then start the service by running the following sequence of commands:

sudo service postgresql stop

sudo service postgresql start

To check whether the change has taken effect, run the following command on PostgreSQL terminal:

SHOW wal_level;

It should return the following:

PostgreSQL Logical Replication 1

Once you get this, logical replication has been turned on. 

After that, you can create a publication by running the following command:

CREATE PUBLICATION my_publication FOR ALL TABLES;

If you don’t want to publish all tables, you can specify the tables that you need to publish. You can also choose to limit the changes that will be published. 

However, for DELETE and UPDATE operations to be replicated, the “published” table should have a REPLICA IDENTITY, which can be a primary key. This makes it possible for the modified rows to be identified in the subscriber. INSERT operations can be done without a replica identity. After the publications have been created, you can go ahead to create subscriptions in the subscriber node. 

You can use the following command for this:

CREATE SUBSCRIPTION my_subscription CONNECTION '... 
<connection string> ...' PUBLICATION my_publication;

The command will add a subscription named my_subscription for the current database that will begin to receive logical changes from the publication named my_publication. Once you run the command, a logical replication worker will be spawned, and this will receive logical changes coming from the publisher. On the side of the publisher, a walsender process will be spawned, which will be reading the WAL step-by-step, decode changes, and send them to the correct subscriber. 

With all the above knowledge, you are set to get into the practical stuff. Now, you will see a real example of how to set up logical replication in PostgreSQL. So, let’s start.

Download the Ultimate Guide on Database Replication
Download the Ultimate Guide on Database Replication
Download the Ultimate Guide on Database Replication
Learn the 3 ways to replicate databases & which one you should prefer.

A Real-life Example

You will be setting up logical replication on a table. You will employ a very simple trick for this. Remember that to perform logical replication, you should have the publisher and the subscriber. In this case, you will create both in the same host. However, you will create them in different ports. 

First, you should create the publication. Let’s create it on the port 5432. 

Here are the steps to do this:

Step 1: First, create a table then insert a row into it. Run the following command:

create table table1(x int primary key, y int); 

The above command creates a table named table1 with two integer columns, x and y. Column x is the primary key for the table. The table is ready. 

Step 2: Let’s insert a row into it by running this command:

insert into table1 values(10, 11);  

We have inserted a single row into the table named table1. 10 will be inserted into column x while 11 will be inserted into column y. 

Step 3: It’s now time to set up publication on the table. Here is the command for this:

create publication my_publication for table table1;   

The command will create a publication on table1 and give it the name my_publication. The publication is now ready. It’s time for you to create the subscription. You will do it on the port 5431. Following are the steps for this:

Step 1: Recreate the table on the subscriber node. Just run the following command:

create table table1(x int primary key, y int); 

Step 2: Next, we can create the subscription and give it the name my_subscription. 

Just run the following command:

CREATE SUBSCRIPTION my_subscription 
CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION my_publication;

Step 3: Since we had inserted one row into the table on the publisher side, we can check whether it has been replicated on the subscriber. Run the following command:

SELECT * FROM table1;  

It returns the following:

PostgreSQL Logical Replication 2

So, the row has been replicated in the subscriber. 

Step 4: We should now check whether the changes are being replicated or not. We will simply insert a new row into the publisher and see whether this will be updated in the subscriber. So, run the following command on the publisher:

insert into table1 values(20, 25);  

Now, run the following command on the subscriber:

SELECT * FROM table1; 

You should get the following:

PostgreSQL logical replication 3

So, the changes were replicated!

Method 2: PostgreSQL logical replication using Hevo’s automated data pipeline

Postgresql logical replication
Image source

Hevo Data, an Automated No Code Data Pipeline, helps you replicate data from PostgreSQL to Data Warehouses, Business Intelligence Tools, or any other destination of your choice in a completely hassle-free & automated manner. Hevo supports data ingestion replication from PostgreSQL servers via Write Ahead Logs (WALs) set at the logical level. This mode is particularly useful when there is a need to replicate the entire database as it is.

Apart from logical replication, Hevo also supports PostgreSQL replication using Custom SQL (fetches data from Postgres source using a custom SQL query at a fixed frequency) and table mode (fetches data from multiple tables while maintaining control over data ingestion from each table individually)

Hevo provides you with an all-in-one solution for replicating data from Amazon Aurora PostgreSQL, Amazon RDS PostgreSQL, Azure PostgreSQL, and various other PostgreSQL sources. This way you can focus more on your Data Analysis and let Hevo manage the PostgreSQL Data Replication.

Check out the salient features of Hevo:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Quick Setup: Hevo with its automated features, can be set up within minutes in a simple 3 step process. Moreover, with its simple and interactive UI, it is extremely easy for new customers to work on and perform operations.
  • Auto Schema Mapping: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data from PostgreSQL and replicates it to the destination schema.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.

Limitations in PostgreSQL Logical Replication

Let’s discuss some limitations in the process of logical replication in PostgreSQL:

  1. It doesn’t replicate the schema or DDL.
  2. It doesn’t replicate sequences.
  3. Tables must have a primary key or unique key to participate in the PostgreSQL logical replication process.
  4. It doesn’t replicate truncate.
  5. Bi-directional replication is not supported.
  6. Large objects are not replicated using PostgreSQL logical replication.

Know more about the limitations of PostgreSQL logical replication here.

Conclusion

In this article, you have done the following:

  • You configured PostgreSQL to perform logical replication. 
  • You built the publisher and subscriber nodes. 
  • You demonstrated how to implement the publisher and the subscriber nodes on the same host. 
  • You demonstrated how to keep the publisher and the subscriber nodes in sync.

If you want to set up PostgreSQL logical replication without having to write code as described above, you can use an automated data integration solution such as Hevo.

Hevo Data is a No-code Data Pipeline that lets you extract, transform, and load data from a plethora of sources such as databases (like PostgreSQL), cloud applications, SDKs, etc. to your data warehouse. Hevo supports pre-built data integration from 100+ data sources.

visit our website to explore hevo

sign up for a 14 day free trial for a seamless data integration experience.

Share your experience of setting up PostgreSQL logical replication in the comment section below.