How to Setup Replication in PostgreSQL using pg_logical?

on Data Replication, PostgreSQL • April 5th, 2022 • Write for Hevo

pg_logical_FI

PostgreSQL is one of the most popular Object-Relational Database Management Systems that allow you to store, process, and analyze data using SQL. It is capable of highly promoting data integrity by maintaining Data Accuracy and Consistency throughout the data life cycle, achieving ACID compliance.

PostgreSQL not only has a Data Integrity feature but also has fault tolerance capability, which allows the database system to run continuously without any interruption when one or more of its components or instances fail due to unprecedented events. One such fault-tolerant function of PostgreSQL is pg_logical, which allows you to implement logical streaming replication for replicating or copying data from one database instance to another, thereby ensuring data availability across servers. 

In this article, you will learn about PostgreSQL, pg_logical extension, and how to implement logical replication in PostgreSQL using pg_logical extension.

Table of Contents

Prerequisites

Fundamental knowledge of SQL queries

What is PostgreSQL?

pg_logical: PostgreSQL logo
Image Source

Introduced in the late 1980s, PostgreSQL is an Open-Source Object-Relational Database Management System with more than 30 years of active development history. PostgreSQL is a Relational Database System that allows you to store, manage, process, and fetch data present in the database servers or instances. Even though PostgreSQL has some object-oriented features for handling unstructured and non-relational data, it is extensively used as a relational database. 

Since PostgreSQL is an open-source DBMS, it is free and easy for everyone to download and run queries to perform data-related tasks like data filtering, wrangling, and processing. In addition, PostgreSQL is highly compatible with operating systems, including Windows, MACOS, and Linux, and supports multiple programming languages such as C, C++, Java, and Python.

Replicate Data in Minutes Using Hevo’s No-Code Data Pipeline

Hevo Data, a Fully-managed Data Pipeline platform, can help you automate, simplify & enrich your data replication process in a few clicks. With Hevo’s wide variety of connectors and blazing-fast Data Pipelines, you can extract & load data from 100+ Data Sources straight into your Data Warehouse or any Databases. To further streamline and prepare your data for analysis, you can process and enrich raw granular data using Hevo’s robust & built-in Transformation Layer without writing a single line of code!

GET STARTED WITH HEVO FOR FREE

Hevo is the fastest, easiest, and most reliable data replication platform that will save your engineering bandwidth and time multifold. Try our 14-day full access free trial today to experience an entirely automated hassle-free Data Replication!

What is pg_logical?

pg_logical logo
Image Source

pg_logical is one of the extensions available in PostgreSQL, which allows you to implement a logical replication mechanism for replicating data between two database servers. The logical replication mechanism is also known as Transactional Replication, in which the subscriber (destination database server) receives a copy of the replicated database object from the publisher (source database server) whenever any real-time change or update is made on the source server. 

The pg_logical extension should be installed in both Source and Destination database instances to seamlessly implement the replication process. You have to install PostgreSQL version 10 or higher to implement the logical replication technique.

How to Setup Replication in PostgreSQL using pg_logical?

In the further steps, you will use Google Cloud SQL to set up replication in PostgreSQL using pg_logical. 

Prerequisites

For setting up logical replication in PostgreSQL using pg_logical, you have to satisfy certain prerequisites. You should already have 

  • an account with Google Cloud SQL, 
  • a fully-managed database service to set up, organize, and 
  • manage databases on the Google Cloud Platform. 

Google Cloud SQL is highly compatible with the most popular Database systems like PostgreSQL, MySQL, and SQL Server. For this tutorial, you will use PostgreSQL in Cloud SQL to set up replication using pg_logical. 

What Makes Hevo’s ETL Process Best-In-Class

Providing a high-quality ETL solution can be a difficult task if you have a large volume of data. Hevo’s automated, No-code platform empowers you with everything you need to have for a smooth data replication experience.

Check out what makes Hevo amazing:

  • Fully Managed: Hevo requires no management and maintenance as it is a fully automated platform.
  • Data Transformation: Hevo provides a simple interface to perfect, modify, and enrich the data you want to transfer.
  • Faster Insight Generation: Hevo offers near real-time data replication so you have access to real-time insight generation and faster decision making. 
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (with 40+ free sources) that can help you scale your data infrastructure as required.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

Step 1: Configuring the PostgreSQL instance

The physical replication feature was introduced in PostgreSQL 9.0, and logical replication was introduced in PostgreSQL 10.0. In this logical replication mechanism, every modification in the Source/Master Server is streamed through WAL (write-ahead logging) and then applied to the destination/standby server.

In other words, when logical replication is implemented in PostgreSQL, the destination server extracts or fetches changes from the WAL logs using logical decoding. The changes that are decoded remain unaffected by the physical storage format. Furthermore, the updates on a replica server only reflect the modification made on data at the SQL level queries, such as INSERTs, UPDATEs, and DELETEs. The pg_logical extension allows you to implement logical replication in PostgreSQL, replicating data from the source server to the destination server. 

The Cloud SQL also allows you to implement Logical replication using the pg_logical extension available in all PostgreSQL 10 versions. Various extensions or plugins can be used to customize the format in which data changes are transmitted between database servers. Such extensions enable Change Data Capture (CDC) structures to be more adaptable.

For example, you can use the “wal2json” extension to stream all database changes in the source server to a consumer server in JSON format. Cloud SQL now supports both of the wal2json variants: format-version 1 encodes the entire transaction as a single JSON object, whereas format-version 2 generates one JSON object per command.

  • Before starting to implement the logical replication process, you have to configure some settings in the Cloud SQL workspace. To enable logical replication in Cloud SQL, you have to set the “cloudsql.logical_decoding flag“ to “on.”
  • Then, you have to add the pg_logical extension to the shared preload libraries section. You have to also set the “cloudsql.enable_pg_logical” to “on.”
  • Now, you are all set to enable pg_logical extension on both primary (source) and replica (destination) instances.
  • Initially, you have to ensure whether the primary instance accepts connection from the replica instance. To provide the access permissions, you have to add the replica’s IP address to the primary instance’s authorized networks. You can refer to this link to discover how to add an IP address to the authorized networks.
  • In the next step, open the Cloud SQL Instances page and go to the Cloud Shell. To implement logical replication in Cloud SQL, you have to create a new user account with replication privileges. Execute the following command to create a new replication user.
CREATE USER replication_user WITH REPLICATION
IN ROLE cloudsqlsuperuser LOGIN PASSWORD 'secret';

Step 2: Installing pg_logical Extension

  • After creating a new replication user account, you are now ready to install the pg_logical extension. Execute the below-given command to install the pg_logical extension.
CREATE EXTENSION pg_logical;
  • After installing, you have to enable the pg_logical extension on both primary and replica instances. 
  • Now, you have to run the following command to enable the pg_logical extension on both the primary and replica instances to register them as separate nodes.
source-instance$ SELECT pg_logical.create_node(
       node_name := 'primary',
       dsn := ‘host=<primary-ip>=5432 dbname=postgres
       user=replication_user password=secret’
);
 
dest-instance$ SELECT pg_logical.create_node(
       node_name := 'replica',
       dsn := ‘host=<replica-ip>=5432 dbname=postgres
       user=replication_user password-secret’
);
  • Then, you have to create a new table with sample data to implement logical replication. Execute the following code to create a table named replica_test in the primary instance and insert some sample values into the table.
CREATE TABLE replica_test (id SERIAL PRIMARY KEY, data text);
INSERT INTO replica_test (data) VALUES ('apple'), ('banana'), ('cherry');
  • Now, you have to create the same table named replica_test in the replica instance so that it can accurately fetch changes from the primary instance and store them in appropriate columns.
  • Then, you have to create a pg_logical subscription on the destination or replica instance to provide connection details of the primary instance.
  • Execute the following command to create a subscription.
SELECT pg_logical.create_subscription(
    subscription_name := 'test_sub',
    provider_dsn := 'host=<primary-ip> port=5432 dbname=postgres user=replication_user password=replicapassword'
);
SELECT * FROM pg_logical.show_subscription_status('test_sub');
  • After executing the above command, if the status appears as “replicating” in the status bar, you can confirm that the replication setup was successful. To check that data has been replicated, query the replica test table with select all statements. You can find that all data present in the primary instance is replicated successfully to the replica instance. You can also add, modify, or delete records on the primary instance and check whether the modifications are reflected on the replica instance.
  • After the logical replication process is successful, you can drop the subscription on the replica using the “pg_logical.drop_subscription(‘test_sub’)” command. Make sure that the replication slot on the main has been removed as well. If you do not drop the subscription, the WAL segments continue to collect on the replica instance unnecessarily.

Following the above-mentioned steps, you successfully implemented logical replication in PostgreSQL using the pg_logical extension.

Limitations of pg_logical Extension

  • Both the publisher (Source) and subscriber (Destination) instances must run on PostgreSQL 9.4+ versions to implement the logical replication process.
  • Tables present on both the Publisher and Subscriber instances should have the same name and schema so that future additions or modifications map to the same table in the Subscriber instance.
  • The provider and subscriber tables must have the same number of columns and data types.
  • The subscriber instance’s CHECK constraints and NOT NULL constraints should be the same as the provider’s.
  • The PRIMARY KEYs in all tables of both publisher and subscriber instances must be the same. Besides the PRIMARY KEY, it is not recommended to create any extra UNIQUE constraints.

Conclusion

In this article, you learned about PostgreSQL, a pg_logical extension, and how to implement logical replication in PostgreSQL using the pg_logical extension. Although this article mainly focused on setting up replication in PostgreSQL using the logical replication process, PostgreSQL also supports streaming replication or physical replication, which you can explore later. In case you want to export data from a source of your choice into your desired Database/destination like PostgreSQL, then Hevo Data is the right choice for you! 

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage Data transfer between a variety of sources and destinations with a few clicks. Hevo with its strong integration with 100+ sources allows you to not only export Data from your desired Data sources & load it to the destination of your choice, but also transform & enrich your Data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis. 

While PostgreSQL is a good solution for Data Integration, It requires a lot of Engineering Bandwidth & Expertise. This can be challenging, resource-intensive & costly in the long run. Hevo offers a much simpler, scalable, and economical solution that allows people to create Data Pipeline without any code in minutes & without depending on Engineering teams.

Want to take Hevo for a spin? Sign Up for a 14-day free trial. You may also have a look at the amazing price, which will assist you in selecting the best plan for your requirements.

Share your experience of learning about how to pg_logical extension! Let us know in the comments section below!

No-code Data Pipeline for PostgreSQL