Data replication ensures the duplication of data on an ongoing basis so that the replicate is in a consistently updated state and is synchronized with the source. This ensures high data availability and safety of data against undesired events such as crashes, system errors, etc. PostgreSQL streaming replication truly meets this requirement. This feature is achieved through a master-slave configuration.
This blog is aimed at providing a detailed explanation of PostgreSQL replication.
Table Of Contents
What is PostgreSQL?
Image Source
PostgreSQL is a powerful, enterprise-class, open-source relational database management system that uses standard SQL to query the relational data and JSON to query the non-relational data residing in the database. PostgreSQL has excellent support for all of the operating systems. It supports advanced data types and optimization operations which are usually found in commercial databases such as Oracle, SQL Server, etc.
Key features of PostgreSQL:
- It has extensive support for complex queries.
- It provides excellent support for geographic objects & hence it can be used for geographic information systems & location-based services.
- It provides full support for client-server network architecture.
- Its write-ahead-logging (WAL) feature makes it fault-tolerant.
For further information on PostgreSQL Analytics, you can check out our other article here.
What is Replication?
Replication refers to the process of copying modifications in data from the publisher database to the subscriber database. In layman’s terms, it creates a replica of one database in another. Both these databases are usually located on different physical servers and help in distributing various types of database queries. This helps to form a load balancing framework.
Replication is implemented in PostgreSQL using a master-slave configuration. The master is known as the primary instance and handles the primary database and its operations. The slave acts as the secondary instance and implements all modifications made to the primary database on itself, thus making itself an identical copy. The master is the read/write server whereas the slave is a read-only server.
For more information on replication in Postgres, take a look at this resource.
Method 1: PostgreSQL Streaming Replication
PostgreSQL streaming replication or WAL replication can be set up quite easily upon installing PostgreSQL on all servers. It is based on transferring the WAL files from the primary to the target database & is implemented using a master-slave configuration.
Method 2: Hevo Data, The Easier Approach
Hevo Data provides a hassle-free & fully managed solution using its no-code Data Pipelines. It helps you replicate the data effortlessly from PostgreSQL without any intervention. Hevo’s pre-built integration with PostgreSQL (among 150+ Sources) will take full charge of the data replication process, allowing you to focus on key business activities. It is the ideal alternative to PostgreSQL streaming replication & logical replication methods.
Sign up here for a 14-day free trial!
Prerequisites
- Working knowledge of PostgreSQL.
- PostgreSQL installed on the host workstation.
Replication Methods In PostgreSQL
Method 1: PostgreSQL Streaming Replication
Image Source
PostgreSQL streaming replication or WAL replication can be set up quite easily upon installing PostgreSQL on all servers. PostgreSQL streaming replication is based on transferring the WAL files from the primary to the target database.
PostgreSQL streaming replication is implemented using a master-slave configuration. The master is known as the primary instance and handles the primary database and its operations. The slave acts as the secondary instance and implements all modifications made to the primary database on itself, thus making itself an identical copy. The master is the read/write server whereas the slave is a read-only server.
Here you need to configure both the master node & the standby node.
Hevo offers a faster way to move data from Databases or SaaS Applications, like PostgreSQL, into your Data Warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.
Get Started with Hevo for Free
Check out some of the cool features of Hevo:
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 150+ sources (including 40+ free sources) that can help you scale your data infrastructure as required.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!
Configuring Master Node
The master node can be configured using the following steps:
Step 1: Database Initialization
To initialize the database, make use of the initidb utility command.
Step 2: Creating A New User
Create a new user with replication privileges using the following command:
CREATE USER <user_name> REPLICATION LOGIN ENCRYPTED PASSWORD ’<password>’;
The user must specify a username and password for the given query. The replication keyword is used to provide the user with the necessary privileges.
Example query:
CREATE USER rep_user REPLICATION LOGIN ENCRYPTED PASSWORD 'rep_pass'
Step 3: Configuring PostgreSQL Streaming Replication Properties
You can configure the streaming properties using the PostgreSQL configuration file (postgresql.conf). Make the following changes in the file:
wal_level = logical
wal_log_hints = on
max_wal_senders = 8
max_wal_size = 1GB
hot_standby = on
The parameters used are:
- wal_level: This is used to enable PostgreSQL streaming replication. The possible values here are replica, minimal, and logical.
- wal_log_hints: It is required for the pg_rewind capability, this helps when the standby server is out of sync with the master server.
- max_wal_senders: It is used to specify the maximum number of concurrent connections that can be established with the standby servers.
- max_wal_size: It is used to specify the size of WAL files that can be retained within log files.
- hot_standby: This parameter enables a read-only connection with the slave when it is set to ON.
Step 4: Replication Entry In pg_hba.conf File
This is used to allow the servers to establish a connection with each other for replication. The file is usually found in the data directory of PostgreSQL.
Use the following lines of code:
host replication rep_user IPaddress md5
The master server now allows a user named rep_user to connect & act as the slave server using the specified IP for replication.
Example query:
host replication rep_user 192.168.0.22/32 md5
This is how the master node can be set up for PostgreSQL streaming replication.
Download the Ultimate Guide on Database Replication
Learn the 3 ways to replicate databases & which one you should prefer.
Configuring Standby Node
Once the master node is configured, the next step for PostgreSQL replication is to configure the standby node. You can configure the standby node in two simple steps:
Step 1: Create A Backup Of The Master Node
Use the pg_basebackup utility to create a backup of the master node. This will be used as the starting point for the slave mode.
The utility must be used with the following syntax:
pg_basebackp -D <data_directory> -h <master_host> -X stream -c fast -U rep_user -W
The parameters used are:
- -D: This indicates the directory you are working on.
- -h: This is used to specify the master host.
- -X: This is used to include the necessary transactional log files.
- -C: This is used to set the checkpoints.
- -W: This is used to prompt the user for a password before connecting to the database.
Step 2: Creating The Replication Configuration File
Check if the replication configuration file exists. If it doesn’t, create the replication configuration file as recovery.conf. The file should be created in the data directory of the PostgreSQL installation.
You can create the file automatically making use of the -R option in the pg_basebackup utility.
The recovery.conf file should have the following commands:
standby_mode = ‘on’
primary_conninfo = ‘host=<master_host> port=<postgres_port> user=<replication_user> password=<password> application_name=”host_name”’
recovery_target_timeline = ‘latest
The parameters used are:
- standby_mode: When this parameter is set to ON, it causes the server to start as the slave.
- primary_conninfo: This parameter is used to establish a connection between the slave & the master server with the help of a connection string.
- recovery_target_timeline: It is used to set the recovery time.
You need to give the IP address, username & password of the master server as values for the primary_conninfo parameter to establish a connection.
Example query:
primary_conninfo = 'host=192.168.0.26 port=5432 user=rep_user password=rep_pass'
Now restart the slave server to complete the process.
This is how a standby node can be set up for PostgreSQL streaming replication.
Method 2: Hevo Data, The Easier Approach
Hevo Data, a no-code Data Pipeline can help you replicate data from PostgreSQL (among 150+ sources) swiftly to a database/data warehouse of your choice. Hevo is fully-managed and automates the process of monitoring and replicating the changes on the secondary database rather than making the user write the code repeatedly. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss. Hevo provides you with a truly efficient and fully-automated solution to replicate your data in real-time.
Visit our Website to Explore Hevo
Check out what makes Hevo amazing:
- Minimal Learning: Hevo with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Completely Managed Platform: Hevo is fully-managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.
- Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to export.
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Conclusion
This article enriches you with in-depth knowledge about the various concepts behind the replication process in PostgreSQL. It provides a step-by-step comprehensive guide to help you perform PostgreSQL streaming replication in the smoothest way possible. PostgreSQL streaming replication method can, however, prove to be challenging especially for beginners & this is where Hevo can help you out. Hevo Data, a no-code Data Pipeline can help you replicate data in real-time without having to write any code. Hevo being a fully-managed system provides a highly secure automated solution to help perform replication in just a few clicks using its interactive UI.
Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the power of managing your data replication needs in just a few clicks.
We would love to hear about your experience of performing PostgreSQL streaming replication! Share your thoughts in the comments section below.