Data replication ensures the duplication of data on an ongoing basis so that replicate is in a consistently updated state and is synchronized with the source. This ensures high data availability and safety of data against undesired events such as crashes, system errors, etc. Postgres WAL replication meets this requirement through a feature called streaming replication. This feature is achieved through a master-slave configuration.
This blog is aimed at providing a detailed explanation of how to perform Postgres WAL replication (Write-Ahead logging).
Table Of Contents
- Introduction To Replication
- Introduction To WAL
- Postgres WAL Replication
- Setting Up Streaming Replication In PostgreSQL
- Making Streaming Replication Synchronous
Introduction To Replication
Replication refers to the process of copying modifications in data from the publisher database to the subscriber database. In layman’s terms, it creates a replica of one database in another. Both these databases are usually located on different physical servers and help in distributing various types of database queries. This helps to form a load balancing framework.
Replication is implemented in PostgreSQL using a master-slave configuration. The master acts as the primary instance and is responsible for handling the primary database and its operations. The slave acts as the secondary instance and implements all modifications made to the primary database on itself, thus making itself an identical copy. The master is the read/write server whereas the slave is a read-only server.
For further information on replication in PostgreSQL, you can check the documentation manual.
Introduction To WAL
WAL stands for Write-Ahead Logging. It is the standard protocol being used to ensure that all the changes made to the database are being logged properly in their order of occurrence. It helps maintain data integrity by facilitating a recovery process in the case of a database crash.
WAL is enabled in PostgreSQL by default and has a default size of 16MB. These files are found in pg_xlog or pg_wal (PostgreSQL 10) directory. The log files are written sequentially and hence the cost of scanning through these files is quite less. These files have a unique name in the format of :
“00000001 00000000 00000000”
One major advantage of WAL is that it supports online-backup and point-in-time recovery. It also reduces the number of disk writes as only the log file needs to be flushed to the disk to guarantee that a transaction has occurred.
For further information on WAL, you can check the documentation manual.
Simplify Data Analysis with Hevo’s No-code Data Pipeline
Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Whatsapp, Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process. It supports 100+ data sources (including 30+ free data sources) like Asana and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.GET STARTED WITH HEVO FOR FREE
Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.
Check out why Hevo is the Best:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
- Working knowledge of PostgresSQL.
- PostgresSQL installed on the host workstation.
Postgres WAL Replication
WAL helps to replicate data between the database servers in two ways:
File-Based Log Shipping
To always keep the data in sync, the master directly sends the WAL logs to the slave. The master can either copy the logs to the slave servers storage or simply share storage with them.
All WAL logs have a maximum storage capacity of 16MBs and are shipped only after they reach the threshold value. This can cause a delay in the replication process and increase the chances of a data loss due to a possible master crash.
Streaming WAL Records
The database servers stream WAL records in chunks to ensure that the data is always in sync. The slave servers receive the WAL chunks by establishing a connection with the master server. The advantage of streaming WAL records is that it doesn’t wait for the capacity to be full, these are streamed immediately. This helps in keeping the standby server up-to-date.
Streaming replication is asynchronous by default however, it supports synchronous replication mode as well.
Setting Up Streaming Replication In PostgreSQL
Postgres WAL replication or streaming replication can be set up quite easily upon installing the PostgreSQL on all servers. Here you need to configure both the master node & the standby node.
Configuring Master Node
The master node can be configured using the following steps:
- Step 1: Database Initialization
- Step 2: Creating A New User
- Step 3: Configuring Streaming Properties
- Step 4: Replication Entry In pg_hba.conf File
Step 1: Database Initialization
To initialize the database, make use of the initidb utility command.
Step 2: Creating A New User
Create a new user with replication privileges using the following command:
CREATE USER <user_name> REPLICATION LOGIN ENCRYPTED PASSWORD ’<password>’;
The user must specify a username and password for the given query. The replication keyword is used to provide the user with the necessary privileges.
CREATE USER sample_user REPLICATION LOGIN ENCRYPTED PASSWORD 'sample_pass'
Step 3: Configuring Streaming Properties
You can configure the streaming properties using the PostgreSQL configuration file (postgresql.conf). Make the following changes in the file:
wal_level = replica wal_log_hints = on max_wal_senders = 3 wal_keep_segments = 8 hot_standby = on
The parameters used are:
- wal_level: This is used to enable Postgres WAL replication/ streaming replication. The possible values here are replica, minimal, and logical.
- wal_log_hints: It is required for the pg_rewind capability, this helps when the standby server is out of sync with the master server.
- max_wal_senders: It is used to specify the maximum numbers of concurrent connections that can be established with the standby servers.
- wal_keep_segments: It is used to specify a minimum number of WAL logs segments so that they are not deleted before standby consumes the entire capacity.
- hot_standby: This parameter enables a read-only connection with the slave when it is set to ON.
Step 4: Replication Entry In pg_hba.conf File
This is used to allow the servers to establish a connection between each other for replication. The file is usually found in the data directory of PostgreSQL.
Use the following lines of code:
host replication rep_user IPaddress md5
The master server now allows a user named rep_user to connect & act as the slave server using the specified IP for replication.
host replication rep_user 192.168.0.22/32 md5
This is how master node can be set up for Postgres WAL replication.
Configuring Standby Node
Once the master node is configured, the next step for Postgres WAL replication is to configure the standby node. You can configure the standby node in two simple steps:
Step 1: Create A Backup Of The Master Node
Use the pg_basebackup utility to create a backup of the master node. This will be used as the starting point for the slave mode.
The utility must be used with the following syntax:
pg_basebackp -D <data_directory> -h <master_host> -X stream -c fast -U rep_user -W
The parameters used are:
- -D: This indicates the directory you are working on.
- -h: This is used to specify the master host.
- -X: This is used to include the necessary transactional log files.
- -C: This is used to set the checkpoints.
- -W: This is used to prompt the user for a password before connecting to the database.
Step 2: Creating The Replication Configuration File
Check if the replication configuration file exists. If it doesn’t, create the replication configuration file as recovery.conf. The file should be created in the data directory of the PostgreSQL installation.
You can create the file automatically making use of the -R option in pg_basebackup utility.
The recovery.conf file should have the following commands:
standby_mode = ‘on’ primary_conninfo = ‘host=<master_host> port=<postgres_port> user=<replication_user> password=<password> application_name=”host_name”’ recovery_target_timeline = ‘latest’
The parameters used are:
- standby_mode: When this parameter is set to ON, it causes the server to start as the slave.
- primary_conninfo: This parameter is used to establish a connection between the slave & the master server with the help of a connection string.
- recovery_target_timeline: It is used to set the recovery time.
You need to give the IP address, username & password of the master server as values for the primary_conninfo parameter to establish a connection.
primary_conninfo = 'host=192.168.0.26 port=5432 user=rep_user password=rep_password'
Now restart the slave server to complete the process.
This is how the slave server is configured for Postgres WAL replication.
Making Streaming Replication Synchronous
Postgres WAL replication is asynchronous by default however, it is possible to make streaming replication synchronous by using the following command:
synchronous_standby_names = ‘num_sync ( standby_name [, ...] )’
The parameters used are:
- num_sync: It denotes the number of synchronous slave servers from where the transactions should wait for a reply.
- standby_name: This parameter should have the same value as that of the application_name parameter in recovery.conf file.
If you want all slave servers to be considered synchronous, set the value of standby_name to “*”, otherwise you can mention specific names by separating them with a comma.
This article teaches you how to perform Postgres WAL replication with ease and answers all your queries regarding it. It provides a brief introduction of various concepts related to it & helps the users understand them better and use them to perform data replication & recovery in the most efficient way possible. This helps maintain data integrity & availability. It also introduces an alternative method that is Hevo Data, a No-code Data Pipeline to help users replicate data in an effortless and highly secure manner without having to write any code.visit our website to explore hevo
Hevo can help you Integrate your data from numerous sources like Postgres and load them into a destination to Analyze real-time data with a BI tool such as Tableau. It will make your life easier and data migration hassle-free. It is user-friendly, reliable, and secure.
SIGN UP for a 14-day free trial and see the difference!
Share your experience of learning about Postgres WAL Replication in the comments section below.