Multi-master replication is a method that allows multiple nodes to accept write requests. A common challenge that you might face when looking to scale your backend to a new server cluster in a new region is replicating the databases.
To ensure smooth replication, master-slave replication might not be enough. You’ll need to turn to PostgreSQL master master replication which can take care of the job in real-time.
In this article, you will see how to set up PostgreSQL multi-master replication using BDR.
What is Replication?
Replication is the copying of data from one database server to another. The source server is generally called the primary server or master server. The receiving database server is also called the replica or slave.
Replication is usually done to improve performance, create backup and scalability, and reduce the load on the primary server. The changes made to the master carry forward to the slave servers.
These changes can be replicated both simultaneously and in batches. If it happens at the same time in both servers, it is called synchronous replication. If it happens in batches, it is called asynchronous.
Replication lets you perform analyses without affecting the operations of the master. You can use replication to scale out, especially when you have a low number of writes and a high number of reads.
Single Master Replication
In a single master multiple slave scenario, only one server (the master) is allowed to make changes to the data. The other servers do not accept write queries from anywhere except the master.
What is Multi-master Replication?
Consider a situation where several connected database servers can update the data. In this case, the changes made by one member of the group ripple into all the other members. A PostgreSQL multi-master replication system can do this. So the multi-master replication adds an element of bi-directionality to the process.
PostgreSQL Multi-master replication lets multiple nodes accept write queries and all of the nodes involved contain the same data. The main purpose of multi-master replication, of course, is high availability.
When the primary server fails, it is not necessary to wait for a physical standby to be promoted.
It is recommended that users not complicate the design by going for master-master replication PostgreSQL where single-master replication in PostgreSQL is enough.
This is because Postgres multi-master replication makes the system complicated and messy. For example, if your application has incremental fields and the two nodes do not integrate this factor, it could lead to conflicts.
PostgreSQL Multi-Master Replication
While there is single-master replication built into PostgreSQL, multi-master replication is not. Small companies and communities manage some PostgreSQL forks.
There is one popular product commercially available at the moment that supports Postgres master master replication. It is called Bidirectional replication created by 2ndQuadrant.
The earlier versions of BDR were open-source but not the latest ones.
Note that in the trade-off between transaction speed and preventing data conflict, BDR prefers low latency allowing some conflicts to arise, if unavoidable, and resolving them later.
Hevo Data is a fully managed No-code Data Pipeline, which supports integrations with over 150+ different sources. You can move your data from PostgreSQL to any destination with ease. Check out how Hevo might be perfect for your needs. Here are just a few of Hevo’s many awesome features:
Get Started with Hevo for Free
- Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
- Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
- Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
- Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
- Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within pipelines.
- Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
You can experience how easy it is to work with Hevo!
Sign up here for a 14-Day Free Trial!
Steps to Set-up PostgreSQL Multi-master Replication Using BDR
Part 1: Initial Configurations
In this PostgreSQL multi-master replication step, after installation of BDR along with the pglogical plugin (which acts as a logical replication solution), you modify the PostgreSQL.conf and pg_hba.conf files and then restart the service.
- Step 1: Install BDR and pglogical plugin.
- Step 2: Configure postgresql.conf to these values.
wal_level = logical
shared_preload_libraries= ‘pg_logical, bdr’
track_commit_timestamp= ‘on’ # This is necessary for conflict resolution.
- Step 3: Create a user with superuser privileges to manage BDR connectivity.
CREATE USER my_user WITH SUPERUSER REPLICATION PASSWORD ‘my_password’;
- Step 4: Change the pg_hba.conf file by adding these lines.
host all bdr 10.20.30.40/24 md5
host replication bdr 10.20.30.40/24 md5
- Step 5: Add the user to the .pgpass file.
- Step 6: Restart Postgresql.
Download the Ultimate Guide on Database Replication
Learn the 3 ways to replicate databases & which one you should prefer.
Part 2: Create a Single BDR Node
Activate BDR on my_db on the host my_host1 as my_user.
- Step 1: Create the extension.
CREATE EXTENSION bdr CASCADE; # CASCADE also creates the pglogical extension.
- Step 2: Initialize the current node using the bdr.create_node function.
SELECT bdr.create_node( node_name:= ‘initail_node’, local_dsn:= ‘dbname=my_db host= my_host1 user=my_user’);
- Step 3: Create BDR cluster definition using the bdr.create_node_group function.
- Step 4: Use the bdr.wait_for_join_completion function to wait for it all to complete.
Part 3: Create Another BDR Node
- Step 1: Create the BDR extension.
CREATE EXTENSION bdr CASCADE;
- Step 2: Initialize the current node using the bdr.create_node function.
SELECT bdr.create_node( node_name:= ‘next_node’, local_dsn:= ‘dbname=my_db host= my_host2 user=my_user’);
- Step 3: Create the BDR cluster definition using the bdr.join_node_group function.
SELECT bdr.join_node_group(join_target_dsn:= ‘dbname=my_db host= my_host1 user=my_user’, wait_for_completion:=True);
Part 4: Configure HAProxy
HAProxy is an open-source proxy software that provides high availability. It automatically directs traffic to any online node. You need to first install HAProxy for this.
- Step 1: Modify the global section of the haproxy.cfg file.
state socket /var/run/haproxy/sock level admin # This allows us to get information and send commands to HAProxy.
- Step 2: Modify the bk_db section of the haproxy.cfg file.
stick-table type ip size 1
stick on dst
server bdr_initial_node my_host1: 5432 check
server bdr_next_node my_host2: 5432 backup check
- Step 3: Reload the configuration file by triggering HAProxy.
sudo systemctl reload haproxy
Part 5: Combine PgBouncer with HAProxy
PgBouncer is a connection pooler for PostgreSQL. With the help of PgBouncer, PostgreSQL can interact with a much larger number of clients. It is essential here for transaction management. While HAProxy redirects traffic, PgBouncer completes transactions. PgBouncer needs to be installed for the below steps.
- Step 1: Modify the haproxy.cfg file at the ft_postgresql section.
bind *: 5433
- Step 2: Alter the databases section in the pgbouncer.ini file to contain the below line.
* = host= proxy_server port= 5433
Modify the pgbouncer section.
listen_port = 5432
- Step 3: Restart HAProxy and then PgBouncer.
Part 6: Node Switchover
We are moving the connections away from the initial node. First, we disable bdr_initial_node using HAProxy, so that new connections are not sent to it.
We pass the RECONNECT command to pgbouncer to ensure reconnection after the current transaction.
Now, HAProxy will cause the reconnection to the second server. Then after making changes, the initial server is enabled again.
- Step 1: Disable initial_node within HAProxy.
echo “disable server bk_db/bdr_initial_node” | socat /var/run/haproxy/sock –
- Step 2: After transactions are complete, PgBouncer should reconnect and wait till the connections are re-established.
psql –h proxy_server –U pgbouncer pgbouncer –c “RECONNECT”
psql –h proxy_server –U pgbouncer pgbouncer –c “WAIT_CLOSE”
- Step 3: Work on the initial node before you re-enable it.
- Step 4: Re-enable the initial node again.
echo “enable server bk_db/bdr_initial_node” | socat /var/run/haproxy/sock –
Pros and Cons of PostgreSQL Multi-master Replication
- If one master fails, then the other master can do the work of updating and inserting.
- The chances of failure of all the masters are very minimal as Master nodes are located in several different locations.
- It is possible to do data updates on multiple servers.
- The application enables to route the traffic to other masters rather than only a single master.
- Complexity is the major con of multi-master replication.
- As simultaneous writes on multiple nodes are possible conflict resolution becomes difficult.
- Sometimes, conflict requires manual intervention.
- Probability of data inconsistency.
Alternative Multi-Master Replication Solutions
There are multiple categories of open-source, closed-source, free, and paid solutions that can be used for multi-master Postgres replication.
Let’s have a brief look at these:
xDB is developed by EnterpriseDB. It is a bi-directional replication solution and is based on the protocol on EnterpriseDB. No design information is known to the world as it is a closed-source solution
- Developed in Java.
- Close-sourced source code.
- There are multiple executable programs in the xDB Replication Server.
- People complained about its performance as it is developed in Java.
- PostgreSQL XC/XC2
Developed by EnterpriseDB and NTT, PostgreSQL-XC is a synchronous replication solution. This open-source project offers a write-scalable, symmetric, and synchronous PostgreSQL cluster solution. There has not been much development in PostgreSQL-XC for a long time. Currently, Huawei is working on it. Performance has improved little bit for OLAP, but it is not effective for TPS.
- PostgreSQL XL
A fork of PostgreSQL-XC, PostgreSQL XL is currently supported by 2ndQuadrant. It is based on PostgreSQL 10.6. It is very good for OLAP, but not for High TPS.
It is an asynchronous master-master replication and claims the easiest configuration and setup. It is developed by Arndt Lehmann. It works across various platforms, including Windows. It operates on two servers, referred to as “left” and “right”. So it is a “2-master” setup rather than a “multi-master”.
- It constantly replicates changes between the right and left databases.
- Automatically sets up necessary triggers, log tables, etc., discovers newly added tables, and synchronizes the table content
- Automatically reconfigures sequences to avoid duplicate key conflicts
- Tracks any changes to the primary key column. It can also implement both master-master and master-slave replication
- It has Pre Built conflict resolution methods: left/right wins; earlier / later change wins
- Ruby code snippets can be used for custom conflict resolution specifiable via
- Rubyrep event log table can be used for Replication decisions to be logged in.
Developed by Jon Jensen and Greg Sabino Mullane of End Point Corporation, it is a trigger-based replication solution in which a Perl daemon listens to NOTIFY requests and acts on them. The daemon is notified of Changes on tables that are recorded in a table (bucardo_delta). It uses standard or custom conflict handlers to sort out conflicts.
This article provides a comprehensive guide on the PostgreSQL replication master-master or PostgreSQL multi-master replication. The above steps are by no means complete. There are several complications you will run into when dealing with sequences, installations, upgrades between versions while online, and testing the failover process.
You can find a complete and comprehensive guide in PostgreSQL 12 High Availability Cookbook. This book goes into great detail not just on PostgreSQL multi-master replication but on several other ways to design highly available servers.
In any case, PostgreSQL multi-master replication requires a lot of technical understanding and learning. But Hevo can help you out with these challenges. Hevo Data, a No-code Data Pipeline can help you replicate data in real-time without having to write any code.
Hevo being a fully managed system provides a highly secure automated solution to help perform replication in just a few clicks using its interactive UI.
Visit our Website to Explore Hevo
Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the power of managing your data replication needs in just a few clicks.
Share your thoughts on PostgreSQL multi-master replication in the comments!