Multi-master replication is a method that allows multiple nodes to accept write requests. A common challenge that you might face when looking to scale your backend to a new server cluster in a new region is replicating the databases.
To ensure smooth replication, master-slave replication might not be enough. You’ll need to turn to PostgreSQL master master replication which can take care of the job in real-time.
In this article, you will see how to set up PostgreSQL multi-master replication using BDR.
What is Replication?
Replication is the copying of data from one database server to another. The source server is generally called the primary server or master server. The receiving database server is also called the replica or slave.
Replication is usually done to improve performance, create backup and scalability, and reduce the load on the primary server. The changes made to the master carry forward to the slave servers.
These changes can be replicated both simultaneously and in batches. If it happens at the same time in both servers, it is called synchronous replication. If it happens in batches, it is called asynchronous.
Replication lets you perform analyses without affecting the operations of the master. You can use replication to scale out, especially when you have a low number of writes and a high number of reads.
Single Master Replication
In a single master multiple slave scenario, only one server (the master) is allowed to make changes to the data. The other servers do not accept write queries from anywhere except the master.
Seamlessly Sync Your Databases
Effortlessly transfer data between PostgreSQL databases with Hevo Data. Our platform ensures smooth, reliable integration and synchronization of your databases, eliminating manual data handling. Start syncing now and experience seamless database management!
Why choose Hevo?
Discover why BeepKart chose Hevo to expand their pipeline creation by 80%, reducing the process from 30 minutes to under 5 minutes. Try out the 14-day free trial today to experience an entirely automated, hassle-free Data Replication!
Get Started with Hevo for Free
What is Multi-master Replication?
Consider a situation where several connected database servers can update the data. In this case, the changes made by one member of the group ripple into all the other members. A PostgreSQL multi-master replication system can do this. So the multi-master replication adds an element of bi-directionality to the process.
PostgreSQL Multi-master replication lets multiple nodes accept write queries and all of the nodes involved contain the same data. The main purpose of multi-master replication, of course, is high availability.
When the primary server fails, it is not necessary to wait for a physical standby to be promoted.
It is recommended that users not complicate the design by going for master-master replication PostgreSQL where single-master replication in PostgreSQL is enough.
This is because Postgres multi-master replication makes the system complicated and messy. For example, if your application has incremental fields and the two nodes do not integrate this factor, it could lead to conflicts.
PostgreSQL Multi-Master Replication
While there is single-master replication built into PostgreSQL, multi-master replication is not. Small companies and communities manage some PostgreSQL forks.
There is one popular product commercially available at the moment that supports Postgres master master replication. It is called Bidirectional replication created by 2ndQuadrant.
The earlier versions of BDR were open-source but not the latest ones.
Note that in the trade-off between transaction speed and preventing data conflict, BDR prefers low latency allowing some conflicts to arise, if unavoidable, and resolving them later.
Steps to Set-up PostgreSQL Multi-master Replication Using BDR
Part 1: Initial Configurations
In this PostgreSQL multi-master replication step, after installation of BDR along with the pglogical plugin (which acts as a logical replication solution), you modify the PostgreSQL.conf and pg_hba.conf files and then restart the service.
- Step 1: Install BDR and pglogical plugin.
- Step 2: Configure postgresql.conf to these values.
wal_level = logical
shared_preload_libraries= ‘pg_logical, bdr’
track_commit_timestamp= ‘on’ # This is necessary for conflict resolution.
- wal_level = logical: Configures PostgreSQL to log changes in a way that supports replication, especially logical replication used for selectively replicating data.
- shared_preload_libraries = ‘pg_logical, bdr’: Preloads extensions required for logical replication (
pg_logical
) and Bi-Directional Replication (bdr
), enabling advanced replication features.
- track_commit_timestamp = ‘on’: Keeps track of when transactions are committed, which is necessary to handle conflicts during replication and to ensure data consistency.
- Step 3: Create a user with superuser privileges to manage BDR connectivity.
CREATE USER my_user WITH SUPERUSER REPLICATION PASSWORD ‘my_password’;
- CREATE USER my_user: Creates a new PostgreSQL user named
my_user
.
- WITH SUPERUSER: Grants
my_user
superuser privileges, allowing unrestricted access to the database.
- REPLICATION: Enables replication privileges, allowing
my_user
replication streams to be set up.
- PASSWORD ‘my_password’: Sets the password for
my_user
to my_password
.
- Step 4: Change the pg_hba.conf file by adding these lines.
host all bdr 10.20.30.40/24 md5
host replication bdr 10.20.30.40/24 md5
- host all bdr 10.20.30.40/24 md5:
- Grants the
bdr
user access to all databases from the IP range 10.20.30.40/24
.
md5
specifies that the connection requires password authentication.
- host replication bdr 10.20.30.40/24 md5:
- Grants the
bdr
user replication access from the IP range 10.20.30.40/24
.
md5
also enforces password authentication for replication access.
- Step 5: Add the user to the .pgpass file.
hostname:port:database:my_user: my_password
- hostname: The server address or IP where the database is hosted.
- port: The port number used by the database server (e.g.,
5432
for PostgreSQL).
- database: The specific database name to connect to.
- my_user: The username with which to authenticate.
- my_password: The password for
my_user
.
- Step 6: Restart Postgresql.
Load Data from PostgreSQL to BigQuery
Load Data from PostgreSQL to Snowflake
Load Data from PostgreSQL to Databricks
Part 2: Create a Single BDR Node
Activate BDR on my_db on the host my_host1 as my_user.
- Step 1: Create the extension.
CREATE EXTENSION bdr CASCADE; # CASCADE also creates the pglogical extension.
- CREATE EXTENSION bdr CASCADE;: This command installs the
bdr
(Bi-Directional Replication) extension in PostgreSQL.
- CASCADE: Ensures that any dependencies (
pglogical
, in this case) are also created automatically.
- pglogical extension: Needed for logical replication, enabling features necessary for BDR.
- Step 2: Initialize the current node using the bdr.create_node function.
SELECT bdr.create_node( node_name:= ‘initail_node’, local_dsn:= ‘dbname=my_db host= my_host1 user=my_user’);
- bdr.create_node: This function initializes a new node in a BDR (Bi-Directional Replication) cluster.
- node_name: Assigns a name to this node, here specified as
initial_node
.
- local_dsn: Provides the connection details for this node. Here,
local_dsn
specifies:
- dbname: Database name (
my_db
).
- host: Hostname or IP address (
my_host1
).
- user: Username to connect to the database (
my_user
)
- Step 3: Create BDR cluster definition using the bdr.create_node_group function.
SELECT bdr.create_node_group(node_group_name:='the_node_group');
- Step 4: Use the bdr.wait_for_join_completion function to wait for it all to complete.
SELECT bdr.wait_for_join_completion();
Part 3: Create Another BDR Node
- Step 1: Create the BDR extension.
CREATE EXTENSION bdr CASCADE;
- Step 2: Initialize the current node using the bdr.create_node function.
SELECT bdr.create_node( node_name:= ‘next_node’, local_dsn:= ‘dbname=my_db host= my_host2 user=my_user’);
- bdr.create_node: Adds a new node to the existing BDR (Bi-Directional Replication) cluster.
- node_name: Specifies the name of the new node, here labeled as
next_node
.
- local_dsn: Provides the connection details for this node:
- dbname: Database name (
my_db
).
- host: Host for the new node (
my_host2
).
- user: Database user (
my_user
).
- Step 3: Create the BDR cluster definition using the bdr.join_node_group function.
SELECT bdr.join_node_group(join_target_dsn:= ‘dbname=my_db host= my_host1 user=my_user’, wait_for_completion:=True);
Part 4: Configure HAProxy
HAProxy is an open-source proxy software that provides high availability. It automatically directs traffic to any online node. You need to first install HAProxy for this.
- Step 1: Modify the global section of the haproxy.cfg file.
global
state socket /var/run/haproxy/sock level admin # This allows us to get information and send commands to HAProxy.
- global: Starts the global configuration block for HAProxy settings.
- state socket: Specifies the location of the HAProxy UNIX socket file, here at
/var/run/haproxy/sock
.
- level admin: Sets the socket access level to
admin
, allowing administrative commands and status information retrieval for HAProxy.
- Step 2: Modify the bk_db section of the haproxy.cfg file.
stick-table type ip size 1
stick on dst
server bdr_initial_node my_host1: 5432 check
server bdr_next_node my_host2: 5432 backup check
- stick-table type ip size 1: Creates a stick table with
type ip
, used to track a single IP address (size 1).
- stick on dst: Directs HAProxy to “stick” or route traffic based on the destination IP (
dst
).
- server bdr_initial_node my_host1:5432 check: Defines a primary server named
bdr_initial_node
at my_host1
on port 5432
with health checks enabled.
- server bdr_next_node my_host2:5432 backup check: Defines a backup server named
bdr_next_node
at my_host2
on port 5432
, also with health checks.
- Step 3: Reload the configuration file by triggering HAProxy.
sudo systemctl reload haproxy
Part 5: Combine PgBouncer with HAProxy
PgBouncer is a connection pooler for PostgreSQL. With the help of PgBouncer, PostgreSQL can interact with a much larger number of clients. It is essential here for transaction management. While HAProxy redirects traffic, PgBouncer completes transactions. PgBouncer needs to be installed for the below steps.
- Step 1: Modify the haproxy.cfg file at the ft_postgresql section.
frontend ft_postgresql
bind *: 5433
default_backend bk_db
- frontend ft_postgresql: Defines a frontend named
ft_postgresql
for handling incoming connections.
- *bind :5433: Binds this frontend to listen on all available IP addresses (
*
) at port 5433
.
- default_backend bk_db: Specifies
bk_db
as the backend to route all incoming traffic from this frontend.
- Step 2: Alter the databases section in the pgbouncer.ini file to contain the below line.
* = host= proxy_server port= 5433
Modify the pgbouncer section.
listen_port = 5432
- Step 3: Restart HAProxy and then PgBouncer.
Part 6: Node Switchover
We are moving the connections away from the initial node. First, we disable bdr_initial_node using HAProxy, so that new connections are not sent to it.
We pass the RECONNECT command to pgbouncer to ensure reconnection after the current transaction.
Now, HAProxy will cause the reconnection to the second server. Then after making changes, the initial server is enabled again.
- Step 1: Disable initial_node within HAProxy.
echo “disable server bk_db/bdr_initial_node” | socat /var/run/haproxy/sock –
- echo “disable server bk_db/bdr_initial_node”: This command sends a message to HAProxy to disable the server named
bdr_initial_node
within the backend bk_db
.
- | socat /var/run/haproxy/sock –: The pipe (
|
) connects the output of the echo
command to socat
, which communicates with the HAProxy socket located at /var/run/haproxy/sock
. This allows the command to be executed on the HAProxy instance.
- Step 2: After transactions are complete, PgBouncer should reconnect and wait till the connections are re-established.
psql –h proxy_server –U pgbouncer pgbouncer –c “RECONNECT”
psql –h proxy_server –U pgbouncer pgbouncer –c “WAIT_CLOSE”
- Step 3: Work on the initial node before you re-enable it.
- Step 4: Re-enable the initial node again.
echo “enable server bk_db/bdr_initial_node” | socat /var/run/haproxy/sock –
Pros and Cons of PostgreSQL Multi-master Replication
Pros:
- If one master fails, then the other master can do the work of updating and inserting.
- The chances of failure of all the masters are very minimal as Master nodes are located in several different locations.
- It is possible to do data updates on multiple servers.
- The application enables to route the traffic to other masters rather than only a single master.
Cons:
- Complexity is the major con of multi-master replication.
- As simultaneous writes on multiple nodes are possible conflict resolution becomes difficult.
- Sometimes, conflict requires manual intervention.
- Probability of data inconsistency.
Alternative Multi-Master Replication Solutions
There are multiple categories of open-source, closed-source, free, and paid solutions that can be used for multi-master Postgres replication.
Let’s have a brief look at these:
- xDB
xDB is developed by EnterpriseDB. It is a bi-directional replication solution and is based on the protocol on EnterpriseDB. No design information is known to the world as it is a closed-source solution
- Developed in Java.
- Close-sourced source code.
- There are multiple executable programs in the xDB Replication Server.
- People complained about its performance as it is developed in Java.
- PostgreSQL XC/XC2
Developed by EnterpriseDB and NTT, PostgreSQL-XC is a synchronous replication solution. This open-source project offers a write-scalable, symmetric, and synchronous PostgreSQL cluster solution. There has not been much development in PostgreSQL-XC for a long time. Currently, Huawei is working on it. Performance has improved little bit for OLAP, but it is not effective for TPS.
- PostgreSQL XL
A fork of PostgreSQL-XC, PostgreSQL XL is currently supported by 2ndQuadrant. It is based on PostgreSQL 10.6. It is very good for OLAP, but not for High TPS.
- Rubyrep
It is an asynchronous master-master replication and claims the easiest configuration and setup. It is developed by Arndt Lehmann. It works across various platforms, including Windows. It operates on two servers, referred to as “left” and “right”. So it is a “2-master” setup rather than a “multi-master”.
- It constantly replicates changes between the right and left databases.
- Automatically sets up necessary triggers, log tables, etc., discovers newly added tables, and synchronizes the table content
- Automatically reconfigures sequences to avoid duplicate key conflicts
- Tracks any changes to the primary key column. It can also implement both master-master and master-slave replication
- It has Pre Built conflict resolution methods: left/right wins; earlier / later change wins
- Ruby code snippets can be used for custom conflict resolution specifiable via
- Rubyrep event log table can be used for Replication decisions to be logged in.
- Bucardo
Developed by Jon Jensen and Greg Sabino Mullane of End Point Corporation, it is a trigger-based replication solution in which a Perl daemon listens to NOTIFY requests and acts on them. The daemon is notified of Changes on tables that are recorded in a table (bucardo_delta). It uses standard or custom conflict handlers to sort out conflicts.
Replicate Your PostgreSQL Server using Hevo
No credit card required
Conclusion
This article provides a comprehensive guide on the PostgreSQL replication master-master or PostgreSQL multi-master replication. The above steps are by no means complete. There are several complications you will run into when dealing with sequences, installations, upgrades between versions while online, and testing the failover process.
You can find a complete and comprehensive guide in PostgreSQL 12 High Availability Cookbook. This book goes into great detail not just on PostgreSQL multi-master replication but on several other ways to design highly available servers.
In any case, PostgreSQL multi-master replication requires a lot of technical understanding and learning. But Hevo can help you out with these challenges. Hevo Data, a No-code Data Pipeline can help you replicate data in real-time without having to write any code.
Try a 14-day free trial and experience the feature-rich Hevo suite firsthand. Also, check out our unbeatable pricing to choose the best plan for your organization.
Frequently Asked Questions
1. Does postgres support master-master replication?
PostgreSQL does not natively support master-master replication out of the box, but it can be achieved using third-party tools like BDR (Bi-Directional Replication), Citus, or Pglogical.
2. What is master-master replication?
Master-master replication is a replication setup where two or more database nodes can accept write operations. Changes made on one node are replicated to others, ensuring that all nodes remain synchronized.
3. How to setup multi-master PostgreSQL?
You can set up multi-master PostgreSQL using tools like BDR or Citus.
Nikhil is an accomplished technical content writer with extensive expertise in the data industry. With six years of professional experience, he adeptly creates informative and engaging content that delves into the intricacies of data science. Nikhil's skill lies in merging his problem-solving prowess with a profound grasp of data analytics, enabling him to produce compelling narratives that resonate deeply with his audience.