A replication slot is a feature in PostgreSQL that ensures that the master server will retain the WAL logs that are needed by the replicas even when they are disconnected from the master.
When streaming replication is utilized between a master and some hot or archiving standbys, a replication slot is needed to keep the WAL files alive even when the replica is offline or disconnected. If the standby goes down, the master can keep track of how much the standby lags and preserve the WAL files it requires until the standby reconnects. The WAL files are then decoded and played back on the duplicate.
This article will introduce you to WAL Files and discuss the importance of PostgreSQL Slots. Moreover, it will elaborate on the types of Postgres Replication slots and provide you with the commands to create, delete and monitor them. Read long to learn more about PostgreSQL Replication Slots and try them out today!
Table of Contents
What are WAL Files?
Write-Ahead Log (WAL) is a record of the changes made to the data. It ensures that when there is a crash in the system or loss of connection, the database can be recovered. When you make changes to the database, WAL files keep building up. WAL logs are stored in the pg_wal directory as a set of segment files.
The purpose of keeping the WAL files is you can recreate the database from scratch by replaying all the changes recorded in the WAL files. Hevo’s end-to-end Data Management supports data ingestion from PostgreSQL servers via Write Ahead Logs (WALs) set at the logical level and replicates your data in real-time.
Because just the log file needs to be flushed to the disc to ensure that a transaction is committed, rather than every data file modified by the transaction, using WAL leads to a greatly decreased amount of disc writes. Because the log file is written sequentially, synchronizing the log is substantially less expensive than flushing the data pages. This is especially true for servers that handle a large number of tiny transactions that interact with various areas of the data store. Furthermore, if the server is executing a large number of minor concurrent transactions, a single “fsync” of the log file may be enough to commit a large number of them.
For extended reading on WAL Files, you can refer to this article.
Now you will see why PostgreSQL replication slots are needed.
What are Replication slots?
A replication slot is a PostgreSQL feature that ensures the master server keeps the WAL logs required by replicas even when they are disconnected from the master.
When streaming replication is utilized between a master and some hot or archiving standbys, a replication slot is needed to keep the WAL files alive even when the replica is offline or disconnected.
If the standby goes down, the master can keep track of how much the standby lags and preserve the WAL files it requires until the standby reconnects. The WAL files are then decoded and played back on the duplicate.
Download the Ultimate Guide on Database Replication
Learn the 3 ways to replicate databases & which one you should prefer.
Advantages and Disadvantages of Replication Slots
Once activated, replication slots are quite valuable. “Replication Slots” are disabled by default and must be enabled explicitly. The following are some of the benefits of employing Replication Slots:
- Ensures that the master has adequate WAL segments for all clones.
- Prevents the master from removing rows that can trigger a replica recovery conflict.
- Only when all clones have devoured the transaction log may the master recycle it. The benefit is that a slave can never fall so far behind that a re-sync is required.
There are certain disadvantages to replication slots too:
- Slave nodes that are tied to a replication slot and are placed under long maintenance (such as days or weeks) will have unbounded disc growth due to piled-up WAL files from the master.
- Slave nodes that are tied to a replication slot will have unbounded disc growth due to piled-up WAL files from the master.
Hevo can be a good choice if you’re looking to replicate data from 100+ Data Sources (including 40+ Free Data Sources) like PostgreSQL into Google BigQuery, Amazon Redshift, Snowflake, and many other databases and warehouse systems. Hevo also supports PostgreSQL as a destination including native connectors for various variants of PostgreSQL like Cloud, on-premise, etc.
To further streamline and prepare your data for analysis, you can process and enrich Raw Granular Data using Hevo’s robust & built-in Transformation Layer without writing a single line of code! In addition, Hevo’s native integration with BI & Analytics Tools will empower you to mine your replicated data to get actionable insights. To learn more, check out Hevo’s documentation for PostgreSQL Replication.
Try our 14-day full access free trial today!
Start your PostgreSQL Replication with Hevo for Free
Why PostgreSQL Replication Slots are Necessary?
If you have a master and some hot or archiving standbys, and you are running streaming replication between them, a replication slot is used to retain the WAL files even when the replica is offline or disconnected.
The ability to connect to the server and conduct read-only queries while it is in archive recovery or standby mode is referred to as “hot standby.” This is important for replication as well as restoring a backup to a specific state with exceptional accuracy. The phrase “Hot Standby” also refers to the server’s capacity to transition from recovery to normal operation while users continue to run queries and/or maintain open connections.
This feature was introduced in PostgreSQL 9.4. Prior to that, if a standby went offline until it came back online, a set of WAL files had to be kept by the master. This is done by setting the wal_keep_segment correctly or high enough to keep the replica from falling behind too much.
If this is not done, then the standby can’t keep up with the master after the master deletes the WAL files that the standby is yet to replay and you will get an error.
ERROR: requested WAL segment 000000490000067B00000073 has already been removed
The problem with setting the correct wal_keep_segment is, it is difficult to know exactly how many WAL files need to be kept. This is guesswork and if it is set too high, the master will exhaust the space in the pg_wal directory (WAL files are stored in pg_wal directory. Prior to version 10, they were stored in the pg_xlog directory).
Alternatives to the above method are either WAL archiving which means you have to write a script to move the WAL files to another long term location or create the standby that has fallen behind again from scratch.
After replication slots came in with PostgreSQL 9.4, you don’t have to worry about wal_keep_segments because replication slots make sure that the WAL files are kept forever. If it so happens that the standby goes offline, the master can keep track of how much the standby lags and retain the WAL it needs files until the standby reconnects again. Then the WAL files can be decoded and replayed to the replica.
Types of PostgreSQL Replication Slots
PostgreSQL Replication slots are of two types:
- Physical PostgreSQL Replication Slots
- Logical PostgreSQL Replication slots
Physical PostgreSQL Replication Slots
The changes that take place on the main server via streaming replication are recorded in the WAL segments. These WAL files are sent to the standby server and then replayed.
So a physical replication slot can be created on the primary server and the location up to where the transactions have been sent to the standby is stored. Now when the standby loses connection, the primary server will keep those WAL files.
Logical PostgreSQL Replication Slots
Logical replication was introduced in PostgreSQL 10. Logical replication brings over only the SQL-like changes. It does not work without replication slots. Logical replication data has to be decoded using a plugin.
Hevo takes care of setting up WAL for Logical Replication for your pipeline. The steps for the same can be found in this documentation.
How to Create PostgreSQL Replication Slots?
The function pg_create_physical_replication_slot is used to create a physical replication slot. This command has to be run in the master node.
Now, you can create a replication slot by the name ‘ocean’.
postgres=# select pg_create_physical_replication_slot(‘ocean’);
Replicating data can be a tiresome task without the right set of tools. Hevo’s Data Replication & Integration platform empowers you with everything you need to have a smooth Data Collection, Processing, and Replication experience. Our platform has the following in store for you!
- Exceptional Security: A Fault-tolerant Architecture that ensures Zero Data Loss.
- Built to Scale: Exceptional Horizontal Scalability with Minimal Latency for Modern-data Needs.
- Built-in Connectors: Support for 100+ Data Sources, including PostgreSQL, Databases, Webhooks, REST APIs, SaaS Platforms, Files & More.
- Data Transformations: Best-in-class & flexible Native Support for Complex code and no-code Data Transformation at fingertips.
- Smooth Schema Mapping: Fully-managed Automated Schema Management for incoming data with the desired destination.
- Blazing-fast Setup: Straightforward interface for new customers to work on, with minimal setup time.
Start your PostgreSQL Replication with Hevo for Free
How to Monitor PostgreSQL Replication Slots?
The below command displays all the replication slots that exist on the database cluster.
postgres=# select * from pg_replication_slots;
These are the columns that you will see in the pg_replication_slots view:
- slot_name: This is a unique identifier of the replication slot which can contain lower-case letters, underscore characters, and numbers.
- plugin: For physical slots, it will be null.
- slot_type: Slot_type is a text indicating whether the slot is physical or logical.
- datoid: Physical slots have no associated databases and hence is null. For logical slots, it will be the OID of the database this slot is associated with.
- active: This is a Boolean value. It is True if the slot is currently active and is False if it is inactive.
- xmin: This represents the oldest transaction this slot requires the database to keep.
- catalog_xmin: This is the oldest transaction needed by the database to retain that affects the system catalogs.
- restart_lsn: Log Sequence Number (LSN) is a unique identifier in the transaction log. Restart_lsn is the oldest WAL which might be needed by this slot.
How to Drop PostgreSQL Replication Slots?
You already know that replication slots have to be created and deleted manually. Do not let the inactive slots stay because the master will retain the WAL files needed by the inactive slot indefinitely and will fill up space in the disk.
This is the command used to delete a replication slot:
postgres=# select pg_drop_replication_slot(‘ocean’);
Limitations of PostgreSQL Replication Slots
Here are a few limitations of PostgreSQL Replication Slots:
- Orphaned Replication Slot: The WAL files are retained by the master when the replica disconnects. This also means that the pg_wal directory may run out of space. Imagine a scenario where the replica fails forever and cannot be recovered (an orphaned replication slot), or when a replica cannot replay the WAL segments fast enough.
- Manually Monitoring pg_wal Directory: The WAL files will just pile up. So you need to monitor the slots and manually drop them. Only when you do that will the master delete anything from the pg_wal directory. You will see how you can monitor and drop replication slots later. So while WAL retention is taken care of without manual settings, pg_wal directory space needs to be manually monitored.
Conclusion
The article introduced you to PostgreSQL streaming replication slots, their purpose, and their limitations. However, when it comes to working with replication, you need to be an expert at PostgreSQL to set up servers from scratch and manually configure several details. Moreover, most of the time, the data is not available in the right format and you will need data engineering and PostgreSQL administration skills to transform the data.
Hevo Data provides an Automated No-code Data Pipeline that empowers you to overcome the above-mentioned limitations. Hevo caters to 100+ data sources (including 40+ free sources) and can seamlessly perform PostgreSQL Replication in real-time. Furthermore, Hevo’s fault-tolerant architecture ensures a consistent and secure replication of your PostgreSQL data. Using Hevo will make your life easier and make Data Replication hassle-free.
Visit our Website to Explore Hevo
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.
Share your thoughts on PostgreSQL Replications Slots in the comments below!