Making a Replica of your Database for easy access and backing up your files to prevent data loss are essentials in running any data operation at scale. Data is now a critical component of most technology systems that not having a detailed Replication strategy is negligent at best.
In this blog post, you will be introduced to Amazon Relational Database Service (Amazon RDS) and PostgreSQL. You will further be shown how to set up an RDS Postgress Replication of your Database by combining these platforms as well as discussing best practices on how to avoid replication lag to gain a smoother operating performance.
Table of Contents
What is Amazon RDS?
Amazon Relational Database Service (Amazon RDS) is an SQL Database service provided by Amazon Web Service (AWS). It is used to set up, operate, store, and organize your Relational Database. It also provides automated Database administration such as migration, hardware provisioning, backup, recovery, and patching.
Amazon RDS is available on several Database (DB) instances and supports a wide range of Database engines including Amazon Aurora, MySQL, PostgreSQL, Oracle Database, MariaDB, and SQL Server. Also, an existing database can be migrated to Amazon RDS using AWS Database Migration Server.
Key Features of Amazon RDS
Some main features of Amazon RDS are listed below:
- Automatic Software Patching: Amazon RDS make sure that the software managing your Database stays up-to-date with the latest patches.
- Event Notifications: Amazon RDS uses Amazon SNS to deliver notifications of Database events via E-Mail or SMS.
- Reserved Instances: Amazon RDS Reserved Instances allows users to reserve a Database instance for a one or three year term.
To learn more about Amazon RDS, visit here.
What is PostgreSQL?
PostgreSQL, also known as Postgres is a powerful open-source Relational Database Management System (RDBMS). Postgres supports both SQL (relational) and JSON (non-relational) query. It has been in existence for over 30 years backed by years of development by the open-source community.
It is the backbone for Databases of many applications on the web as well as analytical applications on mobile. Postgres is commonly used for the following purposes:
- As a back-end Database in the LAPP stack to power dynamic websites and web applications. Lapp stack includes Linux, Apache, PostgreSQL, and PHP (python and Perl).
- For transactional Database purposes by large corporations and startups to support their products.
- It also supports a geospatial Database that is used for geographic information systems (GIS) with the PostGIS extension found on PostgreSQL.
Key Features of PostgreSQL
Some of the main features of PostgreSQL are listed below:
- Data Recovery: To ensure user data remain safe at a time of disaster or system failure, PostgreSQL offers data replication and point-in-time recovery features.
- High Performance: PostgreSQL parallelly reads queries and build B-tree indexes to deliver fast query speed and high performance.
- Security: PostgreSQL offers multiple authentication options such as SCRAM-SHA-256, Certificate, GSSAPI, SSPI, LDAP, etc, with the access control system.
To learn more about PostgreSQL, visit here.
Hevo Data offers a faster way to move data from databases or SaaS applications into your data warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.
Get started with hevo for free
Check out some of the cool features of Hevo:
Sign up here for a 14-day free trial!
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 150+ sources that can help you scale your data infrastructure as required.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Security: Hevo is SOC II, GDPR, and HIPPA compliant. Hevo also enables top-grade security with end-to-end encryption, two-factor authentication, and more.
- Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Using PostgreSQL on Amazon RDS
As has been stated in the previous section, PostgreSQL is one of the Database engines available on Amazon RDS. Amazon RDS supports Database (DB) instances running several versions of PostgreSQL.
On RDS you can create DB instances and DB snapshots, point-in-time restores, and backups. Standard client applications such as pgAdmin, an open-source administration, and development tool, or psql, a command-line utility are used in running commands for PostgreSQL.
Postgres Replication using AWS RDS
Amazon RDS Postgres Replication enables easy configuration of Replicas of your source PostgreSQL. Setting up an RDS Postgres Replication of your source instance clears your read load and helps create disaster recovery (DR) resources.
The Read Replicas can be configured within the same Region as the source or in a different Region. This has to be done using the latest version of the data source instance.
In the following sections, the 3 types of RDS Postgres Replication will be discussed and some of the best practices for properly configuring Read Replicas will be explained. The types of RDS Postgres Replication include:
- Intra-region RDS Postgres Replication
- Cross-region RDS Postgres Replication
- Logical RDS Postgres Replication
Intra-region RDS Postgres Replication
In creating a Read Replica in the same AWS Region as the source instance, RDS PostgreSQL uses Postgres native Streaming Replication. Streaming Replication affects the data changes made at the source instance to the Read Replica.
To ensure optimal operations of the RDS Postgres Replication process between a source and Replica within the same region, the following common factors must be handled:
1) The Proper Value of wal_keep_segments
In PostgreSQL, wal_keep_segments specifies the maximum number of past WAL log file segments kept in the pg_wal directory for streaming replication.
When this number is exceeded, Postgres archives any WAL segment parameter to Amazon S3 buckets. That means if a Read Replica does not find a WAL segment in the pg_location it automatically downloads it from the S3 buckets then restores and applies it.
Generally, the restoration of WAL segments from archives is slower than streaming replication so the higher the value of the WAL segments you keep on the source instance will lead to a faster replication process.
By default, RDS PostgreSQL sets wal_keep_segments at 32 but this can be adjusted using the RDS parameter group. Before setting up a new replica instance, modify the value of wal_keep_segments to a high parameter to prevent WAL files from archiving when the streaming replication starts. But this manual process is complex and consumes a lot of time. Instead, you can take help with automated tools to replicate your PostgreSQL data. Hevo supports data ingestion replication from PostgreSQL servers via Write Ahead Logs (WALs) set at the logical level. This mode is particularly useful when there is a need to replicate the entire database as it is.
2) Heavy Write Activity at the Source Instance
Another best practice is to avoid heavy write activity at the source instance because this can create a high influx of WAL files. An increase of WAL files and replaying of these files on Read Replica leads to a slow down in the replication process. TransactionLogsGeneration metric found in CloudWatch metrics is used in tracking the rate of WAL file creation, it shows the size of transaction logs generated per second and gives you an indication of how high write activity is at the source affects replication lag. This can be avoided by controlling your write activity by breaking them into small task bundles, then distributing them across multiple transactions.
3) An Exclusive Lock on Source Instance Table
Using an exclusive lock on source instance tables encourages replication lag. ACCESS EXCLUSIVE lock is a restrictive lock mode that prevents transactions from accessing tables for the lock’s hold duration until a transaction ends. To prevent such situations, you should periodically query the pg_locks and pg_stat_activity catalog tables.
4) Setting Parameters at Read Replica
Setting up some parameters at the Replica instance can impact RDS Postgres Replication positively. Parameters like hot_stanby_feeback which specifies whether the feedback is sent from the replica instance about queries currently executing at the replica instance to the source when enabled can speed up replication.
Other parameters like max_standby_archive_delay or max_standby_streaming_delay when enabled on the replica instance allow completion of long-running read queries thereby promoting quick replication.
5) Improper Read Replica Instance Configuration
A wrong configuration of replica instance can hurt RDS Postgres Replication. For example, using a lower instance class replica than the source instance. It is better to set Read Replica at the same or higher instance class because they handle similar write load as source instance as well as additional read queries.
Cross-region RDS Postgres Replication
Cross-region RDS Postgres Replication accounts for a disaster recovery solution, allows for scalability of read Database workload, and does cross-region migration. Cross-region Replica can be created using the Amazon RDS console, AWS CLI, or an API create-db-instance-read-replica.
Do the following when setting up the Cross-region Amazon RDS Postgres Replication:
- On the RDS console, choose Amazon RDS for PostgreSQL source instance.
- Click on Create Read Replica from the Actions drop-down menu.
- Select the appropriate target Region as you require.
Cross-region RDS Postgres Replication can also be created using AWS CLI. In doing this, use create-db-instance-read-replica to create a Cross-region Read Replica.
Unlike Streaming Physical Replication used in Replicas for the same region, slots are combined for Cross-region Replication. The Replication slots provide an automated way to make sure the WAL segments are not removed by the master until the Cross-region Replica receives them.
This enables WAL files to be accumulated at the source instance and not archived when any network issue arises between the regions, therefore, reducing Replication lag which hampers performance.
As with Intra-region RDS Postgres Replication, issues can arise when Replication is not set up properly. To avoid this make sure you set up your Replica instance class and storage types to match your source instance class and size, the writing activity should also be in check by first logging your transactions in the WAL files and then writing those changes to physical storage blocks.
Exclusive locks on source instances should also be queried often and finally, parameter settings at the cross-region replica should be monitored.
To learn more about the steps of Cross-Region Replication, visit here.
Logical RDS Postgres Replication
You can set up Logical RDS Postgres Replication using a publish and subscribe model. The publisher may have more than one subscriber linked to the publisher node. Logical replication usually starts with taking a snapshot of the data on the publisher Database and copying it to the subscriber.
Real-time changes on the publisher are indicated to the subscriber as they occur. Unlike traditional Physical Replication, which Replicates the entire instance along with all the databases, logical replication allows the Replication of a subset of the Database such as tables, etc.
Typical uses of Logical RDS Postgres Replication include sending incremental changes from the publisher to the subscriber from subsets as they happen, Replicating between different versions of PostgreSQL, and sending individual changes as they take place on the subscriber.
Limitations of Logical Replication
Logical Replication comes with some limitations. Issues may arise concerning the following:
- Schema Changes: The subscriber should first commit the schema changes and then the producer.
- Truncation of Large Objects: A limitation of TRUNCATE could be the DELETE operation. You can REVOKE TRUNCATE privileges from tables to avoid any accidental TRUNCATE operations.
- Sequence Data: Logical Replication replicates the data in serial or identity columns. You must update the sequences to the new values in case of any switchover or failover to the subscriber database.
- Partition Tables: Partition tables are treated as regular tables in Logical Replication. You should replicate partitioned tables on a one-to-one basis.
- Foreign tables: Logical replication does not replicate foreign tables.
Automated Replication tools such as Hevo Data can overcome some of these concerning issues and set up your Logical Replication in a hassle free manner. Hevo replicates your PostgreSQL data using Postgres Write Ahead Log (WAL) set at a logical level without writing a single line of code.
RDS Postgres Replication Using Hevo Data
Hevo Data, a No-code Data Pipeline, helps you directly transfer data from AWS and 100+ other data sources to Databases such as SQL Server, Data Warehouses, BI tools, or a destination of your choice in a completely hassle-free & automated manner.
Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.
Hevo Data takes care of all your data preprocessing needs and lets you focus on key business activities and draw a much powerful insight on how to generate more leads, retain customers, and take your business to new heights of profitability. It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination.
Hevo simplifies the RDS Postgres Replication process and only the following few steps are required to configure your Amazon RDS PostgreSQL Source:
- Set up Log-based Incremental Replication: Hevo supports data ingestion from PostgreSQL servers via Write Ahead Logs set at the logical level (available on PostgreSQL version 9.4). So log in to your Amazone RDS console and create a parameter group as shown in the below image.
- Configure Parameter: Click Edit parameters in the top right and configure the parameters according to your requirements.
- Apply the Parameter: Apply the parameter group to your PostgreSQL Database as shown in the image below.
- Whitelist Hevo IP Address: You need to open your AWS Console to whitelist the Hevo IP addresses for your region to enable Hevo to connect to your PostgreSQL database.
- Grant Privileges to the User: Log into your Amazon RDS PostgreSQL Database as a user with grant privilege. Enter the following commands to give accesses to the database user:
GRANT CONNECT ON DATABASE <database_name> TO <database_username>;
GRANT USAGE ON SCHEMA <schema_name> TO <database_username>;
GRANT SELECT ON DATABASE <database_name> TO <database_username>;
- Alter the schema’s default privileges to grant SELECT privileges on tables to the database user:
ALTER DEFAULT PRIVILEGES IN SCHEMA <schema_name>;
GRANT SELECT ON TABLES TO <database_username>;
To learn more about these steps, visit here.
This article talked about Database Replication in Amazon RDS PostgreSQL. Best practices on how to avoid Replication lag were also discussed across the types of RDS Postgres Replication which include Intra-region, Cross-region, and Logical Replication. Before that, you were introduced to Amazon RDS and PostgreSQL and how the two platforms combine for effective usage.
visit our website to explore hevo
Though the output gotten from the combination is efficient, the processes involved in achieving Replication of your Database can be cumbersome hence the need for a veritable platform that offers you a one-stop solution. Hevo Data, an all-in-one data warehouse that supports Amazon RDS, and PostgreSQL can help your Replicate your data smoothly. This platform allows you to transfer data from multiple sources to Cloud-based Data Warehouses like Snowflake, Google BigQuery, Amazon Redshift, etc. It will provide you with a hassle-free experience and make your work life much easier.
sign up for a 14-day free trial!
Share your experience of the article in the comments section!