Making a Replica of your Database for easy access and backing up your files to prevent data loss are essentials in running any data operation at scale. Data replication processes, like RDS Postgres Replication, have become commonplace. Data is now a critical component of most technology systems that not having a detailed Replication strategy is negligent at best.
In this blog post, you will be introduced to Amazon Relational Database Service (Amazon RDS) and PostgreSQL. You will further be shown how to migrate Amazon RDS to Postgres by combining these platforms as well as discussing best practices on how to avoid replication lag to gain a smoother operating performance.
What is Amazon RDS?
Amazon Relational Database Service (Amazon RDS) is an SQL Database service provided by Amazon Web Service (AWS). It is used to set up, operate, store, and organize your Relational Database. It also provides automated Database administration such as migration, hardware provisioning, backup, recovery, and patching.
Amazon RDS is available on several Database (DB) instances and supports a wide range of Database engines including Amazon Aurora, MySQL, PostgreSQL, Oracle Database, MariaDB, and SQL Server. Also, an existing database can be migrated to Amazon RDS using AWS Database Migration Server.
Key Features of Amazon RDS
Some main features of Amazon RDS are listed below:
- Automatic Software Patching: Amazon RDS make sure that the software managing your Database stays up-to-date with the latest patches.
- Event Notifications: Amazon RDS uses Amazon SNS to deliver notifications of Database events via E-Mail or SMS.
- Reserved Instances: Amazon RDS Reserved Instances allows users to reserve a Database instance for a one or three year term.
To learn more about Amazon RDS, visit here.
What is PostgreSQL?
PostgreSQL, also known as Postgres is a powerful open-source Relational Database Management System (RDBMS). Postgres supports both SQL (relational) and JSON (non-relational) query. It has been in existence for over 30 years backed by years of development by the open-source community.
It is the backbone for Databases of many applications on the web as well as analytical applications on mobile. Postgres is commonly used for the following purposes:
- As a back-end Database in the LAPP stack to power dynamic websites and web applications. Lapp stack includes Linux, Apache, PostgreSQL, and PHP (python and Perl).
- For transactional Database purposes by large corporations and startups to support their products.
- It also supports a geospatial Database that is used for geographic information systems (GIS) with the PostGIS extension found on PostgreSQL.
Key Features of PostgreSQL
Some of the main features of PostgreSQL are listed below:
- Data Recovery: To ensure user data remain safe at a time of disaster or system failure, PostgreSQL offers data replication and point-in-time recovery features.
- High Performance: PostgreSQL parallelly reads queries and build B-tree indexes to deliver fast query speed and high performance.
- Security: PostgreSQL offers multiple authentication options such as SCRAM-SHA-256, Certificate, GSSAPI, SSPI, LDAP, etc, with the access control system.
To learn more about PostgreSQL, visit here.
Migrating your data from PostgreSQL on Amazon RDS doesn’t have to be complex. Relax and go for a seamless migration using Hevo’s no-code platform. With Hevo, you can:
- Effortlessly extract data from Amazon RDS PostgreSQL and other 150+ connectors.
- Tailor your data to your destination’s needs with features like drag-and-drop and custom Python scripts.
- Achieve lightning-fast data loading into your destination, making your data analysis ready.
Try to see why customers like Slice and Harmoney have upgraded to a powerful data and analytics stack by incorporating Hevo!
Get Started with Hevo for Free
Using PostgreSQL on Amazon RDS
As has been stated in the previous section, PostgreSQL is one of the Database engines available on Amazon RDS. Amazon RDS supports Database (DB) instances running several versions of PostgreSQL.
On RDS you can create DB instances and DB snapshots, point-in-time restores, and backups. Standard client applications such as pgAdmin, an open-source administration, and development tool, or psql, a command-line utility are used in running commands for PostgreSQL.
What is Logical RDS Postgres Replication?
You can set up Logical RDS Postgres Replication using a publish and subscribe model. The publisher may have more than one subscriber linked to the publisher node. Logical replication usually starts with taking a snapshot of the data on the publisher Database and copying it to the subscriber.
Real-time changes on the publisher are indicated to the subscriber as they occur. Unlike traditional Physical Replication, which Replicates the entire instance along with all the databases, logical replication allows the Replication of a subset of the Database such as tables, etc.
Typical uses of Logical RDS Postgres Replication include sending incremental changes from the publisher to the subscriber from subsets as they happen, Replicating between different versions of PostgreSQL, and sending individual changes as they take place on the subscriber.
How Can You Replicate Tables in RDS for PostgreSQL Using Logical Replication?
In this method, two source tables are replicated to two target tables using logical replication in RDS for PostgreSQL. Logical replication first performs the initial load of data already present in the source tables and then continues to replicate ongoing changes.
Integrate PostgreSQL on Amazon RDS to BigQuery
Integrate PostgreSQL on Amazon RDS to Databricks
Step 1: Turn on logical replication
- Modify a custom parameter group to set rds.logical_replication to 1 and attach rds.logical_replication to the DB instance. When the DB instance reboots, the wal_level parameter is set to logical.
- Verify the values for
wal_level
and rds.logical_replication
:
postgres=> SELECT name,setting FROM pg_settings WHERE name IN ('wal_level','rds.logical_replication');
name | setting
-------------------------+---------
rds.logical_replication | on
wal_level | logical
(2 rows)
Step 2: Connect to source database in source DB instance
- Create a source table by using the command
source=> CREATE TABLE reptab1 (slno int primary key);
CREATE TABLE
source=> CREATE TABLE reptab2 (name varchar(20));
CREATE TABLE
- Insert data using command:
source=> INSERT INTO reptab1 VALUES (generate_series(1,1000));
INSERT 0 1000
source=> INSERT INTO reptab2 SELECT SUBSTR ('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',((random()*(36-1)+1)::integer),1) FROM generate_series(1,50);
INSERT 0 50
Step 3: Create a publication for source tables
- Create a publication and use a SELECT query to verify the details of the publication that was created:
source=> CREATE PUBLICATION testpub FOR TABLE reptab1,reptab2;
CREATE PUBLICATION
source=> SELECT * FROM pg_publication;
oid | pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate | pubviaroot
--------+---------+----------+--------------+-----------+-----------+-----------+-------------+------------
115069 | testpub | 16395 | f | t | t | t | t | f
(1 row)
- Verify that the publication has the source tables:
source=> SELECT * FROM pg_publication_tables;
pubname | schemaname | tablename
---------+------------+-----------
testpub | public | reptab1
testpub | public | reptab2
(2 rows)
Step 4: Connect to the Target table
- Create a target table just like you did for the source and ensure that they are empty.
- Now create and verify subscription in target database.
target=> CREATE SUBSCRIPTION testsub CONNECTION 'host=<source RDS/host endpoint> port=5432 dbname=<source_db_name> user=<user> password=<password>' PUBLICATION testpub;
NOTICE: Created replication slot "testsub" on publisher
CREATE SUBSCRIPTION
target=> SELECT oid,subname,subenabled,subslotname,subpublications FROM pg_subscription;
oid | subname | subenabled | subslotname | subpublications
-------+---------+------------+-------------+-----------------
16434 | testsub | t | testsub | {testpub}
(1 row)
- When the subscription is created, the subscription loads all the data present in the source tables to the target tables. Run a SELECT query on the target tables to verify that the initial data loads:
target=> SELECT count(*) FROM reptab1;
count
-------
1000
(1 row)
target=> SELECT count(*) FROM reptab2;
count
-------
50
(1 row)
Step 5: Verify and Test replication slot in source database
- Verify the replication slot details by running the following SELECT query on the source database:
source=> SELECT * FROM pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size
----------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------
testsub | pgoutput | logical | 115048 | source | f | t | 846 | | 6945 | 58/B4000568 | 58/B40005A0 | reserved |
(1 row)
- Test whether data changes in the source tables are being replicated to the target tables by inserting rows into the source tables:
source=> INSERT INTO reptab1 VALUES(generate_series(1001,2000));
INSERT 0 1000
source=> INSERT INTO reptab2 SELECT SUBSTR('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',((random()*(36-1)+1)::integer),1) FROM generate_series(1,50);
INSERT 0 50
source=> SELECT count(*) FROM reptab1;
count
-------
2000
(1 row)
source=> SELECT count(*) FROM reptab2; count
-------
100
And You are done!
Limitations of Logical Replication
Logical Replication comes with some limitations. Issues may arise concerning the following:
- Schema Changes: The subscriber should first commit the schema changes and then the producer.
- Truncation of Large Objects: A limitation of TRUNCATE could be the DELETE operation. You can REVOKE TRUNCATE privileges from tables to avoid any accidental TRUNCATE operations.
- Sequence Data: Logical Replication replicates the data in serial or identity columns. You must update the sequences to the new values in case of any switchover or failover to the subscriber database.
- Partition Tables: Partition tables are treated as regular tables in Logical Replication. You should replicate partitioned tables on a one-to-one basis.
- Foreign tables: Logical replication does not replicate foreign tables.
Automated Replication tools such as Hevo Data can overcome some of these concerning issues and set up your Logical Replication in a hassle free manner. Hevo replicates your PostgreSQL data using Postgres Write Ahead Log (WAL) set at a logical level without writing a single line of code.
Migrate Data seamlessly Within Minutes!
No credit card required
RDS Postgres Replication Using Hevo Data
Hevo provides AWS RDS PostgreSQL as one of its plug-and-play connectors. You can directly configure it to replicate data from your RDS Postgres database.
Best practices for Amazon RDS PostgreSQL replication
- To avoid accidental TRUNCATE operations, you can REVOKE TRUNCATE privileges from tables.
- Set parameter rds.logical_replication: If this parameter enables the use of the RDS PostgreSQL instance as the DMS source, monitor DMS tasks.
- As a best practice, schema changes should be committed first by the subscriber, then by the publisher.
- In case of switchover or failover to the subscriber database, you must update the sequences to the latest values.
Conclusion
This article talked about Database Replication in Amazon RDS PostgreSQL. Best practices on how to avoid Replication lag were also discussed across the types of RDS Postgres Replication which include Intra-region, Cross-region, and Logical Replication. Before that, you were introduced to Amazon RDS and PostgreSQL and how the two platforms combine for effective usage.
visit our website to explore hevo
Though the output gotten from the combination is efficient, the processes involved in achieving Replication of your Database can be cumbersome hence the need for a veritable platform that offers you a one-stop solution. Hevo Data, an all-in-one data warehouse that supports Amazon RDS, and PostgreSQL can help your Replicate your data smoothly. This platform allows you to transfer data from multiple sources to Cloud-based Data Warehouses like Snowflake, Google BigQuery, Amazon Redshift, etc. It will provide you with a hassle-free experience and make your work life much easier.
Frequently Asked Questions
1. How does RDS Postgres replication work?
RDS PostgreSQL replication works using Streaming Replication, where data changes are sent from the primary instance to one or more read replicas. It maintains a near real-time copy of the primary database, allowing for load balancing and failover.
2. Does AWS RDS support replication?
Yes, AWS RDS supports replication, including read replicas for PostgreSQL, which allow you to scale read operations. RDS also supports cross-region replication for disaster recovery and improved availability.
3. Is RDS automatically replicated?
RDS provides automated backups, which allow for point-in-time recovery, but replication (like read replicas) must be set up manually. Automated backups and snapshots are separate from replication features.
Ofem Eteng is a seasoned technical content writer with over 12 years of experience. He has held pivotal roles such as System Analyst (DevOps) at Dagbs Nigeria Limited and Full-Stack Developer at Pedoquasphere International Limited. He specializes in data science, data analytics and cutting-edge technologies, making him an expert in the data industry.