Based on the requirements of different applications, organizations store data in several databases. These databases not only vary in terms of types but also can be hosted in either cloud or on-premise. As a result, organizations often need to move data from one data store to another.

To simplify the migration workflows, organizations are using AWS DMS. It makes it extremely convenient to migrate data from one source endpoint to another. It primarily runs the replication tasks to move your data.

This blog walks you through AWS DMS and how to use PostgreSQL as a source endpoint for migrating data via AWS DMS.

What is AWS DMS?

Logo of AWS DMS for using AWS DMS Postgres source

Database Migration Service (DMS) on the Amazon Web Services (AWS) helps you migrate databases to AWS. It can migrate data from relational databases (SQL-based), data warehouses, non-relational databases (NoSQL), and other data stores. With AWS DMS, the source database remains functional during migration, reducing the downtime for applications.

The DMS supports both homogenous (PostgreSQL to PostgreSQL) and heterogeneous (PostgreSQL to Oracle) migrations. You can also replicate data and ongoing changes (Change Data Capture or CDC) with low latency from any supported source endpoint to a supported target endpoint. 

Benefits of using AWS DMS

There are many upsides to using the AWS data migration service, some of which are:

1) Ease of use

AWS DMS is pretty simple and straightforward to use. You do not need any specific drivers or applications to be able to use the migration service. Most of the time, the service would migrate your data without making any changes to your source database. Once the migration has started, DMS manages all the process tasks, including replicating data changes in the source database while the migration is in progress. 

2) Supports the majority of databases

AWS DMS is compatible with the most widely used open-source and commercial databases. You can use an AWS DMS Postgres source and even an on-premises database source outside of AWS.

3) Low Cost

AWS DMS is a low-cost migration service. You only need to pay for the computing resources and any additional log storage. It will cost you as low as the US $3 to migrate a terabyte-sized database. 

4) Ongoing Replication

AWS DMS allows you to migrate your database via one-time migration or ongoing replication. The latter method of data replication keeps your source and target databases in sync. It continuously synchronizes the alterations to the target database. 

Prerequisites for using AWS DMS Postgres source

  • Installation of pglogical extension on the source and target database.
  • Enable logical replication for CDC on your PostgreSQL source database.

Configuring a PostgreSQL database as an AWS DMS source endpoint

You need to make a few configuration changes to use PostgreSQL database as an AWS DMS source endpoint. To configure, you can follow the steps given below:

  • Create a PostgreSQL user to provide AWS DMS access to your source PostgreSQL database with the required permissions.
  • Create a PostgreSQL source endpoint for your selected database configuration.
  • Create a task to migrate your data.

Note: To create a fully-load-only task, no further inputs are required. If you wish to create a task for CDC (Change Data Capture) only, you need to make a few more configurations to enable CDC. These configurations depend on the ownership of the database and have been elaborated on in the section above.

Additional configurations

You can also undertake other configurations while working with AWS DMS Postgres source to make the migration process more specific in the following ways:

  • You can add values to the ‘extra connection attributes’ when you create the source endpoint. These values capture DDL (Data Definition Language) events and specify the schema for the operational DDL database artifacts.
  • After some additional configuration, you can override connection string parameters. You can do this in two ways:
    • By specifying AWS DMS parameters.
    • By specifying pass-through parameters for the client database.
  • You can also control the information passed on to Write-Ahead Logs (WALs) using the table-level parameter REPLICATE IDENTITY in version 9.4. The parameter identifies updated or deleted rows. 
  • You can also record old values via the REPLICATE IDENTITY FULL parameter as FULL creates additional WALs that might be unnecessary.

AWS DMS Postgres Source: Use Cases

You can migrate your data from one or many PostgreSQL databases homogeneously and heterogeneously. AWS DMS is compatible with a PostgreSQL version of 9.4 or higher, 10.x, 11.x, 12.x, 13.x, and 14.0 database as a source endpoint. You can use the AWS DMS Postgres source for the following types of databases:

  • On-premises databases.
  • Databases on an Amazon EC2 instance.
  • Databases on an Amazon RDS DB instance.
  • Databases on a DB instance based on Amazon Aurora Postgres-compatible edition.

1) Self-Managed PostgreSQL Sources

You can migrate data to any target database supported by AWS DMS or another PostgreSQL database. The source database can be a self-managed engine running on Amazon’s EC2 instance or an on-premises database. 

Prerequisites

  • PostgreSQL database version should be 9.4 or later.
  • Grant superuser permissions for the user account for full-load CDC or CDC-only tasks.
  • For full-load-only tasks, give SELECT permissions on tables for migration.
  • Add the IP address of the replication server to pg_hba.conf (HBA implies host-based authentication) configuration file to enable replication. The configuration file controls authentication.

Configuring a self-managed PostgreSQL database as a source for logical replication via enabling CDC

The CDC feature is leveraged via logical replication when using AWS DMS. To enable logical replication of a self-managed PostgreSQL source endpoint, set the following parameters and values in the postgresql.conf configuration file:

  • Set wal_level = logical.
  • Set max_replication_slots to a value greater than 1. The value should correspond to the number of tasks you wish to run.
  • Set max_wal_senders to a value greater than 1. This parameter specifies the number of concurrent tasks.
  • To end replication connections that are inactive longer than the specified duration, use the wal_sender_timeout parameter. The default duration is 60 seconds, while the minimum value set by AWS DMS is 10 seconds.

Please note that some parameters are static and can be configured only at the start of the server. You will need to restart the server to accommodate changes made in the configuration file or to the DB parameters.

2) AWS-managed PostgreSQL databases

DMS allows you to utilize an AWS-managed PostgreSQL DB instance as a source for AWS DMS. 

Prerequisites

  • Utilize the AWS master user account for the source endpoint, as it can allow you to set up CDC. Any other account would require the rds_superuser role and the rds_replication role to grant replication permissions to manage and stream data via logical slots. 
  • If your source database is in a Virtual Private Cloud (VPC), choose the VPC security method that allows access to the DB instance. Only then the DMS replication instance can connect to it successfully.

Configuring an AWS-managed PostgreSQL database as a source for logical replication via enabling CDC

The DMS on AWS cloud supports CDC on Amazon RDS PostgreSQL databases after configuring the DB instance for logical replication. However, you cannot use it to read replicas for CDC during ongoing replication.

AWS DMS provides full load and CDC support for Aurora PostgreSQL version 2.2 compatible with PostgreSQL 10.6 or higher. The DMS only provides full load support for Aurora PostgreSQL version 2.1 with PostgreSQL 10.5 or lower.

To enable logical replication:

  • Utilize the AWS master user account for the source endpoint, as it can allow you to set up CDC. Any other account would require the rds_superuser role and the rds_replication role to grant replication permissions to manage and stream data via logical slots. 
  • AWS DMS will set the wal_level, max_wal_senders, max_replication_slots, and, max_connections parameters. This can increase WAL generation. Hence the parameter rds.logical_replication has to be used. Set the rds.logical_replication parameter in your DB CLUSTER to 1. For this parameter to be in effect, you will need to reboot.
  • To end replication connections that are inactive longer than the specified duration, use the wal_sender_timeout parameter. The default duration is 60 seconds, while the minimum value set by AWS DMS is 10 seconds.
  • Check that the max_worker_processes parameter is equal to or higher than the total of max_logical_replication_workers, autovacuum_max_workers, and max_parallel_workers combined. 

3) Migrating from PostgreSQL to PostgreSQL

You can also migrate data from a PostgreSQL database to another target database based on PostgreSQL. The source could be an on-premises database, while the target could be Amazon RDS for PostgreSQL or Aurora PostgreSQL instance.

However, ensure that the data type is supported by both the source and target databases. Failing to check the data type might lead to migration failure. AWS DMS transmits some data types as strings if the data type is unknown.

The DMS can migrate smaller files for some data types like XML and JSON, but not if they are large documents.

4) Migrating PostGIS spatial data types

Spatial data comprises information about an object’s geometry or location in space. Since PostgreSQL is an object-relational database, it is used to store PostGIS data types. 

Prerequisites

  • Enable the PostGIS plugin globally so AWS DMS can make the exact spatial data columns for the target database. 

AWS DMS supports geodetic coordinates such as:

  • POINT
  • LINESTRING
  • POLYGON
  • MULTIPOLYGON
  • GEOMETRYCOLLECTION
  • MULTIPOINT

Note: AWS DMS can undertake only homogenous data migrations for PostGIS data types.

Limitations on using PostgreSQL databases as AWS DMS sources

There are some limitations associated with using AWS DMS Postgres source. these are as follows:

  • AWS DMS ignores attempts to update primary key segments. Such an update is not added to the rows in the target database. This is an implication because the result of updating a primary key in PostgreSQL is unpredictable.
  • The AWS DMS Postgres source does not support the Start Process Changes from Timestamp run option.
  • Replicating multiple tables with the same name but different cases (table1, Table1) can be problematic. 
  • AWS DMS Postgres source does not support change processing of CREATE, ALTER, and DROP DDL statements if your tables are held in an inner function or nested constructs.
  • The maximum number of alterations stored in memory per transaction is 4 MB when logical replication is enabled in the AWS DMS Postgres source. As a result, you should refrain from long-running transactions when logical replication is enabled.
  • AWS DMS does not support CDC for Aurora PostgreSQL Serverless as a source.

Conclusion 

This blog introduced you to the AWS Data Migration Service. It also walked you through using a PostgreSQL source endpoint for the data migration service. It also gave a comprehensive knowledge about the different use cases of using AWS DMS Postgres sources.

Now, you can move forward and use the AWS DMS Postgres source for migrating different kinds of data based on your use cases.

Want to explore more about the usage and working with PostgreSQL databases? You can go through these articles.

Collecting data from PostgreSQL and transferring it to the desired warehouse can be a time-consuming task if you build your ETL pipeline manually. The situation is far more overwhelming given the amount and resources needed to hire a data engineer or analyst to understand the data.

However, with Hevo in the picture, you won’t have to worry about your PostgreSQL Data Replication demands. It will just take a few minutes to load your data into your chosen data warehouse.

Hevo‘s strong integration with 150+ Data Sources (including 40+ Free Sources) like PostgreSQL, you can export data from your selected data sources and load it to the destination of your choice.

It also assists you with reorganizing and enhancing your data so that it is ready for analysis. Now, you can readily save your time and focus on gaining insights and doing in-depth research on your data using BI solutions.

You can now replicate your PostgreSQL data to any data warehouse of your choice, including Amazon Redshift, Snowflake, Google BigQuery, and Firebolt.

Why don’t you give Hevo a try? Sign Up here for a 14-day full feature access trial and experience the feature-rich Hevo suite first hand. You can also check our unbeatable pricing and make a decision on your best-suited plan. 

Share your thoughts on learning about using the PostgreSQL database as AWS DMS Source in the comments section below. If you have any questions, do let us know. We’d be happy to help.

mm
Freelance Technical Content Writer, Hevo Data

Disha is deeply passionate about data science, and possesses a knack for writing on data, software architecture, and technical content catered to data teams to solve intricate business challenges.

No-code Data Pipeline for PostgreSQL