Based on the requirements of different applications, organizations store data in several databases. These databases not only vary in terms of types but also can be hosted in either cloud or on-premise. As a result, organizations often need to move data from one data store to another.

  • To simplify the migration workflows, organizations are using AWS DMS. It makes it extremely convenient to migrate data from one source endpoint to another. It primarily runs the replication tasks to move your data.
  • This blog walks you through AWS DMS and how to use PostgreSQL as a source endpoint for migrating data via AWS DMS.

What is AWS DMS?

  1. Database Migration Service (DMS) on the Amazon Web Services (AWS) helps you migrate databases to AWS.
  2. It can migrate data from relational databases (SQL-based), data warehouses, non-relational databases (NoSQL), and other data stores. With AWS DMS, the source database remains functional during migration, reducing the downtime for applications.
  3. The DMS supports both homogenous (PostgreSQL to PostgreSQL) and heterogeneous (PostgreSQL to Oracle) migrations. You can also replicate data and ongoing changes (Change Data Capture or CDC) with low latency from any supported source endpoint to a supported target endpoint. 

Benefits of using AWS DMS

1) Ease of use

  • AWS DMS is pretty simple and straightforward to use. You do not need any specific drivers or applications to be able to use the migration service.
  • Most of the time, the service would migrate your data without making any changes to your source database.
  • Once the migration has started, DMS manages all the process tasks, including replicating data changes in the source database while the migration is in progress. 

2) Supports the majority of databases

  • AWS DMS is compatible with the most widely used open-source and commercial databases. You can use an AWS DMS Postgres source and even an on-premises database source outside of AWS.

3) Low Cost

  • AWS DMS is a low-cost migration service. You only need to pay for the computing resources and any additional log storage. It will cost you as low as the US $3 to migrate a terabyte-sized database. 

4) Ongoing Replication

  • AWS DMS allows you to migrate your database via one-time migration or ongoing replication.
  • The latter method of data replication keeps your source and target databases in sync. It continuously synchronizes the alterations to the target database. 

Prerequisites for using AWS DMS Postgres source

  • Installation of pglogical extension on the source and target database.
  • Enable logical replication for CDC on your PostgreSQL source database.

Configuring a PostgreSQL database as an AWS DMS source endpoint

  • Create a PostgreSQL user to provide AWS DMS access to your source PostgreSQL database with the required permissions.
  • Create a PostgreSQL source endpoint for your selected database configuration.
  • Create a task to migrate your data.

Note: To create a fully-load-only task, no further inputs are required. If you wish to create a task for CDC (Change Data Capture) only, you need to make a few more configurations to enable CDC. These configurations depend on the ownership of the database and have been elaborated on in the section above.

Additional configurations

  • You can add values to the ‘extra connection attributes’ when you create the source endpoint. These values capture DDL (Data Definition Language) events and specify the schema for the operational DDL database artifacts.
  • After some additional configuration, you can override connection string parameters. You can do this in two ways:
    • By specifying AWS DMS parameters.
    • By specifying pass-through parameters for the client database.
  • You can also control the information passed on to Write-Ahead Logs (WALs) using the table-level parameter REPLICATE IDENTITY in version 9.4. The parameter identifies updated or deleted rows. 
  • You can also record old values via the REPLICATE IDENTITY FULL parameter as FULL creates additional WALs that might be unnecessary.

AWS DMS Postgres Source: Use Cases

  1. You can migrate your data from one or many PostgreSQL databases homogeneously and heterogeneously.
  2. AWS DMS is compatible with a PostgreSQL version of 9.4 or higher, 10.x, 11.x, 12.x, 13.x, and 14.0 database as a source endpoint. You can use the AWS DMS Postgres source for the following types of databases:
    • On-premises databases.
    • Databases on an Amazon EC2 instance.
    • Databases on an Amazon RDS DB instance.
    • Databases on a DB instance based on Amazon Aurora Postgres-compatible edition.

1) Self-Managed PostgreSQL Sources

Prerequisites

  • PostgreSQL database version should be 9.4 or later.
  • Grant superuser permissions for the user account for full-load CDC or CDC-only tasks.
  • For full-load-only tasks, give SELECT permissions on tables for migration.
  • Add the IP address of the replication server to pg_hba.conf (HBA implies host-based authentication) configuration file to enable replication. The configuration file controls authentication.

Configuring a self-managed PostgreSQL database as a source for logical replication via enabling CDC

The CDC feature is leveraged via logical replication when using AWS DMS. To enable logical replication of a self-managed PostgreSQL source endpoint, set the following parameters and values in the postgresql.conf configuration file:

  • Set wal_level = logical.
  • Set max_replication_slots to a value greater than 1. The value should correspond to the number of tasks you wish to run.
  • Set max_wal_senders to a value greater than 1. This parameter specifies the number of concurrent tasks.
  • To end replication connections that are inactive longer than the specified duration, use the wal_sender_timeout parameter. The default duration is 60 seconds, while the minimum value set by AWS DMS is 10 seconds.

Please note that some parameters are static and can be configured only at the start of the server. You will need to restart the server to accommodate changes made in the configuration file or to the DB parameters.

2) AWS-managed PostgreSQL databases

DMS allows you to utilize an AWS-managed PostgreSQL DB instance as a source for AWS DMS. 

Prerequisites

  • Utilize the AWS master user account for the source endpoint, as it can allow you to set up CDC. Any other account would require the rds_superuser role and the rds_replication role to grant replication permissions to manage and stream data via logical slots. 
  • If your source database is in a Virtual Private Cloud (VPC), choose the VPC security method that allows access to the DB instance. Only then the DMS replication instance can connect to it successfully.

Configuring an AWS-managed PostgreSQL database as a source for logical replication via enabling CDC

The DMS on AWS cloud supports CDC on Amazon RDS PostgreSQL databases after configuring the DB instance for logical replication. However, you cannot use it to read replicas for CDC during ongoing replication.

AWS DMS provides full load and CDC support for Aurora PostgreSQL version 2.2 compatible with PostgreSQL 10.6 or higher. The DMS only provides full load support for Aurora PostgreSQL version 2.1 with PostgreSQL 10.5 or lower.

To enable logical replication:

  • Utilize the AWS master user account for the source endpoint, as it can allow you to set up CDC. Any other account would require the rds_superuser role and the rds_replication role to grant replication permissions to manage and stream data via logical slots. 
  • AWS DMS will set the wal_level, max_wal_senders, max_replication_slots, and, max_connections parameters. This can increase WAL generation. Hence the parameter rds.logical_replication has to be used. Set the rds.logical_replication parameter in your DB CLUSTER to 1. For this parameter to be in effect, you will need to reboot.
  • To end replication connections that are inactive longer than the specified duration, use the wal_sender_timeout parameter. The default duration is 60 seconds, while the minimum value set by AWS DMS is 10 seconds.
  • Check that the max_worker_processes parameter is equal to or higher than the total of max_logical_replication_workers, autovacuum_max_workers, and max_parallel_workers combined. 

3) Migrating from PostgreSQL to PostgreSQL

  • You can also migrate data from a PostgreSQL database to another target database based on PostgreSQL.
  • The source could be an on-premises database, while the target could be Amazon RDS for PostgreSQL or Aurora PostgreSQL instance.
  • However, ensure that the data type is supported by both the source and target databases. Failing to check the data type might lead to migration failure.
  • AWS DMS transmits some data types as strings if the data type is unknown.
  • The DMS can migrate smaller files for some data types like XML and JSON, but not if they are large documents.

4) Migrating PostGIS spatial data types

Spatial data comprises information about an object’s geometry or location in space. Since PostgreSQL is an object-relational database, it is used to store PostGIS data types. 

Prerequisites

  • Enable the PostGIS plugin globally so AWS DMS can make the exact spatial data columns for the target database. 

AWS DMS supports geodetic coordinates such as:

  • POINT
  • LINESTRING
  • POLYGON
  • MULTIPOLYGON
  • GEOMETRYCOLLECTION
  • MULTIPOINT

Note: AWS DMS can undertake only homogenous data migrations for PostGIS data types.

Limitations on using PostgreSQL databases as AWS DMS sources

  • AWS DMS ignores attempts to update primary key segments. Such an update is not added to the rows in the target database. This is an implication because the result of updating a primary key in PostgreSQL is unpredictable.
  • The AWS DMS Postgres source does not support the Start Process Changes from Timestamp run option.
  • Replicating multiple tables with the same name but different cases (table1, Table1) can be problematic. 
  • AWS DMS Postgres source does not support change processing of CREATE, ALTER, and DROP DDL statements if your tables are held in an inner function or nested constructs.
  • The maximum number of alterations stored in memory per transaction is 4 MB when logical replication is enabled in the AWS DMS Postgres source. As a result, you should refrain from long-running transactions when logical replication is enabled.
  • AWS DMS does not support CDC for Aurora PostgreSQL Serverless as a source.

Conclusion 

  • This blog introduced you to the AWS Data Migration Service. It also walked you through using a PostgreSQL source endpoint for the data migration service.
  • It also gave a comprehensive knowledge about the different use cases of using AWS DMS Postgres sources.

Want to explore more about the usage and working with PostgreSQL databases? You can go through these articles.

You can now replicate your PostgreSQL data to any data warehouse of your choice, including Amazon Redshift, Snowflake, Google BigQuery, and Firebolt.

mm
Freelance Technical Content Writer, Hevo Data

Disha is deeply passionate about data science, and possesses a knack for writing on data, software architecture, and technical content catered to data teams to solve intricate business challenges.

No-code Data Pipeline for PostgreSQL