Redshift Cross Region Replication: 4 Easy Steps

on Data Integration, ETL, Tutorials • January 8th, 2021 • Write for Hevo

Most cloud services house robust support for seamless & real-time replication of data, a functionality that most organisations seek to achieve. Amazon Web Services (AWS) is one such cloud service by Amazon that provides users and businesses with robust end-to-end cloud-based solutions & APIs. One of the most popular services that Amazon Web Services provides is the Redshift data warehouse. It allows users to access, store, retrieve and replicate their data on demand & seamlessly across a diverse set of regions.

This article focuses on Amazon Redshift Cross Region Replication & aims at providing you with a comprehensive step-by-step guide to help you set up replication in Amazon Redshift, and seamlessly replicate your data across tables present in a diverse set of regions. Upon a complete walkthrough of the content, you will have in-depth knowledge of data replication in Amazon Redshift, and you will be able to set up Cross Region Replication in Amazon Redshift with ease.

Table of Contents

Introduction to Amazon Redshift

Amazon Redshift Logo.

Amazon Redshift is a fully-managed petabyte-scale cloud-based data warehouse, designed to store large scale data sets and perform insightful analysis on them in real-time.

It is highly column-oriented & designed to connect with SQL-based clients and business intelligence tools, making data available in real-time. Supporting PostgreSQL 8, Amazon Redshift delivers exceptional performance and efficient querying. Each Amazon Redshift data warehouse contains a collection of computing resources (nodes) organized in a cluster, each having an engine of its own and a database.

For further information on Amazon Redshift, you can check the official website here.

Simplify Data Replication using Hevo’s No-code Data Pipelines

Hevo Data, a No-code Data Pipeline, can help you replicate data from Amazon Redshift (among 100+ sources) swiftly to a database/data warehouse of your choice. Hevo is fully-managed and completely automates the process of monitoring and replicating the changes on the secondary database rather than making the user write the code repeatedly. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Get Started with Hevo for Free

Hevo provides you with a truly efficient and fully-automated solution to replicate and manage data in real-time and always have analysis-ready data in your desired destination. It allows you to focus on key business needs and perform insightful analysis using BI tools. 

Have a look at the amazing features of Hevo:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to export. 
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
  • Completely Managed Platform: Hevo is fully managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.
Sign up here for a 14-Day Free Trial!

Prerequistes

  • Working knowledge of Amazon Redshift.
  • An Amazon Redshift account.
  • An Amazon Redshift table.
  • A general idea about data replication.

Steps to Set up Amazon Redshift Cross Region Replication

Amazon Redshift allows users to replicate their data across numerous regions by extracting data from their tables using the unload command and then loading the data in the target tables via Amazon S3.

You can implement this using the following steps:

Step 1: Extracting Data from Amazon Redshift

To start replicating your Amazon Redshift data, you first need to extract the data from your Amazon Redshift table in the form of files and load it to intermediate storage such as Amazon S3 buckets. To do this, you can use the unload command to extract data, either in a “parallel off mode”(One File per Table) or “parallel on mode”(One File per Slice).

Parallel Off Mode

In case you want to extract data in the parallel off mode, you can use the unload command to load data from your Amazon Redshift table into the Amazon S3 bucket as follows:

unload ('select * from merchant_details') to 's3://staging-area/extract/redshift_east_region_extract/merchant_details_' credentials 'aws_access_key_id=***********;aws_secret_access_key=***********' parallel off;

A new file “merchant_details_000” will now appear in your Amazon S3 bucket.

Parallel On Mode

In case you’re working with tables that house a large volume of complex data, you should use the unload command in the “parallel on mode” and then create a manifest file, containing information about the data extract files.

Manifest File in Redshift - Amazon Redshift Cross Region Replication

To do this, you can use the unload command as follows:

unload ('select * from merchant_details') to 's3://staging-area/extract/redshift_east_region_extract/merchant_details_' credentials 

'aws_access_key_id=***********;aws_secret_access_key=***********;' manifest;

Amazon Redshift will now create multiple files containing your data as follows:

Files Created by Redshift - Amazon Redshift Cross Region Replication

Each extracted file uses “|” as the delimiter by default. In case you want to modify or change the delimiter, you can use the delimiter option and the escape option (Special Characters).

This is how you can extract data from Amazon Redshift, to start setting up Amazon Redshift Cross Region Replication.

Step 2: Creating an Amazon Redshift Table in the Target Region

Once you’ve extracted the data from your Amazon Redshift table into Amazon S3 buckets, you need to create an Amazon Redshift table in the target region. To do this, you can make use of the table view provided by the AWS team, available in the GitHub repository and then execute a DDL command to create the Amazon Redshift table. You can use the following line of code to select the DDL:

select ddl from v_generate_tbl_ddl where tablename='merchant_details’' and schemaname = ‘finance’ order by seq;

This is how you can create the target table for the Amazon Redshift Cross Region Replication process.

Step 3: Loading Data to Target Amazon Redshift Table

With your target Amazon Redshift table now ready, you can now leverage the Amazon Redshift copy command to load data into the table.

In case you want to load a single file, you can use the following line of code:

copy merchant_details from 's3://staging-area/extract/redshift_east_region_extract/merchant_details_000' credentials 'aaws_access_key_id=**********;aws_secret_access_key=********** region 'us-east-1';

In case you want to load multiple files, you can use the following line of code:

copy merchant_details from 's3://staging-area/extract/redshift_east_region_extract/merchant_details_manifest' credentials 'aaws_access_key_id=**********;aws_secret_access_key=**********; manifest region 'us-east-1';

Ensure that you mention the manifest file instead of the source file, otherwise, Amazon Redshift will start looking for data in the same region. In case you want to learn more about the Amazon Redshift copy command, you can check the official AWS documentation here.

This is how you can set up Amazon Redshift Cross Region Replication via Amazon S3. In case you want to replicate data from multiple tables, you can check the next step.

Step 4: Replicating Data from Multiple Tables (Optional)

To replicate your data from numerous Amazon Redshift tables to any other region, you can modify the unload command as follows:

select 'unload ('''|| table_name ||') to ''s3://staging-area/extract/redshift_east_region_extract/'||table_name||'_'' credentials ''aws_access_key_id=**********;aws_secret_access_key=**********; manifest; region 'us-east-1'' 
from (select distinct table_name from information_schema.columns where table_schema = 'finance') ORDER BY seq;

Once you’ve extracted the data from your Amazon Redshift tables, you now need to fetch the DDL from the table view, available in GitHub to create the target Amazon Redshift tables in the different region. To do this, you can use the following line of code:

select ddl from v_generate_tbl_ddl where schemaname = 'finance' order by seq;

With your Amazon Redshift table now ready, you can use the following lines of code to import the data from your extracted files to your target Amazon Redshift table:

select 'copy '|| table_name ||' from ''s3://staging-area/extract/redshift_east_region_extract/'|| table_name||'_manifest'' credentials ''aws_access_key_id=**********;aws_secret_access_key=**********; manifest region ''us-east-1'';' 
from (select distinct table_name from information_schema.columns where table_schema = 'finance') ORDER BY seq;

This is how you can set up Amazon Redshift Cross Region Replication to replicate data from multiple tables.

Conclusion

This article teaches you how to set up Amazon Redshift Cross Region Replication with ease, and answers all your queries regarding it. It provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently. These methods, however, can be challenging especially for a beginner & this is where Hevo saves the day. Hevo Data, a No-code Data Pipeline, can help you replicate data in real-time without writing any code. Hevo being a fully-managed system provides a highly secure automated solution to help perform replication in just a few clicks using its interactive UI.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for you!

Why don’t you share your experience of setting up Amazon Redshift Cross Region Replication in the comments? We would love to hear from you!

No-code Data Pipeline For Amazon Redshift