Connecting Aurora to Redshift using AWS Glue: 7 Easy Steps

on Tutorial • January 7th, 2019 • Write for Hevo

Are you trying to derive deeper insights from your Aurora Database by moving the data into a larger Database like Amazon Redshift? Well, you have landed on the right article. Now, it has become easier to replicate data from Aurora to Redshift.

This article will give you a comprehensive guide to Amazon Aurora and Amazon Redshift. You will explore how you can utilize AWS Glue to move data from Aurora to Redshift using 7 easy steps. You will also get to know about the advantages and limitations of this method in further sections. Let’s get started.

Table of Contents

Prerequisites

You will have a much easier time understanding the method of connecting Aurora to Redshift if you have gone through the following aspects:

  • An active account in AWS.
  • Working knowledge of Database and Data Warehouse.
  • Basic knowledge of ETL process.

Introduction to Amazon Aurora

Amazon Aurora Logo
Image Source

Aurora is a database engine that aims to provide the same level of performance and speed as high-end commercial databases, but with more convenience and reliability. One of the key benefits of using Amazon Aurora is that it saves DBAs (Database Administrators) time when designing backup storage drives because it backs up data to AWS S3 in real-time without affecting the performance. Moreover, it is MySQL 5.6 compliant and provides five times the throughput of MySQL on similar hardware.

To know more about Amazon Aurora, visit this link.

Introduction to Amazon Redshift

Amazon Redshift Logo
Image Source

Amazon Redshift is a cloud-based Data Warehouse solution that makes it easy to combine and store enormous amounts of data for analysis and manipulation. Large-scale database migrations are also performed using it.

The Redshift architecture is made up of several computing resources known as Nodes, which are then arranged into Clusters. The key benefit of Redshift is its great scalability and quick query processing, which has made it one of the most popular Data Warehouses even today.

To know more about Amazon Redshift, visit this link.

Introduction to AWS Glue

aurora to redshift using glue
Image Source

AWS Glue is a serverless ETL service provided by Amazon. Using AWS Glue, you pay only for the time you run your query. In AWS Glue, you create a metadata repository (data catalog) for all RDS engines including Aurora, Redshift, and S3, and create connection, tables, and bucket details (for S3). You can build your catalog automatically using a crawler or manually. Your ETL internally generates Python/Scala code, which you can customize as well. Since AWS Glue is serverless, you do not have to manage any resources and instances. AWS takes care of it automatically.

To know more about AWS Glue, visit this link.

Simplify ETL using Hevo’s No-code Data Pipeline

Hevo Data helps you directly transfer data from 100+ data sources (including 30+ free sources) to Business Intelligence tools, Data Warehouses, or a destination of your choice in a completely hassle-free & automated manner. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Hevo takes care of all your data preprocessing needs required to set up the integration and lets you focus on key business activities and draw a much powerful insight on how to generate more leads, retain customers, and take your business to new heights of profitability. It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination.

Get Started with Hevo for Free

Check out what makes Hevo amazing:

  • Real-Time Data Transfer: Hevo with its strong Integration with 100+ Sources (including 30+ Free Sources), allows you to transfer data quickly & efficiently. This ensures efficient utilization of bandwidth on both ends.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Tremendous Connector Availability: Hevo houses a large variety of connectors and lets you bring in data from numerous Marketing & SaaS applications, databases, etc. such as HubSpot, Marketo, MongoDB, Oracle, Salesforce, Redshift, etc. in an integrated and analysis-ready form.
  • Simplicity: Using Hevo is easy and intuitive, ensuring that your data is exported in just a few clicks. 
  • Completely Managed Platform: Hevo is fully managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

Steps to Move Data from Aurora to Redshift using AWS Glue

You can follow the below-mentioned steps to connect Aurora to Redshift using AWS Glue:

Step 1: Select the data from Aurora as shown below.

aurora to redshift data migration
Image Source: self

Step 2: Go to AWS Glue and add connection details for Aurora as shown below.

Setting up Connection Properties in AWS Glue
Image Source: self
Setting up Access to Data Store in AWS Glue
Image Source: self
Adding Connection in AWS Glue
Image Source: self
Connections in AWS Glue
Image Source: self

Similarly, add connection details for Redshift in AWS Glue using a similar approach.

Step 3: Once connection details are created create a data catalog for Aurora and Redshift as shown by the image below.

Crawler Information in Aurora
Image Source: self
Add a Data Store in Aurora
Image Source: self
Add Crawler in Aurora
Image Source: self

Once the crawler is configured, it will look as shown below:

Crawlers in AWS Glue
Image Source: self

Step 4: Similarly, create a data catalog for Redshift, you can choose schema name in the Include path so that the crawler only creates metadata for that schema alone. Check the content of the Include path in the image shown below.

Creating Data Catalog in Redshift
Image Source: self

Step 5: Once both the data catalog and data connections are ready, start creating a job to export data from Aurora to Redshift as shown below.

Configuring Redshift
Image Source: self
Mapping in Redshift
Image Source: self
Job Properties in Redshift
Image Source: self
Data Sources and Data Targets in Redshift
Image Source: self

Step 6: Once the mapping is completed, it generates the following code along with the diagram as shown by the image below.

Mapping in AWS Glue
Image Source: self

Once the execution is completed, you can view the output log as shown below.

Output Log in AWS Glue
Image Source: self

Step 7: Now, check the data in Redshift as shown below.

Imported Data in Redshift
Image Source: Self

Advantages of Moving Data using AWS Glue

AWS Glue has significantly eased the complicated process of moving data from Aurora to Redshift. Some of the advantages of using AWS Glue for moving data from Aurora to Redshift include:

  1. The biggest advantage of using this approach is that it is completely serverless and no resource management is needed.
  2. You pay only for the time of query and based on the data per unit (DPU) rate.
  3. If you moving high volume data, you can leverage Redshift Spectrum and perform Analytical queries using external tables. (Replicate data from Aurora and S3 and hit queries over)
  4. Since AWS Glue is a service provided by AWS itself, this can be easily coupled with other AWS services i.e., Lambda and Cloudwatch, etc to trigger the next job processing or for error handling.

Limitations of Moving Data using AWS Glue

Though AWS Glue is an effective approach to move data from Aurora to Redshift, there are some limitations associated with it. Some of the limitations of using AWS Glue for moving Data from Aurora to Redshift include:

  1. AWS Glue is still a new AWS service and is in the evolving stage. For complex ETL logic, it may not be recommended. Choose this approach based on your Business logic
  2. AWS Glue is still available in the limited region. For more details, kindly refer to AWS documentation.
  3. AWS Glue internally uses Spark environment to process the data hence you will not have any other option to select any other environment if your business/use case demand so.
  4. Invoking dependent job and success/error handling requires knowledge of other AWS data services i.e. Lambda, Cloudwatch, etc.

Conclusion

The approach to use AWS Glue to set up Aurora to Redshift integration is quite handy as this avoids doing instance setup and other maintenance. Since AWS Glue provides data cataloging, if you want to move high volume data, you can move data to S3 and leverage features of Redshift Spectrum from the Redshift client. However, unlike using AWS DMS to move Aurora to Redshift, AWS Glue is still in an early stage.

Job and multi-job handling or error handling requires a good knowledge of other AWS services. On the other hand in DMS, you just need to set up replication instances and tasks, and not much handling is needed. Another limitation with this method is that AWS Glue is still in a few selected regions. So, all these aspects need to be considered in choosing this procedure for migrating data from Aurora to Redshift.

If you are planning to use AWS DMS to move data from Aurora to Redshift then you can check out our article to explore the steps to move Aurora to Redshift using AWS DMS.

Visit our Website to Explore Hevo

Businesses can use automated platforms like Hevo Data to set this integration and handle the ETL process. It helps you directly transfer data from a source of your choice to a Data Warehouse, Business Intelligence tools, or any other desired destination in a fully automated and secure manner without having to write any code and will provide you a hassle-free experience.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share your experience of connecting Aurora to Redshift using AWS Glue in the comments section below!

No-code Data Pipeline for Redshift