AWS Glue is a serverless ETL service provided by Amazon. Using Glue, you pay only for the time you run your query. In Glue, you create a metadata repository (data catalog) for all RDS engines including Aurora, Redshift, and S3 and create connection, tables and bucket details (for S3). You can build your catalog automatically using crawler or manually. Your ETL internally generates Python/Scala code, which you can customize as well. Since Glue is serverless, you do not have to manage any resources and instances. AWS takes care of it automatically.
Steps to move the data from Aurora to Redshift using AWS Glue:
1. Select the data from Aurora.
2. Go to AWS Glue and add connection details for Aurora.
Similarly, add connection details for Redshift in Glue using a similar approach
3. Once connection details are created, create data catalog for Aurora and Redshift.
Once the crawler is configured, it will look as shown below:
4. Similarly, create a data catalog for Redshift, you can choose schema name in include path so that crawler only creates metadata for that schema alone. Check the content of Include path in the image shown below.
5. Once both the data catalog and data connections are ready, start creating a job to export data from Aurora to Redshift.
6. Once the mapping is completed, it generates the following code along with the diagram.
Once the execution is completed, you can view the output log.
7. Now, check the data in Redshift.
Pros of moving data using this approach
- The biggest advantage of using this approach is that it is completely serverless and no resource management is needed.
- You pay only for the time of query and based on data per unit (DPU) rate.
- If you moving high volume data, you can leverage Redshift Spectrum and perform Analytical queries using external tables. (Replicate data from Aurora and S3 and hit queries over)
- Since Glue is service provided by AWS itself, this can be easily coupled with other AWS services i.e., Lambda and Cloudwatch, etc to trigger next job processing or for error handling.
Cons of moving data using this approach
- Glue is still a new AWS service and in the evolving stage. For complex ETL logic, it may not be recommended. Choose this approach based on your Business logic
- Glue is still available in the limited region. For more details, kindly refer AWS documentation.
- Glue internally uses Spark environment to process the data hence you will not have any other option to select any other environment if your business/use case demand so.
- Invoking dependent job and success/error handling requires knowledge of other AWS data services i.e. Lambda, Cloudwatch etc.
Simpler Way to transfer data from Aurora to Redshift
Using Hevo Data Integration Platform, you can seamlessly replicate data from Aurora to Redshift using 3 simple steps.
- Connect and configure your Aurora database.
- Select the replication mode: (a) load selected Aurora tables (b) load data via Custom Query (c) load data through Binlog.
- For each table in Aurora choose a table name in Redshift where it should be copied.
While you rest, Hevo will take care of retrieving the data and transferring it to your destination warehouse. Unlike AWS Glue, Hevo provides you with an error-free, fully managed set up to move data in minutes.
The approach to use AWS Glue is quite handy as this avoids doing instance setup and other maintenance. Since Glue provides data cataloging, if you want to move high volume data, you can move data to S3 and leverage features of Redshift Spectrum from Redshift client. However, unlike using AWS DMS to move Aurora to Amazon Redshift, Glue is still in an early stage. Job and multi-job handling or error handling require a good knowledge of other AWS services. On the other hand in DMS, you just need to set up replication instance and task and not much handling are needed. Another limitation with this method is that Glue is still in a few selected regions. So, all these aspects need to be considered in choosing this procedure for migrating data from Aurora to Redshift. If you are planning to use AWS DMS to move data from Aurora to Redshift then you can check out our article to explore the steps to move Aurora to Redshift using AWS DMS.