DynamoDB is Amazon’s document-oriented, high performance, NoSQL database. Given it is a NoSQL database, it is hard to run SQL queries to analyze the data. It is essential to move data from DynamoDB to Redshift, convert it into a relational format for seamless analysis.
What is DynamoDB?
Fully managed by Amazon, DynamoDB is a NoSQL database service that provides high-speed and highly scalable performance. DynamoDB can handle around 20 million requests per second. It’s serverless architecture and on-demand scalability make it a solution that is widely preferred.
What is Amazon Redshift?
A widely used Data Warehouse, Amazon Redshift is an enterprise-class RDBMS. Amazon Redshift provides a high-performance MPP, columnar storage set up, highly efficient targeted data compression encoding schemes, making it a natural choice for Data Warehousing and analytical needs.
Amazon Redshift has excellent business intelligence abilities and a robust SQL-based interface. Amazon Redshift allows you to perform complex data analysis queries, complex joins with other tables in your AWS Redshift cluster and queries can be used in any reporting application to create dashboards or reports.
DynamoDB to Redshift Migration:
Broadly, there are two popular methods to move data from DynamoDB to Amazon Redshift.
Method 1: A ready to use Hevo Data Integration Platform (7 Day Free Trial).
Method 2: Building a custom ETL solution using Copy Utility.
This article covers the steps involved in writing custom code to ETL from DynamoDB to Redshift. Towards the end, the blog also covers the limitations of this approach and highlights alternative ways to achieve the same outcome in a simpler manner.
DynamoDB to Redshift Replication Using Custom Code:
Using Copy Command
As a prerequisite, you must have a table created in Amazon Redshift before loading data from DynamoDB table to Redshift. As we are copying data from NoSQL DB to RDBMS, we need to apply some changes/transformations before loading it to the target database. For example, some of the DynamoDB data types do not correspond directly to those of Amazon Redshift. While loading, one should ensure that each column in the Redshift table is mapped to the correct data type and size.
Before you migrate data from DynamoDB to Redshift create a table in Redshift using the following command.
Create a table in DynamoDB by logging into the AWS console.
Add data into DynamoDB Table by clicking on Create Item.
Use COPY command to copy data from DynamoDB to Redshift in the Employee Table.
copy emp.emp from 'dynamodb://Employee' iam_role 'IAM_Role' readratio 10;
Verify that data got copied successfully.
Limitations of Using Custom Code To ETL data from DynamoDB to Redshift
There are a handful of limitations while performing ETL from DynamoDB to Redshift using the Custom Code method. Read the following:
DynamoDB table names can contain up to 255 characters, including ‘.’ (dot) and ‘-‘ (dash) characters, and are case-sensitive. However, Amazon Redshift table names are limited to 127 characters, cannot include dots or dashes and are not case-sensitive. Also, we cannot use Amazon Redshift reserved words.
Unlike SQL databases, DynamoDB does not support NULL. Interpretation of empty or blank attribute values in DynamoDB should be specified to Redshift. In Redshift, these can be treated as either NULLs or empty fields.
Following data parameters are not supported along with COPY from DynamoDB:
Leveraging Distribution Keys for Efficiency
The COPY command leverages Redshift’s massively parallel processing(MPP) architecture to read and stream data in parallel from an Amazon DynamoDB table. By leveraging redshift distribution keys, you can make the best out of Redshift parallel processing architecture.
A Faster And Easier Alternative
A cloud data migration service like Hevo Data Integration Platform (7 Day Free Trial) can constantly serve you even as your business needs evolve with more data volume/data sources coming-up. Using Hevo’s point and click set up, you can perform DynamoDB to Redshift migration in a matter of minutes.
The process of writing custom code to perform DynamoDB to Redshift replication is tedious and needs a whole bunch of precious engineering resources invested in this. As your data grows, the complexities will grow too, making it necessary to invest resources on an ongoing basis for monitoring and maintenance.
Hevo handles all the aforementioned limitations automatically, thereby drastically reducing the effort that you and your team will have to put in.