Modern ETL (Extract, Transform, and Load) architectures are all about exploiting the simplicity of completely managed Cloud Database services and maximizing the value for your ETL spend by paying only for what you use. There are various kinds of completely managed Database services optimized for different use cases like Online Transactional Processes, Online Analytical Processing, etc.
Naturally, when you use different kinds of Databases, there would be a need to migrate data between them frequently. A specific use case that comes up frequently is the transfer of data from your Transactional Database to your Data Warehouse such as transfer/copy data from DynamoDB to Redshift. This article introduces you to AWS DynamoDB and Redshift. It also provides 4 methods (with detailed instructions) that you can use to migrate data from AWS DynamoDB to Redshift.
Table of Contents
- What is AWS DynamoDB?
- What is Amazon Redshift?
- Methods to Copy Data from DynamoDB to Redshift
What is AWS DynamoDB?
AWS DynamoDB is a completely managed NoSQL Database offered as a service by Amazon. It is a Document-Based Database that stores data as records of Keys and Values. Each record is mandated to have a Primary Key that will be used to retrieve the data. It provides an exclusive querying layer that can execute queries based on the Primary Key and the Key-Value pairs of individual records. It is a highly available Database service with built-in mechanisms for point-in-time backups and in-memory caching.
Because of its nature of being a NoSQL Database, DynamoDB does not impose any structure on the data other than for the key attributes. Moreover, pricing for DynamoDB is in terms of the number of requests serviced and occupied data storage. Scaling can be done on an on-demand basis or based on a provisioned upper limit. DynamoDB also offers a mechanism called Streams.
For more information on AWS DynamoDB, click here.
What is Amazon Redshift?
Amazon Redshift is a managed Data Warehouse offered as a service. Amazon Redshift stores data in a structured format and provides a Postgres compatible querying layer. Amazon Redshift can scale up to 2 PB of data and allows the customers to pay only for what they use. Pricing is based on the number of instances required in the use case. Amazon Redshift offers two kinds of instance types: Compute Optimized and Storage Optimized; from which the customers can choose based on their use case.
Amazon Redshift architecture includes a cluster of nodes. One among the nodes is designated as a Leader Node and others as Compute Nodes. The Leader Node is responsible for client communication, query optimization, and assigning tasks to the Compute Node. Amazon Redshift also offers a service called Amazon Redshift spectrum which allows the customers to process data that are located in S3 and not yet added as Amazon Redshift Tables.
For more information on Amazon Redshift, click here.
Loading Data From Dynamo DB To Redshift
This method operates on the Amazon Redshift’s COPY command which can accept a DynamoDB URL as one of the inputs. This way, Redshift can automatically manage the process of copying DynamoDB data on its own. This method is suited for one-time data transfer.
This method uses AWS Data Pipeline which first migrates data from DynamoDB to S3. Afterward, data is transferred from S3 to Redshift using Redshift’s COPY command. However, it can not transfer the data directly from DynamoDb to Redshift.
This method leverages the DynamoDB Streams which provide a time-ordered sequence of records that contains data modified inside a DynamoDB table. This item-level record of DynamoDB’s table activity can be used to recreate a similar item-level table activity in Redshift using some client application that is capable of consuming this stream. This method is better suited for regular real-time data transfer.
Hevo Data, an Automated No Code Data Pipeline can transfer data from DynamoDB to Redshift and provide you with a hassle-free experience. You can easily ingest data from the DynamoDB database using Hevo’s Data Pipelines and replicate it to your Redshift account without writing a single line of code. Hevo’s end-to-end Data Management service automates the process of not only loading data from DynamoDB but also transforming and enriching it into an analysis-ready form when it reaches Redshift.
Hevo supports direct integrations with DynamoDB and 100+ sources (including 40 free sources) and its Data Mapping feature works continuously to replicate your data to Redshift and builds a single source of truth for your business. Hevo takes full charge of the data transfer process, allowing you to focus your resources and time on other key business activities.
Methods to Copy Data from DynamoDB to Redshift
Copying data from DynamoDB to Redshift can be accomplished in 4 ways depending on the use case. Following are the ways to copy data from DynamoDB to Redshift:
- Method 1: DynamoDB to Redshift Using Redshift’s COPY Command
- Method 2: DynamoDB to Redshift Using AWS Data Pipeline
- Method 3: DynamoDB to Redshift Using DynamoDB Streams
- Method 4: DynamoDB to Redshift Using Hevo Data
Each of these 4 methods is suited for the different use cases and involves a varied range of effort. Let’s dive in.
Method 1: DynamoDB to Redshift Using Redshift’s COPY Command
This is by far the simplest way to copy a table from DynamoDB to Redshift. Redshift’s COPY command can accept a DynamoDB URL as one of the inputs and manage the copying process on its own. The syntax for the COPY command is as below.
copy <target_tablename> from 'dynamodb://<source_table_name>' authorization read ratio '<integer>';
For now, let’s assume you need to move product_details_v1 table from DynamoDB to Redshift (to a particular target table) named product_details_tgt. The command to move data will be as follows.
COPY product_details_v1_tgt from dynamodb://product_details_v1 credentials ‘aws_access_key_id = <access_key_id>;aws_secret_access_key=<secret_access_key> readratio 40;
The “readratio” parameter in the above command specifies the amount of provisioned capacity in the DynamoDB instance that can be used for this operation. This operation is usually a performance-intensive one and it is recommended to keep this value below 50% to avoid the source Database getting busy.
Limitations of Using Redshift’s Copy Command to Load Data from DynamoDB to Redshift
The above command may look easy, but in real life, there are multiple problems that a user needs to be careful about while doing this. A list of such critical factors that should be considered is given below.
- DynamoDB and Redshift follow different sets of rules for their table names. While DynamoDB allows for use of up to 255 characters to form the table name, Redshift limits it to 127 characters and prohibits the use of many special characters including dots and dash. Add to that Redshift table names are case insensitive.
- While copying data from DynamoDB to Redshift, Redshift tries to map between DynamoDB attribute names and Redshift column names. If there is no match for a Redshift column name, it is populated as empty or NULL depending on the value of EMPTYASNULL parameter configuration parameter in the COPY command.
- All the attribute names in DynamoDB that cannot be matched to column names in Redshift are discarded.
- At the moment, the COPY command only supports STRING and NUMBER data types in DynamoDB.
- The above method works well when the copying operation is a one-time operation. The COPY command by default always appends the new data to the existing table. Since Redshift lacks the ability to enforce Primary Key constraints, this can be a problem, if the copying operation is a recurring activity. In such cases, a Staging Table will need to be used. The idea is to copy all data to a Redshift Staging Table, use an inner join to find the rows that have the common columns, delete those columns first, and then copy from the Staging Table to the Target Table.
- The COPY command by default uses automatic compression in case the Target Table is empty and no compression encoding is specified. This requires some data from DynamoDB to be sampled. The number of rows to be sampled is specified through the value of COMPROWS parameter. Data transfer required in this operation is also part of the DynamoDB table’s provisioned capacity.
It is completely understandable if an inexperienced user finds all these intricate details a bit too overwhelming. A hassle-free ETL platform like Hevo (covered below) could provide a lifeline in such cases. Hevo provides a very easy-to-use interface that can configure data loads through simple point-and-click.
Method 2: DynamoDB to Redshift Using AWS Data Pipeline
AWS Data Pipeline is Amazon’s own service to execute the migration of data from one point to another point in the AWS Ecosystem. Unfortunately, it does not directly provide us with an option to copy data from DynamoDB to Redshift but gives us an option to export DynamoDB data to S3. From S3, we will need to used a COPY command to recreate the table in S3. Follow the steps below to copy data from DynamoDB to Redshift using AWS Data Pipeline:
- Create an AWS Data pipeline from the AWS Management Console and select the option “Export DynamoDB table to S3” in the source option as shown in the image below. A detailed account of how to use the AWS Data Pipeline can be found in the blog post here.
- Once the Data Pipeline completes the export, use the COPY command with the source path as the JSON file location. The COPY command is intelligent enough to autoload the table using JSON attributes. The following command can be used to accomplish the same.
In the below command, product_details_v1.json is the output of AWS Data Pipeline execution. Alternately instead of the “auto” argument, a JSON file can be specified to map the JSON attribute names to Redshift columns, in case those two are not matching.
COPY product_details_v1_tgt from s3://my_bucket/product_details_v1.json credentials ‘aws_access_key_id = <access_key_id>;aws_secret_access_key=<secret_access_key> Json = ‘auto’
Method 3: DynamoDB to Redshift Using DynamoDB Streams
The above 2 methods are fine if the use case requires only periodic copying of the data from DynamoDB to Redshift. There are specific use cases where real-time syncing from DynamoDB to Redshift is needed. In such cases, DynamoDB’s Streams feature can be exploited to design a streaming copy Data Pipeline.
DynamoDB Stream provides a time-ordered sequence of records that correspond to item level modification in a DynamoDB table. This item-level record of table activity can be used to recreate an item-level table activity in Redshift using a client application that can consume this stream. Amazon has designed the DynamoDB Streams to adhere to the architecture of Kinesis Streams. This means the customer just needs to create a Kinesis Firehose Delivery Stream to exploit the DynamoDB Stream data. The following are the broad set of steps involved in this method:
- Enable DynamoDB Stream in the DynamoDB console dashboard.
- Configure a Kinesis Firehose Delivery Stream to consume the DynamoDB Stream to write this data to S3.
- Implement an AWS Lambda Function to buffer the data from the Firehose Delivery Stream, batch it and apply the required transformations.
- Configure another Kinesis Data Firehose to automatically insert this data to Redshift.
Even though this method requires the user to implement custom functions, it provides unlimited scope for transforming the data before writing to Redshift.
Method 4: DynamoDB to Redshift Using Hevo Data
Hevo Data, an Automated No-code Data Pipeline helps you to directly transfer your DynamoDB data to Redshift in real-time in a completely automated manner. Hevo’s fully managed pipeline uses DynamoDB’s data streams to support Change Data Capture (CDC) for its tables. Hevo also facilitates DynamoDB’s data replication to manage the ingestion information via Amazon DynamoDB Streams & Amazon Kinesis Data Streams.
Hevo’s fault-tolerant architecture will enrich and transform your DynamoDB data in a secure and consistent manner and load it to Redshift without any assistance from your side. You can entrust us with your data transfer process and enjoy a hassle-free experience. This way, you can focus more on Data Analysis, instead of data consolidation.
Here are the 2 simple steps you need to use to move data from DynamoDB to Redshift using Hevo:
- Step 1) Authenticate Source: Connect your DynamoDB account as a source for Hevo by entering a unique name for Hevo Pipeline, AWS Access Key, AWS Secret Key, and AWS Region. This is shown in the below image.
- Step 2) Configure Destination: Configure the Redshift Data Warehouse as the destination for your Hevo Pipeline. You have to provide, Warehouse name, Database Password, Database Schema, Database Port, and Database username. This is shown in the below image.
That is it! Hevo will take care of reliably moving data from DynamoDB to Redshift with no data loss.
Here are more reasons to try Hevo:
- Secure: Hevo has a fault-tolerant architecture that ensures that your DynamoDB data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to your Redshift schema.
- Quick Setup: Hevo with its automated features, can be set up in minimal time. Moreover, with its simple and interactive UI, it is extremely easy for new customers to work on and perform operations.
- Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use for aggregation.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
With continuous Real-Time data movement, Hevo allows you to combine Amazon DynamoDB data along with your other data sources and seamlessly load it toRedshift with a no-code, easy-to-setup interface. Try our 14-day full-feature access free trial!Get Started with Hevo for Free
The article introduced you to AWS DynamoDB and Amazon Redshift. It also provided you with 4 different methods that you can use to copy data from DynamoDB to Redshift. Since DynamoDB is usually used as a Transactional Database and Redshift as a Data Warehouse, the need to copy data from DynamoDB is very common.
Depending on whether the use case demands a 1-time copy or continuous sync, one of the above methods can be chosen. Method 1 and Method 2 are simple in implementation but come along with multiple limitations. Moreover, they are suitable only for one-time data transfer between DynamoDB and Redshift. The method using DynamoDB Streams is suitable for real-time data transfer, but a large number of configuration parameters and intricate details have to be considered for its successful implementation
Hevo Data provides an Automated No-code Data Pipeline that empowers you to overcome the above-mentioned limitations. You can leverage Hevo to seamlessly transfer data from DynamoDB to Redshift in real-time without writing a single line of code. Hevo’s Data Pipeline enriches your data and manages the transfer process in a fully automated and secure manner. Hevo caters to 100+ data sources (including 40+ free sources) and can directly transfer data to Data Warehouses, Business Intelligence Tools, or any other destination of your choice in a hassle-free manner. It will make your life easier and make data migration hassle-free.Learn more about Hevo
Want to take Hevo for a spin? Sign up for a 14-day free trial and experience the feature-rich Hevo suite firsthand.
Share your experience of copying data from DynamoDB to Redshift in the comment section below!