When you use different kinds of databases, there would be a need to migrate data between them frequently. A specific use case that often comes up is the transfer of data from your transactional database to your data warehouse such as transfer/copy data from DynamoDB to Redshift. This article introduces you to AWS DynamoDB and Redshift. It also provides 4 methods (with detailed instructions) that you can use to migrate data from AWS DynamoDB to Redshift.

Loading Data From Dynamo DB To Redshift

Method 1: DynamoDB to Redshift Using Hevo Data

Hevo Data, an Automated No-Code Data Pipeline can transfer data from DynamoDB to Redshift and provide you with a hassle-free experience. You can easily ingest data from the DynamoDB database using Hevo’s Data Pipelines and replicate it to your Redshift account without writing a single line of code. Hevo’s end-to-end data management service automates the process of not only loading data from DynamoDB but also transforming and enriching it into an analysis-ready form when it reaches Redshift.

Get Started with Hevo for Free

Hevo supports direct integrations with DynamoDB and 150+ Data sources (including 40 free sources) and its Data Mapping feature works continuously to replicate your data to Redshift and builds a single source of truth for your business. Hevo takes full charge of the data transfer process, allowing you to focus your resources and time on other key business activities.

Method 2: DynamoDB to Redshift Using Redshift’s COPY Command

This method operates on the Amazon Redshift’s COPY command which can accept a DynamoDB URL as one of the inputs. This way, Redshift can automatically manage the process of copying DynamoDB data on its own. This method is suited for one-time data transfer.

Method 3: DynamoDB to Redshift Using AWS Data Pipeline

This method uses AWS Data Pipeline which first migrates data from DynamoDB to S3. Afterward, data is transferred from S3 to Redshift using Redshift’s COPY command. However, it can not transfer the data directly from DynamoDb to Redshift.

Method 4: DynamoDB to Redshift Using Dynamo DB Streams

This method leverages the DynamoDB Streams which provide a time-ordered sequence of records that contains data modified inside a DynamoDB table. This item-level record of DynamoDB’s table activity can be used to recreate a similar item-level table activity in Redshift using some client application that is capable of consuming this stream. This method is better suited for regular real-time data transfer.

Methods to Copy Data from DynamoDB to Redshift

Copying data from DynamoDB to Redshift can be accomplished in 4 ways depending on the use case. Following are the ways to copy data from DynamoDB to Redshift:

Each of these 4 methods is suited for the different use cases and involves a varied range of effort. Let’s dive in.

Method 1: DynamoDB to Redshift Using Hevo Data

DynamoDb to Redshift: Hevo Logo
Image Source

Hevo Data, an Automated No-code Data Pipeline helps you to directly transfer your AWS DynamoDB data to Redshift in real-time in a completely automated manner. Hevo’s fully managed pipeline uses DynamoDB’s data streams to support Change Data Capture (CDC) for its tables. Hevo also facilitates DynamoDB’s data replication to manage the ingestion information via Amazon DynamoDB Streams & Amazon Kinesis Data Streams

Here are the 2 simple steps you need to use to move data from DynamoDB to Redshift using Hevo: 

  • Step 1) Authenticate Source: Connect your DynamoDB account as a source for Hevo by entering a unique name for Hevo Pipeline, AWS Access Key, AWS Secret Key, and AWS Region. This is shown in the below image.
DynamoDB to Redshift: Configure source
Image Source
  • Step 2) Configure Destination: Configure the Redshift data warehouse as the destination for your Hevo Pipeline. You have to provide, warehouse name, database password, database schema, database port, and database username. This is shown in the below image.
DynamoDB to Redshift: Configure Destination
Image Source

That is it! Hevo will take care of reliably moving data from DynamoDB to Redshift with no data loss. 

Sign Up for a 14 day free Trial

Here are more reasons to try Hevo:

  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to your Redshift schema.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the data pipelines you set up. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.

With continuous real-time data movement, Hevo allows you to combine Amazon DynamoDB data along with your other data sources and seamlessly load it to Redshift with a no-code, easy-to-setup interface.

Method 2: DynamoDB to Redshift Using Redshift’s COPY Command

This is by far the simplest way to copy a table from DynamoDB stream to Redshift. Redshift’s COPY command can accept a DynamoDB URL as one of the inputs and manage the copying process on its own. The syntax for the COPY command is as below. 

copy <target_tablename> from 'dynamodb://<source_table_name>'
authorization
read ratio '<integer>';

For now, let’s assume you need to move product_details_v1 table from DynamoDB to Redshift (to a particular target table) named product_details_tgt. The command to move data will be as follows.

COPY product_details_v1_tgt from dynamodb://product_details_v1
credentials ‘aws_access_key_id = <access_key_id>;aws_secret_access_key=<secret_access_key>  
readratio 40;

The “readratio” parameter in the above command specifies the amount of provisioned capacity in the DynamoDB instance that can be used for this operation. This operation is usually a performance-intensive one and it is recommended to keep this value below 50% to avoid the source database getting busy.

Limitations of Using Redshift’s Copy Command to Load Data from DynamoDB to Redshift

The above command may look easy, but in real life, there are multiple problems that a user needs to be careful about while doing this. A list of such critical factors that should be considered is given below.

  • DynamoDB and Redshift follow different sets of rules for their table names. While DynamoDB allows for the use of up to 255 characters to form the table name, Redshift limits it to 127 characters and prohibits the use of many special characters, including dots and dashs. In addition to that, Redshift table names are case-insensitive.
  • While copying data from DynamoDB to Redshift, Redshift tries to map between DynamoDB attribute names and Redshift column names. If there is no match for a Redshift column name, it is populated as empty or NULL depending on the value of EMPTYASNULL parameter configuration parameter in the COPY command.
  • All the attribute names in DynamoDB that cannot be matched to column names in Redshift are discarded.
  • At the moment, the COPY command only supports STRING and NUMBER data types in DynamoDB.
  • The above method works well when the copying operation is a one-time operation.

Method 3: DynamoDB to Redshift Using AWS Data Pipeline

AWS Data Pipeline is Amazon’s own service to execute the migration of data from one point to another point in the AWS Ecosystem. Unfortunately, it does not directly provide us with an option to copy data from DynamoDB to Redshift but gives us an option to export DynamoDB data to S3. From S3, we will need to used a COPY command to recreate the table in S3. Follow the steps below to copy data from DynamoDB to Redshift using AWS Data Pipeline:

  • Create an AWS Data pipeline from the AWS Management Console and select the option “Export DynamoDB table to S3” in the source option as shown in the image below. A detailed account of how to use the AWS Data Pipeline can be found in the blog post.
DynamoDB to Redshift
  • Once the Data Pipeline completes the export, use the COPY command with the source path as the JSON file location. The COPY command is intelligent enough to autoload the table using JSON attributes. The following command can be used to accomplish the same.
COPY product_details_v1_tgt from s3://my_bucket/product_details_v1.json  credentials ‘aws_access_key_id =          <access_key_id>;aws_secret_access_key=<secret_access_key>   Json = ‘auto’

In the avove command, product_details_v1.json is the output of AWS Data Pipeline execution. Alternately instead of the “auto” argument, a JSON file can be specified to map the JSON attribute names to Redshift columns, in case those two are not matching.

Method 4: DynamoDB to Redshift Using DynamoDB Streams

The above methods are fine if the use case requires only periodic copying of the data from DynamoDB to Redshift. There are specific use cases where real-time syncing from DDB to Redshift is needed. In such cases, DynamoDB’s Streams feature can be exploited to design a streaming copy data pipeline.

DynamoDB Stream provides a time-ordered sequence of records that correspond to item level modification in a DynamoDB table. This item-level record of table activity can be used to recreate an item-level table activity in Redshift using a client application that can consume this stream. Amazon has designed the DynamoDB Streams to adhere to the architecture of Kinesis Streams. This means the customer just needs to create a Kinesis Firehose Delivery Stream to exploit the DynamoDB Stream data. The following are the broad set of steps involved in this method:

  • Enable DynamoDB Stream in the DynamoDB console dashboard.
  • Configure a Kinesis Firehose Delivery Stream to consume the DynamoDB Stream to write this data to S3.
  • Implement an AWS Lambda Function to buffer the data from the Firehose Delivery Stream, batch it and apply the required transformations.
  • Configure another Kinesis Data Firehose to insert this data to Redshift automatically.

Even though this method requires the user to implement custom functions, it provides unlimited scope for transforming the data before writing to Redshift. 

Conclusion

The article provided you with 4 different methods that you can use to copy data from DynamoDB to Redshift. Since DynamoDB is usually used as a transactional database and Redshift as a data warehouse, the need to copy data from DynamoDB is very common.

If you’re interested in learning about the differences between the two, take a look at the article: Amazon Redshift vs. DynamoDB.

Depending on whether the use case demands a one-time copy or continuous sync, one of the above methods can be chosen. Method 2 and Method 2 are simple in implementation but come along with multiple limitations. Moreover, they are suitable only for one-time data transfer between DynamoDB and Redshift. The method using DynamoDB Streams is suitable for real-time data transfer, but a large number of configuration parameters and intricate details have to be considered for its successful implementation

Hevo Data provides an Automated No-code Data Pipeline that empowers you to overcome the above-mentioned limitations. You can leverage Hevo to seamlessly transfer data from DynamoDB to Redshift in real-time without writing a single line of code.

Learn more about Hevo

Want to take Hevo for a spin? Sign up for a 14-day free trial and experience the feature-rich Hevo suite firsthand. Checkout the Hevo pricing to choose the best plan for you.

Share your experience of copying data from DynamoDB to Redshift in the comment section below!

mm
Principal Frontend Engineer, Hevo Data

With over a decade of experience, Suraj has played a crucial role in architecting and developing core frontend modules for Hevo. His expertise lies in building scalable UI solutions, collaborating across teams, and contributing to the open-source community, showcasing a deep commitment to innovation in the tech industry.

No-code Data Pipeline for Redshift