DynamoDB Streams: How To Sync Data Conveniently In Real-Time

on Data Integration, Tutorials • September 23rd, 2021 • Write for Hevo

When you want the data to be continuously available for use and for processing, the periodic data capture and movement are not useful, and that’s when you need real-time data sync between systems. That can be accomplished with the help of ‘DynamoDB Streams’ to set up a data streaming pipeline. 

This article is all about how you can sync the data in real-time using DynamoDB Streams. You will understand the entire infrastructure that is required to set up the sync between source and destination. You will go through the process step-by-step with examples.

Here is how this article structured:

Let’s take a look at this in detail.

What Are DynamoDB Streams?

DynamoDB database system originated from the principles of Dynamo, a progenitor of NoSQL, and introduces the power of the cloud to the NoSQL database world. It falls under the non-relational databases.

The streams are a feature of DynamoDB that emits events when record modifications occur on a DynamoDB table. The events can be of any type -insert, update, or remove and carry the content of the rows being modified. You can perform the customization on the streams, let’s say if you want to see the record before the event and after the event so the change on the data is identifiable. The order of the events is in the sequential order as the modifications happen.

What Is The Lambda function?

The AWS Lambda is a service that provides a way to execute your code without having to manage servers. This service can be hooked up with any type of application or backend.

What Is CloudWatch?

It is a monitoring service that provides the performance and operational data and actionable insights in the form of logs and metrics for AWS, hybrid, and on-premises applications and infrastructure resources. 

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as twilio, Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process. It supports 100+ data sources (including 30+ free data sources) like Asana and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

GET STARTED WITH HEVO FOR FREE

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
SIGN UP HERE FOR A 14-DAY FREE TRIAL

Prerequisites

It would be helpful if you have an idea about the following areas before you want to learn the real-time data sync using DynamoDB Streams.

  1. DynamoDB as a source system.
  2. AWS Lambda function.
  3. Any other system like CloudWatch, ElasticSearch, RDS, MySQL, DynamDB itself as a destination. In this article, you will consider the CloudWatch as a destination system where the data will be synced.
  4. Knowledge about Node.js.
  5. You will use CLI to run commands.

Set-Up For Real-Time Data Sync

You will quickly take a look at how the data will flow along with the systems involved.

Whenever the CRUD operation happens on the DynamoDB table, the change is recorded in the form of events in DynamoDB Stream. The Lambda function then connects to the other system and sends the data accordingly to that destination system.

Different Process for Real-time Data Sync for DynamoDB Streams

Below are the high-level details of the set-up without much technicalities to understand it at a glance.

  • Enable the DynamoDB Stream in the DynamoDB Console.
  • Read change events that are occurring on the table in real-time. Every time an insertion happens, you can get an event.
  • Hook up a Lambda to DynamDB Stream. Every time an event occurs, you have a Lamda that gets involved.
  • Lamda’s arguments are the content of the change that occurred.
  • Replicate the content in the datastore. (The other system could be CloudWatch, DynamoDB, SNS, Kinesis, etc, depending on the use case).

Now let’s dig into all of these steps by step with pieces of codes that you can understand in detail.

  1. Create IAM Role
  2. Create AWS Lambda Function
  3. Run Lambda Create-Function
  4. Create DynamoDB Table
  5. Enable DynamoDB Streams
  6. Event Mapping Of Lambda Function
  7. Set CRUD Operations
  8. Sync Data In Real-Time

1. Create IAM Role

In the AWS management console, you will need an IAM role to execute the Lambda function. Let’s create an IAM role first for the AWS Lambda entity. 

IAM role for entity AWS Lambda:

role name: lambda-dynamodb-role

policy with permission: AWSLambdaDynamoDBExecutionRole.

DynamoDB Streams: Create IAM Role
Image Source

2. Create AWS Lambda Function

Now you will create an AWS Lambda function for which you will use Node.js to write code. The Node.js is one of the languages supported in the AWS Lambda function.

Please note that you can use any other language supported in the Lambda function.

Let’s create index.js with the following piece of code and compress it using any compress utility. The file can be compressed with any tool that you may have or want to use.

console.log('Loading function');
exports.handler = function(event, context, callback) {
console.log(JSON.stringify(event, null, 2));
 event.Records.forEach(function(record) {
    console.log(record.eventID);
    console.log(record.eventName);
    console.log('DynamoDB Record: %j', record.dynamodb);
    });
    callback(null, "message"); 
};

You will do it using the java jar command. Here is the command that you will use in the CLI tool:

jar -cfM function.zip index.js

If you look at the command closely, you have mentioned which file needs to be compressed and what would be the name of the compressed file.

DynamoDB Streams: AWS Lambda Function Process
Image Source

3. Run Lambda Create-Function

You will now run Lambda create-function CLI command with IAM role ARN:

aws lambda create-function --function-name ProcessDynamoDBRecords --zip-file fileb://function.zip --handler index.handler --runtime nodejs8.10 --role arn:aws:iam::392821317968:role/lambda-dynamodb-role

ARN needs to be taken from the role that you created in the AWS console. 

DynamoDB Streams: Run Lambda Create-Function
Image Source

Once you run the command, the lambda function will be created, and you need to test it by invoking the function. Here is the command for the same:

aws lambda invoke --function-name ProcessDynamoDBRecords --payload file://input.txt outputfile.txt

This command will take the sample DynamoDB data, which is there in the input.txt. The same will be passed to the lambda function, which will then write the output in the outputfile.txt as per the input file and function execution. This step is just for testing to ensure that your function is working as expected. It has no relation to the real-time data sync that you are covering in this blog.

4. Create DynamoDB Table

Next step is to create a DynamoDB table with following details:

Table name: lambda-dynamodb-stream

Primary key: id (number)

DynamoDB Streams: Create DynamoDB Table
Image Source

5. Enable DynamoDB Streams

Now enable the DynamoDB Stream as shown below:

Enabling DynamoDB Streams
Image Source

Once the stream is enabled by clicking on the “Manage Stream” button, copy the Latest Stream ARN as shown in the screenshot:

DynamoDB Streams: Enable DynamoDB Streams
Image Source

6. Event Mapping Of Lambda Function

The Latest Stream ARN copied in the previous step will be used to create event mapping of the Lambda function with the stream.

aws lambda create-event-source-mapping --function-name ProcessDynamoDBRecords --batch-size 100 --starting-position LATEST --event-source arn:aws:dynamodb:ap-south-1:392821317968:table/lambda-dynamodb-stream/stream/2019-06-29T12:39:47.680

This means that if any event happens, it will cause the function to be triggered, to execute which can then sync the data to the destination system.

Verify the list of event mapping by executing the following commands.

aws lambda list-event-source-mappings
aws lambda list-event-source-mappings --function-name ProcessDynamoDBRecords

7. Set CRUD Operations

Now you have set up everything. You need to try CRUD operations on the DynamoDB table.

See whether the data logs are getting captured in the AWS CloudWatch dashboard, which is made available for monitoring and insights.

DynamoDB Streams: Set CRUD Operations
Image Source

8. Sync Data In Real-Time

You will see the data inserted into the table on DynamoDB is being synced in real-time in the CloudWatch logs as shown in the screenshot below:

DynamoDB Streams: Sync Data In Real-Time
Image Source

Benefits Of DynamoDB Streams

Let’s look at some benefits of DynamoDB Streams:

  1. Well suited for distributed systems infrastructure.
  2. Faster and more efficient processing.
  3. Better availability and scalability.
  4. Enhanced performance.

Conclusion

With the above benefits, the DynamoDB Streams stands out with efficiency and performance, however one needs to write custom code to manage the data changes and deal with keeping the data warehousing engines in sync wherever there are changes in business requirements. So, if you don’t want to deal with the hassle of setting up code, try Hevo.

visit our website to explore hevo

Hevo is a No-code Data Pipeline. It can migrate the data from DynamoDB to your desired data warehouse in minutes. It offers hassle-free data integration from 100+ data sources.

SIGN UP for a 14-day free trial and see the difference!

Let us know about your experience with DynamoDB Streams in the comment section below.

No-code Data Pipeline for DynamoDB