Using DynamoDB Streams Lambda Made Easy

|

DynamoDB Streams Lambda Featured Image

Amazon Web Services (AWS) offers multiple services that are capable of solving some of the biggest data problems. DynamoDB is a fully managed NoSQL Database service offered by AWS that supports key-value and document data structures. DynamoDB Stream is defined as a stream of observed data changes, often known as a Change Data Capture (CDC). DynamoDB Streams are primarily used with AWS Lambda, this article will help you work around DynamoDB Streams Lambda.

DynamoDB Streams capture a time-ordered sequence of item-level changes in a DynamoDB table and store the data for up to 24 hours. The combination of DynamoDB Streams and AWS Lambda unlocks many powerful design patterns. This exercise will show you how to use a Lambda function to consume events from an Amazon DynamoDB Stream. But before getting started with DynamoDB Streams Lambda, let’s discuss both the robust platforms in brief.

Table of Contents

What is DynamoDB Stream?

DynamoDB Streams Lambda: DynamoDB | Hevo Data
Image Source: www.github.com

DynamoDB Stream is defined as a time-ordered sequence of events that reflect the actual sequence of operations in DynamoDB tables in near to real-time. Similar to Change Data Capture (CDC), DynamoDB Streams comprise numerous Insert, Update, and Delete events. Each record is assigned a unique sequence number that is used to sort the data. Stream records are organized as shards, with each shard storing the data needed to retrieve and iterate through the records.

DynamoDB Stream stores the events in a log for 24 hours with no flexibility to extend.  DynamoDB Streams automatically scale the number of shards based on traffic using the built-in partition auto-scaling functionality. AWS DynamoDB Streams’ pricing model charges you only for the number of read requests. Interestingly, while using Lambda to process DynamoDB Stream events, read requests are free. However, you still need to pay for the table’s read and write throughput units.

Replicate DynamoDB Data In Minutes Using Hevo’s No-Code Data Pipeline

Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up data integration from DynamoDB and 100+ Data Sources (including 40+ Free Data Sources) and will let you directly load data to a Data Warehouse.

To further streamline and prepare your data for analysis, you can process and enrich raw granular data using Hevo’s robust & built-in Transformation Layer without writing a single line of code!

Get started with hevo for free

Hevo is the fastest, easiest, and most reliable data replication platform that will save your engineering bandwidth and time multifold. Try our 14-day full access free trial today to experience an entirely automated hassle-free Data Replication!

What is AWS Lambda?

DynamoDB Streams Lambda: AWS Lambda | Hevo Data
Image Source: www.blog.iron.io

AWS Lambda is a Serverless Computing Service offered by Amazon that allows users to run their applications or execute the codes without the management of servers. Developed in 2014, AWS Lambda reduces the efforts, resources, and expenses associated with buying, maintaining, and replacing the infrastructure and hardware. This, in turn, improves the efficiency of Cloud-based operations.

Lambda function allows users to run application codes or any backend service virtually with zero administration. As Lambda executes codes based on the requirement, it can scale automatically from a few requests per day to 1000s of requests per second. It further enables users to trigger Lambda functions over 200 AWS services and Software-as-a-Service (Saas) applications. AWS Lambda pricing model allows users to pay only for the computation they use.

Let’s now discuss how to use DynamoDB Streams Lambda.

How to Use DynamoDB Streams with Lambda?

This section will help you create a Lambda function to consume events from an Amazon DynamoDB Stream. Follow the below-mentioned steps to get started with DynamoDB Streams Lambda.

Prerequisites for DynamoDB Streams Lambda

This exercise requires you to have a basic knowledge of Lambda operations and the Lambda console. To get familiar with Lambda, you can create a Lambda function with the console following the given instructions.

Create Execution Role

The first step of DynamoDB Streams Lambda requires you to create the execution role that allows your function to access AWS resources. Follow the given instructions to create an execution role.

  • Go to the “Roles” page in the IAM Console.
  • Select “Create role”.
  • Now create an execution role with the given properties.
    • Trusted entity: Lambda.
    • Permissions: AWSLambdaDynamoDBExecutionRole.
    • Role name: lambda-dynamodb-role.

The AWSLambdaDynamoDBExecutionRole is equipped with the permissions that your Lambda function will need to read items from DynamoDB and write logs to CloudWatch Logs.

Create the Lambda Function

To create the Lambda function, you’ll need a sample code. This sample code receives a DynamoDB event and processes the messages stored in it. The code also writes some of the incoming event data to CloudWatch Logs.

console.log('Loading function');
exports.handler = function(event, context, callback) {
    console.log(JSON.stringify(event, null, 2));
    event.Records.forEach(function(record) {
        console.log(record.eventID);
        console.log(record.eventName);
        console.log('DynamoDB Record: %j', record.dynamodb);
    });
    callback(null, "message");
};

Follow the given steps to create the Lambda function.

  • You need to copy the sample code above into a file named index.js.
  • Next, run the following command to create a deployment package.
zip function.zip index.js
  • Use the create-function command to create a Lambda function.
aws lambda create-function --function-name ProcessDynamoDBRecords 
--zip-file fileb://function.zip --handler index.handler --runtime nodejs12.x 
--role arn:aws:iam::123456789012:role/lambda-dynamodb-role

Test the Lambda Function

This step requires you to invoke your Lambda function. You can do this manually by using the invoke Lambda CLI command. For the purpose of this demonstration, a sample DynamoDB event is considered.

{
   "Records":[
      {
         "eventID":"1",
         "eventName":"INSERT",
         "eventVersion":"1.0",
         "eventSource":"aws:dynamodb",
         "awsRegion":"us-east-1",
         "dynamodb":{
            "Keys":{
               "Id":{
                  "N":"101"
               }
            },
            "NewImage":{
               "Message":{
                  "S":"New item!"
               },
               "Id":{
                  "N":"101"
               }
            },
            "SequenceNumber":"111",
            "SizeBytes":26,
            "StreamViewType":"NEW_AND_OLD_IMAGES"
         },
         "eventSourceARN":"stream-ARN"
      },
      {
         "eventID":"2",
         "eventName":"MODIFY",
         "eventVersion":"1.0",
         "eventSource":"aws:dynamodb",
         "awsRegion":"us-east-1",
         "dynamodb":{
            "Keys":{
               "Id":{
                  "N":"101"
               }
            },
            "NewImage":{
               "Message":{
                  "S":"This item has changed"
               },
               "Id":{
                  "N":"101"
               }
            },
            "OldImage":{
               "Message":{
                  "S":"New item!"
               },
               "Id":{
                  "N":"101"
               }
            },
            "SequenceNumber":"222",
            "SizeBytes":59,
            "StreamViewType":"NEW_AND_OLD_IMAGES"
         },
         "eventSourceARN":"stream-ARN"
      },
      {
         "eventID":"3",
         "eventName":"REMOVE",
         "eventVersion":"1.0",
         "eventSource":"aws:dynamodb",
         "awsRegion":"us-east-1",
         "dynamodb":{
            "Keys":{
               "Id":{
                  "N":"101"
               }
            },
            "OldImage":{
               "Message":{
                  "S":"This item has changed"
               },
               "Id":{
                  "N":"101"
               }
            },
            "SequenceNumber":"333",
            "SizeBytes":38,
            "StreamViewType":"NEW_AND_OLD_IMAGES"
         },
         "eventSourceARN":"stream-ARN"
      }
   ]
}

Run the invoke command as shown below.

aws lambda invoke --function-name ProcessDynamoDBRecords --payload file://input.txt outputfile.txt

The function will return the string message as the response. You can then verify the output in the outputfile.txt file.

What Makes Hevo’s DynamoDB ETL Process Best-In-Class

Providing a high-quality ETL solution can be a cumbersome task if you just have lots of data. Hevo’s automated, No-code platform empowers you with everything you need to have a smooth DynamoDB ETL experience.

Check out what makes Hevo amazing:

  • Fully Managed: Hevo requires no management and maintenance as it is a fully automated platform.
  • Data Transformation: Hevo provides a simple interface to perfect, modify, and enrich the data you want to transfer.
  • Faster Insight Generation: Hevo offers near real-time data replication so you have access to real-time insight generation and faster decision making. 
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (with 40+ free sources) that can help you scale your data infrastructure as required.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-day free trial!

Create a DynamoDB Table with Stream Enabled

Here, you’ll be creating a DynamoDB table with a stream enabled. Follow the given steps to create a DynamoDB table.

  • Go to the DynamoDB Console and click on “Create table”.
  • Now, create a table and configure it as shown below.
    • Table name: lambda-dynamodb-stream
    • Primary key: id (string)
  • Once you’re done with the configuration, click on “Create”.

Follow the given steps to enable streams.

  • Go to the DynamoDB Console and click on “Tables”.
  • Select the “lambda-dynamodb-stream” table.
  • Click on “DynamoDB stream details” located under “Exports and streams”.
  • Click on “Enable” and select “Enable stream”.
  • Make a note of the Stream ARN as you’ll need this in the next step while associating the stream with your Lambda function.

Add an Event Source in AWS Lambda

This step requires you to add an event source mapping in AWS Lambda. This event source mapping ensures that the DynamoDB Stream is associated with your Lambda function. 

Run the given AWS CLI create-event-source-mapping command to add an event source mapping.

aws lambda create-event-source-mapping --function-name ProcessDynamoDBRecords 
 --batch-size 100 --starting-position LATEST --event-source DynamoDB-stream-arn

You can make a note of the UUID after running the command as it will be required later to refer to the event source mapping in any commands.

The above command creates a mapping between the specified DynamoDB Stream and the Lambda function created earlier. AWS Lambda will begin polling the stream after you set up this event source mapping.

Run the given command to get the list of event source mappings.

aws lambda list-event-source-mappings

The list returns all of the event source mappings you created. This will also display the LastProcessingResult for each mapping.

Test the Setup

You can perform a set of tasks in order to test the end-to-end experience.

  • You can add, update, and delete items from the table using the DynamoDB Console. DynamoDB will write records of these operations to the stream.
  • AWS Lambda polls the stream and it will invoke your Lambda function when it detects any updates/modifications to the stream. This is accomplished by passing in the stream’s event data.
  • You can also verify the logs reported by your function in the Amazon CloudWatch Console.

Clean Up Resources

The final step of DynamoDB Streams Lambda is the cleaning up of resources that are no longer in use to prevent unnecessary charges to your AWS Account.

Follow the given steps to delete the Lambda function.

  • Go to the “Functions” page in the Lambda Console and choose the function you created.
  • Click on “Actions” and then select “Delete”.

Follow the given steps to delete the execution role

  • Go to the “Roles” page in the IAM Console.
  • Select the execution role you created and click on “Delete role”.
  • Click on “Yes, delete” to complete the process.

Follow the given steps to delete the DynamoDB Table.

  • Go to the “Tables” page in the DynamoDB Console.
  • Select the table you created and click on “Delete”.
  • Enter “delete” in the text box and again click on “Delete”.

That’s it, you’ve successfully implemented DynamoDB Streams Lambda to consume events from a DynamoDB Stream.

Why Use DynamoDB Streams Lambda Together?

DynamoDB Streams Lambda unlocks many powerful design patterns and functionality. You can use AWS Lambda to process DynamoDB streams. Lambda runs your function based on a DynamoDB Streams event (Insert/Delete/Update an item) and polls the DynamoDB Stream. It will invoke your function as soon as it detects the new record and will pass in one or more events. You can use DynamoDB Streams Lambda for various use cases, for instance, you can replicate items from one DynamoDB table to another by using DynamoDB Streams and Lambda functions.

DynamoDB Streams Lambda | Hevo Data
Image Source: www.amazon-dynamodb-labs.com

Lambda automatically scales depending on the throughput and its maximum execution time is 900 seconds per request. It fetches the records and processes them depending on the batch size specified by the user. Additionally, Lambda is a fully managed and highly available service. It doesn’t require maintenance windows or downtimes, it’s always up and running.

Conclusion

DynamoDB is a fully managed NoSQL Database service offered by AWS that captures a time-ordered sequence of modifications/changes in any DynamoDB table and stores it in a log for up to 24 hours. AWS Lambda polls the DynamoDB Streams and invokes the function after detecting a new record.

This article introduced you to DynamoDB Streams and AWS Lambda and later helped you work around DynamoDB Streams Lambda. However, in businesses, extracting complex data from a diverse set of Data Sources can be a challenging task and this is where Hevo saves the day!

visit our website to explore hevo

Hevo Data with its strong integration with 100+ Sources & BI tools such as DynamoDB, allows you to not only export data from multiple sources & load data to the destinations, but also transform & enrich your data, & make it analysis-ready so that you can focus only on your key business needs and perform insightful analysis using BI tools.

Give Hevo Data a try and sign up for a 14-day free trial today. Hevo offers plans & pricing for different use cases and business needs, check them out!

Share your experience of working with DynamoDB Streams Lambda in the comments section below.

Raj Verma
Business Analyst, Hevo Data

Raj, a data analyst with a knack for storytelling, empowers businesses with actionable insights. His experience, from Research Analyst at Hevo to Senior Executive at Disney+ Hotstar, translates complex marketing data into strategies that drive growth. Raj's Master's degree in Design Engineering fuels his problem-solving approach to data analysis.

No-code Data Pipeline For DynamoDB