Connecting DynamoDB to S3 using AWS Glue: 2 Easy Steps

on Tutorial, Data Integration, Database, ETL • August 18th, 2021 • Write for Hevo

Are you trying to derive deeper insights from your Amazon DynamoDB by moving the data into a larger Database like Amazon S3? Well, you have landed on the right article. Now, it has become easier to replicate data from DynamoDB to S3 using AWS Glue.

This article will give you a brief overview of Amazon DynamoDB and Amazon S3. You will also get to know how you can connect your DynamoDB to S3 using AWS Glue. Moreover, the advantages and disadvantages of this method will also be discussed in further sections. Read along to decide which method of connecting DynamoDB to S3 is best for you.

Table of Contents

Prerequisites

You will have a much easier time understanding the steps to connect DynamoDB to S3 using AWS Glue if you have gone through the following aspects:

  • An active AWS account.
  • Working knowledge of Databases.
  • Clear idea regarding the type of data to be transferred.

Introduction to Amazon DynamoDB

Amazon DynamoDB Logo
Image Source

Amazon DynamoDB is a document and key-value database with a millisecond response time. It is an internet-scale database that is fully managed, multi-active, multi-region, and durable, with built-in security, in-memory caching, backup, and restoration. For essential activities, companies like Airbnb, Toyota, Samsung, Lyft, and Capital One rely on DynamoDB’s performance and scalability.

Some of the use cases of DynamoDB include:

  1. Dynamodb is heavily used in e-commerce since it stores the data as a key-value pair with low latency.
  2. Due to its low latency, Dynamodb is used in serverless web applications.

To know more about Amazon DynamoDB, visit this link.

Introduction to Amazon S3

Amazon S3 Logo
Image Source

Amazon Simple Storage Service (Amazon S3) is a web-based cloud storage service that provides high-speed, scalable storage. It was created to back up and archive applications and data on Amazon Web Services.

Amazon S3 is a very useful tool since it allows users to store and retrieve data from anywhere on the internet at any time. This is done using the AWS Management Console, which has a user-friendly online interface. For example, Amazon utilizes S3 to host its websites all over the world. The popularity of S3 is rapidly increasing.

Some of the use cases of Amazon S3 include:

  1. Since S3 is cost-effective, S3 can be used as a backup to store your transient/raw and permanent data.
  2. Using S3, a data lake can be built to perform analytics and as a repository of data.
  3. S3 can be used in Machine Learning, Data profiling, etc.

To know more about Amazon S3, visit this link.

Introduction to AWS Glue

AWS Glue Logo
Image Source

AWS Glue is a serverless ETL service, which is fully managed. Since it is serverless, you do not have to worry about the configuration and management of your resources. Before going through the steps to export DynamoDB to S3 using AWS Glue, here are the use cases of DynamoDB and Amazon S3.

To know more about AWS Glue, visit this link.

Simplify Integrations Using Hevo’s No-code Data Pipeline

Hevo Data helps you directly transfer data from Amazon DynamoDB and 100+ data sources (including 30+ free sources) to Business Intelligence tools, Data Warehouses, or a destination of your choice in a completely hassle-free & automated manner. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Hevo takes care of all your data preprocessing needs required to set up the integration and lets you focus on key business activities and draw a much powerful insight on how to generate more leads, retain customers, and take your business to new heights of profitability. It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. 

Get Started with Hevo for Free

Check out what makes Hevo amazing:

  • Real-Time Data Transfer: Hevo with its strong Integration with 100+ Sources (including 30+ Free Sources), allows you to transfer data quickly & efficiently. This ensures efficient utilization of bandwidth on both ends.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Tremendous Connector Availability: Hevo houses a large variety of connectors and lets you bring in data from numerous Marketing & SaaS applications, databases, etc. such as HubSpot, Marketo, MongoDB, Oracle, Salesforce, Redshift, etc. in an integrated and analysis-ready form.
  • Simplicity: Using Hevo is easy and intuitive, ensuring that your data is exported in just a few clicks. 
  • Completely Managed Platform: Hevo is fully managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

Steps to Connect DynamoDB to MySQL using AWS Glue

This blog post details the steps to move data from DynamoDB to S3 using AWS Glue. This method would need you to deploy precious engineering resources to invest time and effort to understand both S3 and DynamoDB. They would then need to piece the infrastructure together bit by bit. This is a fairly time-consuming process.

Now, let us export data from DynamoDB to S3 using AWS glue. It is done in two major steps:

Step 1: Create a Crawler

The first step in connecting DynamoDB to S3 using AWS Glue is to create a crawler. You can follow the below-mentioned steps to create a crawler.:

  • Create a Database DynamoDB.
Creating DynamoDB Database
Image Source: Self
  • Pick the table CompanyEmployeeList from the tables drop-down list.
Adding Table to DynamoDB
Image Source: Self
  • Let the table info gets created through crawler. Set up crawler details in the window below. Provide crawler name as dynamodb_crawler.
Adding Crawler
Image Source: Self
  • Add database name and DynamoDB table name.
Adding Crawler
Image Source: Self
  • Provide the necessary IAM role to the crawler such that it can access the DynamoDB table. Here, the created IAM role is AWSGlueServiceRole-DynamoDB.
Choosing IAM Role
Image Source: Self
  • You can schedule the crawler. For this illustration, it is running on-demand as the activity is one-time.
Creating a Schedule for Crawler
Image Source: Self
Configuring Crawler's Output
Image Source: Self
  • Review the crawler information.
Reviewing Crawler Information
Image Source: Self
  • Run the crawler.
Running Crawler
Image Source: Self
  • Check the catalog details once the crawler is executed successfully.
Catalog Details
Image Source: Self

Step 2: Exporting Data from DynamoDB to S3 using AWS Glue

Since the crawler is generated, let us create a job to copy data from the DynamoDB table to S3. Here the job name given is dynamodb_s3_gluejob. In AWS Glue, you can use either Python or Scala as an ETL language. For the scope of this article, let us use Python

Job Properties
Image Source: Self
  • Pick your data source.
Picking the Data Source
Image Source: Self
  • Pick your data target.
Picking the Target Data
Image Source: Self
  • Once completed, Glue will create a readymade mapping for you.
Mapping the data
Image Source: Self
  • Once you review your mapping, it will automatically generate python code/job for you.
Review your Mappings
Image Source: Self
  • Execute the Python job.
Python Code Execution
Image Source: Self
  • Once the job completes successfully, it will generate logs for you to review.
Logs
Image Source: Self
  • Go and check files in the bucket. Download the files.
Checking Files in the Bucket
Image Source: Self
  • Review the contents of the file.
Reviewing the Contents of File
Image Source: Self

Advantages of Connecting DynamoDB to S3 using AWS Glue

Some of the advantages of connecting DynamoDB to S3 using AWS Glue include:

  1. This approach is fully serverless and you do not have to worry about provisioning and maintaining your resources
  2. You can run your customized Python and Scala code to run the ETL
  3. You can push your event notification to Cloudwatch
  4. You can trigger Lambda function for success or failure notification
  5. You can manage your job dependencies using AWS Glue
  6. AWS Glue is the perfect choice if you want to create a data catalog and push your data to the Redshift spectrum

Disadvantages of Connecting DynamoDB to S3 using AWS Glue

Some of the disadvantages of connecting DynamoDB to S3 using AWS Glue include:

  1. AWS Glue is batch-oriented and does not support streaming data. In case your DynamoDB table is populated at a higher rate. AWS Glue may not be the right option
  2. AWS Glue service is still in an early stage and not mature enough for complex logic
  3. AWS Glue still has a lot of limitations on the number of crawlers, number of jobs, etc.

Refer to AWS documentation to know more about the limitations. You can also check out how to move data from DynamoDB to Amazon S3 using AWS Data Pipeline.

Conclusion

AWS Glue can be used over AWS Data Pipeline when you do not want to worry about your resources and do not need to take control over your resources ie. EC2 instances, EMR cluster, etc. Thus, connecting DynamoDB to S3 using AWS Glue can help you to replicate data with ease.

Visit our Website to Explore Hevo

Businesses can use automated platforms like Hevo Data to set this integration and handle the ETL process. It helps you directly transfer data from a source of your choice to a Data Warehouse, Business Intelligence tools, or any other desired destination in a fully automated and secure manner without having to write any code and will provide you a hassle-free experience.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share your experience of setting up DynamoDB to S3 Integration in the comments section below!

No-code Data Pipeline for your Data Warehouse