DynamoDB to S3 Using AWS Glue: Steps to Export Data

on Tutorial • February 26th, 2019 • Write for Hevo

AWS Glue is a serverless ETL service, which is fully managed. Since it is serverless, you do not have to worry about the configuration and management of your resources. Before going through the steps to export DynamoDB to S3 using AWS Glue, here are the use cases of DynamoDB and Amazon S3.

DynamoDB Use-cases:

  1. Dynamodb is heavily used in e-commerce since it stores the data as a key-value pair with low latency.
  2. Due to its low latency, Dynamodb is used in serverless web applications.

S3 Use-cases:

  1. Since S3 is cost-effective, S3 can be used as a backup to store your transient/raw and permanent data.
  2. Using S3, data lake can be built to perform analytics and as a repository of data.
  3. S3 can be used in Machine Learning, Data profiling, etc.

This blog post details the steps to move data from DynamoDB to S3 using AWS Glue. This method would need you to deploy precious engineering resources to invest time and effort to understand both S3 and DynamoDB. They would then need to piece the infrastructure together bit by bit. This is a fairly time-consuming process.

In case you are looking to instantly move data from DynamoDB to S3 without having to write any code then you should try Hevo.

Hevo – a No-code Data Pipeline to Move Data from DynamoDB to S3

Hevo helps you load data from DynamoDB to S3 in real-time without having to write a single line of code. Hevo is a completely managed platform that can be set up in minutes on a point and click interface. Once set up, Hevo takes care of reliably loading data from DynamoDB to S3.

Sign up for a 14-day free trial here to explore a hassle-free data migration.

Now, let us export data from DynamoDB to S3 using AWS glue. It is done in two major steps:

  1. Creating a crawler
  2. Exporting data from DynamoDB to S3.

A. Steps to Create a Crawler:

  • Create a Database DynamoDB.
    aws glue dynamodb to s3
  • Pick the table CompanyEmployeeList from tables drop-down list.
    export dynamodb to s3 aws glue
  • Let the table info gets created through crawler. Set up crawler details in the window below. Provide crawler name as dynamodb_crawler.
    export dynamodb table using crawler setup
  • Add database name and DynamoDB table name.
    dynamodb to s3 export table properties add data source move dynamodb to s3
  • Provide the necessary IAM role to crawler such that it can access the DynamoDB table. Here, the created IAM role is AWSGlueServiceRole-dynamodb.
    iam role dynamodb to s3
  • You can schedule the crawler. For this illustration, it is running on demand as the activity is one-time.
    import data from dynamodb to s3 schedule crawlerdynamodb to s3 glue crawler
  • Review the crawler information.
    crawler information
  • Run the crawler.
    run crawler
  • Check the catalog details once crawler is executed successfully.
    aws glue tables

B. Export Data from DynamoDB to S3

Since the crawler is generated, let us create a job to copy data from DynamoDB table to S3. Here the job name given is dynamodb_s3_gluejob. In AWS Glue, you can use either Python or Scala as an ETL language. For the scope of this article, let us use Python

  • Pick your data source.
    data sources dynamodb to s3
  • Pick your data target.
    data target dynamodb to s3
  • Once completed, Glue will create a readymade mapping for you.
    add job dynamodb to s3
  • Once you review your mapping, it will automatically generate python code/job for you.
    dynamodb to s3 job
  • Execute the Python job.
    dynamodb stream to s3
  • Once the job completes successfully, it will generate logs for you to review.
    dynamodb to s3 logs
  • Go and check files in the bucket. Download the files.
    files check s3 bucket
  • Review the contents of the file.
    dynamodb to s3 migrationcheck s3 bucket datamove data to s3 bucket

Advantages of exporting DynamoDB to S3 using AWS Glue:

  1. This approach is fully serverless and you do not have to worry about provisioning and maintaining your resources
  2. You can run your customized Python and Scala code to run the ETL
  3. You can push your event notification to Cloudwatch
  4. You can trigger Lambda function for success or failure notification
  5. You can manage your job dependencies using AWS Glue
  6. AWS Glue is the perfect choice if you want to create data catalog and push your data to Redshift spectrum

Disadvantages of exporting DynamoDB to S3 using AWS Glue of this approach:

  1. AWS Glue is batch-oriented and it does not support streaming data. In case your DynamoDB table is populated at a higher rate. AWS Glue may not be the right option
  2. AWS Glue service is still in an early stage and not mature enough for complex logic
  3. AWS Glue still has a lot of limitations on the number of crawlers, number of jobs etc.

Refer AWS documentation to know more about the limitations. You can also check out how to move data from DynamoDB to Amazon S3 using AWS Data Pipeline.

Faster & Efficient Way to Import DynamoDB to S3

Using Hevo Data Integration Platform, you can seamlessly export data from DynamoDB to S3 using 2 simple steps.

  • Connect and configure your DynamoDB database.
  • For each table in DynamoDB choose a table name in Amazon S3 where it should be copied.

AWS Glue can be used over AWS Data Pipeline when you do not want to worry about your resources and do not need to take control over your resources ie. EC2 instances, EMR cluster etc. However, considering AWS Glue on early stage with various limitations, Glue may still not be the perfect choice for copying data from Dynamodb to S3. You can sign up with Hevo (7-day free trial) and set up DynamoDB to S3 in minutes. 

No-code Data Pipeline for S3