Press "Enter" to skip to content

DynamoDB to S3 Using AWS Glue: Steps to Export Data

dynamodb to s3 using aws glue
AWS Glue is a serverless ETL service, which is fully managed. Since it is serverless, you do not have to worry about the configuration and management of your resources. Before going through the steps to export DynamoDB to S3 using AWS Glue, here are the use cases of DynamoDB and Amazon S3.

DynamoDB Use-cases:

  1. Dynamodb is heavily used in e-commerce since it stores the data as a key-value pair with low latency.
  2. Due to its low latency, Dynamodb is used in serverless web applications.

S3 Use-cases:

  1. Since S3 is cost-effective, S3 can be used as a backup to store your transient/raw and permanent data.
  2. Using S3, data lake can be built to perform analytics and as a repository of data.
  3. S3 can be used in Machine Learning, Data profiling etc.

Now, let us export data from DynamoDB to S3 using AWS glue. It is done in two major steps:

       A. Creating a crawler

       B. Exporting data from DynamoDB to S3.

A. Steps to Create a Crawler:

  • Create a Database DynamoDB.
    aws glue dynamodb to s3
  • Pick the table CompanyEmployeeList from tables drop-down list.
    export dynamodb to s3 aws glue
  • Let the table info gets created through crawler. Set up crawler details in the window below. Provide crawler name as dynamodb_crawler.
    export dynamodb table using crawler setup
  • Add database name and DynamoDB table name.
    dynamodb to s3 export table properties add data source move dynamodb to s3
  • Provide the necessary IAM role to crawler such that it can access the DynamoDB table. Here, the created IAM role is AWSGlueServiceRole-dynamodb.
    iam role dynamodb to s3
  • You can schedule the crawler. For this illustration, it is running on demand as the activity is one-time.
    import data from dynamodb to s3 schedule crawlerdynamodb to s3 glue crawler
  • Review the crawler information.
    crawler information
  • Run the crawler.
    run crawler
  • Check the catalog details once crawler is executed successfully.
    aws glue tables

B. Export Data from DynamoDB to S3

Since the crawler is generated, let us create a job to copy data from DynamoDB table to S3. Here the job name given is dynamodb_s3_gluejob. In AWS Glue, you can use either Python or Scala as an ETL language. For the scope of this article, let us use Python
job properties

  • Pick your data source.
    data sources dynamodb to s3
  • Pick your data target.
    data target dynamodb to s3
  • Once completed, Glue will create a readymade mapping for you.
    add job dynamodb to s3
  • Once you review your mapping, it will automatically generate python code/job for you.
    dynamodb to s3 job
  • Execute the Python job.
    dynamodb stream to s3
  • Once the job completes successfully, it will generate logs for you to review.
    dynamodb to s3 logs
  • Go and check files in the bucket. Download the files.
    files check s3 bucket
  • Review the contents of the file.
    dynamodb to s3 migrationcheck s3 bucket datamove data to s3 bucket

Advantages of exporting DynamoDB to S3 using AWS Glue:

  1. This approach is fully serverless and you do not have to worry about provisioning and maintaining your resources
  2. You can run your customized Python and Scala code to run the ETL
  3. You can push your event notification to Cloudwatch
  4. You can trigger Lambda function for success or failure notification
  5. You can manage your job dependencies using AWS Glue
  6. AWS Glue is the perfect choice if you want to create data catalog and push your data to Redshift spectrum

Disadvantages of exporting DynamoDB to S3 using AWS Glue of this approach:

  1. AWS Glue is batch-oriented and it does not support streaming data. In case your DynamoDB table is populated at a higher rate. AWS Glue may not be the right option
  2. AWS Glue service is still in an early stage and not mature enough for complex logic
  3. AWS Glue still has a lot of limitations on the number of crawlers, number of jobs etc.

Refer AWS documentation to know more about the limitations. You can also check out how to move data from DynamoDB to Amazon S3 using AWS Data Pipeline.

Faster & Efficient Way to Import DynamoDB to S3

Using Hevo Data Integration Platform, you can seamlessly export data from DynamoDB to S3 using 2 simple steps.

  • Connect and configure your DynamoDB database.
  • For each table in DynamoDB choose a table name in Amazon S3 where it should be copied.

AWS Glue can be used over AWS Data Pipeline when you do not want to worry about your resources and do not need to take control over your resources ie. EC2 instances, EMR cluster etc. However, considering AWS Glue on early stage with various limitations, Glue may still not be the perfect choice for copying data from Dynamodb to S3. You can sign up with Hevo (7-day free trial) and set up DynamoDB to S3 in minutes. 

ETL Data to Redshift, Bigquery, Snowflake

Move Data from any Source to Warehouse in Real-time

Sign up today to get $500 Free Credits to try Hevo!
Start Free Trial
  • san gn

    Hi ,
    It was a really simple and helpful blog.
    I have 2 questions :-
    1) How to deal with the updates to the dynamodb table because we might have to run it at certain intervals and have only the updates data instead of having the whole data imported again.
    2)Is it possible to select a certain rows which satisfies a certain condition to be imported to s3.


  • Sergio Muñoz

    Hi, any idea why I cannot continue to next step after choosing the Data Source, it’s like the process get stuck there