Your organization utilizes AWS DocumentDB as an operational data store to manage customer information. However, AWS DocumentDB lacks the real-time analytics capabilities needed for timely insights. 

To achieve this, your organization must implement a real-time migration from AWS DocumentDB to an advanced analytical platform like Snowflake. It offers advanced query processing, data warehousing, and analytics solutions to enhance data management and derive valuable insights. Snowflake’s real-time data streaming lets you make timely decisions to improve business performance. 

Let’s get into the detailed migration process of AWS DocumentDB to Snowflake.   

Why Migrate to Snowflake?

Here are a few reasons why you should transfer data from Amazon DocumentDB to Snowflake:

  • No Vendor Lock-in: Unlike AWS DocumentDB, which ties you to the AWS ecosystem, Snowflake’s data warehouse can be deployed on multiple cloud infrastructures, such as Azure, AWS, or Google Cloud Platform.   
  • Compatibility with Existing Infrastructure: Snowflake allows you to store the nested objects in a relational table or their native format. Since AWS DocumentDB exclusively supports JSON, Snowflake can directly query or manipulate the JSON data using ANSI-standard SQL. 

AWS DocumentDB: A Brief Overview

AWS DocumentDB is a fully managed, NoSQL JSON document database that is limited in compatibility with MongoDB. It allows you to store, query, index, and aggregate critical JSON data of any size cost-effectively. AWS DocumentDB is known for its scalability, low-latency global reads, durability, and built-in security practices. 

AWS DocumentDB integrates with vector search, a machine learning technique, to find similar data points using distance metrics. With millisecond response times, this built-in feature helps you search millions of data within your document database based on nuanced meaning and context. 

Additionally, AWS DocumentDB lets you make your applications smarter and more responsive through integrations with generative AI and other machine-learning capabilities.

Snowflake: A Brief Overview

Snowflake is a cloud-based data warehousing platform that powers the Data Cloud, offering the flexibility to load, analyze, and securely distribute data. It allows you to aggregate unstructured, semi-structured, and structured data into a centralized location, providing access to all necessary information without data silos. 

Snowflake helps you effectively manage your workloads with optimized compression and automatic micro-partitioning. As your workload grows, Snowflake’s single elastic performance engine dynamically allocates necessary compute power and storage capacity in near real-time. 

You can connect to Snowflake using Snowsight, SnowSQL utility, and native connectors like Python or Spark.

Methods for Integrating AWS DocumentDB and Snowflake

To migrate data from AWS DocumentDB to Snowflake, you can utilize Hevo Data or CSV Export/Import Method.

Method 1: Migrate Data from AWS DocumentDB to Snowflake Using CSV Export/Import Method

If you are wondering, “How do I ingest data from AWS Document DB to Snowflake through a manual CSV Export/Import method?” This section outlines the steps in detail. You must initially export the data from AWS DocumentDB in CSV format using the mongoexport tool. Then, you can load the CSV data into Snowflake using the named internal stage.

Step 1: Export Data From AWS DocumentDB in CSV Format Using mongoexport Tool

Before you begin, ensure the following prerequisites are in place:


Follow the steps to export data from AWS DocumentDB in CSV format:

  1. Once you log into your AWS account and go to the AWS EC2 console, start the newly created AWS EC2 instance. 
  2. Connect to the EC2 instance by clicking the Connect button or executing the following command.
ssh -i /path/my-key-pair.pem my_ec2instance_user_name@my_ec2instance_public_dns_name
  1. A command prompt will open and check if MongoDB is successfully installed using the following command:
mongod –version 
  1. Navigate to a folder to store the exported CSV file by using the following command:
cd path/to/folder
  1. Execute the following mongoexport command using the appropriate parameters. Replace your username and password with your login credentials.
mongoexport --ssl \
    --host="tutorialCluster.node.us-east-1.docdb.amazonaws.com:27017" \
    --collection=<collection_name> \
    --db=<database_name> \
    --out=<outputfilename.csv> \
    --username=<yourUsername> \
    --password=<yourPassword> \
    --sslCAFile global-bundle.pem
  1. Exit the AWS EC2 instance by simply typing exit in the command prompt.
  2. Run the following command to download the CSV file from the EC2 instance to your local machine:
scp -i /path/my-key-pair.pem my_ec2instance_user_name@my_ec2instance_public_dns_name:/path/to/<filename.csv> /C:\Users\yourUsername\yourfolder

Step 2: Load the CSV File into Snowflake Using a Named Internal Stage

Ensure the following prerequisites are ready before you begin the import:

Here are the steps to load the CSV file into Snowflake:

  1. Log in to Snowflake using SnowSQL with your Snowflake account credentials:
snowsql -a <account_id> -u <user_name>
  1. Create a named internal stage using the following SQL command:
CREATE OR REPLACE STAGE my_internalstage
  file_format = (type = 'CSV' FIELD_DELIMITER = '|' SKIP_HEADER = 1);

You can also create the name internal stage using Snowsight and Classic Console.

  1. Upload the CSV file on your local machine to your named internal stage using the PUT command:
PUT file://C:\yourUsername\yourfolder\filename.csv @my_internalstage; 
  1. Load your staged CSV data into the Snowflake table using the COPY INTO command:
COPY INTO mydestinationtable from @my_internalstage;

Limitations of AWS DocumentDB to Snowflake Migration Using CSV Export/Import Method

  • Lack of Real-time analytics: If your organization relies on real-time analytics, the CSV Export/Import method would be inadequate because it cannot provide continuous, up-to-date insights into changing data trends. 
  • Limited Scalability: As the dataset size increases, the migration through CSV export/import can become time-consuming, resulting in delayed access to important data.

Method 2: Sync AWS DocumentDB to Snowflake Using Hevo Data

Hevo Data is a real-time ELT, no-code pipeline platform that helps you cost-effectively automate data pipelines that are flexible to your needs. By integrating with 150+ data sources, Hevo Data lets you export data from multiple sources, load it to destinations, and transform it for detailed analysis.

Here are the key features of Hevo Data:

  • Data Transformation: Analyst-friendly transformation approaches, such as Python-based scripts or drag-and-drop blocks, allow you to clean, prepare, and transform data before importing it to your destination.
  • Incremental Data Load: Hevo Data enables real-time data migration, optimizing bandwidth usage on both ends of the data pipeline.
  • Auto-Schema Mapping: Hevo Data’s auto-mapping feature automatically recognizes the incoming data structure and replicates it to the destination schema, eliminating manual schema management. You can choose Full or Incremental mappings according to your data replication needs. 

Let’s see how to transfer data from AWS Document DB to Snowflake using Hevo Data.

Step 1: Configure AWS DocumentDB as Your Source

Before you begin, verify that the following prerequisites are in place:

Here are the steps to configure the Amazon DocumentDB as your source in Hevo:

  1. Once you sign into your Hevo account, go to the Navigation Bar and click the PIPELINES option.
  2. In the Pipelines List View, click the + CREATE button.
  3. Choose Amazon DocumentDB as your source type on the Select Source Type page.
  4. Provide the necessary information in the Configure your Amazon DocumentDB Source page.
AWS DocumentDB to Snowflake: Configuring Amazon DocumentDB Source Page
Configuring Amazon DocumentDB Source Page
  1. Click TEST & CONTINUE.

For more information about the source configuration, read the Amazon DocumentDB documentation in Hevo.

Step 2: Configure Snowflake as Your Destination

Before you start the configuration, check the given prerequisites are in place:

Follow the steps to configure the Snowflake as your destination in Hevo:

  1. In the Navigation Bar, choose the option DESTINATIONS.
  2. Go to the Destinations List View page and click the + CREATE button.
  3. Choose Snowflake as your destination type on the Add Destination page.
  4. Enter the required information on the Configure your Snowflake Destination page.
AWS DocumentDB to Snowflake: Configuring your Snowflake Destination Page
Configuring your Snowflake Destination Page
  1. Click the TEST CONNECTION > SAVE & CONTINUE to complete the destination configuration.

For more information about destination configuration, read the Snowflake documentation in Hevo. 

Get started for Free with Hevo!

Use Cases of AWS DocumentDB to Snowflake Integration

  • Backup and Recovery: Your organization can utilize Snowflake’s Time Travel feature to maintain regular backups of large volumes of operational data for up to 90 days. This feature allows you to quickly revert to previous versions of the data set or even recover dropped tables for business continuity.
  • Support Stored Procedures: Snowflake supports stored procedures to meet the needs of analytical workloads. It also allows you to automate a database operation that needs different SQL statements and is performed frequently.   

Conclusion

AWS DocumentDB to Snowflake migration offers several benefits for your organization’s data management and analytics. This article highlights two easy methods for migrating data from AWS Document DB to Snowflake, including Hevo Data and the CSV Export/Import method.

The Hevo Data method can transfer historical and current data into Snowflake. The CSV Export/Import method is simpler but can migrate only historical data. Additionally, the manual CSV method is inadequate when dealing with large datasets. Choosing a real-time ELT pipeline platform like Hevo Data would allow your organization to manage large data volumes and access real-time data synchronization

To learn more about data synchronization, read the Hevo documentation on types of data synchronization.

Frequently Asked Questions (FAQs)

  1. Why would I choose Snowflake over deploying a database in the cloud and conducting in-house analytics?

A. Some of the key reasons are:

  • Snowflake offers advanced built-in distributed processing that enhances reliability and fault tolerance.
  • You can deploy your Snowflake data lakes or data warehouses in any public cloud platform, such as GCP, Azure, or AWS.
  • Snowflake isolates the compute layer from the storage layer. So, you can process analytical queries in the compute layer while storage operations continue independently.
mm
Customer Experience Engineer, Hevo Data

Dimple, an experienced Customer Experience Engineer, possesses four years of industry proficiency, with the most recent two years spent at Hevo. Her impactful contributions significantly contribute to refining customer experiences within the innovative data integration platform.

All your customer data in one place.