Connecting MongoDB Atlas to Redshift: 2 Easy Methods

By: Published: June 17, 2022

MongoDB Atlas to Redshift - FI | Hevo Data

Companies need to analyze their data and store it in secure and unified storage space. They need to transfer data from SaaS applications, on-premise, and databases to Data Warehouses to organize and make better use of data. MongoDB Atlas is a Database as a Service that allows users to store their data on other Cloud storage space providers.

Organizations can generate insight from their business data to make better business decisions by loading data from MongoDB Atlas to Redshift which is a fully managed Data Warehouse. In this article, you will learn about the methods to connect MongoDB Atlas to Redshift. You will go through one automated way to connect MongoDB Atlas to Redshift and one manual method via Amazon S3.

Table of Contents

What is MongoDB Atlas?

MongoDB Atlas to Redshift: MongoDB Atlas Logo
Image Source

MongoDB Atlas is a fully managed Database as a Service (DBaaS) that allows companies to set up, deploy, and scale a database without worrying about any hardware for on-premise, and configurations for performance. It enables developers to deploy and manage Databases while offering versatile features needed to build resilient and performant applications on the Cloud provider. It is available on AWS, Azure, and GCP.

Key Features of MongoDB Atlas

Some of the main features of the MongoDB atlas are listed below:

  • Security: MongoDB Atlas secures your data with compliance standards and existing protocols for security integration with enterprise-grade features and built-in controls.
  • Optimal Performance: MongoDB Altas can easily scale in any direction and provides real-time visibility into metrics and performance organization tools.
  • Reliability: MongoDB Atlas provides mission control workload and enables automated data recovery and distributed fault tolerance.

To learn more about MongoDB Atlas, click here.

What is Amazon Redshift?

MongoDB Atlas to Redshift: Amazon Redshift Logo
Image Source

Amazon Redshift is a fully managed Cloud Data Warehouse service provider by AWS (Amazon Web Services). It helps companies to store and analyze their data and helps them organize their business data. Amazon Redshift can handle concurrent queries and petabytes of data using Massive Parallel Processing (MPP) and Columnar Storage. Amazon Redshift has its own compute engine to perform computing and generate critical insights.

Key Features of Amazon Redshift

Some of the main features of Amazon Redshift are listed below:

  • Massively Parallel Processing: Amazon Redshift applies MPP to distribute the load over several processors using the divide and conquer strategy.
  • Fault-Tolerant: Amazon Redshift continuously monitors the health of a cluster and automatically replicated data from the failed drives and replaces nodes as required to deliver a fault-tolerant architecture.
  • Flexible Querying: Amazon Redshift comes with a query editor that allows users to flexibly query data from the console or connect any other SQL client tools or BI tools.

To learn more about Amazon Redshift, click here.

Explore These Methods to Connect MongoDB Atlas to Redshift

Connecting MongoDB Atlas to Redshift allow users to securely load their MongoDB data into Amazon Redshift which could solve some of the biggest data problems for businesses. In this article, we have described two methods to achieve this:

Method 1: Simplify MongoDB Atlas to Redshift Connection Using Hevo

Hevo Data, an Automated Data Pipeline, provides you a hassle-free solution to connect MongoDB Atlas to Redshift within minutes with an easy-to-use no-code interface. Hevo is fully managed and completely automates the process of not only loading data from MongoDB Atlas but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code.

Get Started with Hevo for Free

Method 2: Manually Connecting MongoDB Atlas to Redshift

This method would be time-consuming and somewhat tedious to implement. Users need to manually load data from MongoDB Atlas to Redshift via the Amazon S3 bucket.

Both the methods are explained below.

Methods to Connect MongoDB Atlas to Redshift

Now that you have understood What is MongoDB Atlas and Amazon Redshift. In this section, you will learn about the steps to connect MongoDB Atlas to Redshift. As you can not directly transfer data from MongoDB Atlas to Redshift, you have to first load it to Amazon S3 and then to Amazon Redshift to connect MongoDB Atlas to Redshift. The following methods to load data from MongoDB Atlas to Redshift are listed below:

Method 1: Simplify MongoDB Atlas to Redshift Connection Using Hevo

MongoDB Atlas to Redshift: MongoDB Atlas to Redshift Connection Using Hevo
Image Source

Hevo Data helps you connect MongoDB Atlas to Redshift in a completely hassle-free & automated manner. Hevo supports MongoDB Atlas as a source and loads data from Webhooks to any Data Warehouse in minutes. 

Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Hevo takes care of all your data preprocessing needs required to connect MongoDB Atlas to Redshift and lets you focus on key business activities.

Advantages of using Hevo Data Platform:

  • Minimal Setup – You will require minimal setup and bandwidth to load data from Webhooks using the Hevo platform. 
  • No Data Loss – Hevo architecture is fault-tolerant and allows easy, reliable, and the seamless transfer of data from Webhooks to any Data Warehouse without data loss. 
  • 100’s of Out of the Box Integrations – Hevo platform brings data from other sources such as SDKs, Cloud Applications, Databases, and so on into Data Warehouses and Databases. So, Hevo is the right partner for all your growing data needs.
  • Automatic Schema Detection and mapping – The schema of incoming data is scanned automatically. If there are changes detected, they are handled seamlessly and the changes are incorporated into the Database or Data Warehouse. 
  • Exceptional Support – Hevo has 24×7 Technical support through emails, calls, and chat.
Sign up here for a 14-Day Free Trial!

Method 2: Manually Connecting MongoDB Atlas to Redshift

The manual process to connect MongoDB Atlas to Redshift requires you to load data to Amazon S3 and then load that data to Amazon Redshift. The following steps to manually connect MongoDB Atlas to Redshift are listed below:

Step 1: Creating MongoDB Atlas Data Lake

  • First, you have to create Data Lake by navigating to the Data Lake option located on the left-side navigation menu.
  • Then, click on the Create Data Lake or Configure a New Data Lake button.
MongoDB Atlas to Redshift: Creating Data Lake | Hevo Data
Image Source
  • Now Add a Data Store by clicking on the Amazon S3 to get started. 
MongoDB Atlas to Redshift: Choosing Amazon S3 Data Store | Hevo Data
Image Source
  • Now you have to authorize the credentials for AWS.
  • If you have already created a role for MongoDB Atlas authorized for reading and writing permissions to your Amazon S3.
  • If you want to create a new role then create it from the Data Federation option.
  • Then, enter the information for Amazon S3. Here, provide the Amazon S3 bucket name such as mongodb-atlas-to-redshift-demo.
  • Then, choose the read and write permissions so that MongoDB Altas can read and write Parquet files to the Amazon S3 bucket.
  • Now, you have to assign an access policy to your AWS IAM role. For this, go to your MongoDB Atlas UI. Your access policy should look similar to this as shown below:
{
   "Version": "2012-10-17",
   "Statement": [
      {
            "Effect": "Allow",
            "Action": [
               "s3:ListBucket",
               "s3:GetObject",
               "s3:GetObjectVersion",
               "s3:GetBucketLocation"
            ],
            "Resource": [
               <role arn>
            ]
      }
   ]
}
  • Now, define the path structure for your files in the Amazon S3 bucket and then click on the Next button.

Step 2: Connecting MongoDB Database to your Data Lake

  • In this tutorial, the data exists in MongoDB Database and we will transfer it to MongoDB Atlas. 
  • Now add a Data Store and select MongoDB Atlas Cluster. 
MongoDB Atlas to Redshift: Choosing MongoDB Atlas Cluster as a Data Store | Hevo Data
Image Source
  • Then provide the name of the cluster and fill out other essential information and configure the Data Lake.
MongoDB Atlas to Redshift: Data Lake Configuration | Hevo Data
Image Source

Step 3: Creating MongoDB Atlas Trigger to Create New Document Every Minute

  • Now we have to set up a MongoDB Database Trigger so that it can automatically generate new documents every minute for continuous replication. The triggers allow you to execute server-side logic in response to Database events or according to schedule. 
  • We will create a Scheduled trigger to ensure the documents are automatically archived in the Amazon S3 bucket.
  • Now, navigate to the MongoDB Atlas tab from the top of the screen and click on the Triggers option. 
  • Now navigate to the Overview tab on the Triggers page. Then, click on the Add Trigger to open the trigger configuration page. 
  • You can enter these configuration values for your trigger as shown in the image below:
MongoDB Atlas to Redshift: Editing Trigger | Hevo Data
Image Source
  • The trigger function will look like this as shown below:
exports = function () {

   const mongodb = context.services.get("NAME_OF_YOUR_ATLAS_SERVICE");
   const db = mongodb.db("NAME_OF_YOUR DATABASE")
   const events = db.collection("NAME_OF_YOUR_COLLECTION");

   const event = events.insertOne(
      {
            time: new Date(),
            aNumber: Math.random() * 100,
            type: "event"
      }
   );

   return JSON.stringify(event);

};
  • You can now run and check the database if it’s getting new data every 60 seconds.
MongoDB Atlas to Redshift: Checking Data Collection | Hevo Data
Image Source

Step 4: Creating MongoDB Atlas Trigger to Copy Data to S3

  • You can utilize MongoDB Data Lake’s $out to Amazon S3 aggregation pipeline. 
  • Let’s create a new trigger by utilizing these configuration settings shown below.
MongoDB Atlas to Redshift: Editing Trigger configurations | Hevo Data
Image Source
  • Let’s break down our trigger function. First, you have to connect to MongoDB Atlas Data Lake and make sure to Data Lake name in for context.services.get.
  • You must connect your MongoDB Atlas Data Lake to use $out to Amazon S3. 
  • Now you have to query your data that is more than 60 seconds old. So for that, you have to create an aggregation pipeline function.
  • Then you have to the $out aggregate operator so that you can replicate your data from the previous aggregation stage into Amazon S3. 
  • Now to transfer your data from MongoDB Atlas to Redshift via Amazon S3, you need to specify Parquet as a format and determine the maxFileSize and maxRowGroupSize.
  • Next, you have to provide the Amazon S3 path to match the value of the data.
exports = function () {

   const datalake = context.services.get("NAME_OF_YOUR_DATA_LAKE_SERVICE");
   const db = datalake.db("NAME_OF_YOUR_DATA_LAKE_DATABASE")
   const events = db.collection("NAME_OF_YOUR_DATA_LAKE_COLLECTION");

   const pipeline = [
      {
            $match: {
               "time": {
                  $gt: new Date(Date.now() - 60 * 60 * 1000),
                  $lt: new Date(Date.now())
               }
            }
      }, {
            "$out": {
               "s3": {
                  "bucket": "mongodb-data-lake-demo",
                  "region": "us-east-1",
                  "filename": "events",
                  "format": {
                        "name": "parquet",
                        "maxFileSize": "10GB",
                        "maxRowGroupSize": "100MB"
                  }
               }
            }
      }
   ];

   return events.aggregate(pipeline);
};
  • Now you can see the new Parquet document in your Amazon S3 bucket.
MongoDB Atlas to Redshift: Final Parquet Documents in S3 | Hevo Data
Image Source

Step 5: Using AWS Data Pipeline to Connect Amazon S3 to Redshift

  • We will use here is the RedshiftCopyActivity. This activity supports S3 as a source type.
  • Different insert modes are possible in RedshiftCopyActivity – KEEP EXISTING, OVERWRITE EXISTING, TRUNCATE, APPEND. 
  • KEEP EXISTING and OVERWRITE EXISTING are here to enable the users to define if the rows with the same primary key are to be overwritten or kept as such.

That’s it! You have successfully connected MongoDB Atlas to Redshift.

Conclusion 

In this article, you learned about Amazon Redshift and MongoDB Atlas. You went through the two methods to connect MongoDB Atlas to Redshift. The manual process to load data from MongoDB Atlas to Redshift involves many steps which will consume time and infeasible for real-time data transfer. 

Visit our Website to Explore Hevo

Companies store valuable data from multiple data sources such as MongoDB Atlas and other Data Warehouses such as Amazon Redshift. The manual process to transfer data from source to destination is a tedious task. Hevo Data is a No-code Data Pipeline that can help you transfer data from MongoDB Atlas to Redshift. It fully automates the process to load and transform data from 100+ data sources to a destination of your choice without writing a single line of code. 

Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite firsthand.

Share your experience of learning about Connecting MongoDB Atlas to Redshift in the comments section below!

mm
Former Research Analyst, Hevo Data

Aditya has a keen interest in data science and is passionate about data, software architecture, and writing technical content. He has experience writing around 100 articles on data science.

All your customer data in one place.