Summary IconKey Takeaways

Loading data from MongoDB to Redshift can be done in two simple ways:

Method 1: Using Hevo
Fast and easy. Changes in MongoDB update automatically in Redshift, keeping data current. No coding needed, ideal for growing teams or complex data.

Method 2: Using Custom Scripts
Slower and hands-on, but it works for small projects or one-time transfers.

Moving data from MongoDB to Redshift can be challenging, especially when working with unstructured or rapidly changing data. Differences in data formats, schemas, and query logic often make the process more complex.

However, Redshift offers strong benefits once the migration is complete. Its columnar storage, fast query performance, and analytics-ready design provide both reliability and flexibility for handling large-scale data.

In this guide, two reliable methods for moving data are explored. Whether preferring manual scripts for full control or using a tool like Hevo to automate the process, there is an approach to suit every need.

What Are the Methods to Move Data from MongoDB to Redshift?

These are the methods we can use to move data from MongoDB to Redshift in a seamless fashion:

Method 1: Using an Automated Data Pipeline Platform to Move Data from MongoDB to Redshift

Method 2: Using Custom Scripts to Move Data from MongoDB to Redshift

Method 1: Using Hevo to Move MongoDB to Amazon Redshift

Check out our 1-minute demo below to see the seamless integration in action!

Step 1: Set up the MongoDB Source in Hevo

Configure MongoDB as source
  1. In Hevo, click Create Pipeline and choose MongoDB as the source.
  2. Pick your deployment type. Atlas, Replica Set, or Standalone.
  3. Enter connection details. Use either the full Connection URI or fields like Host, Port, Database, Username, and Password.
  4. Enable SSL/TLS if your cluster requires it. Atlas usually does.
  5. If your MongoDB is private, set up Whitelist IPs, VPC peering, or SSH tunnel as offered by Hevo.
  6. Choose Collections to sync. You can start with a few and expand later.
  7. Choose Load Type.
    • Full Load for a one-time snapshot.
    • Incremental using Change Streams/Oplog for ongoing updates.
  8. Define Primary Keys if you want upserts in Redshift. _id is usually the key.
  9. Decide how to handle nested JSON.
    • Flatten to columns for relational modeling.
    • Or map nested fields into Redshift SUPER if available on your cluster.
  10. Click Test Connection, then Save.

Step 2: Add Amazon Redshift as the Destination

Configure Redshift as destination
Image Source
  1. Choose Amazon Redshift as the destination.
  2. Enter Cluster Endpoint, Port, Database, Username, Password, and select the Schema.
  3. Configure Staging on S3. Provide the S3 Bucket and the IAM Role ARN that has permissions to read objects in that bucket.
  4. Attach that IAM Role to your Redshift cluster and ensure the trust relationship allows redshift.amazonaws.com.
  5. Click Test Connection, then Save.

Step 3: Set Destination Behavior

  1. Enable Auto Table Creation so Hevo creates tables that match your collections.
  2. Pick the Write Mode.
  • Append for insert-only.
  • Upsert if you want updates based on your primary key.

3. Set Naming Conventions for tables and columns. Keep it consistent and lowercase.
4. Turn on Schema Evolution so Hevo can add columns when MongoDB fields change.

    Step 4: Configure Transformations and Data Types

    1. Review type mappings. Map MongoDB dates to TIMESTAMP and numbers to INTEGER/DECIMAL as needed.
    2. Choose Timezone handling for timestamps. UTC is safer for warehouses.
    3. Add Transformations if needed. Clean nulls, split fields, or parse arrays.
    4. For heavy JSON, prefer SUPER or pre-flatten large arrays to avoid row explosions.

      Step 5: Performance and Load Settings

      1. Use Batching and Parallelism settings recommended by Hevo for your data size.
      2. If allowed, set Redshift SORTKEY and DISTKEY or automatic table optimization.
      3. Keep files in S3 under a few hundred MB per part for fast parallel COPY. Hevo usually manages this.

      Step 6: Run the Initial Load

      1. Start the pipeline to run the Full Load.
      2. Monitor Throughput, Lag, and Errors in Hevo’s dashboard.
      3. Validate in Redshift. Check row counts and sample a few records.

      Step 7: Enable Incremental Sync (CDC)

      1. Ensure the MongoDB source is a Replica Set or Atlas with Change Streams enabled.
      2. Turn on Incremental in the pipeline so new inserts, updates, and deletes flow in.
      3. Confirm Oplog/Change Stream retention is adequate during long loads, so no gaps occur.

      Step 8: Handle Deletes and Updates

      1. Choose how to represent deletes.
        • Soft delete with a boolean flag.
        • Or Hard delete if your use case requires physical removal.
      2. For updates, confirm Upsert is enabled and primary keys are correct.

      Step 9: Monitoring, Alerts, and Retries

      1. Set Alerts for failed events, schema drifts, and connection errors.
      2. Review Retry behavior for transient network issues.
      3. Track Lag to ensure near real-time sync once the initial load completes.

      Step 10: Security and Compliance

      1. Store credentials in Hevo’s secure secrets. Rotate them on a schedule.
      2. Scope the IAM role to the specific S3 bucket prefix Hevo uses.
      3. Mask or hash PII fields in transformations if needed.
      4. Keep an audit trail of pipeline changes.

      For more detailed information on how you can use the Hevo connectors for MongoDB to Redshift ETL, check out:

      Simplify Your MongoDB to Redshift Migration with Hevo

      Hevo’s no-code data pipeline platform makes it effortless to move and transform your MongoDB data into Redshift—fast, reliable, and without writing a single line of code.

      • Effortless Integrations: Integrate data from 150+ sources effortlessly, eliminating manual data transfers.
      • Analytics-Ready in Redshift: Migrate and transform data seamlessly to make it analytics-ready.
      • No-Code Automation: Automate real-time data loading with a smooth, reliable, and user-friendly platform.

      Join 2,000+ happy customers who trust Hevo—like Meru, who cut costs by 70% and accessed insights 4x faster with Hevo Data.

      Simplify Your Data Migration Now

      Method 2: Using Custom Scripts to Move Data from MongoDB to Redshift

      If you’re looking to move MongoDB data into Redshift without a third-party tool, you can do it using mongoexport, Amazon S3, and the COPY command. Here’s a step-by-step process:

      Step 1: Export MongoDB Data

      Use the mongoexport command to export data from your MongoDB collection. You can export in either CSV or JSON format.

      # Export as CSVmongoexport --db=db_name --collection=collection_name --type=csv --out=outputfile.csv --fields=field1,field2,field3<br># Export as JSONmongoexport --db=db_name --collection=collection_name --out=outputfile.json

      Step 2: Upload the File to Amazon S3

      Before loading data into Redshift, you need to store the exported file in S3.

      1. Create a Redshift-compatible table schema first.
        MongoDB allows flexible schemas, so you’ll need to design your Redshift table to match the exported file.
      2. Upload the exported file to S3. If you’ve installed the AWS CLI, run:
      aws s3 cp outputfile.csv s3://your-bucket-name/outputfile.csv

      Step 3: Create the Table in Redshift

      Log in to Redshift and create a table that matches your CSV/JSON structure. For example:

      CREATE TABLE users (  id INT,  name VARCHAR(100),  email VARCHAR(200),  created_at TIMESTAMP);

      Step 4: Load Data from S3 to Redshift

      Use the COPY command to import data from S3 into Redshift.

      • If your file is in CSV format:
      COPY usersFROM 's3://your-bucket-name/outputfile.csv'IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-name>'FORMAT AS CSV IGNOREHEADER 1;
      • If your file is in JSON format:
      COPY usersFROM 's3://your-bucket-name/outputfile.json'IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-name>'FORMAT AS JSON 'auto';

      At this point, your MongoDB data is successfully loaded into Amazon Redshift.

      Limitations of Using Custom Scripts to Move Data from MongoDB to Redshift

      Here is a list of limitations of using the manual method of moving data from MongoDB to Redshift:

      • Schema Detection Cannot be Done Upfront: Unlike a relational database, a MongoDB collection doesn’t have a predefined schema. Hence, it is impossible to look at a collection and create a compatible table in Redshift upfront.
      • Different Documents in a Single Collection: Different documents in the single collection can have a different set of fields. A document in a collection in MongoDB can have a different set of fields.
      {
        "name": "John Doe",
        "age": 32,
        "gender": "Male"
      }
      {
        "first_name": "John",
        "last_name": "Doe",
        "age": 32,
        "gender": "Male"
      }

      Different documents in a single collection can have incompatible field data types. Hence, the schema of the collection cannot be determined by reading one or a few documents.

      Integrate MongoDB to Redshift
      Integrate Google Analytics to Redshift
      Integrate Salesforce to Redshift

      Two documents in a single MongoDB collection can have fields with values of different types.

      {<br>"name": "John Doe",<br>"age": 32,<br>"gender": "Male",<br>"mobile": 4242266998<br>}

      {<br> "name": "John Doe",<br> "age": 32,<br> "gender": "Male"<br> "mobile": "(424) 226-6998"<br>}

      The field mobile is a string and a number in the above documents, respectively. It is a completely valid state in MongoDB. In Redshift, however, both these values will have to be converted to a string or a number before being persisted.

      • New Fields can be added to a Document at Any Point in Time. It is possible to add columns to a document in MongoDB by running a simple update to the document. In Redshift, however, the process is harder as you have to construct and run ALTER statements each time a new field is detected.
      • Character Lengths of String Columns: MongoDB doesn’t put a limit on the length of the string columns. It has a 16MB limit on the size of the entire document. However, in Redshift, it is a common practice to restrict string columns to a certain maximum length for better space utilization. Hence, each time you encounter a longer value than expected, you will have to resize the column.
      • Nested Objects and Arrays in a Document: A document can have nested objects and arrays with a dynamic structure. The most complex of MongoDB ETL problems is handling nested objects and arrays.
      {
      "name": "John Doe",
      "age": 32,
      "gender": "Male",
      "address": {
      "street": "1390 Market St",
      "city": "San Francisco",
      "state": "CA"
      },
      "groups": ["Sports", "Technology"]
      }

      MongoDB allows nesting objects and arrays to several levels. In a complex real-life scenario, it may become a nightmare trying to flatten such documents into rows for a Redshift table.

      • Data Type Incompatibility between MongoDB and Redshift: Not all data types of MongoDB are compatible with Redshift. ObjectId, Regular Expression, and JavaScript are not supported by Redshift. While building an ETL solution to migrate data from MongoDB to Redshift from scratch, you will have to write custom code to handle these data types.

      What is MongoDB?

      Mongodb logo

      MongoDB distinguishes itself as a NoSQL database program. It uses JSON-like documents along with optional schemas. MongoDB is written in C++. MongoDB allows you to address a diverse set of data sets, accelerate development, and adapt quickly to change with key functionalities like horizontal scaling and automatic failover.

      MongoDB is the best RDBMS when you have a huge volume of structured and unstructured data. Its features make scaling and flexibility smooth. These are available for data integration, load balancing, ad-hoc queries, sharding, indexing, etc.

      Another advantage is that MongoDB also supports all common operating systems (Linux, macOS, and Windows). It also supports C, C++, Go, Node.js, Python, and PHP. 

      What is Amazon Redshift?

      Redshift Logo

      Amazon Redshift is essentially a storage system that allows companies to store petabytes of data across easily accessible “Clusters” that you can query in parallel. Every Amazon Redshift Data Warehouse is fully managed, which means that the administrative tasks like maintenance, backups, configuration, and security are completely automated. 

      Suppose you are a data practitioner who wants to use Amazon Redshift to work with Big Data. It will make your work easily scalable due to its modular node design. It also allows you to gain more granular insight into datasets, owing to the ability of Amazon Redshift Clusters to be further divided into slices. Amazon Redshift’s multi-layered architecture allows multiple queries to be processed simultaneously thus cutting down on waiting times. Apart from these, there are a few more benefits of Amazon Redshift you can unlock with the best practices in place.

      Features of Amazon Redshift

      • When you submit a query, Redshift cross-checks the result cache for a valid and cached copy of the query result. When it finds a match in the result cache, the query is not executed. On the other hand, it uses a cached result to reduce the runtime of the query.
      • You can use the Massive Parallel Processing (MPP) feature for writing the most complicated queries when dealing with a large volume of data.
      • Your data is stored in columnar format in Redshift tables. Therefore, the number of disk I/O requests to optimize analytical query performance is reduced.

      Why perform MongoDB to Redshift ETL?

      It is necessary to bring MongoDB’s data to a relational format data warehouse like AWS Redshift to perform analytical queries. It is simple and cost-effective to efficiently analyze all your data by using a real-time data pipeline. MongoDB is document-oriented and uses JSON-like documents to store data.

      MongoDB doesn’t enforce schema restrictions while storing data, the application developers can quickly change the schema, add new fields and forget about older ones that are not used anymore without worrying about tedious schema migrations. Owing to the schema-less nature of a MongoDB collection, converting data into a relational format is a non-trivial problem for you.

      In my experience in helping customers set up their modern data stack, I have seen MongoDB be a particularly tricky database to run analytics on. Hence, I have also suggested an easier / alternative approach that can help make your journey simpler.

      In this blog, I will talk about the two different methods you can use to set up a connection from MongoDB to Redshift in a seamless fashion: Using Custom ETL Scripts and with the help of a third-party tool, Hevo.

      What are the  Common Challenges when Migrating from MongoDB to Redshift

      Are you thinking about moving your data from MongoDB to Amazon Redshift? You’re not alone, many teams are shifting to Redshift for its powerful analytics capabilities. But the journey isn’t always smooth. MongoDB and Redshift are built on very different foundations, and that can create some friction during migration.

      Here are a few common issues you might face and how to work around them.

      1. Schema Differences Can Get Tricky

      MongoDB offers flexibility and lacks a predefined schema, making it ideal for development, whereas Redshift requires structured and well-defined schemas. If your data contains nested documents or varying fields, the process can become complicated during the transfer.

      Solution: Simplify your MongoDB data prior to importing it into Redshift. Utilize ETL tools such as Hevo or personalized scripts to normalize the schema and convert nested JSON into flat tables compatible with Redshift.

      2. Data Type Mismatches

      MongoDB has its own data types (like ObjectId, embedded arrays, or timestamps) that don’t always play nicely with Redshift. Trying to insert these directly can cause errors or unexpected behavior.

      Solution:  During transformation, map and change data types. Use an ETL tool like Hevo, which intelligently detects and converts data types during the transfer process. Other tools like Talend or Apache NiFi can also help ensure your data types are properly aligned.

      3. Performance Issues with Large Datasets

      Bulk migrating a massive MongoDB collection can lead to slow transfers or timeouts, especially if you’re inserting rows directly into Redshift.

      Solution: A more efficient approach is to initially export your data to Amazon S3 (in either CSV or Parquet format) and then utilize Redshift’s COPY command for bulk data loading. It’s quicker, more dependable, and designed for large data transfers.

      4. Inconsistent or Missing Data

      Since MongoDB doesn’t enforce a strict schema, it’s common to have missing or inconsistent fields across documents. Redshift, however, needs clean, well-structured data.

      Solution: Perform a data validation step before the migration. Fill in missing values with sensible defaults, and make sure each field aligns with your target schema in Redshift.

      Additional Resources for MongoDB Integrations and Migrations

      Conclusion

      In this blog, I have talked about the two different methods you can use to set up a connection from MongoDB to Redshift in a seamless fashion: Using Custom ETL Scripts and with the help of a third-party tool, Hevo.

      Outside of the benefits offered by Hevo, you can use Hevo to migrate data from an array of different sources – databases, cloud applications, SDKs, and more. This will provide the flexibility to instantly replicate data from any source, like MongoDB, to Redshift.

      More related reads:

      1. Creating a table in Redshift
      2. Redshift functions

      You can additionally model your data, build complex aggregates and joins to create materialized views for faster query executions on Redshift. You can define the interdependencies between various models through a drag-and-drop interface with Hevo’s Workflows to convert MongoDB data to Redshift.

      FAQs

      1. How to migrate MongoDB to Redshift?

      There are two ways to migrate MongoDB to Redshift:
      Custom Scripts: Manually extract, transform, and load data.
      Hevo: Use an automated pipeline like Hevo for a no-code, real-time data migration with minimal effort and automatic sync.

      2. How can I deploy MongoDB on AWS?

      You can deploy MongoDB on AWS using two options:
      Manual Setup: Launch an EC2 instance, install MongoDB, and configure security settings.
      Amazon DocumentDB: Use Amazon’s managed service, compatible with MongoDB, for easier setup and maintenance.

      3. How do I transfer data to Redshift?

      You can transfer data to Redshift using:
      COPY Command: Load data from S3, DynamoDB, or an external source.
      ETL Tools: Use services like AWS Glue or Hevo for automated, real-time data transfer.

      Sourabh Agarwal
      Founder and CTO, Hevo Data

      Sourabh is a seasoned tech entrepreneur with over a decade of experience in scalable real-time analytics. As the Co-Founder and CTO of Hevo Data, he has been instrumental in shaping a leading no-code data pipeline platform used by thousands globally. Previously, he co-founded SpoonJoy, a mass-market cloud kitchen platform acquired by Grofers. His technical acumen spans MySQL, Cassandra, Elastic Search, Redis, Java, and more, driving innovation and excellence in every venture he undertakes.