I’m guessing you landed on this page because you’re looking for quick and easy ways to set up this connection. Maybe you were tapped on the shoulder by the data scientist/ data analyst on your team directly. Or, maybe your team lead told you that he needed this setup ‘stat.’

Irrespective of the scenario, we got you. In this article, we’ll go over two methods you can use to replicate data from MongoDB Atlas to BigQuery: using Google Dataflow pipelines and through a no-code data replication tool.

Let’s dive in!

Replicating Data from MongoDB Atlas to BigQuery

Using Google Dataflow Pipelines to Connect MongoDB Atlas to BigQuery

Google Cloud Dataflow is a unified batch and stream processing system that’s fast, serverless, and cost-effective. 

With Dataflow teams can focus on programming instead of managing server clusters thanks to Dataflow’s serverless approach. This gets rid of operational overhead from data engineering workloads.

Previously, you had to write custom code using Apache Beam libraries and deploy it on the Dataflow runtime to move and transform data from MongoDB Atlas to BigQuery. 

But to simplify the process, Google Cloud and MongoDB have come up with templates that you can use to load data from MongoDB Atlas to BigQuery. We’ll be discussing two templates in this section:

  • MongoDB to BigQuery Template (for batch processing)
  • MongoDB to BigQuery CDC Template

MongoDB to BigQuery Template

MongoDB Atlas to BigQuery: MongoDB to BigQuery Template
Image Source

This template is a batch pipeline that reads documents from MongoDB and writes them to BigQuery as mentioned in the userOption parameter.

For this pipeline to work, you need to remember the following things:

  • The source MongoDB instance should be easily accessible via the Dataflow worker machines.
  • The target BigQuery dataset should exist.

Here are the steps to get the MongoDB Atlas to BigQuery template for batch processing up and running:

  • Open up the Google Cloud console and go to the Dataflow Create job from template page.
  • Provide a unique job name in the Job Name field.
  • Next, from the Dataflow template drop-down menu, choose the MongoDB to BigQuery template.
  • Enter your parameter values in the provided parameter fields. Click on Run job to finish setting up the template.

MongoDB to BigQuery CDC Template

MongoDB Atlas to BigQuery: MongoDB to BigQuery CDC Template
Image Source

Here, the pipeline will read the JSON records you push to Pub/Sub through a MongoDB change stream and write it to BigQuery.

Here are the steps to get the MongoDB Atlas to BigQuery CDC template for stream processing up and running:

  • Open up the Google Cloud console and go to the Dataflow Create job from Dataflow page.
  • Provide a unique name in the Job name field.
  • Next, from the Dataflow template drop-down menu, choose the MongoDB to BigQuery (CDC) template.
  • Enter your parameter values in the provided parameter fields. Click on Run job to finish setting up the template.

Using Google Dataflow pipelines to set up MongoDB Atlas to BigQuery migration is recommended when you’re moving data from GCP-hosted and on-premise databases.

Google Cloud Dataflow doesn’t support SaaS sources and databases that aren’t hosted on GCP. So, if you have these sources as a part of your stack you can’t use Google Dataflow to replicate data to BigQuery.

For these scenarios:

  • You can create custom scripts for replication. However, with more data sources, the tedious process of creating custom scripts for connectors, transforming and processing the data, tracking the flow, and fixing issues can quickly become a major burden. 
  • You can opt for a data replication tool that supplements Google Cloud Dataflow which just takes care of replicating data from these sources.

Or, you can opt for a tool that takes care of replicating data from MongoDB Atlas and SaaS sources to BigQuery.

Let’s go over this next!

Use a No-Code Data Replication Tool to Connect MongoDB Atlas to BigQuery

You can streamline the MongoDB Atlas BigQuery integration process by opting for an automated tool to:

  • Focus on pressing engineering goals and free up the resources needed for them.
  • Save time spent on data preparation thanks to a user-friendly UI.
  • Enable business teams to quickly create accurate reports with no-code data replication tools.
  • Access to near real-time data without sacrificing accuracy or consistency.
  • Consolidate analytics-ready data for performance measurement and opportunity exploration.

Let’s take a look at the simplicity a cloud-based ELT tool like Hevo provides to your workflow:

Step 1: Configure MongoDB Atlas as a Source

MongoDB Atlas to BigQuery: Configuring Source

Step 2: Configure BigQuery as a Destination

MongoDB Atlas to BigQuery: Configuring Destination

And that’s it! Based on your inputs, Hevo will start replicating data from MongoDB Atlas to BigQuery.

Suppose you’d like to dive deeper into how your pipelines are configured for this use case. In that case, you can read the official documentation for configuring MongoDB Atlas as a source and Google BigQuery as a destination.

Note: Hevo doesn’t support configuring a standalone instance of MongoDB without a replica.

What can you hope to achieve by replicating data from MongoDB Atlas to BigQuery?

Through MongoDB Atlas BigQuery data replication, you will be able to help your business stakeholders with the following:

  • Aggregate the data of individual interaction of the product for any event. 
  • Finding the customer journey within the product (website/application).
  • Integrating transactional data from different functional groups (Sales, marketing, product, Human Resources) and finding answers. For example:
    • Which Development features were responsible for an App Outage in a given duration?
    • Which product categories on your website were most profitable?
    • How does the Failure Rate in individual assembly units affect Inventory Turnover?

Key Takeaways

In this article, we’ve talked about two ways that you can use to replicate data from MongoDB Atlas to BigQuery: via Google Dataflow pipelines and through a no-code data replication tool. Given that Google Dataflow doesn’t offer support for SaaS sources, you can opt for the latter as a one-stop solution for all your data replication needs, like Hevo.

Hevo allows you to replicate data in near real-time from 150+ sources like MongoDB Atlas to the destination of your choice including BigQuery, Snowflake, Redshift, Databricks, and Firebolt, without writing a single line of code. We’d suggest you use this data replication tool for real-time demands that require you to pull data from SaaS sources. This’ll free up your engineering bandwidth, allowing you to focus on more productive tasks.

For rare times things go wrong, Hevo ensures zero data loss. To find the root cause of an issue, Hevo also lets you monitor your workflow so that you can address the issue before it derails the entire workflow. Add 24*7 customer support to the list, and you get a reliable tool that puts you at the wheel with greater visibility.

If you don’t want SaaS tools with unclear pricing that burn a hole in your pocket, opt for a tool that offers a simple, transparent pricing model. Hevo has 3 usage-based pricing plans starting with a free tier, where you can ingest up to 1 million records.

Schedule a demo to see if Hevo would be a good fit for you, today!

mm
Content Marketing Manager, Hevo Data

Amit is a Content Marketing Manager at Hevo Data. He enjoys writing about SaaS products and modern data platforms, having authored over 200 articles on these subjects.

All your customer data in one place.