Moving data from MongoDB to BigQuery enables powerful analytics and seamless reporting on large, diverse datasets.
While MongoDB’s document-based structure offers flexibility and rapid development, it lacks native SQL support and efficient joins, making complex analysis challenging.
In this blog, we walk you through three step-by-step methods to integrate MongoDB data into BigQuery, so you can leverage SQL-based analytics and get actionable insights quickly.
To help you choose the right approach, here’s a breakdown of three effective methods to integrate your MongoDB data into BigQuery:
- Hevo data: Ideal for real-time replication and minimal setup when you want a fully managed solution that handles nested documents and incremental updates.
- Confluent connectors: Suitable for near real-time streaming with Kafka as an intermediary. Useful when you already have a Kafka ecosystem.
Dataflow: Recommended for large-scale batch migrations via CDC. Works well when you need flexible transformations and GCP-native integrations.
Table of Contents
Methods to Migrate Data From MongoDB to BigQuery
Moving data from MongoDB to BigQuery involves bridging a NoSQL document-based storage with a columnar analytics database. Each integration method handles differences in data structure, replication frequency, and transformation capabilities.
Here, we discuss three methods to migrate data from MongoDB to BigQuery:
- Using Hevo’s automated data pipelines
- Using Confluent Kafka connectors
- Google Cloud Dataflow
Method 1: Moving data from MongoDB to BigQuery using Hevo
Hevo Data connects to MongoDB using native drivers or Change Streams and writes to BigQuery via the streaming API or batch loads. It maps MongoDB’s semi-structured documents to BigQuery’s schema by flattening nested fields.
Here’s how you can set it up:
Prerequisites:
- MongoDB Atlas cluster: A running MongoDB Atlas instance with collections to replicate. Allow Hevo’s IPs in Network Access so it can connect.
- BigQuery dataset: An existing dataset in your GCP project where data will be stored. Choose the dataset region carefully, as all replicated tables will reside in this location.
- GCP service account key: A JSON key with at least BigQuery Data Editor (to create and update tables) and BigQuery Job User roles (to run load/streaming jobs).
- Hevo Data account: An active Hevo account to set up the pipeline.
- Network & security setup: Ensure Hevo can connect to MongoDB.
- For on-prem, open firewall or configure VPN.
- For cloud-hosted (Atlas), whitelist Hevo’s IPs. Enable SSL for secure transfer.
Step 1: Connect MongoDB as the source
The workflow:
- Set up MongoDB as the source to start building the pipeline.
- Click CREATE PIPELINE and choose MongoDB.
- Enter connection details:
- Pipeline name: Identifier for your data pipeline
- Database host: MongoDB server address or IP
- Database port: Port used for connection (default: 27017)
- Authentication database name: Database verifying user credentials
- User credentials: Username and password
- Select “TEST & CONTINUE” for validating details.
- Choose the objects you want to replicate and click “CONTINUE.”
Hevo connects using MongoDB native drivers, authenticates via username/password, and establishes a connection to the database.
Step 2: Configure BigQuery as the destination
The workflow:
- Navigate to Select Destination Type.
- Select “Google BigQuery” and configure your destination.
- Specify the following:
- Destination name: Target BigQuery table name.
- Authorized account: Service account with BigQuery permissions.
- Project ID: Google Cloud project identifier.
- Dataset ID: BigQuery dataset to store data.

- Click “TEST CONNECTION” to confirm the configuration.
- Finally, select “SAVE & CONTINUE.”
Enable “Populate Loaded Stamp” to track ingestion timestamps and “Sanitize Table/Column Names” to automatically clean names for BigQuery compatibility.
Step 3: Confirm the data pipeline and enable final settings
The workflow:
- Go to the “Destination Table Prefix.”
- Add a prefix to all BigQuery table names created by Hevo.
- Makes it easy to identify tables coming from a specific source.
Choose one option based on how you want to query the data downstream:
A. Store JSON fields as JSON strings and arrays as strings.
- Hevo converts JSON objects and arrays into STRING columns.
- Ideal when you plan to use schema-on-read in Databricks.
B. Preserve JSON structure while converting arrays to strings.
- Hevo maintains the object hierarchy, but arrays are stored as STRING.
- Useful for quick access to nested fields with minimal parsing.
Pro tip:
- If your JSON payloads change frequently or are highly nested, option A allows parsing in Databricks when required.
- Choose option B when you require nested objects readable for quick use in BigQuery or BI tools.
Method 2: Integrating MongoDB with BigQuery via Confluent Connectors
Integrating MongoDB with BigQuery via Confluent Connectors leverages Kafka as a streaming intermediary, allowing near real-time replication of data. This approach handles nested documents and offers a reliable pipeline for analytics.
Here’s how it works:
Step 1: Set Up MongoDB Atlas
The workflow:
- Create a MongoDB Atlas cluster running.
- Load sample data into a database and collection to test the pipeline.
- Whitelist the IP addresses of your Kafka Connect cluster in the Network Access settings.
Obtain the connection string for your MongoDB Atlas cluster, which will be used in the connector configuration.
Step 2: Set Up Kafka Connect Cluster
The workflow:
- Set up a Kafka cluster in Confluent Cloud or on-premises.
- Enable Kafka Connect to manage connector configurations and schema evolution.
- Install the MongoDB Source and BigQuery Sink connectors using Confluent Hub Client.
Step 3: Deploy MongoDB Source Connector
Create a JSON configuration file with the following content:
{<br> "name": "mongo-source-connector",<br> "config": {<br> "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",<br> "connection.uri": "mongodb+srv://:@/",<br> "database": "sample_mflix",<br> "collection": "movies",<br> "output.format.value": "json",<br> "topic.prefix": "mongo."<br>}
- connector.class: Specifies the connector implementation.
- connection.uri: MongoDB Atlas connection string with credentials.
- database & collection: MongoDB database and collection to monitor.
- output.format.value: Defines the format of the output messages (JSON in this case).
- topic.prefix: Prefix for Kafka topics created by the connector.
Now, use the Kafka Connect REST API to deploy the connector:
curl -X POST -H "Content-Type: application/json" \<br> --data @mongo-source-connector.json \<br> http://:8083/connectors
- Replace <connect-host> with your Kafka Connect host address.
Step 4: Create BigQuery Dataset
The workflow:
- Navigate to BigQuery in the GCP Console.
- Create a new dataset where the data will be stored.
Note: Ensure the dataset’s location matches the region of your Kafka Connect cluster.
- Create a service account in GCP with roles: BigQuery Data Editor, BigQuery Job User.
- Download the JSON key file for authentication.
Step 5: Deploy BigQuery Sink Connector
Create a JSON configuration file (bigquery-sink-connector.json):
{<br> "name": "bigquery-sink-connector",<br> "config": {<br> "connector.class": "com.wepay.kafka.connect.bigquery.BigQuerySinkConnector",<br> "topics": "mongo.sample_mflix.movies",<br> "project": "",<br> "datasets": "default=",<br> "autoCreateTables": "true",<br> "keyfile": ""<br> }<br>}
- connector.class: Specifies the connector implementation.
- topics: Kafka topic to consume data from.
- project: GCP project ID where BigQuery resides.
- datasets: Mapping of Kafka topic to BigQuery dataset.
- autoCreateTables: Automate table creation in BigQuery.
- keyfile: Path to the service account JSON key for authentication.
Use the Kafka Connect REST API to deploy the connector:
curl -X POST -H "Content-Type: application/json" \<br> --data @bigquery-sink-connector.json \<br> http://:8083/connectors
- Replace <connect-host> with your Kafka Connect host address.
Step 6: Test the Pipeline
The workflow:
- Add test documents to the collection monitored by the MongoDB Source Connector.
- Check if the changes are captured by the source connector.
- Confirm that the inserted documents are ingested correctly in the corresponding BigQuery table.
Method 3: Google Cloud Dataflow to Move MongoDB Data to BigQuery
Google Cloud Dataflow provides both batch and streaming pipelines for integrating MongoDB with BigQuery. This method is ideal for large-scale migrations, scheduled batch transfers, or real-time streaming using MongoDB Change Streams.
Here’s how it works:
Step 1: Set up MongoDB
The workflow:
- Ensure your Atlas or replica set instance is running.
- Identify the database and collections you want to migrate.
- Whitelist the Dataflow IP ranges for network access.
Dataflow requires access to MongoDB to read the documents, and streaming requires a replica set to capture changes.
Step 2: Set up BigQuery
Dataflow writes to BigQuery via the streaming API (real-time) or batch load jobs:
- Create a BigQuery dataset to store the migrated data.
- GCP service account with roles: BigQuery Data Editor and BigQuery Job User.
- Choose table names and creation mode.
Step 3: Choose the Dataflow template
Build a Dataflow pipeline using the platform’s UI:
- In the “Dataflow template” box, select:
- “MongoDB to BigQuery (CDC)” for streaming template.
- “MongoDB to BigQuery” for batch template.
- Other required details include:
- Job name: Unique identifier for pipeline.
- Regional endpoint: GCP region where the job runs.
Step 4: Configure pipeline parameters
Provide MongoDB and BigQuery connection details:
- MongoDB connection URL
- Mongo database
- Mongo collection
- BigQuery destination table
- User option
Choose how you want to encrypt your data:
A. Google-managed encryption key
- Google automatically manages encryption keys.
- No configuration required.
B. Customer-managed encryption key (CMEK)
- You control the encryption key.
- Requires configuring key access and permissions.
Step 5: Run & monitor the pipeline
The workflow:
- Launch the Dataflow job in GCP Console or via gcloud CLI.
- Use Dataflow monitoring tools to check job status.
- Ensure successful ingestion, handle errors, and validate data.
MongoDB to BigQuery: Benefits & Use Cases
- Scalability and Speed: With BigQuery’s serverless, highly scalable architecture, you can manage and analyze large datasets more efficiently, without worrying about infrastructure limitations.
- Enhanced Analytics: BigQuery provides powerful, real-time analytics capabilities that can handle large-scale data with ease, enabling deeper insights and faster decision-making than MongoDB alone.
- Seamless Integration: BigQuery integrates smoothly with Google’s data ecosystem, allowing you to connect with other tools like Google Data Studio, Google Sheets, and Looker for more advanced data visualization and reporting.
Struggling with custom scripts to sync MongoDB and BigQuery? Hevo simplifies the process with a fully managed, no-code data pipeline that gets your data where it needs to be fast and reliably.
With Hevo:
- Connect MongoDB to BigQuery in just a few clicks.
- Handle semi-structured data effortlessly with built-in transformations.
- Automate schema mapping and keep your data analysis-ready.
Trusted by 2000+ data professionals at companies like Postman and ThoughtSpot. Rated 4.4/5 on G2. Try Hevo and make your MongoDB to BigQuery migration seamless!
Get Started with Hevo for FreeConclusion
In this blog, we have provided you with 3 step-by-step methods to make migrating from MongoDB to BigQuery an easy everyday task for you! The methods discussed in this blog can be applied so that business data in MongoDB and BigQuery can be integrated without any hassle through a smooth transition, with no data loss or inconsistencies.
Sign up for a 14-day free trial with Hevo Data to streamline your migration process and leverage multiple connectors, such as MongoDB and BigQuery, for real-time analysis!
FAQ on MongoDB To BigQuery
1. What is the difference between BigQuery and MongoDB?
BigQuery is a fully managed data warehouse for large-scale data analytics using SQL. MongoDB is a NoSQL database optimized for storing unstructured data with high flexibility and scalability.
2. How do I transfer data to BigQuery?
Use tools like Google Cloud Dataflow, BigQuery Data Transfer Service, or third-party ETL tools like Hevo Data for a hassle-free process.
3. Is BigQuery SQL or NoSQL?
BigQuery is an SQL database designed to run fast, complex analytical queries on large datasets.
4. What is the difference between MongoDB and Oracle DB?
MongoDB is a NoSQL database optimized for unstructured data and flexibility. Oracle DB is a relational database (RDBMS) designed for structured data, complex transactions, and strong consistency.