MongoDB is a popular NoSQL database that requires data to be modeled in JSON format. If your application’s data model has a natural fit to MongoDB’s recommended data model, it can provide good performance, flexibility, and scalability for transaction types of workloads.
However, due to a few restrictions that you can face while analyzing data, it is highly recommended to stream data from MongoDB to BigQuery or any other data warehouse.
MongoDB doesn’t have proper join, getting data from other systems to MongoDB will be difficult, and it also has no native support for SQL. MongoDB’s aggregation framework is not as easy to draft complex analytics logic as in SQL.
The article provides steps to migrate data from MongoDB to BigQuery. It also talks about Hevo Data, making it easier to replicate data. Therefore, without any further ado, let’s start learning about this MongoDB to BigQuery ETL.
What is MongoDB?
MongoDB is a popular NoSQL database management system known for its flexibility, scalability, and ease of use. It stores data in flexible, JSON-like documents, making it suitable for handling a variety of data types and structures.
MongoDB is commonly used in modern web applications, data analytics, real-time processing, and other scenarios where flexibility and scalability are essential.
What is BigQuery?
BigQuery is a fully managed, serverless data warehouse and analytics platform provided by Google Cloud. It is designed to handle large-scale data analytics workloads and allows users to run SQL-like queries against multi-terabyte datasets in a matter of seconds.
BigQuery supports real-time data streaming for analysis, integrates with other Google Cloud services, and offers advanced features like machine learning integration, data visualization, and data sharing capabilities.
Prerequisites
mongoexport
(for exporting data from MongoDB)
- a BigQuery dataset
- a Google Cloud Platform account
- Hevo free-trial account
Methods to move Data from MongoDB to BigQuery
Method 1: Using Hevo Data for MongoDB to BigQuery Integration
Simplify your MongoDB to BigQuery data transfer with Hevo. Our no-code platform automates the entire process, ensuring seamless and real-time data synchronization between MongoDB and BigQuery.
Why Choose Hevo?
- Automated Data Integration: Effortlessly connect MongoDB to BigQuery.
- Real-Time Updates: Keep your data current and accurate.
- No-Code Platform: Easy setup and management without technical expertise.
Trusted by 2000+ Customers
See why industry leaders like FairMoney rely on Hevo for efficient data management.
Move your MongoDB Data for Free!
Method 2: Manual Steps to Stream Data from MongoDB to BigQuery
This method involves manually exporting data from MongoDB and using custom scripts or tools to load it into BigQuery. It can be time-consuming and error-prone compared to automated solutions.
Method 1: Using Hevo Data to Set up MongoDB to BigQuery
Step 1: Select the Source Type
To select MongoDB as the Source:
- Click PIPELINES in the Asset Palette.
- Click + CREATE in the Pipelines List View.
- In the Select Source Type page, select the MongoDB variant.
Step 2: Select the MongoDB Variant
Select the MongoDB service provider that you use to manage your MongoDB databases:
- Generic Mongo Database: Database management is done at your end, or by a service provider other than MongoDB Atlas.
- MongoDB Atlas: The managed database service from MongoDB.
Step 3: Specify MongoDB Connection Settings
Refer to the following sections based on your MongoDB deployment:
In the Configure your MongoDB Source page, specify the following:
Step 4: Configure BigQuery Connection Settings
Now Select Google BigQuery as your destination and start moving your data.
You can modify only some of the settings you provide here once the Destination is created. Refer to the section Modifying BigQuery Destination Configuration below for more information.
- Click DESTINATIONS in the Asset Palette.
- Click + CREATE in the Destinations List View.
- In the Add Destination page, select Google BigQuery as the Destination type.
- In the Configure your Google BigQuery Account page, select the authentication method for connecting to BigQuery.
- In the Configure your Google BigQuery Warehouse page, specify the following details.
By following the above-mentioned steps, you will have successfully completed MongoDB BigQuery replication.
With continuous Real-Time data movement, Hevo allows you to combine MongoDB data with your other data sources and seamlessly load it to BigQuery with a no-code, easy-to-setup interface. Try our 14-day full-feature access free trial!
Method 2: Manual Steps to Stream Data from MongoDB to BigQuery
For the manual method, you will need some prerequisites, like:
- MongoDB environment: You should have a MongoDB account with a dataset and collection created in it.
- Tools like MongoDB compass and tool kit should be installed on your system.
- You should have access to MongoDB, including the connection string required to establish a connection using the command line.
- Google Cloud Environment
After meeting these requirements, you can manually export your data from MongoDB to BigQuery. Let’s get started!
Step 1: Extract Data from MongoDB
For the first step, you must extract data from your MongoDB account using the command line. To do this, you can use the mongoexport
utility. Remember that mongoexport should be directly run on your system’s command-line window.
An example of a command that you can give is:
mongoexport --uri="mongodb+srv://username:password@cluster-name.gzjfolm.mongodb.net/database_name" --collection=collection_name --out=filename.file_format --fields="field1,field2…"
Note:
- ‘username: password’ is your MongoDB username and password.
- ‘Cluster_name’ is the name of the cluster you created on your MongoDB account. It contains the database name (database_name) that contains the data you want to extract.
- The ‘–collection’ is the name of the table that you want to export.
- ‘–out=Filename.file_format’ is the file’s name and format in which you want to extract the data. For example, Comments.csv, the file with the extracted data, will be stored as a CSV file named comments.
- ‘– fields’ is applicable if you want to extract data in a CSV file format.
After running this command, you will get a message like this displayed on your command prompt window:
Connected to:mongodb+srv://[**REDACTED**]@cluster-name.gzjfolm.mongodb.net/database_name
exported n records
Here, n is just an example. When you run this command, it will display the number of records exported from your MongoDB collection.
Step 2: Optional cleaning and transformations
This is an optional step, depending on the type of data you have exported from MongoDB. When preparing data to be transferred from MongoDB to BigQuery, there are a few fundamental considerations to make in addition to any modifications necessary to satisfy your business logic.
- BigQuery processes UTF-8 CSV data. If your data is encoded in ISO-8859-1 (Latin-1), then you should specify that while loading it to BigQuery.
- BigQuery doesn’t enforce Primary key or Unique key Constraints, and the ETL (Extract, Transform, and Load) process should take care of that.
- Date values should be in the YYYY-MM-DD (Year-month-date) format and separated by dashes.
- Also, both platforms have different column types, which should be transformed for consistent and error-free data transfer. A few data types and their equivalents in BigQuery are as follows:
MongoDB Data Type | BigQuery Data Type |
DOUBLE | FLOAT64 |
STRING | STRING |
BINARY DATA | BYTES |
OBJECTID | STRING |
BOOLEAN | BOOL |
DATE | DATE |
NULL | NULL |
32-BIT INTEGER | INT64 |
TIMESTAMP | TIMESTAMP |
64-BIT INTEGER | INT64 |
DECIMAL128 | NUMERIC |
These are just a few transformations you need to consider. Make the necessary translations before you load data to BigQuery.
Sync your Data from MongoDB to BigQuery
Sync your Data from MongoDB Atlas to BigQuery
Sync your Data from Oracle to BigQuery
Step 3: Uploading data to Google Cloud Storage (GCS)
After transforming your data, you must upload it to Google Cloud storage. The easiest way to do this is through your Google Cloud Web console.
- Login to your Google Cloud account and search for Buckets. Fill in the required fields and click Create.
- After creating the bucket, you will see your bucket listed with the rest. Select your bucket and click on the ‘upload files’ option.
- Select the file you exported from MongoDB in Step 1. Your MongoDB data is now uploaded to Google Cloud Storage.
Step 4: Upload Data Extracted from MongoDB to BigQuery Table from GCS
- Now, from the left panel of Google Cloud, select BigQuery and select the project you are working on. Click on the three dots next to it and click ‘Create Dataset.’
- Fill in all the necessary information and click the ‘Create Dataset’ button at the bottom. You have now created a dataset to store your exported data in.
- Now click on the three dots next to the dataset name you just created. Let’s say I created the dataset called
mongo_to_bq.
Select the ‘Create table’ option.
- Now, select the ‘Google Cloud Storage’ option and click the ‘browse’ option to select the dataset you created(
mongo_to_bq
).
- Fill in the rest of the details and click ‘Create Table’ at the bottom of the page.
- Now, your data has been transferred from MongoDB to BigQuery.
Step 5: Verify Data Integrity
After loading the data to BigQuery, it is essential to verify that the same data from MongoDB has been transferred and that no missing or corrupted data is loaded to BigQuery. To verify the data integrity, run some SQL queries in BigQuery UI and compare the records fetched as their result with your original MongoDB data to ensure correctness and completeness.
Example: To find the locations of all the theaters in a dataset called “Theaters,” we can run the following query.
Learn more about:
Limitations of Manually Moving Data from MongoDB to BigQuery
The following are some possible drawbacks when data is streamed from MongoDB to BigQuery manually:
- Time-Consuming: Compared to automated methods, manually exporting MongoDB data, transferring it to Cloud Storage, and then importing it into BigQuery is inefficient. Every time fresh data enters MongoDB, this laborious procedure must be repeated.
- Potential for human error: There is a chance that data will be wrongly exported, uploaded to the wrong place, badly converted, or loaded to the wrong table or partition if error-prone manual procedures are followed at every stage.
- Data lags behind MongoDB: The data in BigQuery might not be current with the most recent inserts and changes in the MongoDB database due to the manual process’s latency. Recent modifications may be overlooked in important analyses.
- Difficult to incrementally add new data: When opposed to automatic streaming, which manages this effectively, adding just new or modified MongoDB entries manually is difficult.
- Hard to reprocess historical data: It would be necessary to manually export historical data from MongoDB and reload it into BigQuery if any problems were discovered in the datasets that were previously imported.
- No error handling: Without automated procedures to detect, manage, and retry mistakes and incorrect data, problems like network outages, data inaccuracies, or restrictions violations may arise.
- Scaling limitations: MongoDB’s exporting, uploading, and loading processes don’t scale properly and become increasingly difficult as data sizes increase.
The constraints drive the requirement for automated MongoDB to BigQuery replication to create more dependable, scalable, and resilient data pipelines.
MongoDB to BigQuery: Use Cases
Streaming data from MongoDB to BigQuery may be very helpful in the following frequent use cases:
- Business analytics: Analysts may use BigQuery’s quick SQL queries, sophisticated analytics features, and smooth interaction with data visualization tools like Data Studio by streaming MongoDB data into BigQuery. This can lead to greater business insights.
- Data warehousing: By streaming data from MongoDB and merging it with data from other sources, businesses may create a cloud data warehouse on top of BigQuery, enabling corporate reporting and dashboards.
- Log analysis: BigQuery’s columnar storage and massively parallel processing capabilities enable the streaming of server, application, and clickstream logs from MongoDB databases for large-scale analytics.
- Data integration: By streaming to BigQuery as a centralised analytics data centre, businesses using MongoDB for transactional applications may integrate and analyse data from their relational databases, customer relationship management (CRM) systems, and third-party sources.
- Machine Learning: Streaming data from production MongoDB databases may be utilized to train ML models using BigQuery ML’s comprehensive machine learning features.
- Cloud migration: By gradually streaming data, move analytics from on-premises MongoDB to Google Cloud’s analytics and storage services.
Connect MongoDB to BigQuery in two Minutes
No credit card required
Additional Read –
Conclusion
This blog makes migrating from MongoDB to BigQuery an easy everyday task for you! The methods discussed in this blog can be applied so that business data in MongoDB and BigQuery can be integrated without any hassle through a smooth transition, with no data loss or inconsistencies.
Sign up for a 14-day free trial with Hevo Data to streamline your migration process and leverage multiple connectors, such as MongoDB and BigQuery, for real-time analysis!
FAQ on MongoDB To BigQuery
What is the difference between BigQuery and MongoDB?
BigQuery is a fully managed data warehouse for large-scale data analytics using SQL. MongoDB is a NoSQL database optimized for storing unstructured data with high flexibility and scalability.
How do I transfer data to BigQuery?
Use tools like Google Cloud Dataflow, BigQuery Data Transfer Service, or third-party ETL tools like Hevo Data for a hassle-free process.
Is BigQuery SQL or NoSQL?
BigQuery is an SQL database designed to run fast, complex analytical queries on large datasets.
What is the difference between MongoDB and Oracle DB?
MongoDB is a NoSQL database optimized for unstructured data and flexibility. Oracle DB is a relational database (RDBMS) designed for structured data, complex transactions, and strong consistency.
Chirag is a seasoned support engineer with over 7 years of experience, including over 4 years at Hevo Data, where he's been pivotal in crafting core CX components. As a team leader, he has driven innovation through recruitment, training, process optimization, and collaboration with multiple technologies. His expertise in lean solutions and tech exploration has enabled him to tackle complex challenges and build successful services.