Firestore is a flexible and scalable database that helps import or export application data from multiple data sources into Data Warehouses or other BI (Business Intelligence) tools. Google BigQuery or BigQuery is one of the most popular Data Warehouses developed by Google. Loading data from Firestore to BigQuery plays a pivotal role when applications are developed using other Google technologies.
There needs to be communication between them to cater to business requirements. This article provides a comprehensive guide on integrating Firestore and BigQuery. It also provides a general overview of both Firestore and BigQuery to better understand these technologies individually.
Introduction to Firestore
Firestore, a subset of Google Cloud, is a flexible, scalable NoSQL Cloud Database for mobile, web, and application development. Its main role is to keep your data in sync across client apps through real-time listeners. It also offers offline support for both mobile and web applications so that you can build responsive apps regardless of any Internet issues.
Firestore also supports easy integration with Firebase and other Google Cloud products, including Cloud Functions.
How does Firestore store data?
Firestore stores data in a self-descriptive, hierarchical way that involves documents and collections. Documents are small, independent records that have fields and values, while collections are groups of related documents. This flexible schema allows for nested data and supports real-time synchronisation and offline access.
Key Features of Firestore
Firestore houses a wide range of features which makes it a better choice as compared to other web and mobile application development databases. Some of these features are:
- Firestore is highly flexible and supports hierarchical data structures and collections. The documents can contain complex nested objects and subcollections and still be maintained properly.
- It allows you to perform simultaneous querying. Your queries can include multiple, chained filters, combine filtering, and sorting. In addition to this, queries are also indexed by default so that your query performance is proportional to the resulting data set.
- Firestore allows you to perform real-time updates on your data by leveraging data synchronization. It is also designed to fetch one-time queries efficiently.
Skip the hassle of manually migrating your Firebase data to BigQuery and choose Hevo’s No-code Automated Pipeline. With Hevo:
- Seamlessly pull data from HubSpot and over 150+ other sources with ease.
- Utilize drag-and-drop and custom Python script features to transform your data.
- Efficiently migrate data to a data warehouse, ensuring it’s ready for insightful analysis in Tableau.
Experience the simplicity of data integration with Hevo and see how Hevo helped fuel FlexClub’s drive for accurate analytics and unified data.
Get Started with Hevo for Free
Introduction to BigQuery
Google BigQuery is a Data Warehousing platform that is serverless, cost-effective, highly scalable, and has Machine Learning built into it.
It uses the Business Intelligence Engine for its operations. It enables quick SQL queries to be combined with the processing power of Google’s infrastructure to manage business transactions, manage the data across different databases, and also allow access control policies for users to view and query data.
BigQuery has a BI engine which is a fast memory analysis service that enables users to analyze large and complex datasets with high concurrency. It works very well with tools such as Google Data Studio and Looker for analysis.
How does BigQuery store data?
BigQuery stores its data in tabular format within datasets and then into tables. Tables have rows and columns, where each column has a type of data. The SQL language organizes the data and provides the means of querying against the information. BigQuery architecture offers a high performance in querying and analysis across large datasets.
Key Features of Google BigQuery
Google BigQuery houses many features to help all types of users store their data from multiple data sources and gain valuable and actionable insights from its customers. Some of these features include:
- It requires minimal configuration and so anyone can use it without being a Database Administrator.
- It is fully manageable and does not require any resources to be deployed.
- It encrypts your data by default.
- It supports a variety of languages including Java, .NET, and Python.
- It supports a wide variety of 3rd party applications to visualize and analyze data.
Datatype Conversion
BigQuery converts data from each document in Firestore export files to BigQuery data types. The following table describes the conversion between supported data types.
Firestore Data Type | BigQuery Data Type |
Array | RECORD |
Boolean | BOOLEAN |
Reference | RECORD |
Data and Time | TIMESTAMP |
Map | RECORD |
Floating-point number | FLOAT |
Geographical point | RECORD [{“lat”,”FLOAT”}, {“long”,”FLOAT”}] |
Integer | INTEGER |
String | STRING (truncated to 64MB) |
This mapping shows how Firestore data types translate to corresponding BigQuery data types for seamless data integration and querying.
Integrate MariaDB to BigQuery
Integrate MS SQL Server to BigQuery
Integrate PostgreSQL to BigQuery
Prerequisites
I have listed below the required things that should be present with you while performing the integration.
- Have an active Firestore and Google BigQuery account.
- Have access to the “ bigquery.tables.create ”, “ bigquery.tables. updateData “, and the “ bigquery. jobs. create” permissions in Google BigQuery.
- Have access to the “ storage.objects.get ”permissions in Firestore.
Methods to Load Data from Firestore to BigQuery
Streaming data from Firestore to BigQuery may incur costs based on data volume and query complexity and may require flattening Firestore’s hierarchical data to fit BigQuery’s tabular format. Ensure you have the necessary permissions to load jobs and access BigQuery tables or partitions. For Cloud Storage bucket data loads, you must also have export permissions.
Similarly, if you want to load data from a Cloud Storage bucket, you must be granted all the export permissions. There are 2 main methods you can use to set up Firestore export to BigQuery. They are given below:
Method 1: Using the Firestore Cloud Console to Load Data from Firestore to BigQuery
In this method, you can load data from Firestore to BigQuery by using the Firestore Cloud Console. This method can be used by following the below steps:
Step 1: Open the BigQuery Page in the Cloud Console
The first step to load data from Firestore to BigQuery is to access the Cloud Console and open the BigQuery page. The home page looks like the figure shown below.
Step 2: Select the Dataset
The next step is to go to the Explorer Panel and expand your project to select a dataset.
Step 3: Create the Table
Once this is done, you need to go to the Details Panel and click on Create Table.
Step 4: Enter the Cloud Storage URI
The next step to load data from Firestore to BigQuery is to set up the Cloud Storage URI.
To do this, go to the Create Table page, go to the Source Section, and select Cloud Storage. Once you do that, in the Source field, enter the Cloud Storage URI.
An important point to note is that the Cloud Storage Bucket must be in the same location as your dataset. The URI for your Firestore export file should end with KIND_COLLECTION_ID.export_metadata.
Once that is done, you need to choose the File Format. Select DataStore Backup as it is the correct option for Firestore. Firestore and Datastore share an export format.
Once the formats are selected, you need to configure the Destination to load data from Firestore to BigQuery. For the Dataset name, choose the appropriate dataset as shown in the figure below.
In the Table Name field, enter the name of the table you’re creating in BigQuery and also verify that the Table type is set to Native Table.
Step 6: Overviewing the Schema
In the Schema section, no action is necessary. The schema is inferred for a Firestore export.
Step 7: Configure any Advanced Settings
In case you have any other advanced settings to add to your table, go to the Advanced Options section and set Write preference to Overwrite table. This is shown below.
Finally, click on Create table.
That’s it! You have successfully loaded data from Firestore to BigQuery using the Firestore Cloud Console.
Method 2: Using the BigQuery Command Line Interface to Load Data from Firestore to BigQuery
Before getting started with this method to load data from Firestore to BigQuery, it’s essential to note that every document in Firestore export files has its data converted to BigQuery data types by BigQuery. The conversion between the available data types is shown in the following table.
To use this method you need to use the bq load command with source_format set to DATASTORE_BACKUP on the Command Line Interface.
Step 1: The code for implementing the bq load command on the command-line interface is shown below:
bq --location=LOCATION load
--source_format=FORMAT
DATASET.TABLE
PATH_TO_SOURCE
You need to replace the following attributes according to your requirements:
- LOCATION: Replace it with your current location. The –location flag is optional.
- FORMAT: Replace it with DATASTORE_BACKUP. This is because Datastore Backup is the correct option for Firestore. Firestore and Datastore share the same export format.
- DATASET: Replace it with the dataset that contains the table into which you are loading the data.
- TABLE: This is the table into which you are loading the data. If you do not have a table, it will be created automatically.
- PATH_TO_SOURCE: Replace it with the Cloud Storage URI.
A sample code example with all these parameters set is shown below.
bq --location=US load
--source_format=DATASTORE_BACKUP
mydataset.book_data
gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata
In this example, you are loading the
gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata Firestore export file into a table named book_data. mybucket and mydataset were created in the US multi-region location.
That’s it! You have successfully loaded data from Firestore to BigQuery using the BigQuery Command Line Interface.
Limitations of Loading Data from Firestore to BigQuery
Although both methods for loading Firestore data to BigQuery are easy to understand, users that have a non-technical background might find it difficult to load their data in the first go.
This is because even if the transfer process is successfully deployed, there are a lot of resources you must have before initiating the process.
Some of the limitations of Loading data from Firestore to BigQuery are:
- Your dataset must be in the same regional location as the Cloud Storage Bucket containing your export files.
- You can specify only 1 Cloud Storage URI, and cannot use a URI wildcard.
- You can only create a new table or overwrite an existing table to store the data. You cannot append Firestore export data to an existing table.
- For a Firestore export to load correctly, documents in the export data must share a consistent schema with fewer than 10,000 unique field names.
- The export command must specify a “collection-ids” filter. Without this filter, data cannot be exported into BigQuery.
Use Cases
Firebase is a popular mobile and web application development platform that offers real-time database and authentication services. Integrating data from Firebase to BigQuery can provide numerous benefits. Here are some use cases that explain why syncing Firebase data to BigQuery is essential:
- Advanced Analytics: BigQuery’s powerful data processing capabilities enable you to perform complex queries and data analysis on your Firebase data, extracting insights that wouldn’t be possible within Firebase alone.
- Data Consolidation: If you are utilizing Firebase alongside other sources, you can unify your data and gain a comprehensive understanding of your operations by syncing to BigQuery. Additionally, it allows you to establish a change data capture process, ensuring that your data is always accurate and consistent.
- Historical Data Analysis: Firebase has limitations on historical data. Syncing your data to BigQuery allows for long-term data retention and analysis of historical trends over time.
- Scalability: BigQuery can handle large volumes of data without affecting performance, providing an ideal solution for growing businesses with expanding Firebase data.
- Data Science: With Firebase data in BigQuery, you can apply machine learning models to your data for predictive analytics, customer segmentation, and more.
- Reporting: Firebase offers rudimentary reporting capabilities; however, for more sophisticated business intelligence options, it is recommended to incorporate Tableau, PowerBI, and Looker (Google Data Studio) with BigQuery. This integration allows for advanced data visualization and analytics.
For enhanced analytics and reporting, consider a Firestore to BigQuery integration. To implement this, you’ll need to establish a Firestore to BigQuery data pipeline. Analyzing performance is crucial for optimizing the pipeline, and understanding how triggers work will be pivotal for automating updates and maintaining data synchronization.
Before wrapping up, let’s cover some basics.
Conclusion
This article gave you a step-by-step guide on loading data from Firestore to BigQuery. It also gave you an overview of both Firestore and BigQuery and also highlighted some of the limitations of this process. Overall, loading data from Firestore to BigQuery can be done using any method and plays an important role in extracting valuable insights from customers that use your applications.
Companies store valuable data from multiple data sources in Google BigQuery. However, when it comes to loading with BigQuery, you need to be an expert to set up ETL pipelines from scratch and manually configure several details.
Share your experience of Loading Data from Firestore to BigQuery in the comments section below!
FAQs to integrate Firestore with BigQuery
1. How to load data from Firestore to BigQuery?
To load data from Firestore to BigQuery, use the Firestore export feature to export data to Google Cloud Storage, then load it into BigQuery using the BigQuery Data Transfer Service or bq command-line tool. Ensure data is in a compatible format and permissions are properly configured.
2. What is the difference between BigQuery and Firestore?
BigQuery is a fully-managed, serverless data warehouse optimized for large-scale data analytics and SQL queries, while Firestore is a NoSQL document database designed for real-time synchronization and flexible data storage in applications.
3. How do I extract data from Firebase?
To extract data from Firebase, use the Firebase Realtime Database or Firestore SDKs to read data programmatically or export data through Firebase’s data export tools and Firebase CLI. For larger datasets, consider using Firebase’s integration with Google Cloud Storage for bulk exports.
4. When should you not use BigQuery?
Avoid using BigQuery for low-volume, low-complexity data operations where a traditional database or data processing tool would be more cost-effective. It’s also less suitable for real-time transactional processing or scenarios requiring frequent updates to small datasets.
Aakash is a research enthusiast who was involved with multiple teaming bootcamps including Web Application Pen Testing, Network and OS Forensics, Threat Intelligence, Cyber Range and Malware Analysis/Reverse Engineering. His passion to the field drives him to create in-depth technical articles related to data industry. He holds a Undergraduate Degree from Vellore Institute of Technology in Computer Science & Engineering with a Specialization in Information Security and is keen to help data practitioners with his expertise in the related topics.