Connect Firestore to BigQuery: 2 Easy Methods

Firestore is a flexible and scalable database that helps import or export application data from multiple data sources into Data Warehouses or other BI (Business Intelligence) tools. Google BigQuery or BigQuery is one of the most popular Data Warehouses developed by Google. Loading data from Firestore to BigQuery plays a pivotal role when applications are developed using other Google technologies.

Table of Contents

There needs to be communication between them to cater to business requirements. This article provides a comprehensive guide on integrating Firestore and BigQuery. It also provides a general overview of both Firestore and BigQuery to better understand these technologies individually.

Methods to Load Data from Firestore to BigQuery

Note: It’s essential to consider the potential costs associated with the process to stream Firestore to BigQuery as they can be influenced by factors like data volume and query complexity. Also, when you migrate data from Firestore to BigQuery, the hierarchical structure of Firestore may require flattening to fit BigQuery’s tabular format.

Regardless of the method, you choose to load data from Firestore to BigQuery you need certain permissions. When you load data from Firestore to BigQuery, you need permissions to run a load job and permissions that let you load data into new or existing BigQuery tables and partitions.

Similarly, if you want to load data from a Cloud Storage bucket, you must be granted all the export permissions. There are 2 main methods you can use to set up Firestore export to BigQuery. They are given below:

Method 1: Using the Firestore Cloud Console to Load Data from Firestore to BigQuery
Method 2: Using the BigQuery Command Line Interface to Load Data from Firestore to BigQuery

Method 1: Using the Firestore Cloud Console to Load Data from Firestore to BigQuery

In this method, you can load data from Firestore to BigQuery by using the Firestore Cloud Console. This method can be used by following the below steps:

Step 1: Open the BigQuery Page in the Cloud Console

The first step to load data from Firestore to BigQuery is to access the Cloud Console and open the BigQuery page. The home page looks like the figure shown below.

Firestore to BigQuery: Google BigQuery Home Page — Image Source: Self

Step 2: Select the Dataset

The next step is to go to the Explorer Panel and expand your project to select a dataset.

Step 3: Create the Table

Once this is done, you need to go to the Details Panel and click on Create Table.

Step 4: Enter the Cloud Storage URI

The next step to load data from Firestore to BigQuery is to set up the Cloud Storage URI.

To do this, go to the Create Table page, go to the Source Section, and select Cloud Storage. Once you do that, in the Source field, enter the Cloud Storage URI.

An important point to note is that the Cloud Storage Bucket must be in the same location as your dataset. The URI for your Firestore export file should end with KIND_COLLECTION_ID.export_metadata.

Once that is done, you need to choose the File Format. Select DataStore Backup as it is the correct option for Firestore. Firestore and Datastore share an export format.

Step 5: Configure the Destination

Once the formats are selected, you need to configure the Destination to load data from Firestore to BigQuery. For the Dataset name, choose the appropriate dataset as shown in the figure below.

Firestore to BigQuery: Loading data from Firestore to BigQuery dataset — Image Source

In the Table Name field, enter the name of the table you’re creating in BigQuery and also verify that the Table type is set to Native Table.

Step 6: Overviewing the Schema

In the Schema section, no action is necessary. The schema is inferred for a Firestore export.

Step 7: Configure any Advanced Settings

In case you have any other advanced settings to add to your table, go to the Advanced Options section and set Write preference to Overwrite table. This is shown below.

Firestore to BigQuery: Advance Options — Image Source

Finally, click on Create table.

That’s it! You have successfully loaded data from Firestore to BigQuery using the Firestore Cloud Console.

Method 2: Using the BigQuery Command Line Interface to Load Data from Firestore to BigQuery

Before getting started with this method to load data from Firestore to BigQuery, it’s essential to note that every document in Firestore export files has its data converted to BigQuery data types by BigQuery. The conversion between the available data types is shown in the following table.

Firestore data type	BigQuery data type
Array	RECORD
Boolean	BOOLEAN
Reference	RECORD
Date and time	TIMESTAMP
Map	RECORD
Floating-point number	FLOAT
Geographical point	RECORD[{“lat”,”FLOAT”}, {“long”,”FLOAT”}]
Integer	INTEGER
String	STRING (truncated to 64 KB)

To use this method you need to use the bq load command with source_format set to DATASTORE_BACKUP on the Command Line Interface. The code for implementing the bq load command on the command-line interface is shown below:

bq --location=LOCATION load 
--source_format=FORMAT 
DATASET.TABLE 
PATH_TO_SOURCE

You need to replace the following attributes according to your requirements:

LOCATION: Replace it with your current location. The –location flag is optional.
FORMAT: Replace it with DATASTORE_BACKUP. This is because Datastore Backup is the correct option for Firestore. Firestore and Datastore share the same export format.
DATASET: Replace it with the dataset that contains the table into which you are loading the data.
TABLE: This is the table into which you are loading the data. If you do not have a table, it will be created automatically.
PATH_TO_SOURCE: Replace it with the Cloud Storage URI.

A sample code example with all these parameters set is shown below.

bq --location=US load 
--source_format=DATASTORE_BACKUP 
mydataset.book_data 
gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata

In this example, you are loading the

gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata Firestore export file into a table named book_data. mybucket and mydataset were created in the US multi-region location.

That’s it! You have successfully loaded data from Firestore to BigQuery using the BigQuery Command Line Interface.

Limitations of Loading Data from Firestore to BigQuery

Although both methods for loading Firestore data to BigQuery are easy to understand, users that have a non-technical background might find it difficult to load their data in the first go.

This is because even if the transfer process is successfully deployed, there are a lot of resources you must have before initiating the process.

Some of the limitations of Loading data from Firestore to BigQuery are:

Your dataset must be in the same regional location as the Cloud Storage Bucket containing your export files.
You can specify only 1 Cloud Storage URI, and cannot use a URI wildcard.
You can only create a new table or overwrite an existing table to store the data. You cannot append Firestore export data to an existing table.
For a Firestore export to load correctly, documents in the export data must share a consistent schema with fewer than 10,000 unique field names.
The export command must specify a “collection-ids” filter. Without this filter, data cannot be exported into BigQuery.

Use Cases

Firebase is a popular mobile and web application development platform that offers real-time database and authentication services. Integrating data from Firebase to BigQuery can provide numerous benefits. Here are some use cases that explain why syncing Firebase data to BigQuery is essential:

Advanced Analytics: BigQuery’s powerful data processing capabilities enable you to perform complex queries and data analysis on your Firebase data, extracting insights that wouldn’t be possible within Firebase alone.

Data Consolidation: If you are utilizing Firebase alongside other sources, you can unify your data and gain a comprehensive understanding of your operations by syncing to BigQuery. Additionally, it allows you to establish a change data capture process, ensuring that your data is always accurate and consistent.

Historical Data Analysis: Firebase has limitations on historical data. Syncing your data to BigQuery allows for long-term data retention and analysis of historical trends over time.

Scalability: BigQuery can handle large volumes of data without affecting performance, providing an ideal solution for growing businesses with expanding Firebase data.

Data Science: With Firebase data in BigQuery, you can apply machine learning models to your data for predictive analytics, customer segmentation, and more.

Reporting: Firebase offers rudimentary reporting capabilities; however, for more sophisticated business intelligence options, it is recommended to incorporate Tableau, PowerBI, and Looker (Google Data Studio) with BigQuery. This integration allows for advanced data visualization and analytics.

For enhanced analytics and reporting, consider a Firestore to BigQuery integration. To implement this, you’ll need to establish a Firestore to BigQuery data pipeline. Analyzing performance is crucial for optimizing the pipeline, and understanding how triggers work will be pivotal for automating updates and maintaining data synchronization.

Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery

Learn the best practices and considerations for setting up high-performance ETL to BigQuery

Before wrapping up, let’s cover some basics.

Prerequisites

Working knowledge of Google BigQuery and Firestore.
Have access to the “ bigquery.tables.create ”, “ bigquery.tables. updateData “, and the “ bigquery. jobs. create” permissions in Google BigQuery.
Have access to the “ storage.objects.get ”permissions in Firestore.
Working knowledge of Databases and Data Warehouses.

Introduction to Firestore

Image Source

Firestore, a subset of Google Cloud, is a flexible, scalable NoSQL Cloud Database for mobile, web, and application development. Its main role is to keep your data in sync across client apps through real-time listeners. It also offers offline support for both mobile and web applications so that you can build responsive apps regardless of any Internet issues.

Firestore also supports easy integration with Firebase and other Google Cloud products, including Cloud Functions.

Key Features of Firestore

Firestore houses a wide range of features which makes it a better choice as compared to other web and mobile application development databases. Some of these features are:

Firestore is highly flexible and supports hierarchical data structures and collections. The documents can contain complex nested objects and subcollections and still be maintained properly.
It allows you to perform simultaneous querying. Your queries can include multiple, chained filters, combine filtering, and sorting. In addition to this, queries are also indexed by default so that your query performance is proportional to the resulting data set.
Firestore allows you to perform real-time updates on your data by leveraging data synchronization. It is also designed to fetch one-time queries efficiently.
It offers offline support as it caches data every time it is being used. So, the application can write, read, listen to, and query data even if the device is offline. Once the device is back online, all local changes are synchronized back to the Cloud.
Firestore brings you the best of Google’s Cloud Platform and is highly scalable to handle the workload of the biggest applications.

By loading data from Firestore to BigQuery, you will be able to transfer your application data into the Data Warehouse by using the powerful Google Cloud Platform every step of the way.

Introduction to Google BigQuery

Image Source

Google BigQuery is a Data Warehousing platform that is serverless, cost-effective, highly scalable, and has Machine Learning built into it.

It uses the Business Intelligence Engine for its operations. It enables quick SQL queries to be combined with the processing power of Google’s infrastructure to manage business transactions, manage the data across different databases, and also allow access control policies for users to view and query data.

BigQuery has a BI engine which is a fast memory analysis service that enables users to analyze large and complex datasets with high concurrency. It works very well with tools such as Google Data Studio and Looker for analysis.

It also uses Machine Learning that helps data scientists and data analysts to build and operate various models using simple SQL commands. The models tested could also be exported for prediction in a Cloud AI Platform or a separate service layer.

BigQuery is used by many companies such as UPS, Twitter, and Dow Jones. UPS uses BigQuery to predict the precise package volume for its products. Twitter helps to modernize advertisements and aggregates millions of data per second using BigQuery.

Dow Jones uses BigQuery to unlock the business value of more than 1.3 million users to uncover the hidden relationships between various attributes.

Key Features of Google BigQuery

Google BigQuery houses many features to help all types of users store their data from multiple data sources and gain valuable and actionable insights from its customers. Some of these features include:

It requires minimal configuration and so anyone can use it without being a Database Administrator.
It is fully manageable and does not require any resources to be deployed.
It encrypts your data by default.
It supports a variety of languages including Java, .NET, and Python.
It supports a wide variety of 3rd party applications to visualize and analyze data.

By loading data from Firestore to BigQuery, you can regularly monitor your application data throughout the loading process.

Conclusion

This article gave you a step-by-step guide on loading data from Firestore to BigQuery. It also gave you an overview of both Firestore and BigQuery and also highlighted some of the limitations of this process. Overall, loading data from Firestore to BigQuery can be done using any method and plays an important role in extracting valuable insights from customers that use your applications.

Companies store valuable data from multiple data sources in Google BigQuery. However, when it comes to loading with BigQuery, you need to be an expert to set up ETL pipelines from scratch and manually configure several details.

Share your experience of Loading Data from Firestore to BigQuery in the comments section below!

Aakash Raman Business Associate, Hevo Data

Aakash is a research enthusiast who was involved with multiple teaming bootcamps including Web Application Pen Testing, Network and OS Forensics, Threat Intelligence, Cyber Range and Malware Analysis/Reverse Engineering. His passion to the field drives him to create in-depth technical articles related to data industry. He holds a Undergraduate Degree from Vellore Institute of Technology in Computer Science & Engineering with a Specialization in Information Security and is keen to help data practitioners with his expertise in the related topics.

No-code Data Pipeline For Google BigQuery

Try for free