Firestore Data Model: An Easy Guide

• April 12th, 2022

Firestore Data Model | Cover

Firestore is a NoSQL, document-oriented database modeling and relationship builder, which means no tables or rows exist, unlike SQL databases. So, you store data in documents that are later organized into collections.

In short, it helps you optimize queries for performance, cost, and complexity. Using Firestore Data Models, you can seamlessly build responsive applications which work flawlessly even with low internet connectivity or network latency.

In this blog post, we will be talking about Firestore Data Model and techniques which you can employ to build better, faster data models. Let’s begin.

Table of Contents

What is Cloud Firestore?

Firestore Data Model | firestore logo
Cloud Firestore

Firestore is a product that comes under the umbrella category of Google Cloud Platform — and is a flexible, scalable database for mobile, web, and server development.

Like Firebase’s real-time database, Firestore keeps the data in sync across client apps through real-time listeners. It also offers offline support for mobile and web, so you can build responsive apps that work regardless of network latency or Internet connectivity. 

Cloud Firestores provides excellent and seamless integration with other Google products. 

Cloud Firestore is a NoSQL, document-oriented database, which means data is stored in documents rather than rows and columns. Each Document can be organized into collections.

Key Features of Firestore:

Cloud Firestore provides several benefits over Firebase, some of them are discussed below –

  • Optimized for App development – Cloud Firestores is optimized for app development and helps developers develop apps faster.
  • Synchronizes data between devices in real-time – Cloud Firestore syncs the data between Android, iOS, and other Java-based applications in real-time, enabling users to collaborate easily.
  • Offline Access – Cloud Firestore allows users to access data offline with an on-device database. When the device comes back online, it synchronizes the data with the Cloud Firestore. The above features save users’ data loss in case of network disconnectivity.
  • Fully Managed – Cloud Firestore is a fully managed product from Google. It is built from the ground up to scale up upon demand automatically.
  • High Availability – Cloud Firestore stores the data in multi-region and uses replication factors to ensure high availability of the data in the case of unexpected disasters.
  • Server SDKs – It also means that delivering a great server-side experience for backend developers is a top priority. We’re launching SDKs for Java, Go, Python, and Node.js today, with more languages coming in the future.
Scale your data integration effortlessly with Hevo’s Fault-Tolerant No Code Data Pipeline

As the ability of businesses to collect data explodes, data teams have a crucial role to play in fueling data-driven decisions. Yet, they struggle to consolidate the data scattered across sources into their warehouse to build a single source of truth. Broken pipelines, data quality issues, bugs and errors, and lack of control and visibility over the data flow make data integration a nightmare.

1000+ data teams rely on Hevo’s Data Pipeline Platform to integrate data from over 150+ sources in a matter of minutes. Billions of data events from sources as varied as SaaS apps, Databases, File Storage and Streaming sources can be replicated in near real-time with Hevo’s fault-tolerant architecture. What’s more – Hevo puts complete control in the hands of data teams with intuitive dashboards for pipeline monitoring, auto-schema management, and custom ingestion/loading schedules. 

All of this combined with transparent pricing and 24×7 support makes us the most loved data pipeline software on review sites.

Take our 14-day free trial to experience a better way to manage data pipelines.

Firestore Data Model

Cloud Firestore is a cloud-hosted NoSQL database that is directly accessible via SDKs from iOs, Android, and any other web application. 

In Cloud Firestore, the data is stored in the form of documents which are then organized to form a collection. The collection acts as containers for documents that users can use to organize the data and build queries. 

The Firestore Data Model consists of three different terminologies:

Firestore Data Model | Different terminologies:
Image Source
  1. Document
  2. Collection
  3. Sub-Collection

Let’s discuss each of them in detail:


As Cloud Firestore is a NoSQL database, each entry (or row in terms of SQL) is called a Document. A document is a type of record that contains information in the form of key-value pair.

A document may support different data types, from simple strings and numbers to complex and nested objects. Below is a simple example of the Document representing a user – 

//Document 1

first : "Mac"
last : "Anthony"
born : 1992

A complex or nested document can look like the following:

// Document 2

first : "Mac"
last : "Anthony"
born : 1992
House_no : “2/1”
Street : “10 Belford street”
ZIP : 110012

A document looks like JSON and inherits all the JSON properties.


A collection is simply a container for documents. Several Document forms a collection. For example, you can have a user collection that holds the Document containing user information.


first : "Ada"
last : "Lovelace"
born : 1815
first : "Alan"
last : "Turing"
born : 1912

Firestore is a NoSQL database, which means the collections and documents are schemaless. You have complete freedom over what fields to put in a document and what documents you store in a particular collection. However, it’s a good idea to use the same fields and data types across multiple documents to query the documents more easily.

Now that we have understood what a collection is, there are some rules that you need to keep in mind while creating a collection and attaching a document with it.

  • A collection should only contain a Document. It cannot be a collection of strings, binaries, or anything else.
  • The documents attached to the collection should contain a unique name.
  • The Document should not contain any other documents.
  • The collection should be created before creating a document. 
  • The collection no longer exists, when all documents in the collection are deleted.
//Document 1  
first : "Mac"
last : "Anthony"
born : 1990

//Document 2  
first : "Abdul"
last : "Ahamed"
born : 1987


Sub-Collection is the way to store hierarchical data. Consider a scenario where there is a requirement to store the data from the Chat app. The chat app contains chat messages and chat rooms, and you need to store them in a single collection and rooms. The correct way to store a message, in this scenario, is to use a sub-collection.

//Document 1
name: "my chat room"
//Sub-collection 1
from : "Shubham"
msg : "Hello Shubham"
//Sub-collection 2
from : "Mac"
msg : "Hello Mac"
//Document 2
name: "my chat room two"
//Sub-collection 1
from : "Sam"
msg : "Hello Sam"
//Sub-collection 2
from : "Mac"
msg : "Hello Mac"

A sub-collection is a collection associated with specific documents. However, there are certain rules that you need to keep in mind while creating a sub-collection. 

  • You cannot reference the collection and Documents in the collection. You can store hierarchical data in the sub-collection, making it easier to access data.
  • You can also have sub-collections in documents in sub-collections, allowing us to nest the data more.

Techniques For Reading & Querying Data in Firestore Data Model

Now that we have a basic understanding of the Firestore Data model. Let us see how we can read and query the data from the collections in Firestore Data Model. For ease of explanation, we will be using python language to read and query the data. However, Firestore Data Model supports several programming languages like C, Java, Kotlin, Node.js, Go, PHP, etc.

To read and query data from Firestore Data Model, you first need to install dependencies and authenticate your application via credentials. Follow the below steps to set up the framework in Python – 

Please note that for Python version 3.7 and later, Firestore Data Model is broken at the point of writing; hence we will be using Python 3.6 for this blog post.

  • Install the dependency.
pip install firebase_admin
  • Authenticate using credentials. 
    • Navigate to your Firebase console
    • Click on Project Settings from the Project Overview.
Firestore Data Model | Reading & Querying Data in Firestore Data Model
Image Source
  • Click on the service account and click on Generate new private key. Download the file and store it on your local machine. Name the file as – accountkey.json (or your preferred name)
  • Add the below code to your python script.
import firebase_admin
from firebase_admin import credentials, firestore

cred = credentials.Certificate("path/to/accountKey.json")
  • Now that the firebase admin is initialized, connect to the Firestore Data Model.
db = firestore.client() 

Now we will use the above created Firestore Data Model client to interact with documents and collections.


To add documents in Cloud Firestore: 

cities_collection = db.collection(u’cities’)

res = cities_collection.document(U’BJ’).set({
‘name’ : ‘Beijing’,
‘country’ : ‘China’,
‘population’ : 215000000


To query data from Document:

doc = db.collection(u’cities’).docuemnt(u’BJ’)

Output - 

‘name’ : ‘Beijing’,
‘country’ : ‘China’,
‘population’ : 215000000


To see all the data from the collection, execute the below Python command – 

users_ref = db.collection(u'users')


To get all the data from subcollection, execute the below python command:

room_a_ref = db.collection(u'rooms').document(u'roomA')
message_ref = room_a_ref.collection(u'messages').document(u'message1')

Normalization & Denormalization

Normalization is a technique used in Firebase to reduce data redundancy. Data redundancy means the repetition of information in a table, and normalization is the process that removes this redundancy.

The below example shows how the students and their attendance are normalized thereby removing the repetitive information:

    "students": {
        "students1": {
            "name": "john thomas"
    "attendance": {
        "students1": {
            "attendance1": {
                "total": "20",
                "absents": {
                    "leave1": "medical emergency",
                    "leave2": "not verified"
            "attendance2": {
                "total": "18",
                "absents": {
                    "leave1": "sports game",
                    "leave2": "verified"

Denormalization is the technique to add redundancy to the data by repeating the information within the data. Denormalization is applied when there is a requirement to maintain history, improve query performance(Denormalized form provides results with all the fields at one go), and speed up the reporting.

All of the capabilities, none of the firefighting  -:

Using manual scripts and custom code to move data into the warehouse is cumbersome. Frequent breakages, pipeline errors, and lack of data flow monitoring makes scaling such a system a nightmare. Hevo’s reliable data pipeline platform enables you to set up zero-code and zero-maintenance data pipelines that just work.

Reliability at Scale – With Hevo, you get a world-class fault-tolerant architecture that scales with zero data loss and low latency. 

Monitoring and Observability – Monitor pipeline health with intuitive dashboards that reveal every stat of pipeline and data flow. Bring real-time visibility into your ELT with Alerts and Activity Logs 

Stay in Total Control – When automation isn’t enough, Hevo offers flexibility – data ingestion modes, ingestion, and load frequency, JSON parsing, destination workbench, custom schema management, and much more – for you to have total control.    

Auto-Schema Management – Correcting improper schema after the data is loaded into your warehouse is challenging. Hevo automatically maps source schema with destination warehouse so that you don’t face the pain of schema errors.

24×7 Customer Support – With Hevo you get more than just a platform, you get a partner for your pipelines. Discover peace with round-the-clock “Live Chat” within the platform. What’s more, you get 24×7 support even during the 14-day full-featured free trial.

Transparent Pricing – Say goodbye to complex and hidden pricing models. Hevo’s Transparent Pricing brings complete visibility to your ELT spend. Choose a plan based on your business needs. Stay in control with spend alerts and configurable credit limits for unforeseen spikes in the data flow. 

Get started for Free with Hevo!

Get Started for Free with Hevo’s 14-day Free Trial.

Security In Firestore

In any cloud system, security is of utmost importance. Firestore Data Model provides various security rules that allow users to control access to documents and collections. 

Security in Cloud Firestore Data Model checks for all incoming requests and validates them against the criteria or rules, and those request which doesn’t align with the defined rules are simply rejected.

Security rules in Firestore Data Model provide access control and data validation in a simple and expressive format. To apply security rules, you need to write in below format:

service cloud.firestore {
  match /databases/{database}/documents {
    match /{document=**} {
      allow read, write: if true;

Let’s understand what these parameters mean – 

  • service cloud. firestore: This parameter defines the service, in which case it is the cloud.firestore.
  • match /databases/{database}/documents: This parameter defines the database.
  • match /uploads/{document=**}: Creates a new rule block to apply to the uploaded archive and all documents contained in it.
  • allow read: It allows the public to read access.
  • allow write: This parameter allows public write access


In this blog post, we have discussed Cloud Firestore and Firestore Data Model in detail. We have also discussed Normalization & Denormalization along with Security in Firestore Data Model.

On that more, with the complexity involved in manual and cumbersome processes, businesses today are leaning toward Automated data modeling practices. It is hassle-free, easy to operate, and does not require any technical background. In such a case you can also explore more of Hevo Data for ETL use cases. Hevo Data supports 150+ data sources.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of working with the Firestore Data Model in the comments section below!

No Code Data Pipeline For Your Data Warehouse