Understanding MongoDB Data Modeling: A Comprehensive Guide

on Data Modeling, Guide, MongoDB • February 10th, 2022 • Write for Hevo

Mongodb Data Modeling cover

MongoDB is a cross-platform document-oriented database program. Although classified as a schema-less database program, MongoDB leverages JSON-like document structure; hence a data model exists. Data modeling, in general, has diverse components which require active participation from multiple stakeholders; it’s the responsibility of developers to know the answers to the following four questions:

  1. What information is needed to be stored?
  2. What document is likely to get accessed together?
  3. How often will the document be accessed? and
  4. How fast will the data grow?

So, the answers to these questions will guide the team to build competent and agile data models to cater to the needs of 21st-century organizations which excel in seamless information exchange environments.

This blog post will explain how MongoDB data modeling works. What all are the components involved in the data modeling process and the schema used in MongoDB to build the necessary processes. Let’s begin.

Table of Contents

  1. What is Data Modeling?
  2. Embedded & Normalized MongoDB Data Modeling
  3. Defining Relationships in MongoDB Data Modeling
  4. MongoDB Data Modeling Schema
  5. Conclusion

What is Data Modeling?

What is Data Modeling? | MongoDB Data Modeling

Data modeling is the blueprint on which a full-fledged database system is developed. The primary function of a data model is to facilitate visual information of how the relationship between two or a group of data points would look. The layout/design would then prove vital in maintaining petabyte-scale data repositories to store data from across business functions and teams — from sales to marketing and beyond.

The process of ideating a data model is always continuous and evolving, requires multiple feedback loops, and direct connect with the stakeholders to incorporate new data models or reiterate definitions on an existing one.

To develop competent data models, formalized schemas and techniques are employed to ensure a standard, consistent, and predictable way to run business processes and strategize data resources in an organization.

On the basis of the level of details or specificity, data models for a database system can be conceptualized into three categories: Conceptual data models, Logical data models, and Physical data models. 

Let’s learn about them briefly.

  • Conceptual Data Models: Conceptual Data Models can be described as rough drawings offering the big picture, answering where the data/information from across business functions in the database system would get stored and relationships they will be entangled with. A conceptual data model typically contains the entity class, characteristics, constraints, and the relationship between security and data integrity requirements.
  • Logical Data Models: Logical data models provide more profound, more subjective information on the relationships between data sets. At this stage, we can clearly relate to what data types and relations are used. Logical data models are usually omitted in agile business environments but act helpful in projects that are data data-oriented and require high procedure implementation.
  • Physical Data Models: Physical data model provides a schema/layout for the data storing rituals within a database. A physical data model offers a finalized proposition that can be implemented in a relational database. 

Embedded & Normalized Data Models in MongoDB Data Modeling

MongoDB Data Modeling | Embedded & Normalized Data Models
Embedded versus normalized data models

When data professionals start building data models in MongoDB, they fall upon the choice to either embed the information or to have it separately in a collection of documents. Hence two concepts exist for efficient MongoDB Data Modeling:

  1. The Embedded Data Model, and
  2. The Normalized Data Model.

Embedded Data Model

Embedded data modeling in MongoDB Data Modeling — a denormalized data model — is applied when two data sets contain a relationship. Hence an embedded data model sets relationships between data elements, keeping documents in a single document structure. You can save information in an array or in a field, depending upon the requirements. 

Normalized Data Model

In a normalized data model, object references are used to model relationships between data elements/documents. This model reduces duplication of data; hence many-to-many relationships can be documented without the duplication of content fairly easily. Normalized data models are best for modeling large hierarchical datasets, referencing across collections.

Simplify MongoDB ETL Using Hevo’s No-code Data Pipeline!

Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up Data Integration for 100+ Data Sources (Including 40+ Free sources) and will let you directly load data from sources like MongoDB to a Data Warehouse or the Destination of your choice.

Hevo will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data. 

Get Started with Hevo for Free

Let’s look at some of the salient features of Hevo:

  • Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Connectors: Hevo supports 100+ Integrations to SaaS platforms such as WordPress, FTP/SFTP, Files, Databases, BI tools, and Native REST API & Webhooks Connectors. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake, Firebolt, Data Warehouses; Amazon S3 Data Lakes; Databricks, MySQL, SQL Server, TokuDB, DynamoDB, MongoDB PostgreSQL Databases to name a few.  
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within Data Pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

Defining Relationships in MongoDB Data Modeling

Defining relationships for your schema in your MongoDB data modeling project is the most important consideration. These relationships define how data will get used by your system. These are three central relationships defined in MongoDB Data Modeling: One-to-one, one-to-many, and Many-to-many.

One-to-one Relationship

A great example of this relationship would be your name. Because one user can have only one name. One-to-one data can be modeled as the key-value pairs in your database. Look at the example given below:

{
    "_id": "ObjectId('AAA')",
    "name": "Joe Karlsson",
    "company": "MongoDB",
    "twitter": "@JoeKarlsson1",
    "twitch": "joe_karlsson",
    "tiktok": "joekarlsson",
    "website": "joekarlsson.com"
}

One-to-many Relationship

A great example of this would be to imagine if you are building a page for an e-commerce site with a schema that shows product information. Hence, in the system, we save information for many elements that form one project. The schema will potentially save thousands of subparts and relationships. Let’s take a look at its works:

{
    "_id": "ObjectId('AAA')",
    "name": "Joe Karlsson",
    "company": "MongoDB",
    "twitter": "@JoeKarlsson1",
    "twitch": "joe_karlsson",
    "tiktok": "joekarlsson",
    "website": "joekarlsson.com",
    "addresses": [
        { "street": "123 Sesame St", "city": "Anytown", "cc": "USA" },  
        { "street": "123 Avenue Q",  "city": "New York", "cc": "USA" }
    ]
}

Many-to-many Relationship

To better understand many-to-many relationships, try imaging a to-do application. In the application, a user might have many tasks, and to a task multiple users assigned. Hence, to preserve relationships between users and tasks, reference will exist between one user to many tasks and one task too many users. Let’s see with the help of the below-given example given:

Users:

{
    "_id": ObjectID("AAF1"),
    "name": "Kate Monster",
    "tasks": [ObjectID("ADF9"), ObjectID("AE02"), ObjectID("AE73")]
}

Tasks:

{
    "_id": ObjectID("ADF9"),
    "description": "Write blog post about MongoDB schema design",
    "due_date": ISODate("2014-04-01"),
    "owners": [ObjectID("AAF1"), ObjectID("BB3G")]
}

MongoDB Data Modeling Schema

MongoDB Data modeling, by default, has a flexible scheme that is not identical for all documents. This can be referred to as a paradigm shift in how we view data in conformity in tables from an SQL point of view where all rows and columns are defined to have a fixed data type.

What is a flexible schema?

In a flexible schema model, it’s unnecessary to define data type in a specific field as a field can differ across documents. Flexible schema proves advantageous when adding, removing, or changing new areas to an existing table, even updating documents to a new structure.

Let’s explain with an example. In the below-given example, there are two documents in the same collection:

{ "_id" : ObjectId("5b98bfe7e8b9ab9875e4c80c"),
     "StudentName" : "George  Beckonn",
     "ParentPhone" : 75646344,
     "age" : 10
}
{ "_id" : ObjectId("5b98bfe7e8b9ab98757e8b9a"),
     "StudentName" : "Fredrick  Wesonga",
     "ParentPhone" : false,
}

In the first set, we have the field ‘age’ but in the second set, we don’t have that field. Furthermore, the data type for field ‘ParentPhone’ in the first set is set to numerical, whereas in the second set, it’s set to ‘False,’ which is a boolean type data set. 

What is Rigid Schema?

In a rigid schema, all documents in a collection share a similar structure, giving you a better chance while setting up some new document validation rules to enhance data integrity during insert and update options. Some examples of rigid schema data types are as follows: String, number, boolean, date, buffer, objectld, array, mixed, deciman128, map.

Below-given example shows what a sample schema looks like:

var userSchema = new mongoose.Schema({
    userId: Number,
    Email: String,
    Birthday: Date,
    Adult: Boolean,
    Binary: Buffer,
    height: Schema.Types.Decimal128,
    units: []
   });

Its example use case is as follows:

var user = mongoose.model(‘Users’, userSchema )
var newUser = new user;
newUser.userId = 1;
newUser.Email = “example@gmail.com”;
newUser.Birthday = new Date;
newUser.Adult = false;
newUser.Binary = Buffer.alloc(0);
newUser.height = 12.45;
newUser.units = [‘Circuit network Theory’, ‘Algerbra’, ‘Calculus’];
newUser.save(callbackfunction);

What is Schema Validation?

Schema validation proves vital when validating data from the server’s end. There exist some schema validation rules to achieve the same. The validation rules are applied operations related to insertion and deletion. The rules can also be added to an existing collection using the ‘collMod’ command. The updates will not get applied to an existing document unless an update is applied to them.

The validator command can be issued when creating a new collection using the ‘dv.createCollection()’ command. From MongoDB version 3.6 and onwards, MongoDB supports JSON Schema, and hence you are required to use the ‘$jsonSchema’ operator.

db.createCollection("students", {
   validator: {$jsonSchema: {
         bsonType: "object",
         required: [ "name", "year", "major", "gpa" ],
         properties: {
            name: {
               bsonType: "string",
               description: "must be a string and is required"
            },
            gender: {
               bsonType: "string",
               description: "must be a string and is not required"
            },
            year: {
               bsonType: "int",
               minimum: 2017,
               maximum: 3017,
               exclusiveMaximum: false,
               description: "must be an integer in [ 2017, 2020 ] and is required"
            },
            major: {
               enum: [ "Math", "English", "Computer Science", "History", null ],
               description: "can only be one of the enum values and is required"
            },
            gpa: {
               bsonType: [ "double" ],
               minimum: 0,
               description: "must be a double and is required"
            }
         }
       
   }})

To insert a new document into the schema, follow the below-given example:

db.students.insert({
   name: "James Karanja",
   year: NumberInt(2016),
   major: "History",
   gpa: NumberInt(3)
})

An error will occur due to the callback function because of some violated validation rules as the supplied year is not within the specified limit.

WriteResult({
   "nInserted" : 0,
   "writeError" : {
      "code" : 121,
      "errmsg" : "Document failed validation"
   }
})

Except for the $where, $text, near, and $nearSphere operators, you can add query expressions to the validation option.

db.createCollection( "contacts",
   { validator: { $or:
      [
         { phone: { $type: "string" } },
         { email: { $regex: /@mongodb.com$/ } },
         { status: { $in: [ "Unknown", "Incomplete" ] } }
      ]
   }
} )

Schema Validation Levels in MongoDB Data Modeling

In general, validations are issued to the write operations. But, they can be applied to already existing documents. There exist three levels of validation:

  • Strict: Validation rules are applied to all inserts and updates.
  • Moderate: Validation rules are applied to only those existing documents — during inserts and updates — that fulfill the validation criteria.
  • Off: Validations are off; hence no validation criteria is applied to any document.

For example, let’s insert the data below in a a ‘client’ collection.

db.clients.insert([
{
    "_id" : 1,
    "name" : "Brillian",
    "phone" : "+1 778 574 666",
    "city" : "Beijing",
    "status" : "Married"
},
{
    "_id" : 2,
    "name" : "James",
    "city" : "Peninsula"
}
]

After applying the moderate validation level using:

db.runCommand( {
   collMod: "test",
   validator: { $jsonSchema: {
      bsonType: "object",
      required: [ "phone", "name" ],
      properties: {
         phone: {
            bsonType: "string",
            description: "must be a string and is required"
         },
         name: {
            bsonType: "string",
            description: "must be a string and is required"
         }
      }
   } },
   validationLevel: "moderate"
} )

Hence, the validation rules will only be applied to the document with the ‘_id’ of 1, since it matches the criteria. In the second document, the validation criteria were not met; hence it will not get validated.

Schema Validation Actions

Schema validation actions apply to those documents that violate the validation criteria in the first place. Hence, there exist the need to provide actions when that happens. MongoDB provides two actions for the same: Error and Warn.

Error: This action rejects insert or update if the validation criteria are not met.

Warn: Warn action will record every violation in the MongoDB log and allow insert or update operator to be completed. For example:

db.createCollection("students", {
   validator: {$jsonSchema: {
         bsonType: "object",
         required: [ "name", "gpa" ],
         properties: {
            name: {
               bsonType: "string",
               description: "must be a string and is required"
            },
       
            gpa: {
               bsonType: [ "double" ],
               minimum: 0,
               description: "must be a double and is required"
            }
         }
       
   },
validationAction: “warn”
})

If we insert a document like this:

db.students.insert( { name: "Amanda", status: "Updated" } );

The gpa field is missing, but regardless of this fact, as the validation is set out to ‘warn,’ the document will be saved, an error message will be recorded in the MongoDB log.

MongoDB Data Modeling Schema Design Patterns

There exist 12 patterns in the MongoDB Data Modeling Schema Design. Let’s discuss them briefly.

MongoDB Data Modeling | Schema Design Pattern
Main schema design patterns and their use cases
  • Approximation: Few writes and calculations are done by saving only approximate values.
  • Attribute: On large documents, index and query only on a subset of fields.
  • Bucket: When streaming data or using IoT applications, the bucket values reduce the number of documents. And the Pre-aggregation simplifies data access.
  • Computed: By doing reads at writes or at regular intervals, MongoDB avoids repeated computations.
  • Document Versioning: Document versioning allows different versions of documents to coexist.
  • Extended Reference: We avoid many joins by embedding only frequently embedded fields.
  • Outlier: Data models and queries are designed for typical use cases and not influenced by outliers.
  • Pre-Allocation: when document structure is known in advance, pre-allocation reduces memory reallocation and improves performance. 
  • Polymorphic: Polymorphic is useful when similar documents don’t have the same structure.
  • Schema Versioning: Schema is useful when schema evolves during the application’s lifetime and avoids downtime and technical debt.
  • Subset: A subset is useful when the application uses only some data. Because, a smaller dataset will fit into RAM and improve performance.
  • Tree: The tree pattern is suited for hierarchical data. The application needs to manage updates to the graph.

Conclusion

In this blog post, we discussed MongoDB Data Modeling and its components in detail. We started discussing data modeling then got to know about various types of data models and relationships that exist, and are required to be known if working on a MongoDB Data Modeling project. And, if you want to know more about MongoDB Data Modeling, either of these two articles can help:

  1. MongoDB Schema Design Best Practices
  2. Data Modeling Introduction

Although the process for MongoDB data modeling is highly documented, it’s still hard for non-technical users to understand. It’s where no-code data pipeline-as-a-service products like Hevo Data can help!

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources like MongoDB and the Spring Boot MongoDB Configuration, with a few clicks.

Hevo Data with its strong integration with 100+ sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

No Code Data Pipeline For MongoDB