Understanding MongoDB Joins | 5 Critical Aspects

By: Published: February 8, 2022

MongoDB Joins: FI

MongoDB Joins are different from how Joins work on SQL. MongoDB is an open-source NoSQL Database used to store high volumes of data. In traditional databases, we use tables and rows but MongoDB uses Collections and Documents. Documents use key-value pairs which is the basic unit in MongoDB. Collections are a set of documents and functions which is the equivalent of tables in Relational Databases. 

This blog talks about how to use a view to join two collections MongoDB Joins, the key types of Joins and Lookup commands.

Table of Contents

What is MongoDB?

MongoDB is an open-source NoSQL Database used to store high volumes of data and work with it efficiently. It is categorized as NoSQL because the storage and retrieval of data are not in the form of tables and rows. It is in the form of Documents and Collections.

The MongoDB Database was initially released in February 2009 and is managed by MongoDB.inc. It also supports all the popular languages like C, C++, C#, DotNet, Java, Node.js, Ruby etc. There are a lot of companies that use MongoDB like Facebook, Nokia, Google etc. to store their large amounts of data. Although you might think that it is impossible to perform MongoDB Joins because it is a Non Relational Database, there are a few ways to do that. These are discussed in detail later in the blog.

Key Features of MongoDB

  • Every database contains Collections which in turn contain Documents. Each Document is different from the others w.r.t content and number of fields.
  • The Document structure is based on how the developers construct their classes and objects. The classes are more like key-value pairs instead of rows and columns.
  • The Documents do not need to have a fixed schema beforehand, fields can be created instantly.
  • The data model available lets you represent hierarchical relationships that facilitate storing data like arrays and other complex structures.
  • MongoDB is very scalable. Companies around the world have clusters with millions of Documents within the database.

Key Components of MongoDB Architecture

MongoDB Joins: MongoDB Architecture
Image Source
  • _id: This is like documents’ primary key. It is a requirement for every document. It is unique for each document. If you create a table without an _id field then MongoDB will automatically create one.
  • Collection: A collection is a group of documents. It is the equivalent of tables in any other Relational Database Management system like Oracle or MySQL. They exist within a single database and do not have a fixed structure as such.
  • Cursor: It is a pointer to the result. We can iterate through the cursor to get results.
  • Database: It is a container for Collections. Each database gets its own set of files on the system. The MongoDB server can contain multiple databases.
  • Document: A record is called a Document. They contain field names and values.
  • JSON: JavaScript Object Notation. It is a readable format for expressing structured data.
  • Field: It is a name-value pair. Every Document contains zero or more fields. They are similar to columns in RDBMS.

Here’s an example of Key-value Pairs:

{ 
Customer_id : 11
Customer_name: Sharon
Order_id: 20
}
Simplify your MongoDB ETL with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo helps you integrate data from 150+ data sources (including 40+ Free Data Sources) like MongoDB to a destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources provides users with the flexibility to bring in data of different kinds, in a smooth fashion without having to code a single line. 

GET STARTED WITH HEVO FOR FREE

Check Out Some of the Cool Features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 150+ Integrations to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Amazon Redshift, Firebolt, Snowflake Data Warehouses; Databricks, Amazon S3 Data Lakes; MongoDB, MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL databases to name a few.  
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 150+ sources, that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
GET STARTED WITH HEVO FOR FREE

What is MongoDB Lookup Aggregation?

One of the major differences between SQL and NoSQL Databases is when it comes to the JOIN clause. In Relational Databases, the JOIN clause lets you combine two or more rows from tables using a common field. But in MongoDB, databases are designed to store denormalized data. Fortunately, MongoDB Joins can be performed in MongoDB 3.2 as it introduces a new Lookup operation that can perform Join operations on Collections.

Definition

MongoDB Joins are performed by Lookup. It performs a Left Outer Join to two or more Collections. But Lookup is only permitted in Aggregate operations. This is like a pipeline that performs query, filter and group operations. The output of one is used as input for the next operation.

MongoDB Joins Syntax

The following section talks about the syntax of different conditional clauses for MongoDB Joins:

  • MongoDB Joins: Equality Match with a Single Join Condition
  • MongoDB Joins: Conditions and SubQueries on Joined Conditions
  • MongoDB Joins: Correlated SubQueries using Concise Syntax

MongoDB Joins: Equality Match with a Single Join Condition

This clause performs an equality match between fields from the input documents and the joined collection. Syntax:

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

This would correspond to the following SQL code:

SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
   SELECT *
   FROM <collection to join>
   WHERE <foreignField> = <collection.localField>
);

Examples:

MongoDB Joins: Conditions and Subqueries on Joined Collections

MongoDB 3.6 has added support to:

  • Execute a pipeline on a joined Collection.
  • Perform multiple join collections.
  • Correlated and uncorrelated queries.

To perform other join operations other than Equality Join, you can use this Lookup syntax:

{
   $lookup:
      {
         from: <joined collection>,
         let: { <var_1>: <expression>, …, <var_n>: <expression> },
         pipeline: [ <pipeline to run on joined collection> ],
         as: <output array field>
      }
}

This is equivalent to to the SQL code below:

SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
   SELECT <documents as determined from the pipeline>
   FROM <collection to join>
   WHERE <pipeline>
);

Examples:

MongoDB Joins: Correlated SubQueries using Concise Syntax

From MongoDB 5.0 onwards, you can use a concise syntax for correlated subqueries. The following syntax removes the requirement for an equality match on the local and foreign fields.

{
   $lookup:
      {
         from: <foreign collection>,
         localField: <field from local collection's documents>,
         foreignField: <field from foreign collection's documents>,
         let: { <var_1>: <expression>, …, <var_n>: <expression> },
         pipeline: [ <pipeline to run> ],
         as: <output array field>
      }
}

The SQL code below is equivalent to the above syntax:

SELECT *, <output array field>
FROM localCollection
WHERE <output array field> IN (
   SELECT <documents as determined from the pipeline>
   FROM <foreignCollection>
   WHERE <foreignCollection.foreignField> = <localCollection.localField>
   AND <pipeline match condition>
);

Example:

MongoDB Join Restrictions

There are a few restrictions or limitations while using MongoDB Joins:

  • You cannot use the $out or $merge stage in the $lookup stage before version 4.2 .
  • The From Collection cannot be sharded in the $lookup stage but the Collection you run $aggregate on can be sharded.

MongoDB Join Examples

Here are a few examples of the different types of MongoDB Joins we have seen above:

Single Equality Join

There are two Collections “orders” and “inventory”. Below you can see the code snippet:

db.orders.insertMany( [
   { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
   { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 },
   { "_id" : 3  }
] )
db.inventory.insertMany( [
   { "_id" : 1, "sku" : "almonds", "description": "product 1", "instock" : 120 },
   { "_id" : 2, "sku" : "bread", "description": "product 2", "instock" : 80 },
   { "_id" : 3, "sku" : "cashews", "description": "product 3", "instock" : 60 },
   { "_id" : 4, "sku" : "pecans", "description": "product 4", "instock" : 70 },
   { "_id" : 5, "sku": null, "description": "Incomplete" },
   { "_id" : 6 }
] )

The following Aggregation operations perform Join operation on both these Collections and Join them using fields “item” from “orders” and “sku” from “inventory”. The code snippet is below:

db.orders.aggregate( [
   {
     $lookup:
       {
         from: "inventory",
         localField: "item",
         foreignField: "sku",
         as: "inventory_docs"
       }
  }
] )

The result is as follows:

{
   "_id" : 1,
   "item" : "almonds",
   "price" : 12,
   "quantity" : 2,
   "inventory_docs" : [
      { "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 }
   ]
}
{
   "_id" : 2,
   "item" : "pecans",
   "price" : 20,
   "quantity" : 1,
   "inventory_docs" : [
      { "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 }
   ]
}
{
   "_id" : 3,
   "inventory_docs" : [
      { "_id" : 5, "sku" : null, "description" : "Incomplete" },
      { "_id" : 6 }
   ]
}

Using Lookup on an Array

You can match the array elements with a foreign field. For example, there are two Collections “classes” and “members” with the following Documents:

db.classes.insertMany( [
   { _id: 1, title: "Reading is ...", enrollmentlist: [ "giraffe2", "pandabear", "artie" ], days: ["M", "W", "F"] },
   { _id: 2, title: "But Writing ...", enrollmentlist: [ "giraffe1", "artie" ], days: ["T", "F"] }
] )
db.members.insertMany( [
   { _id: 1, name: "artie", joined: new Date("2016-05-01"), status: "A" },
   { _id: 2, name: "giraffe", joined: new Date("2017-05-01"), status: "D" },
   { _id: 3, name: "giraffe1", joined: new Date("2017-10-01"), status: "A" },
   { _id: 4, name: "panda", joined: new Date("2018-10-11"), status: "A" },
   { _id: 5, name: "pandabear", joined: new Date("2018-12-01"), status: "A" },
   { _id: 6, name: "giraffe2", joined: new Date("2018-12-01"), status: "D" }
] )

The following operation joins “members” and “classes” on members field to name field:

db.classes.aggregate( [
   {
      $lookup:
         {
            from: "members",
            localField: "enrollmentlist",
            foreignField: "name",
            as: "enrollee_info"
        }
   }
] )

The output will be as follows:

{
   "_id" : 1,
   "title" : "Reading is ...",
   "enrollmentlist" : [ "giraffe2", "pandabear", "artie" ],
   "days" : [ "M", "W", "F" ],
   "enrollee_info" : [
      { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
      { "_id" : 5, "name" : "pandabear", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "A" },
      { "_id" : 6, "name" : "giraffe2", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "D" }
   ]
}
{
   "_id" : 2,
   "title" : "But Writing ...",
   "enrollmentlist" : [ "giraffe1", "artie" ],
   "days" : [ "T", "F" ],
   "enrollee_info" : [
      { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
      { "_id" : 3, "name" : "giraffe1", "joined" : ISODate("2017-10-01T00:00:00Z"), "status" : "A" }
   ]
}

Multiple Joins and Correlated SubQuery

Create a Collection “order’ and “warehouses” as follows:

db.orders.insertMany( [
  { "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 },
  { "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 },
  { "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 }
] )
db.warehouses.insertMany( [
  { "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 },
  { "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 },
  { "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 },
  { "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 },
  { "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 }
] )

You can use a Correlated SubQuery that performs join on “orders.item” and “warehouse.stock_item” fields as follows:

db.orders.aggregate( [
   {
      $lookup:
         {
           from: "warehouses",
           let: { order_item: "$item", order_qty: "$ordered" },
           pipeline: [
              { $match:
                 { $expr:
                    { $and:
                       [
                         { $eq: [ "$stock_item",  "$$order_item" ] },
                         { $gte: [ "$instock", "$$order_qty" ] }
                       ]
                    }
                 }
              },
              { $project: { stock_item: 0, _id: 0 } }
           ],
           as: "stockdata"
         }
    }
] )

The output for the aforementioned code snippet would be:

{
  _id: 1,
  item: 'almonds',
  price: 12,
  ordered: 2,
  stockdata: [
    { warehouse: 'A', instock: 120 },
    { warehouse: 'B', instock: 60 }
  ]
},
{
  _id: 2,
  item: 'pecans',
  price: 20,
  ordered: 1,
  stockdata: [ { warehouse: 'A', instock: 80 } ]
},
{
  _id: 3,
  item: 'cookies',
  price: 10,
  ordered: 60,
  stockdata: [ { warehouse: 'A', instock: 80 } ]
}

Concise Correlated SubQuery

Create Collections “restaurants” and “orders” with the following Documents:

db.restaurants.insertMany( [
   {
      _id: 1,
      name: "American Steak House",
      food: [ "filet", "sirloin" ],
      beverages: [ "beer", "wine" ]
   },
   {
      _id: 2,
      name: "Honest John Pizza",
      food: [ "cheese pizza", "pepperoni pizza" ],
      beverages: [ "soda" ]
   }
] )
db.orders.insertMany( [
   {
      _id: 1,
      item: "filet",
      restaurant_name: "American Steak House"
   },
   {
      _id: 2,
      item: "cheese pizza",
      restaurant_name: "Honest John Pizza",
      drink: "lemonade"
   },
   {
      _id: 3,
      item: "cheese pizza",
      restaurant_name: "Honest John Pizza",
      drink: "soda"
   }
] )

Join the above two Collections on orders.restaurant_name and restaurants.name and perform an $in array match between orders.drink and beverages as follows:

db.orders.aggregate( [
   {
      $lookup: {
         from: "restaurants",
         localField: "restaurant_name",
         foreignField: "name",
         let: { orders_drink: "$drink" },
         pipeline: [ {
            $match: {
               $expr: { $in: [ "$$orders_drink", "$beverages" ] }
            }
         } ],
         as: "matches"
      }
   }
] )

The output would be:

{
   "_id" : 1, "item" : "filet",
   "restaurant_name" : "American Steak House",
   "matches" : [ ]
}
{
   "_id" : 2, "item" : "cheese pizza",
   "restaurant_name" : "Honest John Pizza",
   "drink" : "lemonade",
   "matches" : [ ]
}
{
   "_id" : 3, "item" : "cheese pizza",
   "restaurant_name" : "Honest John Pizza",
   "drink" : "soda",
   "matches" : [ {
      "_id" : 2, "name" : "Honest John Pizza",
      "food" : [ "cheese pizza", "pepperoni pizza" ],
      "beverages" : [ "soda" ]
   } ]
}

Conclusion

This blog discusses the features of MongoDB Joins and the types that we can perform on the data. MongoDB $lookup is very useful and powerful but it requires complex aggregate queries.

Extracting complex data from a diverse set of data sources to carry out an insightful analysis can be challenging, and this is where Hevo saves the day! Hevo offers a faster way to move data from Databases or SaaS applications like MongoDB into your Data Warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.

GET STARTED WITH HEVO FOR FREE

If you are using MongoDB as a source of data and searching for a no-fuss alternative to Manual Data Integration, then Hevo can effortlessly automate this for you. Hevo, with its strong integration with 150+ Data Sources & BI tools (Including 40+ Free Sources), allows you to not only export & load data but also transform & enrich your data & make it analysis-ready in a jiffy.

Want to take Hevo for a ride? SIGN UP for a 14-day free trial and simplify your Data Integration process. Do check out the Hevo Pricing details to understand which plan fulfills all your business needs.

mm
Former Content Writer, Hevo Data

Sharon is a data science enthusiast with a passion for data, software architecture, and writing technical content. She has experience writing articles on diverse topics related to data integration and infrastructure.

No-Code Data Pipeline for MongoDB