Understanding MongoDB Joins | 5 Critical Aspects

|

MongoDB Joins: FI

MongoDB Joins are different from how Joins work on SQL. MongoDB is an open-source NoSQL Database used to store high volumes of data. In traditional databases, we use tables and rows but MongoDB uses Collections and Documents. Documents use key-value pairs which is the basic unit in MongoDB. Collections are a set of documents and functions which is the equivalent of tables in Relational Databases. 

This blog talks about how to use a view to join two collections MongoDB Joins, the key types of Joins and Lookup commands.

What is MongoDB?

MongoDB is an open-source NoSQL Database used to store high volumes of data and work with it efficiently. It is categorized as NoSQL because the storage and retrieval of data are not in the form of tables and rows. It is in the form of Documents and Collections.

The MongoDB Database was initially released in February 2009 and is managed by MongoDB.inc. It also supports all the popular languages like C, C++, C#, DotNet, Java, Node.js, Ruby etc. There are a lot of companies that use MongoDB like Facebook, Nokia, Google etc. to store their large amounts of data. Although you might think that it is impossible to perform MongoDB Joins because it is a Non Relational Database, there are a few ways to do that. These are discussed in detail later in the blog.

Key Features of MongoDB

  • Every database contains Collections which in turn contain Documents. Each Document is different from the others w.r.t content and number of fields.
  • The Document structure is based on how the developers construct their classes and objects. The classes are more like key-value pairs instead of rows and columns.
  • The Documents do not need to have a fixed schema beforehand, fields can be created instantly.
  • The data model available lets you represent hierarchical relationships that facilitate storing data like arrays and other complex structures.
  • MongoDB is very scalable. Companies around the world have clusters with millions of Documents within the database.

Key Components of MongoDB Architecture

MongoDB Joins: MongoDB Architecture
Image Source
  • _id: This is like documents’ primary key. It is a requirement for every document. It is unique for each document. If you create a table without an _id field then MongoDB will automatically create one.
  • Collection: A collection is a group of documents. It is the equivalent of tables in any other Relational Database Management system like Oracle or MySQL. They exist within a single database and do not have a fixed structure as such.
  • Cursor: It is a pointer to the result. We can iterate through the cursor to get results.
  • Database: It is a container for Collections. Each database gets its own set of files on the system. The MongoDB server can contain multiple databases.
  • Document: A record is called a Document. They contain field names and values.
  • JSON: JavaScript Object Notation. It is a readable format for expressing structured data.
  • Field: It is a name-value pair. Every Document contains zero or more fields. They are similar to columns in RDBMS.

Here’s an example of Key-value Pairs:

{ 
Customer_id : 11
Customer_name: Sharon
Order_id: 20
}
Simplify your MongoDB ETL with Hevo’s No-code Data Pipeline

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (50+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Sign Up or a 14-day free trial

What is MongoDB Lookup Aggregation?

One of the major differences between SQL and NoSQL Databases is when it comes to the JOIN clause. In Relational Databases, the JOIN clause lets you combine two or more rows from tables using a common field. But in MongoDB, databases are designed to store denormalized data. Fortunately, MongoDB Joins can be performed in MongoDB 3.2 as it introduces a new Lookup operation that can perform Join operations on Collections.

Definition

MongoDB Joins are performed by Lookup. It performs a Left Outer Join to two or more Collections. But Lookup is only permitted in Aggregate operations. This is like a pipeline that performs query, filter and group operations. The output of one is used as input for the next operation.

MongoDB Joins Syntax

The following section talks about the syntax of different conditional clauses for MongoDB Joins:

  • MongoDB Joins: Equality Match with a Single Join Condition
  • MongoDB Joins: Conditions and SubQueries on Joined Conditions
  • MongoDB Joins: Correlated SubQueries using Concise Syntax

MongoDB Joins: Equality Match with a Single Join Condition

This clause performs an equality match between fields from the input documents and the joined collection. Syntax:

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

This would correspond to the following SQL code:

SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
   SELECT *
   FROM <collection to join>
   WHERE <foreignField> = <collection.localField>
);

Examples:

MongoDB Joins: Conditions and Subqueries on Joined Collections

MongoDB 3.6 has added support to:

  • Execute a pipeline on a joined Collection.
  • Perform multiple join collections.
  • Correlated and uncorrelated queries.

To perform other join operations other than Equality Join, you can use this Lookup syntax:

{
   $lookup:
      {
         from: <joined collection>,
         let: { <var_1>: <expression>, …, <var_n>: <expression> },
         pipeline: [ <pipeline to run on joined collection> ],
         as: <output array field>
      }
}

This is equivalent to to the SQL code below:

SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
   SELECT <documents as determined from the pipeline>
   FROM <collection to join>
   WHERE <pipeline>
);

Examples:

MongoDB Joins: Correlated SubQueries using Concise Syntax

From MongoDB 5.0 onwards, you can use a concise syntax for correlated subqueries. The following syntax removes the requirement for an equality match on the local and foreign fields.

{
   $lookup:
      {
         from: <foreign collection>,
         localField: <field from local collection's documents>,
         foreignField: <field from foreign collection's documents>,
         let: { <var_1>: <expression>, …, <var_n>: <expression> },
         pipeline: [ <pipeline to run> ],
         as: <output array field>
      }
}

The SQL code below is equivalent to the above syntax:

SELECT *, <output array field>
FROM localCollection
WHERE <output array field> IN (
   SELECT <documents as determined from the pipeline>
   FROM <foreignCollection>
   WHERE <foreignCollection.foreignField> = <localCollection.localField>
   AND <pipeline match condition>
);

Example:

MongoDB Join Restrictions

There are a few restrictions or limitations while using MongoDB Joins:

  • You cannot use the $out or $merge stage in the $lookup stage before version 4.2 .
  • The From Collection cannot be sharded in the $lookup stage but the Collection you run $aggregate on can be sharded.

MongoDB Join Examples

Here are a few examples of the different types of MongoDB Joins we have seen above:

Single Equality Join

There are two Collections “orders” and “inventory”. Below you can see the code snippet:

db.orders.insertMany( [
   { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
   { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 },
   { "_id" : 3  }
] )
db.inventory.insertMany( [
   { "_id" : 1, "sku" : "almonds", "description": "product 1", "instock" : 120 },
   { "_id" : 2, "sku" : "bread", "description": "product 2", "instock" : 80 },
   { "_id" : 3, "sku" : "cashews", "description": "product 3", "instock" : 60 },
   { "_id" : 4, "sku" : "pecans", "description": "product 4", "instock" : 70 },
   { "_id" : 5, "sku": null, "description": "Incomplete" },
   { "_id" : 6 }
] )

The following Aggregation operations perform Join operation on both these Collections and Join them using fields “item” from “orders” and “sku” from “inventory”. The code snippet is below:

db.orders.aggregate( [
   {
     $lookup:
       {
         from: "inventory",
         localField: "item",
         foreignField: "sku",
         as: "inventory_docs"
       }
  }
] )

The result is as follows:

{
   "_id" : 1,
   "item" : "almonds",
   "price" : 12,
   "quantity" : 2,
   "inventory_docs" : [
      { "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 }
   ]
}
{
   "_id" : 2,
   "item" : "pecans",
   "price" : 20,
   "quantity" : 1,
   "inventory_docs" : [
      { "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 }
   ]
}
{
   "_id" : 3,
   "inventory_docs" : [
      { "_id" : 5, "sku" : null, "description" : "Incomplete" },
      { "_id" : 6 }
   ]
}

Using Lookup on an Array

You can match the array elements with a foreign field. For example, there are two Collections “classes” and “members” with the following Documents:

db.classes.insertMany( [
   { _id: 1, title: "Reading is ...", enrollmentlist: [ "giraffe2", "pandabear", "artie" ], days: ["M", "W", "F"] },
   { _id: 2, title: "But Writing ...", enrollmentlist: [ "giraffe1", "artie" ], days: ["T", "F"] }
] )
db.members.insertMany( [
   { _id: 1, name: "artie", joined: new Date("2016-05-01"), status: "A" },
   { _id: 2, name: "giraffe", joined: new Date("2017-05-01"), status: "D" },
   { _id: 3, name: "giraffe1", joined: new Date("2017-10-01"), status: "A" },
   { _id: 4, name: "panda", joined: new Date("2018-10-11"), status: "A" },
   { _id: 5, name: "pandabear", joined: new Date("2018-12-01"), status: "A" },
   { _id: 6, name: "giraffe2", joined: new Date("2018-12-01"), status: "D" }
] )

The following operation joins “members” and “classes” on members field to name field:

db.classes.aggregate( [
   {
      $lookup:
         {
            from: "members",
            localField: "enrollmentlist",
            foreignField: "name",
            as: "enrollee_info"
        }
   }
] )

The output will be as follows:

{
   "_id" : 1,
   "title" : "Reading is ...",
   "enrollmentlist" : [ "giraffe2", "pandabear", "artie" ],
   "days" : [ "M", "W", "F" ],
   "enrollee_info" : [
      { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
      { "_id" : 5, "name" : "pandabear", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "A" },
      { "_id" : 6, "name" : "giraffe2", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "D" }
   ]
}
{
   "_id" : 2,
   "title" : "But Writing ...",
   "enrollmentlist" : [ "giraffe1", "artie" ],
   "days" : [ "T", "F" ],
   "enrollee_info" : [
      { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
      { "_id" : 3, "name" : "giraffe1", "joined" : ISODate("2017-10-01T00:00:00Z"), "status" : "A" }
   ]
}

Multiple Joins and Correlated SubQuery

Create a Collection “order’ and “warehouses” as follows:

db.orders.insertMany( [
  { "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 },
  { "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 },
  { "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 }
] )
db.warehouses.insertMany( [
  { "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 },
  { "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 },
  { "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 },
  { "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 },
  { "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 }
] )

You can use a Correlated SubQuery that performs join on “orders.item” and “warehouse.stock_item” fields as follows:

db.orders.aggregate( [
   {
      $lookup:
         {
           from: "warehouses",
           let: { order_item: "$item", order_qty: "$ordered" },
           pipeline: [
              { $match:
                 { $expr:
                    { $and:
                       [
                         { $eq: [ "$stock_item",  "$$order_item" ] },
                         { $gte: [ "$instock", "$$order_qty" ] }
                       ]
                    }
                 }
              },
              { $project: { stock_item: 0, _id: 0 } }
           ],
           as: "stockdata"
         }
    }
] )

The output for the aforementioned code snippet would be:

{
  _id: 1,
  item: 'almonds',
  price: 12,
  ordered: 2,
  stockdata: [
    { warehouse: 'A', instock: 120 },
    { warehouse: 'B', instock: 60 }
  ]
},
{
  _id: 2,
  item: 'pecans',
  price: 20,
  ordered: 1,
  stockdata: [ { warehouse: 'A', instock: 80 } ]
},
{
  _id: 3,
  item: 'cookies',
  price: 10,
  ordered: 60,
  stockdata: [ { warehouse: 'A', instock: 80 } ]
}

Concise Correlated SubQuery

Create Collections “restaurants” and “orders” with the following Documents:

db.restaurants.insertMany( [
   {
      _id: 1,
      name: "American Steak House",
      food: [ "filet", "sirloin" ],
      beverages: [ "beer", "wine" ]
   },
   {
      _id: 2,
      name: "Honest John Pizza",
      food: [ "cheese pizza", "pepperoni pizza" ],
      beverages: [ "soda" ]
   }
] )
db.orders.insertMany( [
   {
      _id: 1,
      item: "filet",
      restaurant_name: "American Steak House"
   },
   {
      _id: 2,
      item: "cheese pizza",
      restaurant_name: "Honest John Pizza",
      drink: "lemonade"
   },
   {
      _id: 3,
      item: "cheese pizza",
      restaurant_name: "Honest John Pizza",
      drink: "soda"
   }
] )

Join the above two Collections on orders.restaurant_name and restaurants.name and perform an $in array match between orders.drink and beverages as follows:

db.orders.aggregate( [
   {
      $lookup: {
         from: "restaurants",
         localField: "restaurant_name",
         foreignField: "name",
         let: { orders_drink: "$drink" },
         pipeline: [ {
            $match: {
               $expr: { $in: [ "$$orders_drink", "$beverages" ] }
            }
         } ],
         as: "matches"
      }
   }
] )

The output would be:

{
   "_id" : 1, "item" : "filet",
   "restaurant_name" : "American Steak House",
   "matches" : [ ]
}
{
   "_id" : 2, "item" : "cheese pizza",
   "restaurant_name" : "Honest John Pizza",
   "drink" : "lemonade",
   "matches" : [ ]
}
{
   "_id" : 3, "item" : "cheese pizza",
   "restaurant_name" : "Honest John Pizza",
   "drink" : "soda",
   "matches" : [ {
      "_id" : 2, "name" : "Honest John Pizza",
      "food" : [ "cheese pizza", "pepperoni pizza" ],
      "beverages" : [ "soda" ]
   } ]
}

Additional Resources on Mongodb Joins

Conclusion

  • This blog discusses the features of MongoDB Joins and the types that we can perform on the data.
  • MongoDB $lookup is very useful and powerful but it requires complex aggregate queries.

Extracting complex data from a diverse set of data sources to carry out an insightful analysis can be challenging, and this is where Hevo saves the day!

Sharon Rithika
Content Writer, Hevo Data

Sharon is a data science enthusiast with a hands-on approach to data integration and infrastructure. She leverages her technical background in computer science and her experience as a Marketing Content Analyst at Hevo Data to create informative content that bridges the gap between technical concepts and practical applications. Sharon's passion lies in using data to solve real-world problems and empower others with data literacy.

No-Code Data Pipeline for MongoDB