MongoDB Join Two Collections Simplified: A Comprehensive Guide 101

on Database Management Systems, MongoDB • February 14th, 2022 • Write for Hevo

MongoDB Join Two Collections

There are several NoSQL databases out there, but MongoDB is the most commonly used, and it is available both as a Cloud Service and for Deployment on Self-Managed Systems. Even though MongoDB isn’t an RDBMS (Relational Database Management System) and doesn’t offer extensive join methods, you can use several other operators to join two collections in MongoDB. 

Upon a complete walkthrough of this article, you will learn in detail how MongoDB join two collections.

Table of Contents

What is MongoDB?

MongoDB Logo
Image Source

MongoDB is a non-relational Database Management System that is Open Source and Cross-Platform. MongoDB, which was founded in 2009, employs the Document-Oriented Database Model to organize data into documents and collections rather than tables. This enables it to store various types of data. Furthermore, it is distributed under the Server Side Public License (SSPL), which supports a novel mechanism for storing and retrieving large amounts of data. MongoDB is a straightforward and simple-to-configure database that provides high performance, automatic scalability, and high availability.

How To Join Two Collections In MongoDB?

MongoDB has some issues with linking data from one collection to another unless you use simple Script Query functions. To resolve this problem, we introduce the JOIN concept, which facilitates the relationship between the data.

We can join documents on collections in MongoDB by using the $lookup (Aggregation) function. $lookup(Aggregation) creates an outer left join with another collection and helps to filter data from merged data. If documents are part of a “joined” collection, the $lookup (Aggregation) function will return documents in the form of a subarray of the original collection.

Syntax

In MongoDB’s JOIN operation, the goal is to connect one set of data to another set of data. To join two collections, we use the $lookup operator, whose syntax is defined below:

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

The $lookup function accepts a document containing these fields:

  • From: It describes the collection within an identical database that should be used for executing the join, but with sharded collection restrictions.
  • LocalField: This field represents the input field from the documents to the stage of $lookup, which performs an equivalence match between the local-field and the foreign-field from the collection ‘from.’ If an input document does not have a value for a local-field, then this operator will give the field a null value for matching purposes.
  • foreignField: This field contains data from the documents in the ‘from’ collection with which an equivalence match can be made between the foreignField and the localField. When a document in the collection ‘from’ does not have a foreignField value, this operator will set the field to null for further matching purposes.
  • As: It specifies the name of the array field that needs to be added to the input documents. More so, a new array field also includes matching documents from the collection ‘from.’ The prevailing field will be overwritten if the stated name already exists in the input document.

In SQL, when a JOIN operation is performed, the result is always a new row that combines all the fields present from the parent and foreign tables. However, in MongoDB, the output documents are supplemented by an array of native collection documents.

Steps for Joining Two Collections in MongoDB

For performing MongoDB Join two collections, you must use the $lookup operator. It is defined as a stage that executes a left outer join with another collection and aids in filtering data from joined documents.

For example, if a user requires all grades from all students, then the below query can be written:

Students.aggregate([{
$lookup: {
From: ‘Grades’,
LocalField: ‘Student_id’,
foreignField: ‘Stud_id’,
as: ‘Student_grade’
}}]);

From the above query, we can see that the initial parameter named ‘from’ specifies the collection which is to be joined with the present one, and the ‘localField’ specifies the key in the current collection, which is to be matched with the foreign key in another collection by the ‘foreignField’ parameter. For retrieving the data, Student_id from Students and stud_id from Grades are matched and joined. In addition, the end parameter ‘as’ is added as a code-named to the data.

It is worth noting that here, within every record of the output data from the query above, the data from the joined table also appears as the primary element within an array. Similarly, every row from the output data from the query above will display as follows:

{_ID: String, Quantity: Number, Student_id: String, Student_grade: Array, Stud_id: String, Student_name: String}

The data that was joined by the query will be the first element in the grades array.

Example

The following example creates a sample dataset and inserts some documents to perform MongoDB Join two collections:

db.address.insertMany(
    [
        {
            "name": "Bob",
            "blk_no": 22,
            "street" : "dewey street",
            "city" : "United States of America"
        },
        {
            "name": "Jack",
            "blk_no": 25,
            "street" : "gordon street",
            "city" : "New Zealand"
        }
    ]
);
{
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("613594cdb59313217373673c"),
                ObjectId("613594cdb59313217373673d")
        ]
}

A few documents were inserted into the address collection. Now let’s add a few documents to another collection:

db.userInfo.insertMany(
    [
        {
            "contact_name": "Bob",
            "age": 27,
            "sex" : "male",
            "citizenship" : "Filipino"
        },
        {
            "contact_name": "Jack",
            "age": 22,
            "sex" : "male",
            "citizenship" : "Filipino"
        }
    ]
);
{
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("613594dbb59313217373673e"),
                ObjectId("613594dbb59313217373673f")
        ]
}

The userInfo collection was populated with some documents.

It is important to note that the name field in the address collection has the same values as the contact_name field in the userInfo collection.

Applying the $lookup function, we find an equality match as follows:

098db.userInfo.aggregate([
    { $lookup:
        {
           from: "address",
           localField: "contact_name",
           foreignField: "name",
           as: "address"
        }
    }
]).pretty();
Output:
{
        "_id" : ObjectId("613594dbb59313217373673e"),
        "contact_name" : "Bob",
        "age" : 27,
        "sex" : "male",
        "citizenship" : "Filipino",
        "address" : [
                {
                        "_id" : ObjectId("613594cdb59313217373673c"),
                        "name" : "Bob",
                        "blk_no" : 22,
                        "street" : "dewey street",
                        "city" : "United States of America"
                }
        ]
}
{
        "_id" : ObjectId("613594dbb59313217373673f"),
        "contact_name" : "Jack",
        "age" : 22,
        "sex" : "male",
        "citizenship" : "Filipino",
        "address" : [
                {
                        "_id" : ObjectId("613594cdb59313217373673d"),
                        "name" : "Jack",
                        "blk_no" : 25,
                        "street" : "gordon street",
                        "city" : "New Zealand"
                }
        ]
}

We matched the name field and the contact_name field of these two collections. Then, you can see the output where we joined the two collections. This is the simplest way to perform MongoDB join two collections from the same database.

Simplify MongoDB ETL & Analysis Using Hevo’s No-code Data Pipeline

Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up Data Integration for 100+ Data Sources (including 40+ Free sources) and will let you directly load data from sources like MongoDB to a Data Warehouse or the Destination of your choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data. 

Get Started with Hevo for Free

Let’s look at some of the salient features of Hevo:

  • Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and maps it to the destination schema.
  • Connectors: Hevo supports 100+ Integrations to SaaS platforms FTP/SFTP, Files, Databases, BI tools, and Native REST API & Webhooks Connectors. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake, Firebolt, Data Warehouses; Amazon S3 Data Lakes; Databricks; and MySQL, SQL Server, TokuDB, MongoDB, PostgreSQL Databases to name a few.  
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within Data Pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

How MongoDB Joins Two Collections With Specific Conditions using Pipeline & $Lookup?

$lookup is used with a pipeline to perform MongoDB join two collections based on specific conditions requiring expressions.

For instance, if we want to get all users with orders greater than five

User.aggregate([{
$lookup: {
 from: 'Orders',
 let: {ID: '$ID'},
 pipeline: [{$match: {
  $expr: {
   $eq: [
    '$User_ID', '$$ID'
   ],
   $gt: [
    '$quantity', 5
   ]
  }}}]
  as: 'OrdersData'
}}]);

We can quickly join tables using pipelines within a $lookup.

How Does Mongodb Join Two Collections With The Where Clause?

MongoDB’s aggregate() method joins two collections with a where clause. By doing this, we can correlate two subqueries.

The pipeline operator is also used to apply the condition. Let’s look at the following example:

Example

In this example, we create two collections, and then we apply the where clause to join them using the $lookup aggregation.

Lookup Aggregation
Image Source

Two collections, orders, and warehouses are created, and a few documents are stored. We now combine both collections using an aggregation operation.

MongoDB Join Two Collections
Image Source

As you can see in the where clause condition, we use a pipeline function, assign the stock_item equal to order_item, and then add stock_data if the amount in stock is more significant than the order_qty.

This is the simplest way to apply where clauses to perform MongoDB Join two collections. However, you can learn other ways to do this by practicing more.

Using Compass To Join Two Collections In MongoDB

The MongoDB Compass GUI allows you to query, aggregate, and analyze your MongoDB data using a visual environment. 

The $lookup aggregation function is used to perform MongoDB Join two collections for the same database.

Example

In this example, we’ll learn how to combine two database collections using $lookup aggregation.

In Compass, you can apply the aggregation operation by following these steps:

  • Open MongoDB Compass.
MongoDB Compass
Image Source
  • If desired, create a new collection and database.
Creating Collection & Database
  • We created two collections for genera and movies in the test database, and we also inserted some documents into both collections.
    • We now merge the two collections using the aggregation function.
    • On the Documents page, click Aggregation.
Aggregation
Image Source

There’s a “pipeline stage” after you select Aggregation where there’s a dropdown list and where you have to choose which operation to perform; we choose the $lookup function.

Some fields in the collection need to be provided in this function. First, you write the collection name that you would like to join (movies) into the ‘from’ field, and write the fields of the local collection (genre list) and the fields of the “form collection” (type) in localField and foreignField, respectively, and in as write the output array field (movies).

You can see the output on the right side of the input after Aggregation.

Here is a simple way to apply the $lookup aggregation using MongoDB compass. With more practice, you will better understand the aggregation operations and better grasp the compass environment.

Conclusion

One of the fundamental differences between SQL and NoSQL databases is the JOIN operator. The JOIN operation can be performed in SQL databases between two tables inside the same database or different. On the other hand, JOIN operations in MongoDB can be performed between collections implemented in a similar database, although the case is varied.

This guide introduced you to a few of the most essential operations for performing MongoDB Join Two Collections with examples. 

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources like MongoDB and a wide variety of Desired Destinations, with a few clicks. Hevo Data with its strong integration with 100+ sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share your experience of learning how to perform MongoDB Join Two Collections in the comments below!

No Code Data Pipeline For MongoDB