MongoDB Inner Join 101: Syntax & Example Simplified

|

Mongodb Inner Join - Featured Image

The most important difference between SQL and NoSQL databases is JOIN. An SQL JOIN clause is used in relational databases to join tables using their common fields. MongoDB, however, is a NoSQL type of database that does not follow relational database management systems (RDBMS) and therefore does not provide extensive JOIN methods. Although MongoDB originally did not have any joining methods, the MongoDB Inner JOIN concept was influenced by SQL databases. Among the many joins in SQL, the inner join is the most commonly used. 

In this article, you will learn how to easily perform a MongoDB Inner Join operation in MongoDB. 

Table of Contents

What is MongoDB?

Mongodb Inner Join - MongoDB Logo
Image Source

MongoDB is a NoSQL open-source document-oriented database developed for storing and processing high volumes of data. Compared to the conventional relational databases, MongoDB makes use of collections and documents instead of tables consisting of rows and columns. The Collections consist of several documents and documents containing the basic units of data in terms of key and value pairs. 

Introduced in February 2009, the MongoDB database is designed, maintained, and managed by MongoDB.Inc under SSPL(Server Side Public License). Organizations such as Facebook, Nokia, eBay, Adobe, Google, etc. prefer it for efficiently handling and storing their exponentially growing data. It offers complete support for programming languages such as C, C++, C#, Go, Java, Node.js, Perl, PHP, Python, Motor, Ruby, Scala, Swift, and Mongoid.

Key Features of MongoDB?

MongoDB Inner Join - MongoDB Features
Image Source

With constant efforts from the online community, MongoDB has evolved over the years. Some of its eye-catching features are:

  • High Data Availability & Stability: MongoDB’s Replication feature provides multiple servers for disaster recovery and backup. Since several servers store the same data or shards of data, MongoDB provides greater data availability & stability. This ensures all-time data access and security in case of server crashes, service interruptions, or even good old hardware failure. 
  • Accelerated Analytics: You may need to consider thousands to millions of variables while running Ad-hoc queries. MongoDB indexes BSON documents and utilizes the MongoDB Query Language (MQL) that allows you to update Ad-hoc queries in real-time. MongoDB provides complete support for field queries, range queries, and regular expression searches along with user-defined functions.
  • Indexing: With a wide range of indices and features with language-specific sort orders that support complex access patterns to datasets, MongoDB provides optimal performance for every query. For the real-time ever-evolving query patterns and application requirements, MongoDB also provisions On-demand Indices Creation.
  • Horizontal Scalability: With the help of Sharding, MongoDB provides horizontal scalability by distributing data on multiple servers using the Shard Key. Each shard in every MongoDB Cluster stores parts of the data, thereby acting as a separate database. This collection of comprehensive databases allows efficient handling of growing volumes of data with zero downtime. The complete Sharding Ecosystem is maintained and managed by Mongos that directs queries to the correct shard based on the Shard Key.
  • Load Balancing: Real-time Replication and Sharding contribute towards large-scale Load Balancing. Ensuring top-notch Concurrency Controls and Locking Protocols, MongoDB can effectively handle multiple concurrent read and write requests for the same data.  
  • Aggregation: Similar to the SQL Group By clause, MongoDB can easily batch process data and present a single result even after executing several other operations on the group data. MongoDB’s Aggregation framework consists of 3 types of aggregations i.e. Aggregation Pipeline, Map-Reduce Function, and Single-Purpose Aggregation methods.
Simplify MongoDB ETL with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports MongoDB & MongoDB Atlas, along with 100+ data sources (Including 40+ Free Data Sources), and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Get Started with Hevo for Free

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Connectors: Hevo supports 100+ Integrations to SaaS platforms such as WordPress, FTP/SFTP, Files, Databases, BI tools, and Native REST API & Webhooks Connectors. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake, Firebolt, Data Warehouses; Amazon S3 Data Lakes; Databricks, MySQL, SQL Server, TokuDB, MongoDB, DynamoDB, PostgreSQL Databases to name a few.  
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time. 
Sign up here for a 14-Day Free Trial!

How to Perform MongoDB Inner Join operation?

There are no official joins in MongoDB, as mentioned earlier. However, there are methods for doing so. The $lookup operator is used for this. The $lookup operator can only be used when aggregating data. You can visualize these as a pipeline of operators querying, filtering, and grouping results. Results from one operator are inputs for the subsequent one.

In general, aggregate queries are more complicated and slower than simpler queries. But they’re powerful and invaluable when it comes to complex search operations.

Mongodb Inner Join - MongoDB lookup operator
Image Source

$lookup returns a document as its value. This document consists of four main fields.

  • from: The name of the collection to be joined.
  • localField and foreignField: To join the collections using $lookup, a field from both collections is required. The localField comes from the collection which invokes the $lookup, whereas the foreignField comes from a second collection.
  • as: The connection between two collections in a result.

To understand the process of performing the MongoDB Inner Join, you can go through the following aspects:

1. Sample Collections for MongoDB Inner Join

For this article, two collections are considered for reference. The name of the first collection is “Chelsea”. (the _id field is omitted for better clarity). This collection features football players of Chelsea, an English football club. The documents contain two fields – playername and country. 

{ "playername" : "Thiago Silva", "country" : "Brazil" }
{ "playername" : "Anthony Rudiger", "country" : "Germany" }
{ "playername" : "Andreas Christensen", "country" : "Denmark" }
{ "playername" : "Reece James", "country" : "England" }
{ "playername" : "Ben Chilwell", "country" : "England" }
{ "playername" : "Ngolo Kante", "country" : "France" }
{ "playername" : "Jorginho", "country" : "Italy" }
{ "playername" : "Mason Mount", "country" : "England" }
{ "playername" : "Hakim Ziyech", "country" : "Morocco" }
{ "playername" : "Edouard Mendy", "country" : "Senegal" }
{ "playername" : "Romelu Lukaku", "country" : "Belgium" }
{ "playername" : "Mateo Kovacic", "country" : "Croatia" }

The second collection is called “worldXI”. 11 documents are included in the worldXI collection. According to FIFA, these are the best world XIs. In each document, three fields appear namely playername, club, and country.

{ "playername" : "Leonel Messi", "club" : "PSG", "country" : "Argentina" }
{ "playername" : "Cristiano Ronaldo", "club" : "Manchester United", "country" : "Portugal" }
{ "playername" : "Robert Lewandowski", "club" : "Bayern Munich", "country" : "Poland" }
{ "playername" : "Ruben Dias", "club" : "Manchester City", "country" : "Portugal" }
{ "playername" : "Erling Haland", "club" : "Borussia Dortmund", "country" : "Norway" }
{ "playername" : "Kevin De Bruyne", "club" : "Manchester City", "country" : "Belgium" }
{ "playername" : "David Alaba", "club" : "Real Madrid", "country" : "Austria" }
{ "playername" : "Leonardo Bonucci", "club" : "Juventus", "country" : "Italy" }
{ "playername" : "Gianluigi Donnarumma", "club" : "PSG", "country" : "Italy" }
{ "playername" : "Ngolo Kante", "club" : "Chelsea", "country" : "France" }
{ "playername" : "Jorginho", "club" : "Chelsea", "country" : "Italy" }

Common fields between the two collections are required when performing a join operation. For this MongoDB Inner Join case, playername and country are the only two fields common to both collections. Thus, you can combine these collections based on playernames.

Consequently, the result will include all the information about Chelsea footballers that are in the world XI team.

2. Aggregating Collections for MongoDB Inner Join

Follow the simple steps to aggregate the 2 collections for MongoDB Inner join:

  • Step 1: Combine the Chelsea and worldXI collections by running the following command:
db.chelsea.aggregate({ $lookup: {from : "worldXI", localField: "playername", foreignField: "playername", as : "chelseaInWorldXI"}})

This query will join the collections Chelsea and worldXI when the field, playername, is the same in both collections. Check the field named “as” : “chelseaInWorldXI”

In other words, the documents that will be linked from worldXI to Chelsea will be stored in a new field, “chelseaInWorldXI”.

  • Step 2: You can check out the MongoDB Inner Join result via the following command:
> db.chelsea.aggregate({ $lookup: {from : "worldXI", localField: "playername", foreignField: "playername", as : "chelseaInWorldXI"}}).pretty()
{
        "_id" : ObjectId("5e1b3b8c2e6cf2f7f45585d7"),
        "playername" : "Thiago Silva",
        "country" : "Brazil",
        "chelseaInWorldXI" : [ ]
}
{
        "_id" : ObjectId("5e1b3ba72e6cf2f7f45585d8"),
        "playername" : "Anthony Rudiger",
        "country" : "Germany",
        "chelseaInWorldXI" : [ ]
             
}
{
        "_id" : ObjectId("5e1b3bd52e6cf2f7f45585d9"),
        "playername" : "Andreas Christensen ",
        "country" : "Denmark",
        "chelseaInWorldXI" : [ ]
}
{
        "_id" : ObjectId("5e1b3be42e6cf2f7f45585da"),
        "playername" : "Reece James",
        "country" : "England",
        "chelseaInWorldXI" : [ ]
}
{
        "_id" : ObjectId("5e1b3c012e6cf2f7f45585db"),
        "playername" : "Ben Chillwell",
        "country" : "England",
        "chelseaInWorldXI" : [ ]
}
{
        "_id" : ObjectId("5e1b3c1d2e6cf2f7f45585dc"),
        "playername" : "Ngolo Kante",
        "country" : "France",
        "chelseaInWorldXI" : [ 
   {
                        "_id" : ObjectId("5e1b3e1b2e6cf2f7f45585e3"),
                        "playername" : "Ngolo Kante",
                        "club" : "Chelsea",
                        "country" : "France"
                }
        ]
}
{
        "_id" : ObjectId("5e1b3c332e6cf2f7f45585dd"),
        "playername" : "Jorginho",
        "country" : "Italy",
        "chelseaInWorldXI" : [ 
 {
                        "_id" : ObjectId("5e1b3d4d2e6cf2f7f45585e0"),
                        "playername" : "Jorginho",
                        "club" : "Chelsea",
                        "country" : "Italy"
                }
        ]

}
{
        "_id" : ObjectId("5e1b3c3d2e6cf2f7f45585de"),
        "playername" : "Mason Mount",
        "country" : "England",
        "chelseaInWorldXI" : [ ]
}
{
        "_id" : ObjectId("5e1b3c502e6cf2f7f45585df"),
        "playername" : "Hakim Ziyech",
        "country" : "Morocco",
        "chelseaInWorldXI" : [ ]
}
{
        "_id" : ObjectId("5e1b3f962e6cf2f7f45585eb"),
        "playername" : "Edouard Mendy",
        "country" : "Senegal",
        "chelseaInWorldXI" : [ ]
}
{
        "_id" : ObjectId("5e1b41412e6cf2f7f45585ec"),
        "playername" : "Romelu Lukaku",
        "country" : "Belgium",
        "chelseaInWorldXI" : [ ]
              
}
{
        "_id" : ObjectId("5e1b51f42e6cf2f7f45585ed"),
        "playername" : "Mateo Kovacic",
        "country" : "Croatia",
        "chelseaInWorldXI" : [ ]
}

Review the MongoDB Join results. Several documents from the Chelsea collection are merged with those from the WorldXI collection, but the result also contains documents from the Chelsea collection that do not match the ones from the WorldXI collection. 

What’s wrong here? The MongoDB Inner Join shouldn’t work this way, right? As a result, only documents that are matched with the worldXI collection are required. 

As a result, you can now use the $match operator to make it a proper MongoDB Inner Join.

3. Match operator to complete MongoDB Inner Join

You can enhance the results by using the $match operator. This tool will be used to remove all documents that do not match the worldXI collection. How can you accomplish this? Those documents that are not matching have no content in the chelseaInWorldXI field. There is nothing in it. As a result, only documents that have a length greater than zero will be displayed.

{ $match : { "chelseaInWorldXI" : { $ne : []}}}

Using the $ne operator you must remove all the records where ChelseaInWorldXI is empty.

Let’s add this to the MongoDB Inner Join query and see what happens.

> db.chelsea.aggregate({ $lookup: {from : "worldXI", localField: "playername", foreignField: "playername", as : "chelseaInWorldXI"}},{ $match : { "chelseaInWorldXI" : { $ne : []}}}  ).pretty()
{
         "_id" : ObjectId("5e1b3c1d2e6cf2f7f45585dc"),
        "playername" : "Ngolo Kante",
        "country" : "France",
        "chelseaInWorldXI" : [ 
   {
                        "_id" : ObjectId("5e1b3e1b2e6cf2f7f45585e3"),
                        "playername" : "Ngolo Kante",
                        "club" : "Chelsea",
                        "country" : "France"
                }
        ]
}
{
        "_id" : ObjectId("5e1b3c332e6cf2f7f45585dd"),
        "playername" : "Jorginho",
        "country" : "Italy",
        "chelseaInWorldXI" : [ 
 {
                        "_id" : ObjectId("5e1b3d4d2e6cf2f7f45585e0"),
                        "playername" : "Jorginho",
                        "club" : "Chelsea",
                        "country" : "Italy"
                }
        ]

}

Conclusion 

In this article, you have learned how to effectively perform the MongoDB Inner Join operation. The MongoDB database is well known for its extensive support for processing data inside the database. Unlike SQL-based databases, it does not provide a dedicated method to join collections. For MongoDB Inner Joins, you can use the $lookup & $match operators that can be used in the aggregation method instead of Join. $lookup is a much-welcome addition to MongoDB 3.2. It can alleviate some of the frustrations relating to the use of small amounts of relational data in a NoSQL database.

To get a complete picture of your business performance and financial health, you need to consolidate data from MongoDB and all the other applications used across your business. To achieve this you need to assign a portion of your Engineering Bandwidth to Integrate data from all sources, Clean & Transform it, and finally, Load it to a Cloud Data Warehouse or a destination of your choice for further Business Analytics. All of these challenges can be comfortably solved by a Cloud-Based ETL tool such as Hevo Data.  

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline can seamlessly transfer data from a vast sea of 100+ sources such as MongoDB & MongoDB Atlas to a Data Warehouse or a Destination of your choice to be visualized in a BI Tool. It is a reliable, completely automated, and secure service that doesn’t require you to write any code! 

If you are using MongoDB as your NoSQL Database Management System and searching for a no-fuss alternative to Manual Data Integration, then Hevo can effortlessly automate this for you. Hevo, with its strong integration with 100+ sources & BI tools(Including 40+ Free Sources), allows you to not only export & load data but also transform & enrich your data & make it analysis-ready in a jiffy.

Want to take Hevo for a ride? Sign Up for a 14-day free trial and simplify your Data Integration process. Do check out the pricing details to understand which plan fulfills all your business needs.

Tell us about your experience performing the MongoDB Inner Join operation! Share your thoughts with us in the comments section below.

Samuel Salimon
Freelance Technical Content Writer, Hevo Data

Samuel specializes in freelance writing within the data industry, adeptly crafting informative and engaging content centered on data science by merging his problem-solving skills.

No-code Data Pipeline for MongoDB