MongoDB Joins are different from how Joins work on SQL. MongoDB is an open-source NoSQL Database used to store high volumes of data. In traditional databases, we use tables and rows but MongoDB uses Collections and Documents. Documents use key-value pairs which is the basic unit in MongoDB. Collections are a set of documents and functions which is the equivalent of tables in Relational Databases.
This blog talks about how to use a view to join two collections MongoDB Joins, the key types of Joins and Lookup commands.
What is MongoDB?
MongoDB is an open-source NoSQL Database used to store high volumes of data and work with it efficiently. It is categorized as NoSQL because the storage and retrieval of data are not in the form of tables and rows. It is in the form of Documents and Collections.
The MongoDB Database was initially released in February 2009 and is managed by MongoDB.inc. It also supports all the popular languages like C, C++, C#, DotNet, Java, Node.js, Ruby etc. There are a lot of companies that use MongoDB like Facebook, Nokia, Google etc. to store their large amounts of data. Although you might think that it is impossible to perform MongoDB Joins because it is a Non Relational Database, there are a few ways to do that. These are discussed in detail later in the blog.
Key Features of MongoDB
- Every database contains Collections which in turn contain Documents. Each Document is different from the others w.r.t content and number of fields.
- The Document structure is based on how the developers construct their classes and objects. The classes are more like key-value pairs instead of rows and columns.
- The Documents do not need to have a fixed schema beforehand, fields can be created instantly.
- The data model available lets you represent hierarchical relationships that facilitate storing data like arrays and other complex structures.
- MongoDB is very scalable. Companies around the world have clusters with millions of Documents within the database.
Key Components of MongoDB Architecture
- _id: This is like documents’ primary key. It is a requirement for every document. It is unique for each document. If you create a table without an _id field then MongoDB will automatically create one.
- Collection: A collection is a group of documents. It is the equivalent of tables in any other Relational Database Management system like Oracle or MySQL. They exist within a single database and do not have a fixed structure as such.
- Cursor: It is a pointer to the result. We can iterate through the cursor to get results.
- Database: It is a container for Collections. Each database gets its own set of files on the system. The MongoDB server can contain multiple databases.
- Document: A record is called a Document. They contain field names and values.
- JSON: JavaScript Object Notation. It is a readable format for expressing structured data.
- Field: It is a name-value pair. Every Document contains zero or more fields. They are similar to columns in RDBMS.
Here’s an example of Key-value Pairs:
{
Customer_id : 11
Customer_name: Sharon
Order_id: 20
}
Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (50+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.
Start for free now!
Sign Up or a 14-day free trial
What is MongoDB Lookup Aggregation?
One of the major differences between SQL and NoSQL Databases is when it comes to the JOIN clause. In Relational Databases, the JOIN clause lets you combine two or more rows from tables using a common field. But in MongoDB, databases are designed to store denormalized data. Fortunately, MongoDB Joins can be performed in MongoDB 3.2 as it introduces a new Lookup operation that can perform Join operations on Collections.
Definition
MongoDB Joins are performed by Lookup. It performs a Left Outer Join to two or more Collections. But Lookup is only permitted in Aggregate operations. This is like a pipeline that performs query, filter and group operations. The output of one is used as input for the next operation.
MongoDB Joins Syntax
The following section talks about the syntax of different conditional clauses for MongoDB Joins:
- MongoDB Joins: Equality Match with a Single Join Condition
- MongoDB Joins: Conditions and SubQueries on Joined Conditions
- MongoDB Joins: Correlated SubQueries using Concise Syntax
MongoDB Joins: Equality Match with a Single Join Condition
This clause performs an equality match between fields from the input documents and the joined collection. Syntax:
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}
This would correspond to the following SQL code:
SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
SELECT *
FROM <collection to join>
WHERE <foreignField> = <collection.localField>
);
Examples:
MongoDB Joins: Conditions and Subqueries on Joined Collections
MongoDB 3.6 has added support to:
- Execute a pipeline on a joined Collection.
- Perform multiple join collections.
- Correlated and uncorrelated queries.
To perform other join operations other than Equality Join, you can use this Lookup syntax:
{
$lookup:
{
from: <joined collection>,
let: { <var_1>: <expression>, …, <var_n>: <expression> },
pipeline: [ <pipeline to run on joined collection> ],
as: <output array field>
}
}
This is equivalent to to the SQL code below:
SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
SELECT <documents as determined from the pipeline>
FROM <collection to join>
WHERE <pipeline>
);
Examples:
MongoDB Joins: Correlated SubQueries using Concise Syntax
From MongoDB 5.0 onwards, you can use a concise syntax for correlated subqueries. The following syntax removes the requirement for an equality match on the local and foreign fields.
{
$lookup:
{
from: <foreign collection>,
localField: <field from local collection's documents>,
foreignField: <field from foreign collection's documents>,
let: { <var_1>: <expression>, …, <var_n>: <expression> },
pipeline: [ <pipeline to run> ],
as: <output array field>
}
}
The SQL code below is equivalent to the above syntax:
SELECT *, <output array field>
FROM localCollection
WHERE <output array field> IN (
SELECT <documents as determined from the pipeline>
FROM <foreignCollection>
WHERE <foreignCollection.foreignField> = <localCollection.localField>
AND <pipeline match condition>
);
Example:
MongoDB Join Restrictions
There are a few restrictions or limitations while using MongoDB Joins:
- You cannot use the $out or $merge stage in the $lookup stage before version 4.2 .
- The From Collection cannot be sharded in the $lookup stage but the Collection you run $aggregate on can be sharded.
MongoDB Join Examples
Here are a few examples of the different types of MongoDB Joins we have seen above:
Single Equality Join
There are two Collections “orders” and “inventory”. Below you can see the code snippet:
db.orders.insertMany( [
{ "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
{ "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 },
{ "_id" : 3 }
] )
db.inventory.insertMany( [
{ "_id" : 1, "sku" : "almonds", "description": "product 1", "instock" : 120 },
{ "_id" : 2, "sku" : "bread", "description": "product 2", "instock" : 80 },
{ "_id" : 3, "sku" : "cashews", "description": "product 3", "instock" : 60 },
{ "_id" : 4, "sku" : "pecans", "description": "product 4", "instock" : 70 },
{ "_id" : 5, "sku": null, "description": "Incomplete" },
{ "_id" : 6 }
] )
The following Aggregation operations perform Join operation on both these Collections and Join them using fields “item” from “orders” and “sku” from “inventory”. The code snippet is below:
db.orders.aggregate( [
{
$lookup:
{
from: "inventory",
localField: "item",
foreignField: "sku",
as: "inventory_docs"
}
}
] )
The result is as follows:
{
"_id" : 1,
"item" : "almonds",
"price" : 12,
"quantity" : 2,
"inventory_docs" : [
{ "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 }
]
}
{
"_id" : 2,
"item" : "pecans",
"price" : 20,
"quantity" : 1,
"inventory_docs" : [
{ "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 }
]
}
{
"_id" : 3,
"inventory_docs" : [
{ "_id" : 5, "sku" : null, "description" : "Incomplete" },
{ "_id" : 6 }
]
}
Using Lookup on an Array
You can match the array elements with a foreign field. For example, there are two Collections “classes” and “members” with the following Documents:
db.classes.insertMany( [
{ _id: 1, title: "Reading is ...", enrollmentlist: [ "giraffe2", "pandabear", "artie" ], days: ["M", "W", "F"] },
{ _id: 2, title: "But Writing ...", enrollmentlist: [ "giraffe1", "artie" ], days: ["T", "F"] }
] )
db.members.insertMany( [
{ _id: 1, name: "artie", joined: new Date("2016-05-01"), status: "A" },
{ _id: 2, name: "giraffe", joined: new Date("2017-05-01"), status: "D" },
{ _id: 3, name: "giraffe1", joined: new Date("2017-10-01"), status: "A" },
{ _id: 4, name: "panda", joined: new Date("2018-10-11"), status: "A" },
{ _id: 5, name: "pandabear", joined: new Date("2018-12-01"), status: "A" },
{ _id: 6, name: "giraffe2", joined: new Date("2018-12-01"), status: "D" }
] )
The following operation joins “members” and “classes” on members field to name field:
db.classes.aggregate( [
{
$lookup:
{
from: "members",
localField: "enrollmentlist",
foreignField: "name",
as: "enrollee_info"
}
}
] )
The output will be as follows:
{
"_id" : 1,
"title" : "Reading is ...",
"enrollmentlist" : [ "giraffe2", "pandabear", "artie" ],
"days" : [ "M", "W", "F" ],
"enrollee_info" : [
{ "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
{ "_id" : 5, "name" : "pandabear", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "A" },
{ "_id" : 6, "name" : "giraffe2", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "D" }
]
}
{
"_id" : 2,
"title" : "But Writing ...",
"enrollmentlist" : [ "giraffe1", "artie" ],
"days" : [ "T", "F" ],
"enrollee_info" : [
{ "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
{ "_id" : 3, "name" : "giraffe1", "joined" : ISODate("2017-10-01T00:00:00Z"), "status" : "A" }
]
}
Multiple Joins and Correlated SubQuery
Create a Collection “order’ and “warehouses” as follows:
db.orders.insertMany( [
{ "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 },
{ "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 },
{ "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 }
] )
db.warehouses.insertMany( [
{ "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 },
{ "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 },
{ "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 },
{ "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 },
{ "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 }
] )
You can use a Correlated SubQuery that performs join on “orders.item” and “warehouse.stock_item” fields as follows:
db.orders.aggregate( [
{
$lookup:
{
from: "warehouses",
let: { order_item: "$item", order_qty: "$ordered" },
pipeline: [
{ $match:
{ $expr:
{ $and:
[
{ $eq: [ "$stock_item", "$$order_item" ] },
{ $gte: [ "$instock", "$$order_qty" ] }
]
}
}
},
{ $project: { stock_item: 0, _id: 0 } }
],
as: "stockdata"
}
}
] )
The output for the aforementioned code snippet would be:
{
_id: 1,
item: 'almonds',
price: 12,
ordered: 2,
stockdata: [
{ warehouse: 'A', instock: 120 },
{ warehouse: 'B', instock: 60 }
]
},
{
_id: 2,
item: 'pecans',
price: 20,
ordered: 1,
stockdata: [ { warehouse: 'A', instock: 80 } ]
},
{
_id: 3,
item: 'cookies',
price: 10,
ordered: 60,
stockdata: [ { warehouse: 'A', instock: 80 } ]
}
Concise Correlated SubQuery
Create Collections “restaurants” and “orders” with the following Documents:
db.restaurants.insertMany( [
{
_id: 1,
name: "American Steak House",
food: [ "filet", "sirloin" ],
beverages: [ "beer", "wine" ]
},
{
_id: 2,
name: "Honest John Pizza",
food: [ "cheese pizza", "pepperoni pizza" ],
beverages: [ "soda" ]
}
] )
db.orders.insertMany( [
{
_id: 1,
item: "filet",
restaurant_name: "American Steak House"
},
{
_id: 2,
item: "cheese pizza",
restaurant_name: "Honest John Pizza",
drink: "lemonade"
},
{
_id: 3,
item: "cheese pizza",
restaurant_name: "Honest John Pizza",
drink: "soda"
}
] )
Join the above two Collections on orders.restaurant_name and restaurants.name and perform an $in array match between orders.drink and beverages as follows:
db.orders.aggregate( [
{
$lookup: {
from: "restaurants",
localField: "restaurant_name",
foreignField: "name",
let: { orders_drink: "$drink" },
pipeline: [ {
$match: {
$expr: { $in: [ "$$orders_drink", "$beverages" ] }
}
} ],
as: "matches"
}
}
] )
The output would be:
{
"_id" : 1, "item" : "filet",
"restaurant_name" : "American Steak House",
"matches" : [ ]
}
{
"_id" : 2, "item" : "cheese pizza",
"restaurant_name" : "Honest John Pizza",
"drink" : "lemonade",
"matches" : [ ]
}
{
"_id" : 3, "item" : "cheese pizza",
"restaurant_name" : "Honest John Pizza",
"drink" : "soda",
"matches" : [ {
"_id" : 2, "name" : "Honest John Pizza",
"food" : [ "cheese pizza", "pepperoni pizza" ],
"beverages" : [ "soda" ]
} ]
}
Additional Resources on Mongodb Joins
Conclusion
- This blog discusses the features of MongoDB Joins and the types that we can perform on the data.
- MongoDB $lookup is very useful and powerful but it requires complex aggregate queries.
Extracting complex data from a diverse set of data sources to carry out an insightful analysis can be challenging, and this is where Hevo saves the day!
FAQ
Can MongoDB do joins?
Yes, MongoDB supports joins using the $lookup
aggregation stage, which allows combining documents from different collections.
Why are there no joins in NoSQL?
NoSQL databases avoid joins to maintain high performance and scalability by favoring denormalized data models, reducing the need for complex operations across multiple documents.
Are joins expensive in MongoDB?
Yes, joins in MongoDB can be expensive in terms of performance, especially with large datasets, as they require aggregating data across collections, which may lead to slower query execution times.
Sharon is a data science enthusiast with a hands-on approach to data integration and infrastructure. She leverages her technical background in computer science and her experience as a Marketing Content Analyst at Hevo Data to create informative content that bridges the gap between technical concepts and practical applications. Sharon's passion lies in using data to solve real-world problems and empower others with data literacy.