When working on a project, the need might have arrived to merge two documents or some specific fields in the documents. If that is the case, you are at the right place. MongoDB provides its users with the ability to aggregate the data they want, as long as it is present in the same database.

We use the MongoDB Lookup operator to achieve the same.

The MongoDB Lookup operator, by definition, “Performs a left outer join to an unshared collection in the same database to filter in documents from the joined collection for processing.” Simply put, using Lookup in MongoDB makes it possible to merge data from the document you are running a query on and the document you want the data from.

But, before moving forward straight to the how’s of The lookup MongoDB aggregation operator and its usage, we will discuss, in brief, Aggregation in MongoDB. And the working principles of the MongoDB Lookup Operator. We will also discuss, in detail, the syntaxes for each scenario and case, and give some real-world examples, too. Let’s begin.

What is Aggregation in MongoDB?

The process of collecting data to return computed results is known as Aggregation in MongoDB. The process first requires collecting the required data points then grouping them together in accordance with the relevancies to perform operations such as sum, average, minimum, maximum, etc. So, in general, aggregation in MongoDB is similar to that in SQL. 

Three highly seen use cases for aggregation operations are as follows:

  1. To group values from multiple documents
  2. Perform operations return a single result on the grouped data
  3. Analyze trends by examining data values and changes in it over time
Simplify MongoDB ETL Using Hevo’s No-code Data Pipeline!

Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up Data Integration for 150+ Data Sources (Including 60+ Free sources) and will let you directly load data from sources like MongoDB to a Data Warehouse or the Destination of your choice. Check out what makes Hevo amazing:

  • Easy Integration: Connect and migrate data without any coding.
  • Auto-Schema Mapping: Automatically map schemas to ensure smooth data transfer.
  • In-Built Transformations: Transform your data on the fly with Hevo’s powerful transformation capabilities.
  • Load Events in Batches: Events can be loaded in batches in certain data warehouses.
Get Started with Hevo for Free

Ways to Perform Aggregation in MongoDB

There are three ways to perform aggregation in MongoDB:

  1. Aggregation pipeline: This method is the recommended way to perform aggregation tasks. Learn more about the Aggregation pipeline from the official documentation.
  2. Map-reduce function: The Map-reduce function is used to aggregate results on a large scale. It has two main functions, namely, map and reduce. The map function groups all the documents, and the reduce function perform operations on the grouped data.
  3. Single-purpose aggregation: The single-purpose aggregation method is a simple way of performing aggregation tasks, but it lacks some features when compared to the Aggregation pipeline method. It’s mainly used to perform tasks within a document like counting numbers of distinct values in a document or number of documents etc. 

What is The MongoDB Lookup operator?

The MongoDB lookup operator, by definition, leverages the left outer join method to merge information from one document to another. Using the $lookup operator, you could add an extra field to the existing document to demystify information.

For example, you could map a field that contains the last three digits of your employee ID to the name of your department by adding a DepartmentName field to the output produced. 

But, how does the MongoDB Lookup operator works?

In the $lookup operations, we have two collection inputs required: ‘Input collection‘ and ‘from collection.’

‘Input collection’ collets the database information on which the $lookup operation is performed. On the other hand, ‘from collection’ collects the information from the database we want to merge the primary document with.

Syntaxes Used For The MongoDB lookup Operations

The syntax for The MongoDB Lookup operations varies for three possible scenarios:

  1. Equality Match with a Single Join Condition
  2. Join Condition and Subqueries on a Joined Collection
  3. Correlated Subqueries Using Concise Syntax

1. Equality Match with a Single Join Condition

In the Equality Match to perform merge operations between a field from the input database and the joined database, the syntax looks like this:

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

MongoDB documentation explains the constraints used for this MongoDB Lookup with conditions as follows:

FieldDescription
fromSpecifies the collection in the same database to perform the join with. The from collection cannot be sharded. For details, see Sharded Collection Restrictions.
localFieldSpecifies the field from the documents input to the $lookup stage. $lookup performs an equality match on the localField to the foreignField from the documents of the from collection. If an input document does not contain the localField, the $lookup treats the field as having a value of null for matching purposes.
foreignFieldSpecifies the field from the documents in the from collection. $lookup performs an equality match on the foreignField to the localField from the input documents. If a document in the from collection does not contain the foreignField, the $lookup treats the value as null for matching purposes.
asSpecifies the name of the new array field to add to the input documents. The new array field contains the matching documents from the from collection. If the specified name already exists in the input document, the existing field is overwritten.

The pseudo-SQL statement for the above-mentioned operation would look like this:

SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
   SELECT *
   FROM <collection to join>
   WHERE <foreignField> = <collection.localField>
);

2. Join Condition and Subqueries on a Joined Collection

MongoDB’s subqueries related to correlation, in general, are relatable to the SQL’s. Hence, in a correlated subquery, the inner query references the outer query values. At the same time, the uncorrelated subquery does the opposite. 

In this scenario, to perform correlated and uncorrelated subqueries, the syntax looks like this:

{
   $lookup:
      {
         from: <joined collection>,
         let: { <var_1>: <expression>, …, <var_n>: <expression> },
         pipeline: [ <pipeline to run on joined collection> ],
         as: <output array field>
      }
}

MongoDB documentation explains the used constraints as follows:

FieldDescription
fromSpecifies the collection in the same database to perform the join operation. The joined collection cannot be sharded (see Sharded Collection Restrictions).
letOptional. Specifies variables to use in the pipeline stages. Use the variable expressions to access the fields from the joined collection’s documents that are input to the pipeline.
pipelineSpecifies the pipeline to run on the joined collection. The pipeline determines the resulting documents from the joined collection. To return all documents, specify an empty pipeline [].The pipeline cannot include the $out stage or the $merge stage. The pipeline cannot directly access the joined document fields. Instead, define variables for the joined document fields using the let option and then reference the variables in the pipeline stages.
asSpecifies the name of the new array field to add to the joined documents. The new array field contains the matching documents from the joined collection. If the specified name already exists in the joined document, the existing field is overwritten.

The pseudo-SQL statement for the above-mentioned operation would look like this:

SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
   SELECT <documents as determined from the pipeline>
   FROM <collection to join>
   WHERE <pipeline>
);

3. Correlated Subqueries Using Concise Syntax

The feature came into existence in MongoDB version 5.0. The feature uses the concise syntax form for a correlated subquery. The new format removes the constraints for running equality match first on the two databases inside an $exp operator.

In this scenario, to perform correlated subqueries in a concise format, the syntax looks like this:

{
   $lookup:
      {
         from: <foreign collection>,
         localField: <field from local collection's documents>,
         foreignField: <field from foreign collection's documents>,
         let: { <var_1>: <expression>, …, <var_n>: <expression> },
         pipeline: [ <pipeline to run> ],
         as: <output array field>
      }
}

MongoDB documentation explains the used constraints as follows:

FieldDescription
fromSpecifies the foreign collection in the same database to join to the local collection. The foreign collection cannot be sharded (see Sharded Collection Restrictions).
localFieldSpecifies the local documents’ localField to perform an equality match with the foreign documents’ foreignField.If a local document does not contain a localField value, the $lookup uses a null value for the match.
foreignFieldSpecifies the foreign documents’ foreignField to perform an equality match with the local documents’ localField.If a foreign document does not contain a foreignField value, the $lookup uses a null value for the match.
letOptional. Specifies the variables to use in the pipeline stages. Use the variable expressions to access the document fields that are input to the pipeline.
pipelineSpecifies the pipeline to run on the foreign collection. The pipeline returns documents from the foreign collection. To return all documents, specify an empty pipeline [].The pipeline cannot include the $out or $merge stages.
asSpecifies the name of the new array field to add to the foreign documents. The new array field contains the matching documents from the foreign collection. If the specified name already exists in the foreign document, the existing field is overwritten.

The pseudo-SQL statement for the above-mentioned operation would look like this:

SELECT *, <output array field>
FROM localCollection
WHERE <output array field> IN (
   SELECT <documents as determined from the pipeline>
   FROM <foreignCollection>
   WHERE <foreignCollection.foreignField> = <localCollection.localField>
   AND <pipeline match condition>
);

Considerations For The MongoDB Lookup Operations

Views and Collation

The views should have a similar collection when performing aggregation tasks with multiple views.

Restrictions

While specifying a pipeline for the joined collection, it’s prohibited to include $out or the $merge stage in $lookup. (changed in version 2.3)

{
  $lookup:
    {
       from: <collection to join>,
       let: { <var_1>: <expression>, …, <var_n>: <expression> },
       pipeline: [ <pipeline to execute on the joined collection> ],  // Cannot include $out or $merge
       as: <output array field>
    }
}

Shared Collections

In MongoDB’s version 5.1, it’s now possible to specify the shared collections in the ‘from’ parameter of the $lookup stages.

db.collection.aggregate([
{ $lookup: { from: "fromCollection", … } }
])

MongoDB Lookup Operator Examples

In this section, we will look at some MongoDB Lookup examples to better understand the MongoDB Lookup Operator in different scenarios. Let’s begin.

1) Perform a Single Equality Join with the MongoDB lookup 

First, create a collection named ‘Orders’ with the following data sets. You can copy and paste the code provided below in your system.

db.orders.insertMany( [
   { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
   { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 },
   { "_id" : 3  }
] )

Second, create another collection named ‘inventory’ with the following data sets.

db.inventory.insertMany( [
   { "_id" : 1, "sku" : "almonds", "description": "product 1", "instock" : 120 },
   { "_id" : 2, "sku" : "bread", "description": "product 2", "instock" : 80 },
   { "_id" : 3, "sku" : "cashews", "description": "product 3", "instock" : 60 },
   { "_id" : 4, "sku" : "pecans", "description": "product 4", "instock" : 70 },
   { "_id" : 5, "sku": null, "description": "Incomplete" },
   { "_id" : 6 }
] )

The code given below will now merge the information on the document ‘Order’ from the document ‘inventory’ using the field ‘item,’ which will collect the information from the document ‘order’ and ‘SKU’ field from the ‘inventory’ document. The code looks like this:

db.orders.aggregate( [
   {
     $lookup:
       {
         from: "inventory",
         localField: "item",
         foreignField: "sku",
         as: "inventory_docs"
       }
  }
] )

The output looks like this:

{
   "_id" : 1,
   "item" : "almonds",
   "price" : 12,
   "quantity" : 2,
   "inventory_docs" : [
      { "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 }
   ]
}
{
   "_id" : 2,
   "item" : "pecans",
   "price" : 20,
   "quantity" : 1,
   "inventory_docs" : [
      { "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 }
   ]
}
{
   "_id" : 3,
   "inventory_docs" : [
      { "_id" : 5, "sku" : null, "description" : "Incomplete" },
      { "_id" : 6 }
   ]
}

Below given is the pseudo-SQL statement to which the operation corresponds.

SELECT *, inventory_docs
FROM orders
WHERE inventory_docs IN (
   SELECT *
   FROM inventory
   WHERE sku = orders.item
);

2) Using The MongoDB Lookup with an Array

It is possible to match elements in array format within the scalar format without an $unwind stage. Follow the example below. We will be creating an example collection ‘classes’ with the data sets provided below:

db.classes.insertMany( [
   { _id: 1, title: "Reading is ...", enrollmentlist: [ "giraffe2", "pandabear", "artie" ], days: ["M", "W", "F"] },
   { _id: 2, title: "But Writing ...", enrollmentlist: [ "giraffe1", "artie" ], days: ["T", "F"] }
] )

Now create another document ‘members’ with the data sets provided below:

db.members.insertMany( [
   { _id: 1, name: "artie", joined: new Date("2016-05-01"), status: "A" },
   { _id: 2, name: "giraffe", joined: new Date("2017-05-01"), status: "D" },
   { _id: 3, name: "giraffe1", joined: new Date("2017-10-01"), status: "A" },
   { _id: 4, name: "panda", joined: new Date("2018-10-11"), status: "A" },
   { _id: 5, name: "pandabear", joined: new Date("2018-12-01"), status: "A" },
   { _id: 6, name: "giraffe2", joined: new Date("2018-12-01"), status: "D" }
] )

The below-given aggregation code will merge the document ‘classes’ with the document ‘members.’ This will match the ‘members’ field to the ‘name’ field.

db.classes.aggregate( [
   {
      $lookup:
         {
            from: "members",
            localField: "enrollmentlist",
            foreignField: "name",
            as: "enrollee_info"
        }
   }
] )

The operation will return the following:

{
   "_id" : 1,
   "title" : "Reading is ...",
   "enrollmentlist" : [ "giraffe2", "pandabear", "artie" ],
   "days" : [ "M", "W", "F" ],
   "enrollee_info" : [
      { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
      { "_id" : 5, "name" : "pandabear", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "A" },
      { "_id" : 6, "name" : "giraffe2", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "D" }
   ]
}
{
   "_id" : 2,
   "title" : "But Writing ...",
   "enrollmentlist" : [ "giraffe1", "artie" ],
   "days" : [ "T", "F" ],
   "enrollee_info" : [
      { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
      { "_id" : 3, "name" : "giraffe1", "joined" : ISODate("2017-10-01T00:00:00Z"), "status" : "A" }
   ]
}

3) Using The MongoDB Lookup with $mergerObjects

In general, the $mergerObecjts operator merges multiple documents into one. We will proceed by creating ‘orders’ document first, as given below:

db.orders.insertMany( [
   { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
   { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 }
] )

Now, create a document named ‘items.’

db.items.insertMany( [
  { "_id" : 1, "item" : "almonds", description: "almond clusters", "instock" : 120 },
  { "_id" : 2, "item" : "bread", description: "raisin and nut bread", "instock" : 80 },
  { "_id" : 3, "item" : "pecans", description: "candied pecans", "instock" : 60 }
] )

The operator will first use the MongoDB Lookup operator to merge the two documents and then use the $mergeObjects operator in the $replaceRoot to merge the information.

db.orders.aggregate( [
   {
      $lookup: {
         from: "items",
         localField: "item",    // field in the orders collection
         foreignField: "item",  // field in the items collection
         as: "fromItems"
      }
   },
   {
      $replaceRoot: { newRoot: { $mergeObjects: [ { $arrayElemAt: [ "$fromItems", 0 ] }, "$$ROOT" ] } }
   },
   { $project: { fromItems: 0 } }
] )

The operator returns the following:

{
  _id: 1,
  item: 'almonds',
  description: 'almond clusters',
  instock: 120,
  price: 12,
  quantity: 2
},
{
  _id: 2,
  item: 'pecans',
  description: 'candied pecans',
  instock: 60,
  price: 20,
  quantity: 1
}
Integrate MongoDB to BigQuery
Integrate MongoDB to PostgreSQL
Integrate MongoDB Atlas to Snowflake

Conclusion

In this tutorial article, we learned our way through the concept of Aggregation in MongoDB, as well as, the concept of MongoDB Lookup Operator. We also discussed the different syntaxes used for different scenarios and examples to better under the subject.

Nevertheless, MongoDB is a cloud-native Document Database-as-a-Service, and it requires technical acumen to understand some code-heavy concepts; hence the need arises for a Data warehouse-as-a-Service platform like Hevo Data.

Sign up for a 14-day free trial and simplify your data integration process. Check out the pricing details to understand which plan fulfills all your business needs.

Frequently Asked Questions

1. What are lookups in MongoDB?

Lookups in MongoDB are used to perform joins between collections. The $lookup operator allows you to combine documents from two collections based on a shared field, similar to SQL joins. This helps in aggregating data from different sources in a single query.

2. What is the difference between lookup and populate in MongoDB?

$lookup: A MongoDB operator used in aggregation to join data from different collections.
– populate: A method used in Mongoose (MongoDB’s object data modeling library for Node.js) to replace object references with actual data from related collections. Populate is a higher-level abstraction built on top of $lookup.

3. How do I check if a value exists in MongoDB?

To check if a value exists in a MongoDB collection, you can use the findOne or find method with a query that matches the value. For example, using db.collection.findOne({ fieldName: value }) will return the document if the value exists, or null if it doesn’t.

Yash Arora
Content Manager, Hevo Data

Yash is a Content Marketing professional with over three years of experience in data-driven marketing campaigns. He has expertise in strategic thinking, integrated marketing, and customer acquisition. Through comprehensive marketing communications and innovative digital strategies, he has driven growth for startups and established brands.