MongoDB Lookup Aggregations: Syntax, Usage & Practical Examples 101

|

MongoDB Lookup Cover

When working on a project, the need might have arrived to merge two documents or some specific fields in the documents. If that is the case, you are at the right place. MongoDB provides its users with the ability to aggregate the data they want, as long as it is present in the same database.

We use the MongoDB Lookup operator to achieve the same.

The MongoDB Lookup operator, by definition, “Performs a left outer join to an unshared collection in the same database to filter in documents from the joined collection for processing.” Simply put, using Lookup in MongoDB makes it possible to merge data from the document you are running a query on and the document you want the data from.

But, before moving forward straight to the hows of The lookup MongoDB aggregation operator and its usage, we will discuss, in brief, Aggregation in MongoDB. And the working principles of the MongoDB Lookup Operator. We will also discuss, in detail, the syntaxes for each scenario and case, and give some real-world examples, too. Let’s begin.

Table of Contents

  1. What is Aggregation in MongoDB?
  2. What is The MongoDB Lookup operator?
  3. Syntaxes Used For The MongoDB lookup Operations
  4. Considerations For The MongoDB Lookup Operations
  5. Few Examples to Better Understand The MongoDB lookup Operator

What is Aggregation in MongoDB?

The process of collecting data to return computed results is known as Aggregation in MongoDB. The process first requires collecting the required data points then grouping them together in accordance with the relevancies to perform operations such as sum, average, minimum, maximum, etc. So, in general, aggregation in MongoDB is similar to that in SQL. 

Three highly seen use cases for aggregation operations are as follows:

  1. To group values from multiple documents
  2. Perform operations return a single result on the grouped data
  3. Analyze trends by examining data values and changes in it over time

There are three ways to perform aggregation in MongoDB:

  1. Aggregation pipeline: This method is the recommended way to perform aggregation tasks. Learn more about the Aggregation pipeline.
  2. Map-reduce function: The Map-reduce function is used to aggregate results on a large scale. It has two main functions, namely, map and reduce. The map function groups all the documents, and the reduce function perform operations on the grouped data.
  3. Single-purpose aggregation: The single-purpose aggregation method is a simple way of performing aggregation tasks, but it lacks some features when compared to the Aggregation pipeline method. It’s mainly used to perform tasks within a document like counting numbers of distinct values in a document or number of documents etc. 

What is The MongoDB Lookup operator?

The MongoDB lookup operator, by definition, leverages the left outer join method to merge information from one document to another. Using the $lookup operator, you could add an extra field to the existing document to demystify information.

For example, you could map a field that contains the last three digits of your employee ID to the name of your department by adding a DepartmentName field to the output produced. 

But, how does the MongoDB Lookup operator works?

In the $lookup operations, we have two collection inputs required: ‘Input collection‘ and ‘from collection.’

‘Input collection’ collets the database information on which the $lookup operation is performed. On the other hand, ‘from collection’ collects the information from the database we want to merge the primary document with.

Simplify MongoDB ETL Using Hevo’s No-code Data Pipeline!

Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up Data Integration for 150+ Data Sources (Including 40+ Free sources) and will let you directly load data from sources like MongoDB to a Data Warehouse or the Destination of your choice.

Hevo will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data. 

Get Started with Hevo for Free

Let’s look at some of the salient features of Hevo:

  • Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Connectors: Hevo supports 100+ Integrations to SaaS platforms such as WordPress, FTP/SFTP, Files, Databases, BI tools, and Native REST API & Webhooks Connectors. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake, Firebolt, Data Warehouses; Amazon S3 Data Lakes; Databricks, MySQL, SQL Server, TokuDB, DynamoDB, MongoDB PostgreSQL Databases to name a few.  
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within Data Pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

Syntaxes Used For The MongoDB lookup Operations

The syntax for The MongoDB Lookup operations varies for three possible scenarios:

  1. Equality Match with a Single Join Condition
  2. Join Condition and Subqueries on a Joined Collection
  3. Correlated Subqueries Using Concise Syntax

Equality Match with a Single Join Condition

In the Equality Match to perform merge operations between a field from the input database and the joined database, the syntax looks like this:

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

MongoDB documentation explains the constraints used for this MongoDB Lookup with conditions as follows: (The language has been edited for consistency, click here to check out the original version)

FieldDescription
fromSpecifies the collection in the same database to perform the join with. The from collection cannot be sharded. For details, see Sharded Collection Restrictions.
localFieldSpecifies the field from the documents input to the $lookup stage. $lookup performs an equality match on the localField to the foreignField from the documents of the from collection. If an input document does not contain the localField, the $lookup treats the field as having a value of null for matching purposes.
foreignFieldSpecifies the field from the documents in the from collection. $lookup performs an equality match on the foreignField to the localField from the input documents. If a document in the from collection does not contain the foreignField, the $lookup treats the value as null for matching purposes.
asSpecifies the name of the new array field to add to the input documents. The new array field contains the matching documents from the from collection. If the specified name already exists in the input document, the existing field is overwritten.

The pseudo-SQL statement for the above-mentioned operation would look like this:

SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
   SELECT *
   FROM <collection to join>
   WHERE <foreignField> = <collection.localField>
);

Join Condition and Subqueries on a Joined Collection

MongoDB’s subqueries related to correlation, in general, are relatable to the SQL’s. Hence, in a correlated subquery, the inner query references the outer query values. At the same time, the uncorrelated subquery does the opposite. 

In this scenario, to perform correlated and uncorrelated subqueries, the syntax looks like this:

{
   $lookup:
      {
         from: <joined collection>,
         let: { <var_1>: <expression>, …, <var_n>: <expression> },
         pipeline: [ <pipeline to run on joined collection> ],
         as: <output array field>
      }
}

MongoDB documentation explains the used constraints as follows: (The language has been edited for consistency, click here to check out the original version)

FieldDescription
fromSpecifies the collection in the same database to perform the join operation. The joined collection cannot be sharded (see Sharded Collection Restrictions).
letOptional. Specifies variables to use in the pipeline stages. Use the variable expressions to access the fields from the joined collection’s documents that are input to the pipeline.
pipelineSpecifies the pipeline to run on the joined collection. The pipeline determines the resulting documents from the joined collection. To return all documents, specify an empty pipeline [].The pipeline cannot include the $out stage or the $merge stage. The pipeline cannot directly access the joined document fields. Instead, define variables for the joined document fields using the let option and then reference the variables in the pipeline stages.
asSpecifies the name of the new array field to add to the joined documents. The new array field contains the matching documents from the joined collection. If the specified name already exists in the joined document, the existing field is overwritten.

The pseudo-SQL statement for the above-mentioned operation would look like this:

SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
   SELECT <documents as determined from the pipeline>
   FROM <collection to join>
   WHERE <pipeline>
);

Correlated Subqueries Using Concise Syntax

The feature came into existence in MongoDB version 5.0. The feature uses the concise syntax form for a correlated subquery. The new format removes the constraints for running equality match first on the two databases inside an $exp operator.

In this scenario, to perform correlated subqueries in a concise format, the syntax looks like this:

{
   $lookup:
      {
         from: <foreign collection>,
         localField: <field from local collection's documents>,
         foreignField: <field from foreign collection's documents>,
         let: { <var_1>: <expression>, …, <var_n>: <expression> },
         pipeline: [ <pipeline to run> ],
         as: <output array field>
      }
}

MongoDB documentation explains the used constraints as follows: (The language has been edited for consistency, click here to check out the original version)

FieldDescription
fromSpecifies the foreign collection in the same database to join to the local collection. The foreign collection cannot be sharded (see Sharded Collection Restrictions).
localFieldSpecifies the local documents’ localField to perform an equality match with the foreign documents’ foreignField.If a local document does not contain a localField value, the $lookup uses a null value for the match.
foreignFieldSpecifies the foreign documents’ foreignField to perform an equality match with the local documents’ localField.If a foreign document does not contain a foreignField value, the $lookup uses a null value for the match.
letOptional. Specifies the variables to use in the pipeline stages. Use the variable expressions to access the document fields that are input to the pipeline.
pipelineSpecifies the pipeline to run on the foreign collection. The pipeline returns documents from the foreign collection. To return all documents, specify an empty pipeline [].The pipeline cannot include the $out or $merge stages.
asSpecifies the name of the new array field to add to the foreign documents. The new array field contains the matching documents from the foreign collection. If the specified name already exists in the foreign document, the existing field is overwritten.

The pseudo-SQL statement for the above-mentioned operation would look like this:

SELECT *, <output array field>
FROM localCollection
WHERE <output array field> IN (
   SELECT <documents as determined from the pipeline>
   FROM <foreignCollection>
   WHERE <foreignCollection.foreignField> = <localCollection.localField>
   AND <pipeline match condition>
);

Considerations For The MongoDB Lookup Operations

Views and Collation

The views should have a similar collection when performing aggregation tasks with multiple views.

Restrictions

While specifying a pipeline for the joined collection, it’s prohibited to include $out or the $merge stage in $lookup. (changed in version 2.3)

{
  $lookup:
    {
       from: <collection to join>,
       let: { <var_1>: <expression>, …, <var_n>: <expression> },
       pipeline: [ <pipeline to execute on the joined collection> ],  // Cannot include $out or $merge
       as: <output array field>
    }
}

Shared Collections

In MongoDB’s version 5.1, it’s now possible to specify the shared collections in the ‘from’ parameter of the $lookup stages.

db.collection.aggregate([
{ $lookup: { from: "fromCollection", … } }
])

Few Examples to Better Understand The MongoDB Lookup Operator

In this section, we will look at some MongoDB Lookup examples to better understand the MongoDB Lookup Operator in different scenarios. Let’s begin.

Perform a Single Equality Join with the MongoDB lookup 

First, create a collection named ‘Orders’ with the following data sets. You can copy and paste the code provided below in your system.

db.orders.insertMany( [
   { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
   { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 },
   { "_id" : 3  }
] )

Second, create another collection named ‘inventory’ with the following data sets.

db.inventory.insertMany( [
   { "_id" : 1, "sku" : "almonds", "description": "product 1", "instock" : 120 },
   { "_id" : 2, "sku" : "bread", "description": "product 2", "instock" : 80 },
   { "_id" : 3, "sku" : "cashews", "description": "product 3", "instock" : 60 },
   { "_id" : 4, "sku" : "pecans", "description": "product 4", "instock" : 70 },
   { "_id" : 5, "sku": null, "description": "Incomplete" },
   { "_id" : 6 }
] )

The code given below will now merge the information on the document ‘Order’ from the document ‘inventory’ using the field ‘item,’ which will collect the information from the document ‘order’ and ‘SKU’ field from the ‘inventory’ document. The code looks like this:

db.orders.aggregate( [
   {
     $lookup:
       {
         from: "inventory",
         localField: "item",
         foreignField: "sku",
         as: "inventory_docs"
       }
  }
] )

The output looks like this:

{
   "_id" : 1,
   "item" : "almonds",
   "price" : 12,
   "quantity" : 2,
   "inventory_docs" : [
      { "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 }
   ]
}
{
   "_id" : 2,
   "item" : "pecans",
   "price" : 20,
   "quantity" : 1,
   "inventory_docs" : [
      { "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 }
   ]
}
{
   "_id" : 3,
   "inventory_docs" : [
      { "_id" : 5, "sku" : null, "description" : "Incomplete" },
      { "_id" : 6 }
   ]
}

Below given is the pseudo-SQL statement to which the operation corresponds.

SELECT *, inventory_docs
FROM orders
WHERE inventory_docs IN (
   SELECT *
   FROM inventory
   WHERE sku = orders.item
);

Using The MongoDB Lookup with an Array

It is possible to match elements in array format within the scalar format without an $unwind stage. Follow the example below. We will be creating an example collection ‘classes’ with the data sets provided below:

db.classes.insertMany( [
   { _id: 1, title: "Reading is ...", enrollmentlist: [ "giraffe2", "pandabear", "artie" ], days: ["M", "W", "F"] },
   { _id: 2, title: "But Writing ...", enrollmentlist: [ "giraffe1", "artie" ], days: ["T", "F"] }
] )

Now create another document ‘members’ with the data sets provided below:

db.members.insertMany( [
   { _id: 1, name: "artie", joined: new Date("2016-05-01"), status: "A" },
   { _id: 2, name: "giraffe", joined: new Date("2017-05-01"), status: "D" },
   { _id: 3, name: "giraffe1", joined: new Date("2017-10-01"), status: "A" },
   { _id: 4, name: "panda", joined: new Date("2018-10-11"), status: "A" },
   { _id: 5, name: "pandabear", joined: new Date("2018-12-01"), status: "A" },
   { _id: 6, name: "giraffe2", joined: new Date("2018-12-01"), status: "D" }
] )

The below-given aggregation code will merge the document ‘classes’ with the document ‘members.’ This will match the ‘members’ field to the ‘name’ field.

db.classes.aggregate( [
   {
      $lookup:
         {
            from: "members",
            localField: "enrollmentlist",
            foreignField: "name",
            as: "enrollee_info"
        }
   }
] )

The operation will return the following:

{
   "_id" : 1,
   "title" : "Reading is ...",
   "enrollmentlist" : [ "giraffe2", "pandabear", "artie" ],
   "days" : [ "M", "W", "F" ],
   "enrollee_info" : [
      { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
      { "_id" : 5, "name" : "pandabear", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "A" },
      { "_id" : 6, "name" : "giraffe2", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "D" }
   ]
}
{
   "_id" : 2,
   "title" : "But Writing ...",
   "enrollmentlist" : [ "giraffe1", "artie" ],
   "days" : [ "T", "F" ],
   "enrollee_info" : [
      { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
      { "_id" : 3, "name" : "giraffe1", "joined" : ISODate("2017-10-01T00:00:00Z"), "status" : "A" }
   ]
}

Using The MongoDB Lookup with $mergerObjects

In general, the $mergerObecjts operator merges multiple documents into one. We will proceed by creating ‘orders’ document first, as given below:

db.orders.insertMany( [
   { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
   { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 }
] )

Now, create a document named ‘items.’

db.items.insertMany( [
  { "_id" : 1, "item" : "almonds", description: "almond clusters", "instock" : 120 },
  { "_id" : 2, "item" : "bread", description: "raisin and nut bread", "instock" : 80 },
  { "_id" : 3, "item" : "pecans", description: "candied pecans", "instock" : 60 }
] )

The operator will first use the MongoDB Lookup operator to merge the two documents and then use the $mergeObjects operator in the $replaceRoot to merge the information.

db.orders.aggregate( [
   {
      $lookup: {
         from: "items",
         localField: "item",    // field in the orders collection
         foreignField: "item",  // field in the items collection
         as: "fromItems"
      }
   },
   {
      $replaceRoot: { newRoot: { $mergeObjects: [ { $arrayElemAt: [ "$fromItems", 0 ] }, "$$ROOT" ] } }
   },
   { $project: { fromItems: 0 } }
] )

The operator returns the following:

{
  _id: 1,
  item: 'almonds',
  description: 'almond clusters',
  instock: 120,
  price: 12,
  quantity: 2
},
{
  _id: 2,
  item: 'pecans',
  description: 'candied pecans',
  instock: 60,
  price: 20,
  quantity: 1
}

Conclusion

In this tutorial article, we learned our way through the concept of Aggregation in MongoDB, as well as, the concept of MongoDB Lookup Operator. We also discussed the different syntaxes used for different scenarios and examples to better under the subject.

Nevertheless, MongoDB is a cloud-native Document Database-as-a-Service, and it requires technical acumen to understand some code-heavy concepts; hence the need arises for a Data warehouse-as-a-Service platform like Hevo Data.

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources like MongoDB and a wide variety of Desired Destinations, with a few clicks.

Hevo Data with its strong integration with 150+ Data sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable Hevo Pricing that will help you choose the right plan for your business needs.

Yash Arora
Former Content Manager, Hevo Data

Yash is a Content Marketing professinal with experience in data-driven marketing campaigns. He has expertise in strategic thinking, integrated marketing, and customer acquisition. She has driven growth for startups and established brands through comprehensive marketing communications, and digital strategies.

No-code Data Pipeline for MongoDB