MongoDB Groupby Aggregation Method Simplified

|

MongoDB Groupby

Aggregation operations in MongoDB process data records/documents and return the computed results. Aggregation collects values from various documents, groups them, and then performs various operations like Sum, Average, Minimum, Maximum, etc, on that grouped data to return a computed result. It is similar to SQL’s Aggregate Function.

Upon a complete walkthrough of this article, you will gain a holistic understanding of MongoDB. You will also learn about the MongoDB Groupby Aggregation Method along with general syntax and example queries of MongoDB Groupby.

Table of Contents

What is MongoDB?

MongoDB Logo
Image Source

MongoDB is a well-known Open-Source NoSQL Database written in C++. MongoDB is a Document-oriented Database that uses JSON-like documents with a Dynamic Schema to store data. It means that you can store your records without having to worry about the Data Structure, the number of fields or the types of fields used to store values. Documents in MongoDB are similar to JSON objects.

You can change the structure of records (which MongoDB refers to as Documents) by simply adding new fields or deleting existing ones. This feature of MongoDB allows you to easily represent Hierarchical Relationships, Store Arrays, and other complex Data Structures.

Nowadays, many tech giants, including Facebook, eBay, Adobe, and Google, use MongoDB to store their large amounts of data.

Key Features of MongoDB

MongoDB Features
Image Source

MongoDB offers a wide range of unique features that make it a better solution in comparison to other conventional databases. Some of these features are discussed below:

  • Schema Less Database: A Schema-Less Database allows various types of Documents to be stored in a single Collection(the equivalent of a table). In other words, in the MongoDB database, a single collection can hold multiple Documents, each of which can have a different number of Fields, Content, and Size.

    It is not necessary for one document to be similar to another which is a prerequisite in Relational Databases. Due to this feature, MongoDB offers great flexibility to the users.
  • Index-based Document: Every field in the Document in a MongoDB database is indexed with Primary and Secondary Indices, which makes it easier to retrieve information from the pool of data. 
  • Scalability: Sharding in MongoDB allows for Horizontal Scalability. Sharding refers to the process of distributing data across multiple Servers.

    A large amount of data is partitioned into data chunks using the Shard Key, and these data chunks are evenly distributed across Shards that reside across many Physical Servers.
  • Replication: MongoDB offers high availability of data by creating multiple copies of the data and sending these copies to a different server so that if one server fails, the data can still be retrieved from another Server. You can learn more about MongoDB Replication here
 
Simplify MongoDB ETL Using Hevo’s No-code Data Pipeline

Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up Data Integration for 100+ Data Sources (including 40+ Free sources) and will let you directly load data from sources like MongoDB to a Data Warehouse or the Destination of your choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data. 

Get Started with Hevo for Free

Let’s look at some of the salient features of Hevo:

  • Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and maps it to the destination schema.
  • Connectors: Hevo supports 100+ Integrations to SaaS platforms FTP/SFTP, Files, Databases, BI tools, and Native REST API & Webhooks Connectors. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake, Firebolt, Data Warehouses; Amazon S3 Data Lakes; Databricks; and MySQL, SQL Server, TokuDB, MongoDB, PostgreSQL Databases to name a few.  
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within Data Pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

How to use the MongoDB Groupby Aggregation Method?

MongoDB Aggregation Groupby
Image Source

A) General Syntax of MongoDB Groupby Aggregation Method

{ $group (Aggregation is used to define group by): { _id: <expression>, <field1>: { <accumulator1> : <expression1> }, <accumulator2> : <expression2>, <accumulator3> : <expression3> } }...<accumulatorN> : <expressionN> } } }

B) Parameters Involved in MongoDB Groupby Aggregation

The parameters involved in the syntax description of MongoDB Groupby Aggregation Method are as follows:

  • $group: $group outputs a document for each distinct grouping of input documents based on the specified _id expression. This aggregation produces the _id field, which contains the group by key of distinct records. The output documents may also include computed fields containing the values of an accumulator expression.
  • _id: It is a mandatory field of MongoDB Groupby while using aggregation with $group operator. For calculating the accumulated value from all input values, you can specify the id value with a null value.
  • Accumulator: Accumulators are operators in MongoDB Groupby that maintain their state (e.g., total, maximum, minimum, and related data) while documents move through the pipeline. Some of the Accumulator operators are listed below:
    • $addToSet: For each group, this operator returns an array of unique expression values.
    • $avg: Using the $group operator, the $avg operator returns the average of all numeric fields. Non-numeric values in the collections are ignored by this operator.
    • $first: This operator returns a value from the first document of each group.
    • $last: This operator returns a value from the last document of each group.
    • $max: This operator returns the maximum value from each group.
    • $min: This operator returns the minimum value from each group.
    • $mergeObjects: This operator returns a document that was generated by combining the input documents for each group.
    • $push: This operator returns an array of expression values for each group of documents.
    • $sum: This operator returns the sum of all the numeric fields.

C) Conceptual Example: Using MongoDB Groupby Aggregation

To understand the working of the MongoDB Groupby Aggregation method, create a sample named sales with the following data:


db.sales.insertMany([
  { "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
  { "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
  { "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
  { "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" :  NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
  { "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
  { "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
  { "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
  { "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

Now, suppose you want to apply the MongoDB Groupby function to calculates the Total Sales Amount, Average Sales Quantity, and Sale Count for each day in the year 2014. The following piece of code will help you in extracting the required information:

db.sales.aggregate([
  // First Stage
  {
    $match : { "date": { $gte: new ISODate("2014-01-01"), $lt: new ISODate("2015-01-01") } }
  },
  // Second Stage
  {
    $group : {
       _id : { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
       totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
       averageQuantity: { $avg: "$quantity" },
       count: { $sum: 1 }
    }
  },
  // Third Stage
  {
    $sort : { totalSaleAmount: -1 }
  }
 ])

Where:

  • The $match filter filters the documents so that only documents from 2014 are passed on to the next stage.
  • The $group stage organizes the documents by Date and computes the Total Sale Amount, Average Quantity, and Total Count for each group.
  • The $sort stage arranges the results in descending order based on the Total Sale Amount for each group.

Output:

{ "_id" : "2014-04-04", "totalSaleAmount" : NumberDecimal("200"), "averageQuantity" : 15, "count" : 2 }
{ "_id" : "2014-03-15", "totalSaleAmount" : NumberDecimal("50"), "averageQuantity" : 10, "count" : 1 }
{ "_id" : "2014-03-01", "totalSaleAmount" : NumberDecimal("40"), "averageQuantity" : 1.5, "count" : 2 }

Conclusion

This article introduced you to MongoDB along with the salient features that it offers. Furthermore, it introduced you to the MongoDB Groupby Aggregation method along with its syntax and example queries. As your business begins to grow, data is generated at an exponential rate across all of your company’s SaaS applications, Databases, and other sources.

To meet this growing storage and computing needs of data,  you would require to invest a portion of your engineering bandwidth to Integrate data from all sources, Clean & Transform it, and finally load it to a Cloud Data Warehouse for further Business Analytics. All of these challenges can be efficiently handled by a Cloud-Based ETL tool such as Hevo Data.

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources like MongoDB and a wide variety of Desired Destinations, with a few clicks. Hevo Data with its strong integration with 100+ sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share with us your experience of learning about the MongoDB Groupby Aggregration Method in the comments below!

mm
Former Research Analyst, Hevo Data

Rakesh is a Cloud Engineer with a passion for data, software architecture, and writing technical content. He has experience writing articles on various topics related to data integration and infrastructure.

No Code Data Pipeline For MongoDB