Aggregation operations in MongoDB process data records/documents and return the computed results. Aggregation collects values from various documents, groups them, and then performs various operations like Sum, Average, Minimum, Maximum, etc, on that grouped data to return a computed result. It is similar to SQL’s Aggregate Function.

Upon a complete walkthrough of this article, you will gain a holistic understanding of MongoDB. You will also learn about the MongoDB Groupby Aggregation Method along with general syntax and example queries of MongoDB Groupby.

What is MongoDB?

MongoDB Logo

MongoDB is a well-known Open-Source NoSQL Database written in C++. MongoDB is a Document-oriented Database that uses JSON-like documents with a Dynamic Schema to store data. It means that you can store your records without having to worry about the Data Structure, the number of fields or the types of fields used to store values. Documents in MongoDB are similar to JSON objects.

Key Features of MongoDB

MongoDB offers a wide range of unique features that make it a better solution in comparison to other conventional databases. Some of these features are discussed below:

  • Schema Less Database: A Schema-Less Database allows various types of Documents to be stored in a single Collection(the equivalent of a table). In other words, in the MongoDB database, a single collection can hold multiple Documents, each of which can have a different number of Fields, Content, and Size.
  • Index-based Document: Every field in the Document in a MongoDB database is indexed with Primary and Secondary Indices, which makes it easier to retrieve information from the pool of data. 
  • Scalability: Sharding in MongoDB allows for Horizontal Scalability. Sharding refers to the process of distributing data across multiple Servers.
  • Replication: MongoDB offers high availability of data by creating multiple copies of the data and sending these copies to a different server so that if one server fails, the data can still be retrieved from another Server. You can learn more about MongoDB Replication.

Explore more about How To Join Two Collections In MongoDB.

Simplify MongoDB ETL Using Hevo’s No-code Data Pipeline

Hevo is the ideal data pipeline solution for integrating MongoDB as a source, enabling seamless data extraction, transformation, and loading. This ensures smooth data flow and real-time updates, optimizing your analytics and data management processes.

Let’s see some unbeatable features of Hevo Data:

  1. Fully Managed: Hevo Data is a fully managed service and is straightforward to set up.
  2. Schema Management: Hevo Data automatically maps the source schema to perform analysis without worrying about the changing schema.
  3. Real-Time: Hevo Data works on the batch as well as real-time data transfer so that your data is analysis-ready always.  
  4. Live Support: With 24/5 support, Hevo provides customer-centric solutions to the business use case.
Get Started with Hevo for Free

How to use the MongoDB Groupby Aggregation Method?

A) General Syntax of MongoDB Groupby Aggregation Method

{ $group (Aggregation is used to define group by): { _id: <expression>, <field1>: { <accumulator1> : <expression1> }, <accumulator2> : <expression2>, <accumulator3> : <expression3> } }...<accumulatorN> : <expressionN> } } }

B) Parameters Involved in MongoDB Groupby Aggregation

The parameters involved in the syntax description of MongoDB Groupby Aggregation Method are as follows:

  • $group: $group outputs a document for each distinct grouping of input documents based on the specified _id expression. This aggregation produces the _id field, which contains the group by key of distinct records. The output documents may also include computed fields containing the values of an accumulator expression.
  • _id: It is a mandatory field of MongoDB Groupby while using aggregation with $group operator. For calculating the accumulated value from all input values, you can specify the id value with a null value.
  • Accumulator: Accumulators are operators in MongoDB Groupby that maintain their state (e.g., total, maximum, minimum, and related data) while documents move through the pipeline. Some of the Accumulator operators are listed below:
    • $addToSet: For each group, this operator returns an array of unique expression values.
    • $avg: Using the $group operator, the $avg operator returns the average of all numeric fields. Non-numeric values in the collections are ignored by this operator.
    • $first: This operator returns a value from the first document of each group.
    • $last: This operator returns a value from the last document of each group.
    • $max: This operator returns the maximum value from each group.
    • $min: This operator returns the minimum value from each group.
    • $mergeObjects: This operator returns a document that was generated by combining the input documents for each group.
    • $push: This operator returns an array of expression values for each group of documents.
    • $sum: This operator returns the sum of all the numeric fields.

C) Conceptual Example: Using MongoDB Groupby Aggregation

To understand the working of the MongoDB Groupby Aggregation method, create a sample named sales with the following data:


db.sales.insertMany([
  { "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
  { "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
  { "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
  { "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" :  NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
  { "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
  { "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
  { "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
  { "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

Now, suppose you want to apply the MongoDB Groupby function to calculates the Total Sales Amount, Average Sales Quantity, and Sale Count for each day in the year 2014. The following piece of code will help you in extracting the required information:

db.sales.aggregate([
  // First Stage
  {
    $match : { "date": { $gte: new ISODate("2014-01-01"), $lt: new ISODate("2015-01-01") } }
  },
  // Second Stage
  {
    $group : {
       _id : { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
       totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
       averageQuantity: { $avg: "$quantity" },
       count: { $sum: 1 }
    }
  },
  // Third Stage
  {
    $sort : { totalSaleAmount: -1 }
  }
 ])

Where:

  • The $match filter filters the documents so that only documents from 2014 are passed on to the next stage.
  • The $group stage organizes the documents by Date and computes the Total Sale Amount, Average Quantity, and Total Count for each group.
  • The $sort stage arranges the results in descending order based on the Total Sale Amount for each group.

Output:

{ "_id" : "2014-04-04", "totalSaleAmount" : NumberDecimal("200"), "averageQuantity" : 15, "count" : 2 }
{ "_id" : "2014-03-15", "totalSaleAmount" : NumberDecimal("50"), "averageQuantity" : 10, "count" : 1 }
{ "_id" : "2014-03-01", "totalSaleAmount" : NumberDecimal("40"), "averageQuantity" : 1.5, "count" : 2 }

Conclusion

This article introduced you to MongoDB along with the salient features that it offers. Furthermore, it introduced you to the MongoDB Groupby Aggregation method along with its syntax and example queries. As your business begins to grow, data is generated at an exponential rate across all of your company’s SaaS applications, Databases, and other sources.

To meet this growing storage and computing needs of data,  you would require to invest a portion of your engineering bandwidth to Integrate data from all sources, Clean & Transform it, and finally load it to a Cloud Data Warehouse for further Business Analytics. All of these challenges can be efficiently handled by a Cloud-Based ETL tool such as Hevo Data.

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources like MongoDB and a wide variety of Desired Destinations, with a few clicks.

Want to take Hevo for a spin? Sign Up or a 14-day free trial and experience the feature-rich Hevo suite firsthand. Also checkout our unbeatable pricing to choose the best plan for your organization.

Share with us your experience of learning about the MongoDB Groupby Aggregration Method in the comments below!

FAQs

1. How to use group by MongoDB?

In MongoDB, you use the $group aggregation pipeline stage to group documents by a specific field. It allows performing operations like sum, average, or count on grouped data. For example, db.collection.aggregate([{ $group: { _id: "$fieldName", total: { $sum: 1 } } }]);.

2. Why is $group needed in MongoDB?

$group is essential in MongoDB for aggregating data. It helps calculate metrics such as totals, averages, and counts by grouping documents based on specified fields, enabling advanced data analysis within the database.

3. Why use MongoDB over JSON?

MongoDB uses BSON, an optimized binary format of JSON, allowing efficient storage and querying of hierarchical data. Unlike static JSON, MongoDB offers indexing, querying, and scalability for dynamic and complex data structures.

Rakesh Tiwari
Former Research Analyst, Hevo Data

Rakesh is a research analyst at Hevo Data with more than three years of experience in the field. He specializes in technologies, including API integration and machine learning. The combination of technical skills and a flair for writing brought him to the field of writing on highly complex topics. He has written numerous articles on a variety of data engineering topics, such as data integration, data analytics, and data management. He enjoys simplifying difficult subjects to help data practitioners with their doubts related to data engineering.