MongoDB Groupby Aggregation Method Simplified

|

MongoDB Groupby

Aggregation operations in MongoDB process data records/documents and return the computed results. Aggregation collects values from various documents, groups them, and then performs various operations like Sum, Average, Minimum, Maximum, etc, on that grouped data to return a computed result. It is similar to SQL’s Aggregate Function.

Upon a complete walkthrough of this article, you will gain a holistic understanding of MongoDB. You will also learn about the MongoDB Groupby Aggregation Method along with general syntax and example queries of MongoDB Groupby.

What is MongoDB?

MongoDB Logo
Image Source

MongoDB is a well-known Open-Source NoSQL Database written in C++. MongoDB is a Document-oriented Database that uses JSON-like documents with a Dynamic Schema to store data. It means that you can store your records without having to worry about the Data Structure, the number of fields or the types of fields used to store values. Documents in MongoDB are similar to JSON objects.

You can change the structure of records (which MongoDB refers to as Documents) by simply adding new fields or deleting existing ones. This feature of MongoDB allows you to easily represent Hierarchical Relationships, Store Arrays, and other complex Data Structures.

Nowadays, many tech giants, including Facebook, eBay, Adobe, and Google, use MongoDB to store their large amounts of data.

Key Features of MongoDB

MongoDB Features
Image Source

MongoDB offers a wide range of unique features that make it a better solution in comparison to other conventional databases. Some of these features are discussed below:

  • Schema Less Database: A Schema-Less Database allows various types of Documents to be stored in a single Collection(the equivalent of a table). In other words, in the MongoDB database, a single collection can hold multiple Documents, each of which can have a different number of Fields, Content, and Size.

    It is not necessary for one document to be similar to another which is a prerequisite in Relational Databases. Due to this feature, MongoDB offers great flexibility to the users.
  • Index-based Document: Every field in the Document in a MongoDB database is indexed with Primary and Secondary Indices, which makes it easier to retrieve information from the pool of data. 
  • Scalability: Sharding in MongoDB allows for Horizontal Scalability. Sharding refers to the process of distributing data across multiple Servers.

    A large amount of data is partitioned into data chunks using the Shard Key, and these data chunks are evenly distributed across Shards that reside across many Physical Servers.
  • Replication: MongoDB offers high availability of data by creating multiple copies of the data and sending these copies to a different server so that if one server fails, the data can still be retrieved from another Server. You can learn more about MongoDB Replication.

Explore more about : How To Join Two Collections In MongoDB

 
Simplify MongoDB ETL Using Hevo’s No-code Data Pipeline

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Get Started with Hevo for Free

How to use the MongoDB Groupby Aggregation Method?

MongoDB Aggregation Groupby
Image Source

A) General Syntax of MongoDB Groupby Aggregation Method

{ $group (Aggregation is used to define group by): { _id: <expression>, <field1>: { <accumulator1> : <expression1> }, <accumulator2> : <expression2>, <accumulator3> : <expression3> } }...<accumulatorN> : <expressionN> } } }

B) Parameters Involved in MongoDB Groupby Aggregation

The parameters involved in the syntax description of MongoDB Groupby Aggregation Method are as follows:

  • $group: $group outputs a document for each distinct grouping of input documents based on the specified _id expression. This aggregation produces the _id field, which contains the group by key of distinct records. The output documents may also include computed fields containing the values of an accumulator expression.
  • _id: It is a mandatory field of MongoDB Groupby while using aggregation with $group operator. For calculating the accumulated value from all input values, you can specify the id value with a null value.
  • Accumulator: Accumulators are operators in MongoDB Groupby that maintain their state (e.g., total, maximum, minimum, and related data) while documents move through the pipeline. Some of the Accumulator operators are listed below:
    • $addToSet: For each group, this operator returns an array of unique expression values.
    • $avg: Using the $group operator, the $avg operator returns the average of all numeric fields. Non-numeric values in the collections are ignored by this operator.
    • $first: This operator returns a value from the first document of each group.
    • $last: This operator returns a value from the last document of each group.
    • $max: This operator returns the maximum value from each group.
    • $min: This operator returns the minimum value from each group.
    • $mergeObjects: This operator returns a document that was generated by combining the input documents for each group.
    • $push: This operator returns an array of expression values for each group of documents.
    • $sum: This operator returns the sum of all the numeric fields.

C) Conceptual Example: Using MongoDB Groupby Aggregation

To understand the working of the MongoDB Groupby Aggregation method, create a sample named sales with the following data:


db.sales.insertMany([
  { "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
  { "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
  { "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
  { "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" :  NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
  { "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
  { "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
  { "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
  { "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

Now, suppose you want to apply the MongoDB Groupby function to calculates the Total Sales Amount, Average Sales Quantity, and Sale Count for each day in the year 2014. The following piece of code will help you in extracting the required information:

db.sales.aggregate([
  // First Stage
  {
    $match : { "date": { $gte: new ISODate("2014-01-01"), $lt: new ISODate("2015-01-01") } }
  },
  // Second Stage
  {
    $group : {
       _id : { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
       totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
       averageQuantity: { $avg: "$quantity" },
       count: { $sum: 1 }
    }
  },
  // Third Stage
  {
    $sort : { totalSaleAmount: -1 }
  }
 ])

Where:

  • The $match filter filters the documents so that only documents from 2014 are passed on to the next stage.
  • The $group stage organizes the documents by Date and computes the Total Sale Amount, Average Quantity, and Total Count for each group.
  • The $sort stage arranges the results in descending order based on the Total Sale Amount for each group.

Output:

{ "_id" : "2014-04-04", "totalSaleAmount" : NumberDecimal("200"), "averageQuantity" : 15, "count" : 2 }
{ "_id" : "2014-03-15", "totalSaleAmount" : NumberDecimal("50"), "averageQuantity" : 10, "count" : 1 }
{ "_id" : "2014-03-01", "totalSaleAmount" : NumberDecimal("40"), "averageQuantity" : 1.5, "count" : 2 }

Conclusion

This article introduced you to MongoDB along with the salient features that it offers. Furthermore, it introduced you to the MongoDB Groupby Aggregation method along with its syntax and example queries. As your business begins to grow, data is generated at an exponential rate across all of your company’s SaaS applications, Databases, and other sources.

To meet this growing storage and computing needs of data,  you would require to invest a portion of your engineering bandwidth to Integrate data from all sources, Clean & Transform it, and finally load it to a Cloud Data Warehouse for further Business Analytics. All of these challenges can be efficiently handled by a Cloud-Based ETL tool such as Hevo Data.

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources like MongoDB and a wide variety of Desired Destinations, with a few clicks. Hevo Data with its strong integration with 150+ sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.

Want to take Hevo for a spin? Sign Up or a 14-day free trial and experience the feature-rich Hevo suite firsthand. Also checkout our unbeatable pricing to choose the best plan for your organization.

Share with us your experience of learning about the MongoDB Groupby Aggregration Method in the comments below!

mm
Former Research Analyst, Hevo Data

Rakesh is a research analyst at Hevo Data with more than three years of experience in the field. He specializes in technologies, including API integration and machine learning. The combination of technical skills and a flair for writing brought him to the field of writing on highly complex topics. He has written numerous articles on a variety of data engineering topics, such as data integration, data analytics, and data management. He enjoys simplifying difficult subjects to help data practitioners with their doubts related to data engineering.

No Code Data Pipeline For MongoDB