Ultimate MongoDB MapReduce Tutorial: Key Commands, Syntax & 4 Examples

Last Modified: December 29th, 2022

MOGODB MAPREDUCE - FEATURED IMAGE

A NoSQL database, like MongoDB, is the best solution if you intend to develop an application capable of managing large volumes of data, providing high-performance data storage solutions, and providing an easy-to-use environment.

MongoDB is among the most popular open-source NoSQL databases, written entirely in C++. This tool is popular among agile development teams because of its flexibility. The application uses a document database to build highly available and scalable apps. Because MongoDB provides drivers for all major programming languages, you can start developing your application immediately without wasting time configuring the database.

The purpose of this article is to discuss MongoDB MapReduce so we can learn how MapReduce works with MongoDB. In addition, we will learn the MongoDB MapReduce commands, syntax, and examples.

Table of Contents

What is MongoDB MapReduce?

mongodb mapreduce: mongodb logo
Image Source: webimages.mongodb.com

Map-Reduce is a programming paradigm in MongoDB that enables you to process large data sets and produce aggregated results? The map-reduce operations in MongoDB are performed by the MapReduce() function. The map and reduce functions are the two main functions in this function. It is possible to group all the data based on a key value using the map function and perform operations on this grouped data using the reduce function. 

The MapReduce() appears to work best on extensive collections of data. With Map Reduce, you can aggregate data using key-based operations such as max, avg, as well as a group by in SQL. As a result, each data set is mapped and reduced independently in different spaces and then combined in a function, resulting in a new collection. Again, data is processed independently and in parallel.

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process. It supports 100+ data sources (including 30+ free data sources) like Asana and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

GET STARTED WITH HEVO FOR FREE[/hevoButton]

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
SIGN UP HERE FOR A 14-DAY FREE TRIAL

MongoDB MapReduce Syntax and Parameter

MongoDB MapReduce: MongoDB MapReduce structure
Image Source: external-preview.redd.it

The following is the syntax for the MapReduce command.

Syntax:

mongodb mapreduce: mongodb mapreduce syntax
Image Source: https://www.educba.com/mongodb-mapreduce/

Parameter:

The following is a description of each parameter:

  • Collection name: This is the result of retrieving documents from the collection using the MapReduce command. MongoDB offers the MapReduce method for processing large amounts of data.
  • Map Reduce: The technique is used to process large amounts of data and large aggregate amounts in MongoDB. Using the MapReduce command in MongoDB is convenient and advantageous.
  • Options: It is specified that an additional parameter is used with this MapReduce command.
  • Out: Specifies where the MapReduce operation results will be stored in MongoDB. Output can be set as a primary member, and on the secondary members, the only inline output is available.
  • Query: MongoDB defines a query parameter as the selection criteria. MongoDB allows us to define MapReduce select criteria using queries.
  • Sort: Sorting documents from collections is done using this. The MapReduce method in MongoDB is primarily used for optimization.
  • Limit: Limit is a MapReduce method that limits the number of documents for input.
  • Finalize: MongoDB provides this method as an optional parameter. The output will be modified, and the reduced method will be followed.
  • Scope: Using the MapReduce method, the scope specifies which variables from the map are accessible.
  • JsMode: When executing functions, it specifies whether the data will be converted into BSON format.
  • Verbose: By default, verbose is set to false in MapReduce commands. This specifies the timing information.
  • Collation: MongoDB’s MapReduce method accepts a correlation parameter as an optional parameter. It specifies which collation will be used during MapReduce operations.

MongoDB MapReduce Examples

Let’s look at some examples of MongoDB MapReduce to gain a better understanding:

mongodb mapreduce: mongodB mapreduce example
Image Source: data-flair.training

Example – 1

In this case, a map operation is performed on every input document. First, the map operation generates a key-value pair. Next, MongoDB applies the reduce phase to keys with multiple values, aggregating the data and condensing it. It then stores the results in collections.

If necessary, the output of the reduce function may be passed through the finalize function to process the aggregation results. All the map-reduce functions in MongoDB run in JavaScript as part of the MongoDB process.

A single collection of documents is used as input, then any sorting and limiting are performed before the mapping is begun.

Example – 2

Next, we’ll examine another example from a collection named examples, which contains the following types of documents:

{
    _id: ObjectId("50a8240b927d5d8b5891743c"),
    cust_id: "a123",
    ord_date: new Date("Jan 04, 2019"),
    status: 'A',
    price: 25,
    items: [ { sku: "m", qty: 5, price: 2.5 },
          { sku: "n", qty: 5, price: 2.5 } ]
}

Determine the map function that will be used to process each input document:

var mapFunction1 = function() {
           emit(this.cust_id, this.price);
         };

You need two arguments to define the corresponding reduce function: CustId and Prices.

var reduceFunction1 = function(CustId, Prices) {
                return Array.sum(Prices);
             };

Now perform map-reduce on the entire examples collection.

db.examples.mapReduce(
           mapFunction1,
           reduceFunction1,
           { out: "map_reduce_example" }
          )

It will return the map_reduce_example collection as the output.

Example – 3 

What is the best method for calculating the total quantity and order?

We will calculate order and total quantity based on the average quantity per item for the same example.

Develop a map function for processing each input document:

var mapFunction2 = function() {
          for (var idx = 0; idx < this.items.length; idx++) {
            var key = this.items[idx].sku;
            var value = {
                     count: 1,
                     qty: this.items[idx].qty
                   };
          emit(key, value);
          }
       };

You need to define a reduce function with two arguments, key and ObjVals:

var reduceFunction2 = function(key, ObjVals) {
         reducedVal = { count: 0, qty: 0 };
         for (var idx = 0; idx < ObjVals.length; idx++) {
            reducedVal.count += ObjVals[idx].count;
            reducedVal.qty += ObjVals[idx].qty;
         }
         return reducedVal;
         };

You need to define a finalize function with two arguments keys and reducedVal.

var finalizeFunction2 = function (keys, reducedVal) {
           reducedVal.avg = reducedVal.qty/reducedVal.count;
           return reducedVal;
         };

Following this, we will perform a map-reduce operation on the collection of examples.

DB.examples.MapReduce( mapFunction2,
          reduceFunction2,
          {
           out: { merge: "map_reduce_example" },
           query: { ord:
                   { $gt: new Date('26/01/2019') }
                 },
           finalize: finalizeFunction2
           }
          )

Example – 4 

Here, let us consider school DB, where the student is a collection, and the collection contains documents, each of which includes a student’s name and the marks they received in a particular subject. MapReduce will be used to tally the grades for each student.

The following is a collection of student work.

> db.students.find({});
{ "_id" : ObjectId("5a1f9ce431c157f3ec2aec39"), "name" : "Midhu", "subject" : "science", "marks" : 68 }
{ "_id" : ObjectId("5a1f9ce431c157f3ec2aec3a"), "name" : "Midhu", "subject" : "maths", "marks" : 98 }
{ "_id" : ObjectId("5a1f9ce431c157f3ec2aec3b"), "name" : "Midhu", "subject" : "sports", "marks" : 77 }
{ "_id" : ObjectId("5a1f9ce431c157f3ec2aec3c"), "name" : "Akhil", "subject" : "science", "marks" : 67 }
{ "_id" : ObjectId("5a1f9ce431c157f3ec2aec3d"), "name" : "Akhil", "subject" : "maths", "marks" : 87 }
{ "_id" : ObjectId("5a1f9ce431c157f3ec2aec3e"), "name" : "Akhil", "subject" : "sports", "marks" : 89 }
{ "_id" : ObjectId("5a1f9ce431c157f3ec2aec3f"), "name" : "Anish", "subject" : "science", "marks" : 67 }
{ "_id" : ObjectId("5a1f9ce431c157f3ec2aec40"), "name" : "Anish", "subject" : "maths", "marks" : 78 }
{ "_id" : ObjectId("5a1f9ce431c157f3ec2aec41"), "name" : "Anish", "subject" : "sports", "marks" : 90 }

Prepare Map function

The map function should return a key-value pair. Here, the name is the key, and marks are the value.

var map = function() {emit(this.name,this.marks);};

Prepare Reduce function

Map functions should produce key-value pairs. In this case, the name is the key, and marks are the value.

var reduce = function(name,marks) {return Array.sum(marks);};

Prepare MapReduce function

The map function should return a key-value pair. Here, the name is the key, and marks are the value.

DB.students.MapReduce(
   map,
   reduce,
   { out: "totals" }
);

Out: “totals”: the results are written into the totals collection of the database.

Using Mongo Daemon

Using the following command, start the Mongo daemon.

~$ sudo mongod --port 27017 --dbpath /var/lib/MongoDB

Mongo Daemon will now wait for connections on port 27017.

Run MapReduce

Use Mongo Shell to run the above commands (in Step 1 to Step 3).

> var map = function() {emit(this.name,this.marks);};
> var reduce = function(name,marks) {return Array.sum(marks);};
> db.students.mapReduce(
...    map,
...    reduce,
...    { out: "totals" }
... );
{
    "result" : "totals",
    "timeMillis" : 599,
    "counts" : {
        "input" : 9,
        "emit" : 9,
        "reduce" : 3,
        "output" : 3
    },
    "ok" : 1
}
> db.totals.find({})
{ "_id" : "Akhil", "value" : 243 }
{ "_id" : "Anish", "value" : 235 }
{ "_id" : "Midhu", "value" : 243 }

This value has been aggregated (accumulated) for the key values, and the result has been placed in the totals collection.

Conclusion

MongoDB MapReduce command is primarily based on the reduce and map functions. MongoDB MapReduce is a technique for processing large data sets. It provides an aggregated result for large data sets. Therefore, using the MapReduce command is quite handy and helpful.

The following guide to MongoDB MapReduce discusses the MapReduce command and how it works in MongoDB. In addition, the guide provides information on its syntax, parameters, and various examples. 

MongoDB is a trusted source that a lot of companies use as it provides many benefits but transferring data from it into a data warehouse is a hectic task. The Automated data pipeline helps in solving this issue and this is where Hevo comes into the picture. Hevo Data is a No-code Data Pipeline and has awesome 100+ pre-built Integrations that you can choose from.

visit our website to explore hevo[/hevoButton]

Hevo can help you Integrate your data from numerous sources and load them into a destination to Analyze real-time data with a BI tool such as Tableau. It will make your life easier and data migration hassle-free. It is user-friendly, reliable, and secure.

SIGN UP for a 14-day free trial and see the difference!

Share your experience of learning about steps to MongoDB MapReduce in the comments section below.

Samuel Salimon
Freelance Technical Content Writer, Hevo Data

Samuel specializes in freelance writing within the data industry, adeptly crafting informative and engaging content centered on data science by merging his problem-solving skills.

No-code Data Pipeline For Your Data Warehouse