Conducting MongoDB Query Performance Analysis Simplified 101

|

MongoDB Query Performance Analysis_FI

One of the common challenges that every growing business faces are the ability to efficiently handle the exponentially growing data. Apart from the Traditional Relational Databases, organizations are now using Document-oriented Open-source NoSQL Databases. There are several NoSQL databases out there, but MongoDB is the most commonly used, and it is available both as a Cloud Service and for Deployment on Self-Managed Systems.

In this article, you will gain information about MongoDB Query Performance Analysis. You will also gain a holistic understanding of MongoDB, its key features, conducting MongoDB Query Performance Analysis, and the ways of optimizing Query Performance. Read along to find out in-depth information about undergoing MongoDB Query Performance Analysis.

Table of Contents

What is MongoDB?

MongoDB is a NoSQL database that was developed by MongoDB inc, which is schema-free. It was designed and created using c++ and javascript allowing for higher connectivity. It uses a collection of Documents and has an option for creating schemas as well. It doesn’t follow the same structure of a traditional database wherein the data is stored in form of rows.

Since general RDBMS are easier to use same is the case with MongoDB. MongoDB uses a NoSQL platform making it easier for individuals having less or no prior programming knowledge. MongoDB processes the data in a semi-structured format, allowing for processing large volumes of data in one go simultaneously. It can be hosted on mostly all the cloud platforms be it Google’s Cloud, Microsoft Azure, or even Amazons’ Web Services.

MongoDB uses Binary JSON and MQL as an alternative to SQL. BSON allows for data types such as the floating-point, long, date, and many more that are not supported by regular JSON. MQL offers additional capabilities when compared to regular SQL making it more relevant for MongoDB as it processes JSON-type documents.

MongoDB is a NoSQL Server in which data is stored in BSON (Binary JSON) documents and each document is essentially built on a key-value pair structure. As MongoDB easily stores schemaless data, make it appropriate for capturing data whose structure is not known. This document-oriented approach is designed to offer a richer experience with modern programming techniques.

To install MongoDB click here.

Key Features of MongoDB

MongoDB Query Performance Analysis: MongoDB Architecture
Image Source

Main features of MongoDB which make it unique are:

1) High Performance

Data operations on MongoDB are fast and easy because of their NoSQL nature. Data can be quickly stored, manipulated, and retrieved without any compromise on data integrity.

2) Scalability

In the Big Data era, MongoDB data can be distributed across a cluster of machines quickly and equally, free of bulkiness. The scalability of MongoDB handles a growing amount of data capably. Sharding is a process in MongoDB used to horizontally scale the data across multiple servers when the size of data increases.

3) Availability

Data is highly available with MongoDB as it makes multiple copies of the same data and sends copies of data across different servers. In case any server fails, data can be retrieved from another server without delay.

4) Flexibility

MongoDB can easily be combined with different Database Management Systems, both SQL and NoSQL types. Document-oriented structure makes MongoDB schema dynamically flexible and different types of data can be easily stored and manipulated.

To learn more about MongoDB, click this link.

Simplify MongoDB ETL with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 150+ different sources (including 40+ free sources) such as MongoDB to a Data Warehouse or Destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different Business Intelligence (BI) tools as well.

Get Started with Hevo for Free

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

How to Conduct MongoDB Query performance Analysis?

Conducting MongoDB Query Performance Analysis can be difficult if you don’t know which aspects should be measured. Fortunately, MongoDB provides a very useful tool for conducting MongoDB query performance Analysis i.e., explain (“executionStats”). This tool provides some general measurements, such as the number of documents examined and the execution time, which can be used for statistical analysis.

The cursor.explain(“executionStats”) and the db.collection.explain(“executionStats”) methods provide statistics about the performance of a query. These statistics can be useful in measuring if and how a query uses an index.

For further information, you can visit db.collection.explain() for details.

Conducting MongoDB Query Performance Analysis

MongoDB Query Performance Analysis can be conducted in two different cases.

Lets us consider a collection named inventory with the following documents:

{ "_id" : 1, "item" : "f1", type: "food", quantity: 500 }
{ "_id" : 2, "item" : "f2", type: "food", quantity: 100 }
{ "_id" : 3, "item" : "p1", type: "paper", quantity: 200 }
{ "_id" : 4, "item" : "p2", type: "paper", quantity: 150 }
{ "_id" : 5, "item" : "f3", type: "food", quantity: 300 }
{ "_id" : 6, "item" : "t1", type: "toys", quantity: 500 }
{ "_id" : 7, "item" : "a1", type: "apparel", quantity: 250 }
{ "_id" : 8, "item" : "a2", type: "apparel", quantity: 400 }
{ "_id" : 9, "item" : "t2", type: "toys", quantity: 50 }
{ "_id" : 10, "item" : "f4", type: "food", quantity: 75 }

1) Query with No Index

The following query returns Documents with quantity values ranging from 100 to 200, inclusive:

db.inventory.find( { quantity: { $gte: 100, $lte: 200 } } )

The following Documents are returned by the above query:

{ "_id" : 2, "item" : "f2", "type" : "food", "quantity" : 100 }
{ "_id" : 3, "item" : "p1", "type" : "paper", "quantity" : 200 }
{ "_id" : 4, "item" : "p2", "type" : "paper", "quantity" : 150 }

To view the query plan selected, you can chain the cursor.explain(“executionStats”) cursor method to the end of the find command:

db.inventory.find(
   { quantity: { $gte: 100, $lte: 200 } }
).explain("executionStats")

The explain() returns the following results:

{
   "queryPlanner" : {
         "plannerVersion" : 1,
         ...
         "winningPlan" : {
            "stage" : "COLLSCAN",
            ...
         }
   },
   "executionStats" : {
      "executionSuccess" : true,
      "nReturned" : 3,
      "executionTimeMillis" : 0,
      "totalKeysExamined" : 0,
      "totalDocsExamined" : 10,
      "executionStages" : {
         "stage" : "COLLSCAN",
         ...
      },
      ...
   },
   ...
}
  • COLLSCAN is displayed by queryPlanner.winningPlan.stage to indicate a Collection scan. Collection scans indicate that the mongod had to scan the entire Collection Document by Document to identify the results. This is a generally expensive operation and can result in slow queries.
  • The value 3 in executionStats.nReturned indicates that the query matches and returns three documents.
  • The value 0 in executionStats.totalKeysExamined indicates that this query does not use an index.
  • The value 10 displayed by executionStats.totalDocsExamined indicates that MongoDB had to scan all the documents in the collection i.e, 10 to find the three matching documents.

The difference between the number of matching documents and the number of examined documents indicates that the query could benefit from the use of an Index to improve efficiency.

2) Query with Index

You can add an Index on the quantity field to support the query on the quantity field.

You can use the explain() method to view the query plan statistics:

db.inventory.createIndex( { quantity: 1 } )
db.inventory.find(
   { quantity: { $gte: 100, $lte: 200 } }
).explain("executionStats")

It will give the following result.

{
   "queryPlanner" : {
         "plannerVersion" : 1,
         ...
         "winningPlan" : {
               "stage" : "FETCH",
               "inputStage" : {
                  "stage" : "IXSCAN",
                  "keyPattern" : {
                     "quantity" : 1
                  },
                  ...
               }
         },
         "rejectedPlans" : [ ]
   },
   "executionStats" : {
         "executionSuccess" : true,
         "nReturned" : 3,
         "executionTimeMillis" : 0,
         "totalKeysExamined" : 3,
         "totalDocsExamined" : 3,
         "executionStages" : {
            ...
         },
         ...
   },
   ...
}
  • IXSCAN is displayed by queryPlanner.winningPlan.inputStage.stage to indicate index use.
  • The value 3 in executionStats.nReturned indicates that the query matches and returns three Documents.
  • MongoDB scanned three index entries, as indicated by the value 3 in executionStats.totalKeysExamined. The number of keys examined corresponds to the number of documents returned, indicating that the mongod only needed to examine index keys to return the results. The mongod didn’t have to scan all of the documents, and only the three that matched had to be pulled into memory. As a result, the query is highly efficient.
  • executionStats.totalDocsExamined displays 3 which indicates that MongoDB scanned three documents.

Without the index, the query would scan the entire Collection of ten Documents to return 3 matching Documents. The query would also have to scan the entire content of each Document, potentially storing them in memory. As a result, the query operation becomes costly and potentially slow.

MongoDB Query Performance Analysis: How to Optimize?

MongoDB Query performance can be optimized in the following ways.

1) Create Indexes to Support Queries

You can create Indexes for commonly used queries. If a query searches multiple fields, a Compound Index can be created. You should prefer creating Indexes because scanning an Index takes significantly less time than scanning a Collection. The Index structures are smaller than the document references, and they store references in chronological order.

For example: Suppose you have a Collection named posts that contain blog posts and if you regularly run queries that sort on the author_name field, you can optimize the query by creating an Index on the author_name field:

db.posts.createIndex( { author_name : 1 } )

Indexes also improve the efficiency of queries that sort on a specific field on a regular basis.

For example: If you run queries that sort on the timestamp field on a regular basis, you can optimize the query by creating an Index on the timestamp field.

  • Creating this Index:
db.posts.createIndex( { timestamp : 1 } )
  • Optimizes this query:
db.posts.find().sort( { timestamp : -1 } )

The direction of a single-key Index is irrelevant because MongoDB can read Indexes in both ascending and descending order.

2) Limit the Number of Query Results to Reduce Network Demand

Cursors in MongoDB return results in groups of multiple documents. If you know how many results you want, you can use the limit( ) method to reduce the demand for network resources.

This is typically used in combination with sort operations. For example, if you only need 10 results from your query to the posts collection, you can issue the following command:

db.posts.find().sort( { timestamp : -1 } ).limit(10)

3) Use Projections to Return Only Necessary Data

When you only need a subset of fields from a Document, you can improve performance by returning only the fields you require:

For example: If you only need the timestamp, title, author, and abstract fields in your query to the posts collection, you can issue the following command:

db.posts.find( {}, { timestamp : 1 , title : 1 , author : 1 , abstract : 1} ).sort( { timestamp : -1 } )

4) Use $hint to Select a Particular Index

In most cases, the Query Optimizer chooses the best Index for a given operation. You can, however, use the hint() method to persuade MongoDB to use a specific Index. You can also use hint() to aid performance testing or on queries that require you to select a field or fields that are included in multiple Indexes.

5) Use the Increment Operator to Perform Operations Server-Side

To increment or decrement values in Documents, you can use MongoDB’s $inc operator. As an alternative to selecting a Document, making simple changes in the client, and then writing the entire Document to the server, the operator increments the value of the field on the server.

The $inc operator can also aid in the avoidance of race conditions, which occur when two application instances are queried for a document, manually increment a field, and save the entire document back at the same time.

Conclusion

In this article, you have learned about MongoDB Query Performance Analysis. This article also provided information on MongoDB, its key features, conducting MongoDB Query Performance Analysis, and the ways of optimizing Query Performance in detail. For further information on MongoDB Replica Set Configuration, MongoDB Compass Windows Installation, MongoDB Count Method, you can visit the following links.

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.

Visit our Website to Explore Hevo

Hevo Data with its strong integration with 150+ data sources (including 40+ Free Sources) allows you to not only export data from your desired data sources & load it to the destination of your choice but also transform & enrich your data to make it analysis-ready. Hevo also allows integrating data from non-native sources using Hevo’s in-built Webhooks Connector. You can then focus on your key business needs and perform insightful analysis using BI tools. 

Want to give Hevo a try?

Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing Hevo price, which will assist you in selecting the best plan for your requirements.

Share your experience of understanding MongoDB Query Performance Analysis in the comment section below! We would love to hear your thoughts.

Manisha Jena
Research Analyst, Hevo Data

Manisha is a data analyst with experience in diverse data tools like Snowflake, Google BigQuery, SQL, and Looker. She has hadns on experience in using data analytics stack for various problem solving through analysis. Manisha has written more than 100 articles on diverse topics related to data industry. Her quest for creative problem solving through technical content writing and the chance to help data practitioners with their day to day challenges keep her write more.

No-code Data Pipeline for MongoDB