MongoDB Configuration 101: Best Practices to Optimize your Files & Services

|

mongodb configuration - featured image

MongoDB is a popular NoSQL database. Unlike relational database management systems like MySQL, MongoDB doesn’t group data into rows and columns. Instead, it uses JSON-like documents to store data. This makes it possible for MongoDB to store different types of data, even those that can’t fit in relations. MongoDB also scales well and offers a high performance to its users. That’s why it is one of the most preferred NoSQL database management systems by application developers. 

However, just like other database management systems, a number of issues may occur and downgrade the performance of MongoDB. This may have a negative effect on the performance of your application. That’s why you should learn how to fine-tune your MongoDB Configuration to get the best performance. 

In this article, you will learn how to check the MongoDB configuration settings and apply the best practices that can help you to optimize your files and services. 

Table of Contents

What is MongoDB?

MongoDB is a popular free and open-source cross-platform document-oriented database built for efficiently storing and processing massive volumes of data. Unlike traditional relational databases, MongoDB is classified as a NoSQL Database Management System that uses Collections and JSON-like Documents instead of tables consisting of rows and columns. Each collection consists of multiple documents that contain the basic units of data in terms of key and value pairs. 

Officially introduced as an open-source development model in 2009, the MongoDB database is designed, maintained, and managed by MongoDB.Inc under a combination of the Server Side Public License and the Apache License. MongoDB is widely used by organizations such as MetLife, Barclays, Viacom,  New York Times, Facebook, Nokia, eBay, Adobe, Google, etc to efficiently meet their exponentially growing data processing and storage requirements. MongoDB is highly flexible as it supports several programming languages such as C, C++, C#, Go, Java, Node.js, Perl, PHP, Python, Motor, Ruby, Scala, Swift, and Mongoid.

Key Features of MongoDB

mongodb configuration - mongodb features
Image Source

With constant efforts from the online community, MongoDB has evolved over the years. Some of its eye-catching features are:

  • High Data Availability & Stability: MongoDB’s Replication feature provides multiple servers for disaster recovery and backup. Since several servers store the same data or shards of data, MongoDB provides greater data availability & stability. This ensures all-time data access and security in case of server crashes, service interruptions, or even good old hardware failure. 
  • Accelerated Analytics: You may need to consider thousands to millions of variables while running Ad-hoc queries. MongoDB indexes BSON documents and utilizes the MongoDB Query Language (MQL) that allows you to update Ad-hoc queries in real-time. MongoDB provides complete support for field queries, range queries, and regular expression searches along with user-defined functions.
  • Indexing: With a wide range of indices and features with language-specific sort orders that support complex access patterns to datasets, MongoDB provides optimal performance for every query. For the real-time ever-evolving query patterns and application requirements, MongoDB also provisions On-demand Indices Creation.
  • Horizontal Scalability: With the help of Sharding, MongoDB provides horizontal scalability by distributing data on multiple servers using the Shard Key. Each shard in every MongoDB Cluster stores parts of the data, thereby acting as a separate database. This collection of comprehensive databases allows efficient handling of growing volumes of data with zero downtime. The complete Sharding Ecosystem is maintained and managed by Mongos that directs queries to the correct shard based on the Shard Key.
  • Load Balancing: Real-time Replication and Sharding contribute towards large-scale Load Balancing. Ensuring top-notch Concurrency Controls and Locking Protocols, MongoDB can effectively handle multiple concurrent read and write requests for the same data.  
  • Aggregation: Similar to the SQL Group By clause, MongoDB can easily batch process data and present a single result even after executing several other operations on the group data. MongoDB’s Aggregation framework consists of 3 types of aggregations i.e. Aggregation Pipeline, Map-Reduce Function, and Single-Purpose Aggregation methods.
Simplify MongoDB ETL with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports MongoDB & MongoDB Atlas, along with 150+ data sources (Including 40+ Free Data Sources), and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Get Started with Hevo for Free

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Connectors: Hevo supports 150+ Integrations to SaaS platforms such as WordPress, FTP/SFTP, Files, Databases, BI tools, and Native REST API & Webhooks Connectors. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake, Firebolt, Data Warehouses; Amazon S3 Data Lakes; Databricks, MySQL, SQL Server, TokuDB, MongoDB, DynamoDB, PostgreSQL Databases to name a few.  
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

How to Optimize MongoDB Configuration?

MongoDB provides the following settings to set up your MongoDB Configuration for optimal performance:

1. MongoDB Configuration: Locking Performance

This is an important part of MongoDB configuration. Database receives multiple reads, writes, and updates from different users. These operations are not done sequentially, and one user may access data that another user is in the middle of updating. This can lead to conflicts. To solve such issues, databases introduced the concept of locks for locking documents and collections. 

When a lock is initiated, no other user can read or modify the data until the lock is released. Although this feature is good for avoiding conflicts, it can degrade the database performance. 

The good news is that MongoDB comes with useful metrics that can help you to check whether locking is degrading your database performance. The two common ones are globalLock and locks of the db.serverStatus() command:

  • db.serverStatus().globalLock
mongodb configuration - global lock
Image Source: Self
  • db.serverStatus().locks
mongodb configuration - locks
Image Source: Self

If the value of the currentQueue parameter is too high, it could be an indication of concurrency. If the value of totalTime parameter is higher than the total database uptime, it means that the database has been in a lock state for a long time. 

With these two parameters alone, you can investigate the request that has created a lock and take the necessary action to improve the performance of MongoDB. 

2. MongoDB Configuration: WiredTiger Cache

MongoDB’s MMAPv1 storage is deprecated, and there are plans to remove it in future releases. Thus, it is advisable to move any MMAPv1 storage engine to the modern WiredTiger storage engine. The latter is better when it comes to concurrency handling and performance. It also offers encryption and compression. 

By default, MongoDB reserves 50% of memory for WiredTiger data cache. The cache size is important in ensuring that WiredTiger performs well. As part of the MongoDB configuration, you should check to see if there is a need to alter its default size. The cache size should be big enough to hold the whole application’s working set. 

Run the following command to check the cache usage statistics:

db.serverStatus().wiredTiger.cache

The command will return so much data, but you can focus on a few fields including:

  • wiredTiger.cache.maximum bytes configured: The maximum size of the cache. 
  • wiredTiger.cache.bytes currently in the cache: The size of the data currently stored in the cache. It should be less than the size of the above parameter. 
  • wiredTiger.cache.tracked dirty bytes in the cache: The size of the dirty data stored in the cache. Its value should be less than that of bytes currently in the cache. 

The sizes of the above parameters should tell you whether you should increase the size of the cache or not. For read-heavy applications, you can consider the wiredTiger.cache.bytes read into cache parameter. If it has a high value, increasing the size of the cache may improve the read performance. 

3. MongoDB Configuration: MongoDB Logging

The location of the MongoDB log is defined in the logpath setting, and it is always /var/log/mongodb/mongod.log. The MongoDB configuration file can be found at /etc/mongod.conf

The following query can help you to change the log verbosity of a component:

db.setLogLevel(2, "query")

The log file is significant and you may need to clear it before doing profiling. You only have to run the following command:

db.runCommand({  logRotate : 1  });

4. MongoDB Configuration: Free Performance Monitoring

MongoDB has introduced a free performance monitoring feature for replica sets and standalone instances in the cloud. When you enable this feature during MongoDB configuration, the monitored data will be sent to the cloud service periodically. You don’t need any additional agents to use this feature. 

The configuration process only takes a single command, after which you will be given a web address where you can access the performance stats. To enable free monitoring during runtime, run the following command:

db.enableFreeMonitoring()
mongodb configuration - enable free monitoring
Image Source: Self

Just copy the URL provided in the output and paste it on your web browser. You will be able to monitor performance statistics after a single MongoDB configuration command. 

mongodb configuration - performance statistics
Image Source

The dashboard will show you metrics such as operation execution time, disk utilization, memory, system CPU usage, query targeting, and more. 

You can disable this feature via this command:

db.disableFreeMonitoring()

You can also enable and disable the feature during MongoDB startup. You can use the enableFreeMonitoring command-line option or cloud.monitoring.free.state configuration file setting for this. These are some of the important features to consider during MongoDB configuration. 

Conclusion

In this article, you have learned about some of the important MongoDB Configuration settings that you can fine-tune for optimal performance. MongoDB is a popular NoSQL database management system. It scales well and offers a high performance to its users. However, just like with other database management systems, issues may arise when using MongoDB, and these can downgrade its performance. Thus, you should learn how to do MongoDB configuration for various metrics to fine-tune its performance.

To avoid conflicts, MongoDB uses the concept of locks. MongoDB comes with metrics that you can use to check whether locking is downgrading your database performance. You should also check to see if there is a need to alter the size of memory allocated to the WiredTiger data cache. To monitor the performance statistics of MongoDB on the web browser, you can enable its free performance monitoring feature. This feature can be enabled during startup or during runtime. 

To get a complete overview of your business performance, it is essential to consolidate data from MongoDB and all the other applications used across your firm. To achieve this you need to assign a portion of your engineering bandwidth to Integrate data from all sources, Clean & Transform it, and finally, Load it to a Cloud Data Warehouse or a destination of your choice for further Business Analytics. All of these challenges can be comfortably solved by a Cloud-based ETL tool such as Hevo Data.   

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline can seamlessly transfer data from a vast sea of 150+ sources such as MongoDB & MongoDB Atlas to a Data Warehouse or a Destination of your choice to be visualized in a BI Tool. It is a reliable, completely automated, and secure service that doesn’t require you to write any code!  

If you are using MongoDB as your NoSQL Database Management System and searching for a no-fuss alternative to Manual Data Integration, then Hevo can effortlessly automate this for you. Hevo, with its strong integration with 150+ Data sources & BI tools(Including 40+ Free Sources), allows you to not only export & load data but also transform & enrich your data & make it analysis-ready in a jiffy.

Want to take Hevo for a ride? Sign Up for a 14-day free trial and simplify your Data Integration process. Do check out the Hevo’s pricing details to understand which plan fulfills all your business needs.

Tell us about your experience of learning about the optimal MongoDB Configuration settings! Share your thoughts with us in the comments section below.

Nicholas Samuel
Technical Content Writer, Hevo Data

Skilled in freelance writing within the data industry, Nicholas is passionate about unraveling the complexities of data integration and data analysis through informative content for those delving deeper into these subjects. He has written more than 150+ blogs on databases, processes, and tutorials that help data practitioners solve their day-to-day problems.

No-code Data Pipeline for MongoDB