With data growing at the speed of light, modern databases are becoming more and more powerful, and hence, they need the ability to capture & react to changes as they occur in real-time. This is where the MongoDB Change Streams feature plays a crucial role in making it possible.

MongoDB Change Streams feature (available for MongoDB v3.6 & above), allows you to stream data in real-time. Data changes occurring on databases or even on collections can be monitored by applications making use of MongoDB’s Change Streams which is based on its aggregation framework. Thus, it further enhances the database’s real-time capabilities.

This feature is crucial if a database is to accurately depict business activities in use cases like capturing sensor data for an IoT data pipeline or updating enterprise-wide reports such as operational data changes, etc.

What Is MongoDB?

MongoDB Change Streams- MongoDB Logo.

MongoDB is a popular high-performance NoSQL database that enables you to store your data in a non-relational format. The basic unit of data in MongoDB is a set of key-value pairs that allow documents to have different fields and functions. BSON (Binary JSON) can be used to communicate with the data stored in MongoDB.

MongoDB stores its data as objects, which are commonly identified as documents. These documents are stored in collections, analogous to how tables work in relational databases.
MongoDB is known for its scalability, ease of use, reliability & no compulsion to use a fixed schema among all stored documents, giving them the ability to have varying fields (columns).  

See how to install MongoDB on Ubuntu.

What Is MongoDB Change Streams?

MongoDB Change Streams- MongoDB Change Streams
  • MongoDB Change Streams track real-time data changes across a database, a collection, or an entire deployment, allowing you to immediately react to these changes. It gives users the power to track changes without having to continuously monitor the operations log (oplog).
  • MongoDB Change Streams are built on the aggregation framework, which gives the applications using it the unique ability to not only filter the notifications but also transform them.
  • Setting up a replica set is a must before using MongoDB Change Streams. A Replica Set is a group of daemon processes for a data set that ensures that your data is distributed and replicated across multiple servers. This ensures data integrity, especially in cases of server failure or experience-related issues. 
  • MongoDB uses a method known as sharding to replicate the data across various machines with high throughput, even when working with large datasets. The nodes resulting from this process are known as sharded clusters. 
Accomplish Seamless Data Migration from MongoDB

Looking for the best no-code tool to connect your MongoDB source? Rest assured, Hevo’s no-code platform helps streamline your data from MongoDB and other sources to a destination of your choice. Try Hevo and equip your team to: 

Try Hevo and discover why 2000+ customers have chosen Hevo to upgrade to a modern data stack.

Get Started with Hevo for Free

Prerequisites

The following are some prerequisites you should be familiar with before following this manual:

  • Working knowledge of MongoDB.
  • MongoDB version 3.6 or greater.
  • A general idea about the concepts of sharding & replication.
  • A general idea about the MongoDB operations log.

MongoDB Change Streams In Practice

Let’s have a look at the concepts you will come across here:

1. Availability Of MongoDB Change Streams

These support sharded clusters and replica sets only:

  1. Replica Set Protocol: The replica set, version 1, must be used by both sharded clusters and replica sets.
  2. Engine: The WiredTiger storage engine is the desired choice for replica sets and shared clusters. db.runCommand({ getParameter: 1, wiredTiger: 1 })
  3. Enabling Read Concern: To make the Change Streams available, you must enable the majority to read concern if you’re using MongoDB 4.0 and earlier. Starting from v4.2 there is no compulsion to enable it, MongoDB Change Streams are available in both modes.
// Set read concern for a collection
db.runCommand({
    collMod: "yourCollectionName",
    readConcern: { level: "majority" }
});

2. Defining A Change Stream

Change Streams can be defined using collection_name.watch() method as follows:

db.collection_name.watch()

The watch method will signal every write to your collection. It accepts aggregation pipelines as its parameters.

3. Opening A Change Stream

The way a Change Stream is opened depends upon whether you’re working with a sharded cluster or replica set:

  • Replica Set: The open Change Stream operation can be issued with the help of any data-bearing member.
  • Sharded Cluster: The open Change Stream operation must be issued using Mongos.
  • Example: Using the following code, you can open the Change Stream and iterate over the cursor to retrieve Change Stream documents for a collection named webinar. This example code is written in Python.
cursor = db.webinar.watch()
document = next(cursor)

4. Modifying The Output Of A MongoDB Change Stream

  • The output of a Change Stream can be tweaked by using one or more pipeline stages while setting up the Change Stream configuration.
  • Some examples are $match, $project, $replaceRoot, $set(v4.2 & above), etc.
  • Example Query: This Python query makes use of the $match, $addedFields stages, etc.
pipeline = [
    {'$match': {'fullDocument.username': 'mark'}},
    {'$addFields': {'newField': 'this is an new field!'}}
]
cursor = db.webinar.watch(pipeline=pipeline)
document = next(cursor)
  • Usually, a part of the document is returned as a response in the output. However, Change Streams configurations can be tweaked to get the entire document:
collection = db.collection("name_collection")
changeStream = collection.watch({ fullDocument: “updateLookup”})

For more information on the MongoDB Change Streams response, you can look into the change events manual.

5. Access Control

It is a must for all the deployments that emphasize on authentication. Applications must have privileges to grant Change Stream & find actions.

  • In order to open a Change Stream for a particular collection, applications need to have privileges that grant actions on that collection.
{ resource: { db: <dbname>, collection: <collection> }, actions: [ "find", "changeStream" ] }
  • In order to open a Change Stream for a database, applications need to have privileges that grant actions on all non-system collections belonging to that database.
{ resource: { db: <dbname>, collection: "" }, actions: [ "find", "changeStream" ] }
  • In order to open a Change Stream for an entire deployment, applications need to have privileges that grant actions on all non-system collections for all databases, that are a part of the deployment.
{ resource: { db: "", collection: "" }, actions: [ "find", "changeStream" ] }

You can log your Change Streams data into the downstream systems & even send them as notifications.

For more information on the operations & usability of MongoDB Change Streams, you can look into MongoDB Change Streams manual.

6. MongoDB Change Stream Recommendations

The following are some recommendations for the deployment of MongoDB Change Streams:

  • Ensure that the documents that represent the Change Stream response adhere to the 16MB limit for BSON documents.
  • Increase the size of the oplog if a significant downtime is anticipated (such as upgrades etc.). This will ensure that the operations are retained for a time longer than the estimated downtime.
  • Consider utilizing filters for shared collections with high activity levels. This is done to help the instances keep up with all the changes happening across the shards. 

Real-life Use Cases

1. Real-time Analytics

Use case: An e-commerce website can track events of adding items to the cart or purchases. The site can directly update dashboards on analytics without running any batch jobs, thanks to Change Streams. 

2. Data Synchronization

Use Case: If an organization is using multiple databases, then the change streams can keep them in sync. For example, if data gets updated in one of the databases, a change stream in it can bring in updates in another, so all the systems will be synced with the latest information.

3. Notifications and Alerts

Use Case: A messaging app will employ Change Streams to immediately deliver the message notification to the user. Once a message has been sent, the Change Stream identifies this change and triggers a delivery notification to the recipient in real time.

4. Workflow Automation

Use Case: As soon as the status of a task changes in a project management tool, a Change Stream can trigger that automation by sending out notifications to team members or updating relevant tasks, thereby making the workflows of the project stream aligned.

5. Health Monitoring

Use Case: Change Streams can be used to track database operations and changes for auditing. In cases of error, alerts can be triggered and allow system administrators to respond quickly. 

6. Building Event-Driven Applications

Use Case: A social networking site would be able to respond to user interactions: likes and comments. Change Streams lets the application update feeds dynamically, which means better engagement for users.

7. Data Warehousing and ETL

Use Case: Use Change Streams of Enterprise to collect data from MongoDB for data warehousing. When the data changes, it can be pushed into some data warehouse for reporting and analysis while keeping business intelligence up-to-date.

Integrate MongoDB to BigQuery
Integrate MongoDB to Redshift
Integrate MongoDB to Snowflake

Hevo Data, a fully managed No-code Data Pipeline solution can help you meet your daily CDC requirements with ease. Its interactive UI & pre-built integration with MongoDB (among 100+ sources) will help you not only monitor the real-time updates and changes but also react to them in an immediate yet simple way. Hevo’s point and click interface ensure the lowest time to production possible.

Some Key Features Of Hevo

  1. Minimal Setup Time: Hevo can be set up & used by anyone from the team, as there is a minimal learning curve involved.
  2. Interactive UI: Owing to its simple point & click interface, the user can connect to a source of their choice in a matter of minutes. This allows the user to interact and maintain their data in real-time.
  3. Incremental Data Load: Hevo allows transferring of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  4. Huge Source Platform Support: Hevo can help you bring in data from 150+ sources, thereby making it the ideal partner for your business’s growing data needs.
  5. Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
  6. Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.

Conclusion

This article outlines how to master the skill of utilizing the power of MongoDB Change Streams to monitor the changes as they happen & act upon them practically in real-time with no hassle. With data growing at an exponential rate, in real-life situations, handling such humongous amounts of data can be grueling. This is where Hevo comes into the picture, offering its users a fully automated, No-code Solution that helps monitor data quite easily with minimal supervision.

Hevo Data, a No-code Data Pipeline, provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.

Hevo Data with its strong integration with 150+ sources (including 60+ free sources) like MongoDB allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools. 

Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing price, which will assist you in selecting the best plan for your requirements.

We would love to hear from you about your experiences of learning about MongoDB Change Streams. Share your thoughts in the comments section below.

FAQs

1. What is the difference between Oplog and change stream?

Oplog is a specialized MongoDB collection that maintains all the modifications to the documents in the replica set primarily for replication. Changes Streams offers an access method to real-time notifications about the changes within the database, enabling the applications to respond in real time to the changes.

2. How do I change fields in MongoDB?

You can use the updateOne() or updateMany() methods with the $set operator in MongoDB to replace fields. This is an operator that lets you specify which fields you’d like to modify as well as the new values for them.

3. What is the difference between resumeAfter and startAfter in MongoDB?

resumeAfter resumes a Change Stream from a specific change event after an interruption, whereas StartAfter changes processing after a change event does not resume from there in case of an interruption.

Rashid Y
Technical Content Writer, Hevo Data

Rashid is a technical content writer with a passion for the data industry. Leveraging his problem-solving skills, he delivers informative and engaging content on data science. With a deep understanding of complex data concepts and a talent for clear, compelling communication, Rashid creates content that informs and captivates his audience.