With data growing at the speed of light, modern databases are becoming more and more powerful and hence they need the ability to capture & react to changes as they occur in real-time. This is where the MongoDB Change Streams feature plays a crucial role to make it possible.

MongoDB Change Streams feature (available for MongoDB v3.6 & above), allows you to stream data in real-time. Data changes occurring on databases or even on collections can be monitored by applications making use of MongoDB’s Change Streams which is based on its aggregation framework. Thus, it further enhances the database’s real-time capabilities.

This feature is crucial if a database is to accurately depict business activities in use cases like capturing sensor data for an IoT data pipeline or updating enterprise-wide reports such as operational data changes, etc.

What Is MongoDB?

MongoDB Change Streams- MongoDB Logo.

MongoDB is a popular high-performance NoSQL database that enables you to store your data in a non-relational format. The basic unit of data in MongoDB is a set of key-value pairs that allow documents to have different fields and functions. BSON (Binary JSON) can be used to communicate with the data stored in MongoDB. 

MongoDB stores its data as objects, which are commonly identified as documents. These documents are stored in collections, analogous to how tables work in relational databases.

MongoDB is known for its scalability, ease of use, reliability & no compulsion for using a fixed schema among all stored documents, giving them the ability to have varying fields (columns).  

What Is MongoDB Change Streams?

MongoDB Change Streams- MongoDB Change Streams

MongoDB Change Streams track real-time data changes across a database, a collection, or an entire deployment, allowing you to immediately react to these changes. It gives users the power to track changes without having to continuously monitor the operations log (oplog).

MongoDB Change Streams are built on the aggregation framework which gives the applications using it, the unique ability to not only filter the notifications but also transform them.

Setting up a replica set is a must before using MongoDB Change Streams. A Replica Set is a group of daemon processes for a data set that ensures that your data is distributed and replicated across multiple servers. This ensures data integrity, especially in cases of any server failure or experience-related issues. 

MongoDB uses a method known as sharding to replicate the data across various machines with a high throughput even when working with large datasets. The nodes resulting from this process are known as sharded clusters. 

Key Features Of MongoDB Change Streams

  • Resumable: The Change Stream response includes resume tokens. These resume tokens can be sent by an application when it loses connection with the database and the Change Stream picks up from that specific point in time.
  • Scalable: Change stream scales across different nodes, and thus it is able to take on high workloads.
  • Ease Of Use: Change Streams are easy to use as they make use of the existing MongoDB drivers and query format. 
  • Filter Capability: It is possible to filter the changes and specify the parts that a user wants to send to the downstream systems.
  • Security: Users are only able to deploy Change Streams on those collections for which they have been granted read access.
Download the Ultimate Guide on Database Replication
Download the Ultimate Guide on Database Replication
Download the Ultimate Guide on Database Replication
Learn the 3 ways to replicate databases & which one you should prefer.
Ways To Set Up Real-Time Data Streaming In MongoDB

Method 1: Using MongoDB Change Streams

Making use of Change Streams is one such way to monitor real-time data streaming. MongoDB Change Streams track real-time data changes across a database, a collection, or an entire deployment, allowing you to immediately react to these changes.

MongoDB Change Streams are built on the aggregation framework which gives the applications using it, the unique ability to not only filter the notifications but also transform them.

Method 2: Using A No-Code Data Pipeline, Hevo

Hevo Data, a No-Code Data Pipeline solution can help you manage your daily CDC needs in a fully automated & hassle-free manner with minimal supervision. Hevo’s interactive UI & pre-built integration with MongoDB (among 150+ sources) will help you not only monitor the real-time updates but also react to them in an immediate yet simple way.

Sign up here for a 14-day Free Trial!

Prerequisites

The following are some prerequisites you should be familiar with before following this manual:

  • Working knowledge of MongoDB.
  • MongoDB version 3.6 or greater.
  • A general idea about the concepts of sharding & replication.
  • A general idea about the MongoDB operations log.
MongoDB Change Streams- MongoDB Oplogs.
Image Source: MongoDB

Method 1: MongoDB Change Streams In Practice

Let’s have a look at the concepts you will come across here:

1. Availability Of MongoDB Change Streams

These support sharded clusters and replica sets only:

  1. Replica Set Protocol: The replica set version 1, must be used by both sharded clusters and replica sets.
  2. Engine: WiredTiger storage engine is the desired choice that should be used by replica sets and shared clusters. You can learn more about WiredTiger here.
  3. Enabling Read Concern: To make the Change Streams available, you must enable the majority to read concern if you’re using MongoDB 4.0 and earlier. Starting from v4.2 there is no compulsion to enable it, MongoDB Change Streams are available in both modes.

2. Defining A Change Stream

Change Streams can be defined using collection_name.watch() method as follows:

db.collection_name.watch()

The watch method will signal every write to your collection. It accepts aggregation pipelines as its parameters. This enables users to specify parts of the data that they are particularly interested in, thus controlling the Change Stream outputs.

You can make use of most modern language drivers to create and consume Change Streams.

3. Opening A Change Stream

The way a Change Stream is opened depends upon whether you’re working with a sharded cluster or replica set:

  • Replica Set: The open Change Stream operation can be issued with the help of any data-bearing member.
  • Sharded Cluster: The open Change Stream operation must be issued by using the mongos.

Example: Using the following code, you can open the Change Stream and iterate over the cursor to retrieve Change Stream documents for a collection named webinar. This example code is written in python.

cursor = db.webinar.watch()
document = next(cursor)

4. Modifying The Output Of A MongoDB Change Stream

The output of a Change Stream can be tweaked by using one or more pipeline stages while setting up the Change Stream configuration.

Some examples are: $match, $project, $replaceRoot, $set(v4.2 & above), etc.

Example Query: This python query makes use of the $match, $addedFields stages, etc.

pipeline = [
    {'$match': {'fullDocument.username': 'mark'}},
    {'$addFields': {'newField': 'this is an new field!'}}
]
cursor = db.webinar.watch(pipeline=pipeline)
document = next(cursor)

Usually, a part of the document is returned as a response in the output. However, Change Streams configurations can be tweaked to get the entire document:

collection = db.collection("name_collection")
changeStream = collection.watch({ fullDocument: “updateLookup”})

For more information on the MongoDB Change Streams response, you can look into the change events manual.

5. Access Control

It is a must for all the deployments that emphasize on authentication. Applications must have privileges to grant Change Stream & find actions.

  • In order to open a Change Stream for a particular collection, applications need to have privileges that grant actions on that collection.
{ resource: { db: <dbname>, collection: <collection> }, actions: [ "find", "changeStream" ] }
  • In order to open a Change Stream for a database, applications need to have privileges that grant actions on all non-system collections belonging to that database.
{ resource: { db: <dbname>, collection: "" }, actions: [ "find", "changeStream" ] }
  • In order to open a Change Stream for an entire deployment, applications need to have privileges that grant actions on all non-system collections for all databases, that are a part of the deployment.
{ resource: { db: "", collection: "" }, actions: [ "find", "changeStream" ] }

You can log your Change Streams data into the downstream systems & even send them as notifications.

For more information on the operations & usability of MongoDB Change Streams, you can look into MongoDB Change Streams manual.

6. MongoDB Change Stream Recommendations

The following are some recommendations for the deployment of MongoDB Change Streams:

  • Ensure that the documents which represent the Change Stream response adhere to the 16MB limit for BSON documents.
  • Increase the size of the oplog if a significant downtime is anticipated (such as upgrades etc.). This will ensure that the operations are retained for a time longer than the estimated downtime.
  • Consider utilizing filters for sharded collections with high activity levels. This is done to help the instances keep up with all the changes happening across the shards. 

Method 2: Real-Time Streaming Using Hevo

MongoDB Change Streams- Hevo logo.
Image Source: Hevo Data

Hevo Data, a fully managed No-code Data Pipeline solution can help you meet your daily CDC requirements with ease. Its interactive UI & pre-built integration with MongoDB (among 100+ sources) will help you not only monitor the real-time updates and changes but also react to them in an immediate yet simple way. Hevo’s point and click interface ensure the lowest time to production possible.

Some Key Features Of Hevo

  1. Minimal Setup Time: Hevo can be set up & used by anyone from the team, as there is a minimal learning curve involved.
  2. Interactive UI: Owing to its simple point & click interface, the user can connect to a source of their choice in a matter of minutes. This allows the user to interact and maintain their data in real-time.
  3. Incremental Data Load: Hevo allows transferring of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  4. Huge Source Platform Support: Hevo can help you bring in data from 100’s of sources, thereby making it the ideal partner for your business’s growing data needs.
  5. Live Monitoring: Hevo allows you to monitor the data flow, so you can check where your data is at a particular point in time.
  6. Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss
Get Started with Hevo for free

Conclusion

This article outlines how to master the skill of utilizing the power of MongoDB Change Streams to monitor the changes as they happen & act upon them practically in real-time with no hassle. With data growing at an exponential rate, in real-life situations, handling such humongous amounts of data can be grueling. This is where Hevo comes into the picture, offering its users a fully automated, No-code Solution that helps monitor data quite easily with minimal supervision.

Hevo Data, a No-code Data Pipeline, provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.

Visit our Website to Explore Hevo

Hevo Data with its strong integration with 100+ sources (including 30+ free sources) like MongoDB allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools. 

Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing price, which will assist you in selecting the best plan for your requirements.

We would love to hear from you about your experiences of learning about MongoDB Change Streams. Share your thoughts in the comments section below.

Rashid Y
Freelance Technical Content Writer, Hevo Data

Rashid is passionate about freelance writing within the data industry, and delivers informative and engaging content on data science by incorporating his problem-solving skills. He finds joy in breaking down complex topics for helping data practitioners solve their everyday problems.

A No-Code Data Pipeline For MongoDB