Easy Steps To Master MongoDB Change Streams: Efficient Real-Time Streaming

on ETL, Tutorials • June 16th, 2020 • Write for Hevo

With data growing at the speed of light, modern databases are becoming more and more powerful and hence they need the ability to capture & react to changes as they occur in real-time. This is where the MongoDB Change Streams feature plays a crucial role to make it possible.

MongoDB’s Change Streams feature (available for MongoDB v3.6 & above), allows you to stream data in real-time. Data changes occurring on databases or even on collections can be monitored by applications making use of MongoDB’s Change Streams which is based on its aggregation framework. Thus, it further enhances the database’s real-time capabilities.

This feature is crucial if a database is to accurately depict business activities in use cases like capturing sensor data for an IoT data pipeline or updating enterprise-wide reports such as operational data changes, etc.

Table Of Contents

Introduction To MongoDB

MongoDB Logo.

MongoDB is a popular high-performance NoSQL database that enables you to store your data in a non-relational format. The basic unit of data in MongoDB is a set of key-value pairs that allow documents to have different fields and functions. BSON (Binary JSON) can be used to communicate with the data stored in MongoDB. 

MongoDB stores its data as objects which are commonly identified as documents. These documents are stored in collections, analogous to how tables work in relational databases.

MongoDB is known for its scalability, ease of use, reliability & no compulsion for using a fixed schema among all stored documents, giving them the ability to have varying fields (columns).  

Introduction To MongoDB Change Streams

MongoDB Change Streams

MongoDB Change Streams track real-time data changes across a database, a collection, or an entire deployment, allowing you to immediately react to these changes. It gives users the power to track changes without having to continuously monitor the operations log (oplog). MongoDB Change Streams are built on the aggregation framework which gives the applications using it, the unique ability to not only filter the notifications but also transform them.

Setting up a replica set is a must before using MongoDB Change Streams. A Replica Set is a group of daemon processes for a data set that ensures that your data is distributed and replicated across multiple servers. This ensures data integrity especially in cases of any server failure or experience related issues. 

MongoDB uses a method known as sharding to replicate the data across various machines with a high throughput even when working with large datasets. The nodes resulting from this process are known as sharded clusters. 

Key Features Of MongoDB Change Streams

  • Resumable: The Change Stream response includes resume tokens. These resume tokens can be sent by an application when it loses connection with the database and the Change Stream picks up from that specific point in time.
  • Scalable: Change stream scales across different nodes, and thus it is able to take on high workloads.
  • Ease Of Use: Change Streams are easy to use as they make use of the existing MongoDB drivers and query format. 
  • Filter Capability: It is possible to filter the changes and specify the parts that a user wants to send to the downstream systems.
  • Security: Users are only able to deploy Change Streams on those collections for which they have been granted read access.

Ways To Set Up Real-Time Data Streaming

Method 1: Using MongoDB Change Streams

Making use of Change Streams is one such way to monitor real-time data streaming. MongoDB Change Streams track real-time data changes across a database, a collection, or an entire deployment, allowing you to immediately react to these changes. MongoDB Change Streams are built on the aggregation framework which gives the applications using it, the unique ability to not only filter the notifications but also transform them.

Method 2: Using A No-Code Data Pipeline, Hevo

Hevo Data, a No-Code Data Pipeline solution can help you manage your daily CDC needs in a fully automated & hassle-free manner with minimal supervision. Hevo’s interactive UI & pre-built integration with MongoDB (among 100+ sources) will help you not only monitor the real-time updates but also react to them in an immediate yet simple way. Get started with a 14-day free trial!

Prerequisites

The following are some prerequisites you should be familiar with before following this manual:

  • Working knowledge of MongoDB.
  • MongoDB version 3.6 or greater.
  • A general idea about the concepts of sharding & replication.
  • A general idea about the MongoDB operations log.
MongoDB Oplogs.

Method 1: MongoDB Change Streams In Practice

Let’s have a look at the concepts you will come across here:

1. Availability Of MongoDB Change Streams

These support sharded clusters and replica sets only:

  1. Replica Set Protocol: The replica set version 1, must be used by both sharded clusters and replica sets.
  2. Engine: WiredTiger storage engine is the desired choice that should be used by replica sets and shared clusters. You can learn more about WiredTiger here.
  3. Enabling Read Concern: To make the Change Streams available, you must enable the majority to read concern if you’re using MongoDB 4.0 and earlier. Starting from v4.2 there is no compulsion to enable it, MongoDB Change Streams are available in both modes.

2. Defining A Change Stream

Change Streams can be defined using collection_name.watch() method as follows:

db.collection_name.watch()

The watch method will signal every write to your collection. It accepts aggregation pipelines as its parameters. This enables users to specify parts of the data that they are particularly interested in, thus controlling the Change Stream outputs.

You can make use of most modern language drivers to create and consume Change Streams.

3. Opening A Change Stream

The way a Change Stream is opened depends upon whether you’re working with a sharded cluster or replica set:

  • Replica Set: The open Change Stream operation can be issued with the help of any data-bearing member.
  • Sharded Cluster: The open Change Stream operation must be issued by using the mongos.

Example: Using the following code, you can open the Change Stream and iterate over the cursor to retrieve Change Stream documents for a collection named webinar. This example code is written in python.

cursor = db.webinar.watch()
document = next(cursor)

4. Modifying The Output Of A MongoDB Change Stream

The output of a Change Stream can be tweaked by using one or more pipeline stages while setting up the Change Stream configuration.

Some examples are: $match, $project, $replaceRoot, $set(v4.2 & above), etc.

Example Query: This python query makes use of the $match, $addedFields stages, etc.

pipeline = [
    {'$match': {'fullDocument.username': 'mark'}},
    {'$addFields': {'newField': 'this is an new field!'}}
]
cursor = db.webinar.watch(pipeline=pipeline)
document = next(cursor)

Usually, a part of the document is returned as a response in the output. However, Change Streams configurations can be tweaked to get the entire document:

collection = db.collection("name_collection")
changeStream = collection.watch({ fullDocument: “updateLookup”})

For more information on the MongoDB Change Streams response, you can look into the change events manual.

5. Access Control

It is a must for all the deployments that emphasize on authentication. Applications must have privileges to grant Change Stream & find actions.

  • In order to open a Change Stream for a particular collection, applications need to have privileges that grant actions on that collection.
{ resource: { db: <dbname>, collection: <collection> }, actions: [ "find", "changeStream" ] }
  • In order to open a Change Stream for a database, applications need to have privileges that grant actions on all non-system collections belonging to that database.
{ resource: { db: <dbname>, collection: "" }, actions: [ "find", "changeStream" ] }
  • In order to open a Change Stream for an entire deployment, applications need to have privileges that grant actions on all non-system collections for all databases, that are a part of the deployment.
{ resource: { db: "", collection: "" }, actions: [ "find", "changeStream" ] }

You can log your Change Streams data into the downstream systems & even send them as notifications.

For more information on the operations & usability of MongoDB Change Streams, you can look into MongoDB Change Streams manual.

6. MongoDB Change Stream Recommendations

The following are some recommendations for deployment of Change Streams:

  • Ensure that the documents which represent the Change Stream response adhere to the 16MB limit for BSON documents.
  • Increase the size of the oplog if a significant downtime is anticipated (such as upgrades etc.). This will ensure that the operations are retained for a time longer than the estimated downtime.
  • Consider utilizing filters for sharded collections with high activity levels. This is done to help the instances keep up with all the changes happening across the shards. 

Method 2: Real-Time Streaming Using Hevo

Hevo Data Logo.

Hevo Data, a fully managed No-code Data Pipeline solution can help you meet your daily CDC requirements with ease. Its interactive UI & pre-built integration with MongoDB (among 100+ sources) will help you not only monitor the real-time updates and changes but also react to them in an immediate yet simple way. Hevo’s point and click interface ensures the lowest time to production possible.

Some Key Features Of Hevo

  1. Minimal Setup Time: Hevo can be set up & used by anyone from the team as there is a minimal learning curve involved.
  2. Interactive UI: Owing to its simple point & click interface, the user can connect to a source of their choice in a matter of minutes. This allows the user to interact and maintain their data in real-time.
  3. Incremental Data Load: Hevo allows to transfer data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  4. Huge Source Platform Support: Hevo can help you bring in data from 100’s of sources thereby making it the ideal partner for your business’s growing data needs.
  5. Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
  6. Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.

Experience the power and simplicity by signing up for a 14-day free trial with Hevo.

Conclusion

This article outlines how to master the skill of utilizing the power of MongoDB Change Streams to monitor the changes as they happen & act upon them practically in real-time with no hassle. With data growing at an exponential rate, in real-life situations handling such humongous amounts of data can be grueling and this is where Hevo comes into the picture offering its users a fully automated, no-code solution that helps monitor data quite easily with minimal supervision.

Want to give Hevo a try? Get started by signing up for a 14-day free trial and experience the feature-rich Hevo suite first hand. Have a look at our unbeatable pricing, that will help you choose the right plan for you.

We would love to hear from you about your preferred way to manage your CDC needs. Share your thoughts in the comments section below.

A No-Code Data Pipeline For MongoDB