Setting up Kafka MongoDB Connection: 5 Easy Steps

on Data Integration, ETL, Tutorials • October 7th, 2021 • Write for Hevo

Kafka MongoDB Connection

Do you want to transfer your MongoDB data using Kafka? Are you finding it challenging to set up a Kafka MongoDB Connection? Well, look no further! This article will answer all your queries & relieve you of the stress of finding a truly efficient solution. Follow our easy step-by-step guide to help you master the skill of efficiently transferring your data with the help of MongoDB Kafka Connector.

It will help you take charge in a hassle-free way without compromising efficiency. This article aims at making the data export process as smooth as possible.

Upon a complete walkthrough of the content, you will be able to successfully set up a Kafka MongoDB Connection to seamlessly transfer data to the destination of your choice for a fruitful analysis in real-time. It will further help you build a customized ETL pipeline for your organization. Through this article, you will get a deep understanding of the tools and techniques & thus, it will help you hone your skills further.

Table of Contents

What is Kafka?

Kafka MongoDB Connection: Kafka Logo.
Image Source

Apache Kafka is an open-source message queue that helps publish & subscribe high volumes of messages in a distributed manner. It makes use of the leader-follower concept, allowing users to replicate messages in a fault-tolerant way and further allows to segment & store messages as Kafka Topics depending upon the subject. Kafka allows setting up real-time streaming data pipelines & applications to transform the data and stream data from source to target.

Key features of Kafka:

  • Scalability: Kafka has exceptional scalability and can be scaled easily without downtime.
  • Data Transformation: Kafka offers KStream and KSQL (in case of Confluent Kafka) for on the fly data transformation.
  • Fault-Tolerant: Kafka uses brokers to replicate data and persists the data to make it a fault-tolerant system.
  • Security: Kafka can be combined with various security measures like Kerberos to stream data securely.
  • Performance: Kafka is distributed, partitioned, and has a very high throughput for publishing and subscribing to the messages.

For further information on Kafka, you can check the official website here.

Simplify Your MongoDB & Kafka ETL Using Hevo’s No-code Data Pipeline

Hevo Data, an Automated No-code Data Pipeline, helps you directly transfer data from MongoDB and Kafka to Data Warehouses, Databases, or any other destination of your choice in a completely hassle-free manner. Hevo provides a one-stop solution for all Kafka and MongoDB use cases. Hevo initializes a connection with Kafka Bootstrap Servers and collects the data stored in their Topics & Clusters. Moreover, with Hevo, you can either Poll your data using MongoDB’s OpLog or perform Real-time streaming of data using MongoDB’s Change Streams.

Get Started With Hevo For Free

Hevo is fully managed and completely automates the process of not only loading data from 100+ data sources (including 40+ free sources) sources but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure and flexible manner with zero data loss. Hevo’s consistent & reliable solution to manage data in real-time allows you to focus more on Data Analysis, instead of Data Consolidation. 

What is MongoDB?

Kafka MongoDB Connection: MongoDB Logo.
Image Source

MongoDB is an open-source NoSQL database that uses a document-oriented data model to store data and allows you to query data using the NoSQL query language. MongoDB is widely used among organizations and is one of the most potent NoSQL databases in the market.

MongoDB, being a NoSQL database, doesn’t use the concept of rows and columns to store the data; instead, it stores data as key-value pairs in the form of documents(analogous to records) and maintains all these documents in collections(tables). All MongoDB documents are of the BSON (Binary Style of JSON document) format.

MongoDB allows you to modify the schemas without having any downtime. It is highly elastic and hence, lets you combine and store multivariate data types without having to compromise on the powerful indexing & data access options and validation rules.

For further information on MongoDB, you can check the official website here.

Why do we need Kafka MongoDB Connection?

Kafka MongoDB Connection allows seamless data transfer from MongoDB as a Source to other destinations or another MongoDB source. Kafka streams data from MongoDB in a hassle-free manner. Enterprises need to publish and send high volumes of data to many other subscribers or manage multiple streams to MongoDB. MongoDB is widely used in the industry, and Kafka helps developers layout multiple asynchronous streams using Kafka MongoDB Connection. 

MongoDB doesn’t support the selective stream, and Kafka MongoDB Connection can help developers set up a stream that sends selective data from MongoDB for Business Intelligence tools, applications, or Reporting tools.

Prerequisites

To set up the Kafka MongoDB connection, you need:

  • Working knowledge of MongoDB.
  • Working knowledge of Kafka.
  • MongoDB installed at the host workstation.
  • Kafka installed at the host workstation.

Steps to set up the Kafka MongoDB Connection

Kafka supports connecting with MongoDB and numerous other NoSQL databases with the help of in-built connectors provided by Confluent Hub. These connectors help bring in data from a source of your choice to Kafka and then stream it to the destination of your choice from Kafka Topics. Similarly, there are many connectors for MongoDB that help establish a Kafka MongoDB Connection.

You can set up the Kafka MongoDB Connection with the Debezium MongoDB connector using the following steps:

Step 1: Installing Kafka

To start setting up the Kafka MongoDB Connection, you will have to download and install Kafka, either on standalone or distributed mode. You can check out the following links & follow Kafka’s official documentation, that will help you get started with the installation process:

Step 2: Installing the Debezium MongoDB Connector for Kafka

Confluent provides users with a diverse set of in-built connectors that act as the data source and sink, and help users transfer their data via Kafka. One such connector that lets users establish Kafka MongoDB connection is the Debezium MongoDB Connector.

To install the Debezium MongoDB connector, go to Confluent Hub’s official website and search for MongoDB, using the search bar found at the top of your screen. You can also click here to locate the connector on Confluent Hub with ease.

Kafka MongoDB Connection: Downloading the Debezium MongoDB Connector.
Image Source: Self

Once you’ve found the desired MongoDB connector, click on the download button. A zip file will now start downloading on your system. You now need to extract the zip file and copy all jar files, found in the lib folder to your Confluent installation.

Kafka MongoDB Connection: Kafka MongoDB Connection: Copying Jar Files.
Image Source: Self

This is how you can install the Debezium MongoDB connector to start setting up a Kafka MongoDB Connection.

Step 3: Adding Jar Files to the Class-Path & Starting Confluent

Once you have all the relevant jar files, you need to put them into the class-path to allow the application to recognise them and execute them accordingly. To do this, open the Bash_profile file using the following line of code:

vi ~/.bash_profile

Modify the file by adding the following lines and then save it to bring the changes into effect.

#for CLASSPATH
CLASSPATH=/Users/software/confluent-5.0.0/debezium-debezium-connector-mongodb-0.8.1/*
export CLASSPATH
PATH=$PATH:/usr/local/sbin
export PATH

Once you’ve made the changes, source the Bash_profile file as follows:

source ~/.bash_profile

Once you’ve made the necessary modifications, you now need to ensure that you have Confluent Kafka set up and it’s running on your system. In case you don’t have Kafka running on your system, you can use the following lines of code to start Zookeeper, Kafka, and Schema Registry. Ensure that you execute them on different terminals:

./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties

./bin/kafka-server-start  ./etc/kafka/server.properties

./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties

This is how you can create configuration files and Kafka Topics to set up the Kafka MongoDB Connection.

Step 4: Creating Configuration Files & Kafka Topics

Once you have Kafka set up & running on your system, you now need to create the configurations file, containing the information about MongoDB’s connection URL, port, database name, collection name, etc.

To do this, create a file known as “connect-mongodb-source.properties” and update it by adding the following lines:

name=mongodb-source-connector
connector.class=io.debezium.connector.mongodb.MongoDbConnector
mongodb.hosts=repracli/localhost:27017
mongodb.name=mongo_conn
initial.sync.max.threads=1
tasks.max=1
Kafka MongoDB Connection: Creating the Kafla Configurations file.
Image Source: Self

With the configuration file ready, you now need to create Kafka Topics to hold the streaming data. Kafka further allows you to perform analysis using functionalities such as KStream, KSQL or any other tool such as Spark Streaming, etc.

You can create a Kafka Topic by executing the following command on a new terminal:

./bin/kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic mongo_conn.test_mongo_db.test_mongo_table

The above command will create a new Kafka Topic known as “mongo_conn.test_mongo_db.test_mongo_table”.

This is how you can create configuration files and Kafka Topics to set up the Kafka MongoDB Connection.

Step 5: Enabling the Connector

This is the final step required for your Kafka MongoDB connection. Once you’ve made the necessary configurations and created a Kafka Topic, you now need to enable the Kafka connector that will bring in data from your MongoDB data source and push it into Kafka Topics. To do this, you can use the following command in the same terminal:

./bin/connect-standalone ./etc/schema-registry/connect-avro-standalone.properties ./etc/kafka/connect-mongodb-source.properties

With your connector up and running, open a new terminal and launch the console consumer to check if the data populates at the topic or not. You can do this by running the following command in the new terminal:

./bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic mongo_conn.test_mongo_db.test_mongo_table  --from-beginning
Kafka MongoDB Connection: Console Consumer Command Output
Image Source: Self

The output represents entries from the first MongoDB collection. This is how you can set up Kafka MongoDB Connection.

What Makes Your Data Integration Experience With Hevo Unique?

These are some benefits of having Hevo Data as your Data Automation Partner:

  • Exceptional Security: A Fault-tolerant Architecture that ensures Zero Data Loss.
  • Built to Scale: Exceptional Horizontal Scalability with Minimal Latency for Modern-data Needs.
  • Built-in Connectors: Support for Kafka, MongoDB and other 100+ Data Sources, including Databases, SaaS Platforms, Files & More. Native Webhooks & REST API Connector available for Custom Sources.
  • Data Transformations: Best-in-class & Native Support for Complex Data Transformation at fingertips. Code & No-code Fexibilty designed for everyone.
  • Smooth Schema Mapping: Fully-managed Automated Schema Management for incoming data with the desired destination.
  • Blazing-fast Setup: Straightforward interface for new customers to work on, with minimal setup time.

ETL your data from MongoDB & Kafka to your destination warehouse with Hevo’s easy-to-setup and No-code interface. Try our 14-day full access free trial.

Sign up here for a 14-Day Free Trial!

Conclusion

This article teaches you how to set up the Kafka MongoDB Connection with ease. It provides in-depth knowledge about the concepts behind every step of MongoDB Connector Kafka to help you understand and implement them efficiently. These methods can be challenging especially for a beginner as writing code to ETL unstructured data can be quite tasking and resource-intensive.

Hevo can abstract all the above challenges by letting users ETL data in minutes and without code. Hevo’s Data Pipeline offers a much faster and easy to setup No-code alternative for unifying data from Kafka and MongoDB. Hevo caters to 100+ data sources (including 40+ free sources) and can seamlessly transfer your Kafka and MongoDB data to the Data Warehouse of your choice in real-time. Hevo’s Data Pipeline enriches your data and manages the transfer process in a fully automated and secure manner. It will make your life easier and make data migration hassle-free.

Learn more about Hevo

Tell us about your experience of setting up the Kafka MongoDB Connection! Share your thoughts in the comments section below!

No-code Data Pipeline For MongoDB