Kafka to MongoDB ETL: 5 Easy Steps

• October 7th, 2021

Kafka MongoDB Connection

Do you want to transfer your MongoDB data using Kafka? Are you finding it challenging to set up a Kafka to MongoDB Connection? Well, look no further! This article will answer all your queries & relieve you of the stress of finding a truly efficient solution. Follow our easy step-by-step guide to help you master the skill of efficiently transferring your data with the help of MongoDB Kafka Connector.

It will help you take charge in a hassle-free way without compromising efficiency. This article aims at making the data export process as smooth as possible.

Upon a complete walkthrough of the content, you will be able to successfully set up a Kafka MongoDB Connection to seamlessly transfer data to the destination of your choice for a fruitful analysis in real-time. It will further help you build a customized ETL pipeline for your organization. Through this article, you will get a deep understanding of the tools and techniques & thus, it will help you hone your skills further.

Table of Contents

Prerequisites

To set up the Kafka MongoDB connection, you need:

  • Working knowledge of MongoDB.
  • Working knowledge of Kafka.
  • MongoDB installed at the host workstation.
  • Kafka installed at the host workstation.

Why do we need Kafka MongoDB Connection?

Kafka to MongoDB allows seamless data transfer from MongoDB as a Source to other destinations or another MongoDB source. Kafka streams data from MongoDB in a hassle-free manner. Enterprises need to publish and send high volumes of data to many other subscribers or manage multiple streams to MongoDB. MongoDB is widely used in the industry, and Kafka helps developers layout multiple asynchronous streams using Kafka MongoDB Connection. 

MongoDB doesn’t support the selective stream, and Kafka to MongoDB Connection can help developers set up a stream that sends selective data from MongoDB for Business Intelligence tools, applications, or Reporting tools.

Steps to set up the Kafka to MongoDB Connection

Kafka supports connecting with MongoDB and numerous other NoSQL databases with the help of in-built connectors provided by Confluent Hub. These connectors help bring in data from a source of your choice to Kafka and then stream it to the destination of your choice from Kafka Topics. Similarly, there are many connectors for MongoDB that help establish a Kafka MongoDB Connection.

You can set up the Kafka to MongoDB Connection with the Debezium MongoDB connector using the following steps:

Step 1: Installing Kafka

To start setting up the Kafka MongoDB Connection, you will have to download and install Kafka, either on standalone or distributed mode. You can check out the following links & follow Kafka’s official documentation, that will help you get started with the installation process:

Step 2: Installing the Debezium MongoDB Connector for Kafka

Confluent provides users with a diverse set of in-built connectors that act as the data source and sink, and help users transfer their data via Kafka. One such connector that lets users establish Kafka MongoDB connection is the Debezium MongoDB Connector.

To install the Debezium MongoDB connector, go to Confluent Hub’s official website and search for MongoDB, using the search bar found at the top of your screen. You can also click here to locate the connector on Confluent Hub with ease.

Kafka MongoDB Connection: Downloading the Debezium MongoDB Connector.
Image Source: Self

Once you’ve found the desired MongoDB connector, click on the download button. A zip file will now start downloading on your system. You now need to extract the zip file and copy all jar files, found in the lib folder to your Confluent installation.

Kafka MongoDB Connection: Kafka MongoDB Connection: Copying Jar Files.
Image Source: Self

This is how you can install the Debezium MongoDB connector to start setting up a Kafka MongoDB Connection.

Step 3: Adding Jar Files to the Class-Path & Starting Confluent

Once you have all the relevant jar files, you need to put them into the class-path to allow the application to recognise them and execute them accordingly. To do this, open the Bash_profile file using the following line of code:

vi ~/.bash_profile

Modify the file by adding the following lines and then save it to bring the changes into effect.

#for CLASSPATH
CLASSPATH=/Users/software/confluent-5.0.0/debezium-debezium-connector-mongodb-0.8.1/*
export CLASSPATH
PATH=$PATH:/usr/local/sbin
export PATH

Once you’ve made the changes, source the Bash_profile file as follows:

source ~/.bash_profile

Once you’ve made the necessary modifications, you now need to ensure that you have Confluent Kafka set up and it’s running on your system. In case you don’t have Kafka running on your system, you can use the following lines of code to start Zookeeper, Kafka, and Schema Registry. Ensure that you execute them on different terminals:

./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties

./bin/kafka-server-start  ./etc/kafka/server.properties

./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties

This is how you can create configuration files and Kafka Topics to set up the Kafka MongoDB Connection.

Step 4: Creating Configuration Files & Kafka Topics

Once you have Kafka set up & running on your system, you now need to create the configurations file, containing the information about MongoDB’s connection URL, port, database name, collection name, etc.

To do this, create a file known as “connect-mongodb-source.properties” and update it by adding the following lines:

name=mongodb-source-connector
connector.class=io.debezium.connector.mongodb.MongoDbConnector
mongodb.hosts=repracli/localhost:27017
mongodb.name=mongo_conn
initial.sync.max.threads=1
tasks.max=1
Kafka MongoDB Connection: Creating the Kafla Configurations file.
Image Source: Self

With the configuration file ready, you now need to create Kafka Topics to hold the streaming data. Kafka further allows you to perform analysis using functionalities such as KStream, KSQL or any other tool such as Spark Streaming, etc.

You can create a Kafka Topic by executing the following command on a new terminal:

./bin/kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic mongo_conn.test_mongo_db.test_mongo_table

The above command will create a new Kafka Topic known as “mongo_conn.test_mongo_db.test_mongo_table”.

This is how you can create configuration files and Kafka Topics to set up the Kafka MongoDB Connection.

Step 5: Enabling the Connector

This is the final step required for your Kafka MongoDB connection. Once you’ve made the necessary configurations and created a Kafka Topic, you now need to enable the Kafka connector that will bring in data from your MongoDB data source and push it into Kafka Topics. To do this, you can use the following command in the same terminal:

./bin/connect-standalone ./etc/schema-registry/connect-avro-standalone.properties ./etc/kafka/connect-mongodb-source.properties

With your connector up and running, open a new terminal and launch the console consumer to check if the data populates at the topic or not. You can do this by running the following command in the new terminal:

./bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic mongo_conn.test_mongo_db.test_mongo_table  --from-beginning
Kafka MongoDB Connection: Console Consumer Command Output
Image Source: Self

The output represents entries from the first MongoDB collection. This is how you can set up Kafka MongoDB Connection.

Deliver smarter, faster insights with your unified data -:

Using manual scripts and custom code to move data into the warehouse is cumbersome. Changing API endpoints and limits, ad-hoc data preparation and inconsistent schema makes maintaining such a system a nightmare. Hevo’s reliable no-code data pipeline platform enables you to set up zero-maintenance data pipelines that just work.

Wide Range of Connectors – Instantly connect and read data from 150+ sources including SaaS apps and databases, and precisely control pipeline schedules down to the minute.

In-built Transformations – Format your data on the fly with Hevo’s preload transformations using either the drag-and-drop interface, or our nifty python interface. Generate analysis-ready data in your warehouse using Hevo’s Postload Transformation 

Near Real-Time Replication – Get access to near real-time replication for all database sources with log based replication. For SaaS applications, near real time replication is subject to API limits.   

Auto-Schema Management – Correcting improper schema after the data is loaded into your warehouse is challenging. Hevo automatically maps source schema with destination warehouse so that you don’t face the pain of schema errors.

Transparent Pricing – Say goodbye to complex and hidden pricing models. Hevo’s Transparent Pricing brings complete visibility to your ELT spend. Choose a plan based on your business needs. Stay in control with spend alerts and configurable credit limits for unforeseen spikes in data flow.

24×7 Customer Support – With Hevo you get more than just a platform, you get a partner for your pipelines. Discover peace with round the clock “Live Chat” within the platform. What’s more, you get 24×7 support even during the 14-dayfree trial.

Security – Discover peace with end-to-end encryption and compliance with all major security certifications including HIPAA, GDPR, SOC-2.

Get started for Free with Hevo!

Conclusion

This article teaches you how to set up the Kafka to MongoDB Connection with ease. It provides in-depth knowledge about the concepts behind every step of MongoDB Connector Kafka to help you understand and implement them efficiently. These methods can be challenging especially for a beginner as writing code to ETL unstructured data can be quite tasking and resource-intensive.

Hevo can abstract all the above challenges by letting users ETL data in minutes and without code. Hevo’s Data Pipeline offers a much faster and easy to setup No-code alternative for unifying data from Kafka and MongoDB. Hevo caters to 150+ data sources (including 40+ free sources) and can seamlessly transfer your Kafka and MongoDB data to the Data Warehouse of your choice in real-time. Hevo’s Data Pipeline enriches your data and manages the transfer process in a fully automated and secure manner. It will make your life easier and make data migration hassle-free.

Learn more about Hevo

Tell us about your experience of setting up the Kafka MongoDB Connection! Share your thoughts in the comments section below!