Apache Kafka is a Distributed Streaming Platform that streams trillions of real-time data into Kafka servers per day. Producers and consumers write and read messages to and from the Kafka servers, respectively. By default, the Kafka application comes with consumer and producer script files to build a Kafka producer and consumer console using the command prompt. However, if you are about to develop industry-level applications, you cannot run Kafka through CLI tools. To use Kafka streaming in high-level applications, you have to build and automate the Kafka producer or consumer by high-end programming languages like Java or Python.
Kafka Java producers and consumers built using Java will solely take care of publishing and fetching real-time data, making it easy to automate and deploy the industrial application that involves data streaming. In this article, you will learn about Kafka, Kafka consumer, and how to build a Kafka Java consumer client.
Table of Contents
Prerequisites
To get started with Kafka Java, you’re expected to have a clear understanding of data streaming.
What is Kafka?
Image Source: www.codeflex.co
Kafka is an open-source and Distributed Streaming Platform that stores and handles real-time data for building event-driven applications. It has a dedicated set of servers vastly distributed across the Kafka cluster. Kafka servers are solely responsible for collecting, storing, organizing, and distributing real-time messages. The Kafka environment mainly consists of 3 components like Kafka producers, servers, and consumers.
Kafka producers publish or write data into Kafka servers, while Kafka consumers fetch or write data from respective Kafka servers. Kafka servers are highly fault-tolerant since it is vastly distributed over the Kafka environment. In rare cases, if any one of the Kafka servers fails, the real-time data or messages will be safe in other Kafka servers.
What is Data Streaming?
The Data Streaming process refers to the processing of data in parallel connected systems. This process lets different applications limit the parallel execution of the data, where one record executes without waiting for the output of the previous record. Hence, a distributed streaming platform allows the user to simplify the task of the streaming process and parallel execution. Hence a streaming platform has the following capabilities:
- It works similar to an enterprise messaging system where it subscribes and publishes streams of records.
- As soon as the streams of records occur, it processes them.
- It stores the streams of records in a fault-tolerant durable way.
What is a Messaging System?
A messaging system refers to a simple exchange of messages between two or more devices, persons, etc. A publish-subscribe messaging system allows a sender to write/send the message and a receiver to read the message. In Apache Kafka, a sender is known as a producer who publishes messages, and a receiver is known as a consumer who consumes that message by subscribing to it.
Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up data integration from Apache Kafka and 100+ Data Sources (including 30+ Free Data Sources) and will let you directly load data to a Data Warehouse or the destination of your choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data.
Get started with hevo for free
Let’s look at some of the salient features of Hevo:
- Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
- Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
- Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
- Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
- Scalable Infrastructure: Hevo has in-built integrations for 100’s of sources that can help you scale your data infrastructure as required.
- Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within Data Pipelines.
- Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-day free trial!
Understanding Kafka Consumer
In Kafka architecture, the consumer serves as an end-application or user, consuming the real-time data present inside Kafka servers. In other words, Kafka consumers read or consume data from specific topics or partitions present inside Kafka servers. Kafka fetches data by subscribing to the specific topics present inside Kafka servers or brokers. When consumers work together to fetch data from a particular topic, each consumer that belongs to the consumer group will read data from different partitions under a specific Kafka topic.
Now that you’re familiar with Kafka and Kafka consumers, let’s dive straight into Kafka Java consumer clients.
Why do you need Kafka Java Consumer?
Similar to how a Kafka Producer optimizes writes to Kafka, a Kafka consumer can be used for the optimal consumption of data. The primary consumer of Kafka’s Java consumer is to take the consumer and connection properties to read the records from the appropriate Kafka broker. Complexities of offset management, concurrent application consumption, delivery semantics, and a lot more can easily be taken care of by Java Consumer APIs.
Building the Kafka Java Consumer
Follow the below-mentioned steps to build Kafka Java Consumer.
Step 1: Installing Kafka
Before building a Kafka Java consumer client to fetch records or messages from Kafka servers, you need to have your Kafka clusters up and running. According to the data you need to fetch from each server, you can have your desired set or number of Kafka servers to be active and running. For building and running a Kafka Java consumer client, a single Kafka server and Zookeeper instance is essential to stream data from the producer to the consumer side efficiently.
- Initially, you have to download and set up the Kafka environment. Ensure that the Java 8+ version is pre-installed and running on your local machine. Setup and configure the Java file path to enable your operating system to locate or point towards the Java utilities.
- Download the Kafka package from the official website. After installing and configuring the Kafka files, you are ready to start the Kafka server and Zookeeper instance.
Step 2: Set up the Kafka Environment
- Firstly, you can start the Zookeeper instance. Zookeeper serves as a centralized coordination system in Kafka to maintain Kafka clusters’ metadata, Kafka servers, producer and consumer details, etc.
- To start the Zookeeper instance, open a new command prompt and execute the following command.
zookeeper-server-start.bat .configzookeeper.properties
- Then, for starting the Kafka server, open another new command prompt and execute the command given below.
.binwindowskafka-server-start.bat .configserver.properties
- After executing the above-mentioned commands, you successfully started the Kafka server and Zookeeper instance.
Note: You have to make sure not to accidentally close the command prompts that run the Kafka server and Zookeeper instances.
The above method is one way to set up the Kafka environment necessary to build a Kafka Java consumer client. The other way is by using the Docker Compose tool to run the Kafka environment locally without any pre-installation and configurations of Kafka application.
- To use the Docker Compose file to set up the local Kafka environment, you should have pre-installed the Docker Compose tool. If you want to install the Docker Compose setup to your local machine, you can visit the link and follow the instructions mentioned in the official documentation.
- After installing and setting up the Docker Compose tool, you are all set to run the docker-compose file for starting the Kafka cluster with Kafka server and Zookeeper instance.
- To run the Kafka environment, you can access the docker-compose file by cloning from GitHub. You can also separately download and use the docker-compose file from the respective Github repository.
- For cloning the docker-compose file, execute the following command.
git clone https://github.com/codingharbour/kafka-docker-compose.git
- Now, navigate to the folder named “single-node-kafka” and execute the following command.
docker-compose up -d
- After the command is executed, you will get the output like the following.
Creating network "single-node-kafka_default" with the default driver
Creating sn-zookeeper ... done
Creating sn-kafka ... done
- Now, your Kafka cluster is started and running successfully. You can execute the below command to see the running instances.
docker-compose ps
- After the command is executed, you will see the output, as shown below.
Image Source: www.dev.to
- Now, you are ready to build the Kafka Java consumer client to consume records from Kafka servers. For continuously fetching records or messages from Kafka, you need the Kafka client library. The easy way to access the Kafka client library is by adding the following code to your pom.xml file that contains all the configuration details of Kafka. For referring to the structure of the pom file, you can refer here.
Image Source: www.dev.to
Step 3: Creating a Kafka Topic
After installing the Kafka client library, you have to create a topic that stores and organizes messages received from producers. Further, consumers can fetch records or messages by subscribing to the respective topic.
- For creating topics, you can use the Kafka topic CLI that comes with the Kafka package by default. You can access it from the docker container named “sn-kafka.”
- Navigate to the terminal and execute the following command.
docker exec -it sn-kafka /bin/bash
- On executing the above command, you will be navigated to the respective docker container to access the Kafka CLI tools for creating Kafka topics.
- Now, execute the following command to create a new Kafka topic.
kafka-topics --bootstrap-server localhost:9092
--create --topic java_topic
--partitions 1 --replication-factor 1
- On executing the above command, you created a new Kafka topic named java_topic with a single partition and replication factor. Using this topic, you can produce and consume messages to and fro the Kafka servers.
Step 4: Creating a Kafka Producer
- For producing messages into the Kafka topics, by default, Kafka distribution provides a command-line utility, i.e., command-line interface in the form of the script file (.sh) for producing messages into Kafka topics. Enter the following command to create a producer console.
kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic java_topic
- You can also use the CLI tool present inside the docker container for creating a producer console. Navigate to the sn-kafka container by executing the following command.
docker exec -it sn-kafka /bin/bash
- Now, run the command given to create a producer console.
kafka-console-producer --broker-list localhost:9092 --topic java_topic
- After executing the above command, the Kafka producer console is created successfully. You can now start with building Kafka Java consumer client.
Step 5: Building Kafka Consumer using Java
The last step is to create a Kafka Java consumer client. Apache Kafka provides several Kafka properties for creating consumer consoles using Java. Some of the essential properties used to build a Kafka consumer are bootstrap.servers, group.id, and value.deserializer.
- Create a new Java Project in your desired IDE for building a Kafka consumer.
- Define a new Java class and enter the consumer properties as shown below.
Image Source: www.dev.to
- In the above code, the “deserializer_class” helps in deserializing or decrypting the serialized messages sent by the Kafka producers.
- Bootstrap.servers parameter defines the port at which the Kafka server is available. “Group ID” parameter is a unique name that identifies which group the consumer belongs to on the receiver side.
- The “AUTO_OFFSET_RESET_CONFIG” parameter defines where the consumer starts consuming messages during the data transfer from producer to consumer. As the value is set “earliest,” the consumer reads every message from the beginning, which already exists in the respective topic.
- After creating a consumer console in Java, you are now ready to fetch or consume data.
- For fetching messages from the consumer side, you have to subscribe to the respective Kafka topic. As you previously created a new Kafka topic named java_topic, you can subscribe to that specific topic for consuming messages.
Image Source: www.dev.to
- On successfully executing the above command, you subscribed to the Kafka topic named java_topic.
- Now, you can produce some messages into the respective topic from the producer console. Within a few seconds, you can see that the newly built consumer console starts fetching all the messages sent from the producer side.
By executing the above-mentioned steps, you can successfully create or build a Kafka Java consumer client.
Conclusion
In this article, you have learned about Kafka, Kafka consumers, and how to build the Kafka Java consumer client for fetching messages from Kafka servers. This article focused on only building Kafka consumer client applications in Java while separately creating a producer console using the default command-line interface of Kafka. You can also build producer and consumer clients separately in Kafka using Java to attain maximum throughput. However, in businesses, extracting complex data from a diverse set of Data Sources can be a challenging task and this is where Hevo saves the day!
visit our website to explore hevo
Hevo Data with its strong integration with 100+ Sources & BI tools such as Apache Kafka, allows you to not only export data from sources & load data in the destinations, but also transform & enrich your data, & make it analysis-ready so that you can focus only on your key business needs and perform insightful analysis using BI tools.
Give Hevo Data a try and sign up for a 14-day free trial today. Hevo offers plans & pricing for different use cases and business needs, check them out!
Share your experience of building a Kafka Java consumer client in the comments section below.