Apache Kafka is a stream processing platform that stores and handles real-time data using distributed Kafka servers. It has a vast environment consisting of Kafka producers, brokers, and consumers. Kafka producers publish real-time data or messages into Kafka servers and to Kafka Consumer to fetch the real-time messages from the respective Kafka servers.
For performing real-time data transfer between Kafka servers, users can use the default producer and consumer console CLI that comes with Apache Kafka installation.
However, if you are about to build an industry-level real-time application that involves proper deployment into the production environment, you should build a Kafka consumer or producer application using any high-end languages like Java or Python.
On building a Kafka consumer or producer application using a high-end programming language, you can run the application through different SDLC phases, including testing and production, thereby ensuring proper deployment of the respective application.
In this article, you will learn about Kafka, Kafka Consumers, how to build and deploy Kafka consumer applications into the production environment.
Table of Contents
- What is Kafka?
- What is Kafka Consumers?
- Building Kafka Consumers
- Deploying Kafka Consumers
A fundamental understanding of real-time data streaming.
What is Kafka?
Kafka is an open-source and distributed event streaming platform that allows you to develop real-time event-driven or data-driven applications. It has a vast collection of servers distributed across the Kafka clusters to collect, store, organize, and manage the real-time messages that flow continuously into Kafka servers.
Since multiple Kafka servers are distributed across the Kafka ecosystem, it ensures the high availability of real-time data. Even if one of the Kafka servers fails under rare circumstances, the real-time data will be protected on other servers, ensuring fault tolerance.
Because of such effective capabilities, Kafka is being used by the world’s most prominent companies, including Netflix, Uber, Airbnb, and Spotify.
Simplify Data Analysis with Hevo’s No-code Data Pipeline
Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process. It supports 100+ data sources (including 30+ free data sources) like Asana and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.
GET STARTED WITH HEVO FOR FREE[/hevoButton]
Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.
Check out why Hevo is the Best:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
What is Kafka Consumers?
The Kafka ecosystem mainly consists of three main components, namely Kafka producers, servers, and consumers, all of which contribute to end-to-end data streaming.
Kafka consumers act as end-users or applications that retrieve data from Kafka servers inside which Kafka producers publish real-time messages. For effectively fetching real-time messages, Kafka consumers have to subscribe to the respective topics present inside the Kafka servers.
On being an end-user or end-application, Kafka consumers can fetch messages from a specific server by targeting the particular topic or partition within which producers previously published messages.
Building Kafka Consumers
a. Setting up Kafka Cluster
Before building the Kafka Consumers, you have to install and set up the Kafka cluster. You can use the default Apache Kafka installation on your local machine or Confluent Cloud platform to set up the Kafka cluster for building the Kafka consumer application.
Since the Confluent Cloud platform is a fully managed service, you can use it for setting up the Kafka cluster in a straightforward and most efficient way with significantly less coding.
With Confluent Cloud, instead of setting up and running the Kafka environment locally, you can flexibly set up Kafka clusters with the interactive and user-friendly UI (User Interface).
- Signup with the Confluent Cloud platform. Follow this link and click on “Start Free.”
- After filling in the required details, click on the “Start Free” button. Now, you are successfully signed up with the Confluent cloud platform. You can use a trial version that comes free of cost with a limited time period. However, since it is a fully managed cloud service, you must pay based on the required cloud space and instance.
- After logging in, click on the “Add cloud environment” button. Provide a unique name to the new Kafka cloud environment. On providing all the essential details, click on the “Learn” button on top. Now, you successfully set up and launched a new Kafka cluster.
- In the next step, you have to create a new directory for the Kafka consumer application. You have to also allocate a new directory for configuration data of Kafka clusters.
mkdir kafka-consumer-application && cd kafka-consumer-application mkdir configuration
- After creating separate file directories, you have to write all the cluster information into a local file, making Kafka point towards respective folders during execution.
- Navigate to the Kafka cluster from your Confluent Cloud Console. You can find connection information under the “Clients” view section.
- Now, you have to create new credentials for your Kafka cluster and schema registry to start working with Kafka consumer applications.
- After creating credentials, Confluent Cloud provides you with the overall configuration details essential to run the Kafka environment. Copy the configuration details as shown above to the new file ccloud.properties by following the path
- In the next step, you have to install the Confluent CLI or Confluent Cloud console for performing Kafka topic management operations, including writing and reading messages to and fro the Kafka topics present in Kafka clusters.
- Now, you are all set to build a Kafka consumer application. Initially, you have to create a new Kafka topic, which stores and organizes the messages sent from producers.
b. Creating a Kafka topic
- Execute the following command to create a new topic.
confluent kafka topic create input-topic
- The above command creates a Kafka topic in the name of “input-topic.”
c. Configuring the Kafka Consumer Application
- In the below steps, you will configure the Kafka consumer application in Java using the Gradle tool. Gradle is a build automation tool that helps you in different phases of the end-to-end software development life cycle, including building, testing, and deployment.
- Now, create a new Gradle file with the following codes.
- After writing the code, save the file and name it “built.gradle.”
- Now, execute the following command to get the Gradle wrapper that serves as a script file to run all Gradle tasks.
- In the next step, create a new configuration file named “dev.properties” at the configuration directory (configuration/dev.properties).
- Write the following code in the newly created configuration file and save the respective file.
- Now, update the configuration file with the Confluent Cloud information. On executing the below command, you can append the essential Confluent Cloud information to the dev.properties files.
cat configuration/ccloud.properties >> configuration/dev.properties
- For building a Kafka consumer application in Java, as a prerequisite, you have to create a dedicated directory for development purposes. Execute the following command to create a new directory.
mkdir -p src/main/java/io/confluent/developer
- Now, to write, compile and execute Java code for building Kafka consumer applications, create a new file named KafkaConsumerApplication.java in the developer directory created in the previous step.
- In KafkaConsumerApplication.java, write the Java code by following the official website of Confluent.
- The above code snippet represents the Kafka Consumer Application method that passes consumer and consumer records handlers via the constructor parameter.
- The code snippet given above represents the runConsume method that performs the operation of subscribing to the specific Kafka topic.
- Since the code involves the usage of instance variable keepConsuming, Kafka consumers will fetch messages from the Kafka servers indefinitely, i.e., with no fixed end.
- The recordsHandler.process(consumerRecords) method returns the polled or fetched consumerRecords to the consumerRecordsHandler interface for making consumers read the messages from the Kafka topic.
- Then, the consumer.close() is included in the code to prevent resource leakage.
- In the next step, you have to create an interface for supporting or helper classes.
- Create a consumerRecordsHandler.java inside a developer directory you created before for storing all the development files of the Kafka consumer application. The file path should be as given below.
- After creating a class file, create a new java file named FileWritingRecordsHandler.java in the main developer director, following the path given below.
- After creating the java file, write the code given below and save the file.
- In the main java application file, the below-given code snippet represents the code used to create an interface named consumerRecordsHandler that handles and manages all the consumer records fetched from Kafka servers.
- The below-given code snippet of the main Java code represents the FileWritingRecordsHandler class that writes values of records fetched from the Kafka servers to a separate file.
- After writing codes in the main java file named FileWritingRecordsHandler, execute the below command to compile and run the program written for the Kafka consumer application.
- On executing the above command, you have created an uberjar for the Kafka consumer application. Uberjar contains and archives all classes, packages, and dependencies in one JAR file, which is needed to run an end-to-end application.
d. Compiling and Running the Kafka Consumer application
- Execute the following command to run the Kafka consumer application.
java -jar build/libs/kafka-consumer-application-standalone-0.0.1.jar configuration/dev.properties
- After running the Kafka consumer application, you have to produce messages into the previously created Kafka topic, making the consumer application consume data from that respective topic.
- For producing messages into a Kafka topic, open a new terminal and run the following command.
confluent kafka topic produce input-topic
- On executing the above command, you have successfully created a Kafka producer console that publishes messages into the Kafka topic named input-topic.
- In the newly created producer console, paste the following messages and press the Enter button.
- In the next step, check whether the produced messages are consumed by the Kafka Consumer application. For inspecting the data transfer from producer to consumer, execute the following command in the new terminal.
- On executing the above command, you will get an output as shown below.
- The Kafka consumer application successfully consumes the messages sent from the producer console.
Deploying the Kafka consumer application
- For deploying the Kafka consumer application into the production environment, you have to create a new configuration file at the main configuration directory. The path should resemble configuration/prod.properties.
- Now, write the following code in the newly created prod.properties file
- In the next step, create a docker image that comprises the Kafka consumer application.
- Execute the following command to build a docker image.
gradle jibDockerBuild --image=io.confluent.developer/kafka-consumer-application-join:0.0.1
- After building a docker image, you are now all set to deploy your Kafka consumer application into the production environment. Execute the following command to launch the docker container.
docker run -v $PWD/configuration/prod.properties:/config.properties io.confluent.developer/kafka-consumer-application-join:0.0.1 config.properties
On executing the above steps, you successfully built and deployed a Kafka consumer application.
This article focused on building and deploying Kafka consumer applications using a Confluent cloud platform that served as a fully managed service, allowing users to write less code when compared to the traditional way of transferring data between Kafka servers.
However, you can also use the default Apache Kafka installation on your local machine to build an end-to-end Kafka message streaming application.
Kafka is a trusted source that a lot of companies use as it provides many benefits but transferring data from it into a data warehouse is a hectic task. The Automated data pipeline helps in solving this issue and this is where Hevo comes into the picture. Hevo Data is a No-code Data Pipeline and has awesome 100+ pre-built Integrations that you can choose from.
visit our website to explore hevo[/hevoButton]
Hevo can help you Integrate your data from numerous sources and load them into a destination to Analyze real-time data with a BI tool such as Tableau. It will make your life easier and data migration hassle-free. It is user-friendly, reliable, and secure.
SIGN UP for a 14-day free trial and see the difference!
Share your experience of learning about Kafka Consumers integration in the comments section below.