Apache Kafka is a distributed streaming platform that stores and handles trillions of real-time data every day. It has a vast ecosystem that comprises Kafka servers or brokers, producers, consumers, and a centralization coordination service called Zookeeper.
Kafka producers write messages into Kafka servers, while Kafka consumers fetch or consume data from Kafka servers.
Usually, to produce and consume simple messages or message queues, you can write code or commands using the default CLI tool that comes with Kafka installation.
However, if you want to stream real-time data using Kafka mainly for enterprise or industry-level applications, the code has to be written in high-level IDEs, should pass several test cases, and then deployed into the production environment.
In this article, you will learn about Kafka, Kafka producers, and how to build, test, and deploy Kafka producer applications using Java.
Table of Contents
Prerequisites
- Fundamental understanding of streaming data.
- Apache Maven is installed on your system.
- Understanding the Confluent Cloud
Understanding Kafka and Kafka producers
Image Source
Originally developed by Linkedin in 2010, Kafka is an open-source and distributed message streaming platform, which collects stores, and organizes real-time data.
In other words, Kafka is a message streaming service that handles real-time data that can be used to develop or build event-driven or data-driven applications.
Kafka is also called a publish-subscribe messaging service because real-time data can be written (publish) and fetched (subscribe) to and fro the distributed Kafka servers.
The ecosystem of Kafka consists of three main components, namely Kafka producers, servers, and consumers. The dedicated set of Kafka servers distributed across the Kafka ecosystem can store and handle trillions of messages streaming per day into Kafka clusters.
Kafka producers stream or publish messages into Kafka servers, while Kafka consumers read data from the respective servers.
Inside Kafka servers, messages are stored inside specific topics that organize and arrange messages with respect to their time of arrival.
Kafka producers should always mention the name of the topic inside which they will publish messages on the Kafka servers.
Similarly, Kafka consumers should also mention the name of the topic to fetch messages present inside that respective topic.
Kafka Producer Configuration
There are many configuration settings available that affect the producer’s behavior.
Core Configuration: You can set the bootstrap.servers property so that the producer can find the Kafka cluster.
Message Durability: You can control the durability of messages written to Kafka through the acks setting. The default value is 1, acks=all is used when there is a requirement of confirmation that message has reached, and acks=0 is for maximum throughput, and does not wait for the confirmation that the message is received.
Message Ordering: usually the messages sent through the broker follow the same order they are sent, but in case of failure this config allows the retries to maintain the order.
Batching and Compression: Kafka producers attempt to collect sent messages into batches to improve throughput.
Queuing Limits: The Kafka Producers provide buffer memory to limit memory usage. When the limit is hit, the producer stops the additional messages and sets a timeout to retry.
Building Kafka Producers
Setup Kafka Cluster
For making Kafka producers start writing messages or records into Kafka servers, you have to set up and run the Kafka environment. You can run the Kafka environment using your default Kafka installation on your local machine or the Confluent cloud platform.
Using a Confluent cloud platform for building producers is straightforward and effective since it is a fully managed service, which involves significantly less coding. Follow these steps to install:
- Initially, you have to sign up and log in to the Confluent cloud platform.
- After logging in to the Confluent cloud platform, click on the “Add cloud environment” button.
- Then, you have to name the environment with a unique name. Since it is a cloud service, you have to pay according to your required cloud space and instance. However, you can play around with the trial version.
- After providing the required details, click on the “Learn” button.
- In the next step, you have to initialize the “Kafka producer application” project.
- On executing the commands as shown below, allocate a local directory to run the producer application and create a separate directory for configuration data.
mkdir kafka-producer-application && cd kafka-producer-application
mkdir configuration
- Now, write all the cluster information into a local file. Navigate to the Kafka cluster from your Confluent cloud console tool. Then, from the “clients view” parameter, you can get information about cluster configuration.
- Create a new credential for your Kafka cluster and schema registry. After creating the essential credentials, Confluent cloud will provide you with the overall configuration details required to run the Kafka environment.
- Then, you have to install Confluent Cloud console or Confluent CLI for writing and reading messages to and fro the Kafka clusters.
- In the next step, you can create topics inside which producers can write and publish messages.
Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up data integration from Apache Kafka and 100+ Data Sources (including 30+ Free Data Sources)and will let you directly load data to a Data Warehouse or the destination of your choice.
Hevo will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data
Get started with hevo for free
Create a Topic
- Open a new CLI and execute the following command.
confluent kafka topic create output-topic --partitions 1
- On executing the above command, you created a new Kafka topic named “output-topic” with a single partition.
Configuring the Producer Console
- In the below steps, you will configure the “Kafka producer application” project in Java. Using a build automation tool like Gradle, you can build any end-to-end applications with very few assumptions and prerequisites. Create a new Gradle file as shown below and name it “build.gradle.”
- After executing the above code, run the following command to get the Gradle wrapper, a script file you add to your Gradle project, and further use it to execute your pre-written Java code.
Gradle wrapper
- Now, as given below, create a new development file in the configuration directory you created before. The path should be configuration/dev.properties.
- As given below, create a new development file in the configuration directory you created before. The path should be configuration/dev.properties.
- After creating a new configuration file, you have to update the respective file with Confluent Cloud information. Execute the following command to append the Confluent Cloud information to dev.properties files.
cat configuration/ccloud.properties >> configuration/dev.properties
- Now, you are all set to create a Kafka producer application. Before building the Kafka producer application, create a new directory by executing the following command.
mkdir -p src/main/java/io/confluent/developer
- In the next step, create a new KafkaProducerApplication.java file by following the codes in the official website of Confluent.
- In the KafkaProducerApplication.java code, the above code snippet is used to pass the Producer instance as a constructor parameter. You also provide the name of the topic inside which the respective producer application will write records.
- In the above code snippet, message.split(“-”) is the parameter that splits the string or message queues for publishing messages to the Kafka topic. The ProducerRecord parameter sends records to the topic named outTopic.
Sending Messages from a Producer application
- Create an input.txt file in the base directory with the following keys and values.
- Then, navigate to your command prompt (Confluent CLI) and execute the following command.
./gradlew shadowJar
- On executing the above command, you have created an uberjar for the Kafka producer application, which archives and contains both classes and dependencies needed to run an end-to-end application.
- Execute the following command to launch the Kafka producer application with input.txt file.
java -jar build/libs/kafka-producer-application-standalone-0.0.1.jar configuration/dev.properties input.txt
- After executing the above command, the Kafka producer application processes the input.txt file, and your producer console will look like the following image.
- You can also create a new key-value pair in the input.txt file and re-run the above command to fetch the new values.
- You can check the working of the producer application by consuming the produced messages from the respective topic.
- Execute the following command to run the consumer console that reads messages from a specific topic.
confluent kafka topic consume output-topic -b --print-key --delimiter " : "
- After executing the above command, your consumer console fetches messages from the Kafka topic named output-topic.
Testing the application
- In the following steps, you can technically test the newly created Kafka producer application. Initially, you have to create a test file inside the main configuration directory (configuration/test.properties).
- In the newly created test file, execute the following command.
- Then, create a new directory for the test files by executing the following command.
mkdir -p src/test/java/io/confluent/developer/KafkaProducerApplicationTest.java
- In the next step, write the test codes in the KafkaProducerApplicationTest.java by following the codes given on the official website of Confluent.
- After writing codes, execute the testing process by the below command.
./gradlew test
- On successfully completing the testing process, your application is ready to undergo the production or deployment process.
Deploying the Application
- Initially, you have to create a production configuration file. You can create a production configuration file at the path configuration/prod.properties.
- Make sure you fill the bootstrap.servers parameter based on the address of your production host.
- Then, you have to build a docker image that comprises your application.
- In the command prompt or Confluent CLI, execute the following command to invoke the Jib plugin for building a new docker image.
gradle jibDockerBuild --image=io.confluent.developer/kafka-producer-application-join:0.0.1
- After building the docker image, you can launch the container in the production environment. Execute the following command to launch your container that comprises your Kafka producer application.
docker run -v $PWD/configuration/prod.properties:/config.properties io.confluent.developer/kafka-producer-application-join:0.0.1 config.properties
On executing the above steps, you have successfully built and deployed the Kafka producer application.
Frequently Asked Questions (FAQs)
Does Kafka have REST API?
Kafka has the provision for REST API named Kaka REST Proxy. It enables you to send messages using HTTP. It can be used both to produce data and ingest data. It is also used for running queries on clusters.
How does Kafka communicate?
Kafka writes the data into a disk that is structured and replicates it to be fault-tolerant. This disk is scalable. The Streams when using Kafka API aggregates the input streams and output streams to have stable communication.
Is Kafka connect a producer?
The Kafka Connect source API is a framework that is built on top of the Kafka Producer API. This was done to provide more functionality to the developer using a nicer API.
Can Kafka be used as a message queue?
Kafka can be used as a Message Queue but is technically a distributed streaming platform that has other benefits other than streaming and storage.
Conclusion
Kafka is a go-to message streaming application in the marketplace. It is known for its reliable streaming and real-time application. Kafka consists of two components namely Kafka Producers and Kafka Consumers.
Kafka producers are responsible for streaming messages. They are applications that can be created based on the requirements. In this article, you learned about Kafka, Kafka Producers, a guide to set up Kafka Clusters, and steps to build, test and deploy a Kafka Producer.
There are many different applications like Kafka that organizations use on a daily basis. These applications generate data and this same data when utilized efficiently generates valuable insights.
But, transferring data from a source to the desired destination is a hectic task. An automated data pipeline helps in extracting this complex data and striding it in a warehouse can be a breeze. This is where Hevo comes in. Hevo is a no-code data pipeline that is reliable and efficient.
Hevo Data has strong integrations with BI tools such as Apache Kafka, which allows you to not only export data from sources & load data to the destinations, but also transform & enrich your data, & make it analysis-ready so that you can focus only on your key business needs.
visit our website to explore hevo
Give Hevo Data a try and sign up for a 14-day free trial today. Hevo offers plans & pricing for different use cases and business needs, check them out!
You can now also go ahead and get some more knowledge on Building Kafka Java Consumer Client.
Please do share your experience of working with Kafka Producers in the comments section below.