The Kafka CLI is an interactive shell environment that provides you with command-line access for managing your Kafka resources programmatically. You can use the Kafka CLI to type in text commands that perform specific tasks within your Kafka environment. It is by far the fastest and most efficient interface for interacting with a Kafka cluster.
This post illustrates how to perform crucial Kafka tasks using the Kafka CLI commands. However, before diving in, let’s first understand what Kafka is and how is it used in the industry.
What is Kafka?
Kafka is a popular open-source Publish/Subscribe messaging system that is used for building streaming analytics platforms and data integration pipelines. Kafka is both a queue for parallelizing tasks and a messaging-oriented middleware for service integration. The Kafka message broker (cluster) ingests and stores streams of messages (records) from event producers, which are later distributed to consumer services asynchronously when requested.
Producers publish events to the Kafka instance or instances since a Kafka cluster can be either single or multi-node. Kafka stores the messages in a topic. The messages in the topic are organized in an immutable sequence (Python tuple object) based on when the messages were created. Downstream applications can then subscribe to the topics that they are interested in.
This design is quite agile and robust when compared to the pattern of broadcasting messages by synchronous remote procedure calls (RPCs), where producers must wait for consumers to receive the events.
Understanding Core Kafka Concepts
- Topic: A named resource to which a particular stream of messages is stored and published.
- Producer: A client application that creates and publishes records/messages to a Kafka topic(s).
- Consumer: A client application that subscribes to a topic(s) to receive data from it.
- Message: The combination of data and associated metadata that a producer application writes to a topic and is eventually consumed by consumers.
Common Kafka Use Cases
Event sourcing and stream processing architecture is a data strategy that is gaining popularity. Currently, over 60% of the Fortune 100 use Kafka in their tech stack. Some of these organizations include Cisco, Goldman Sachs, Target, and Intuit. There are 1000+ Kafka use cases but let’s just highlight the most common ones:
- Ingestion of User Interaction and Server Events: To make use of user interaction events from end-user apps or server events from your system, you can forward the events to Kafka, process them, and then deliver them to analytical databases. Kafka can ingest events from many clients simultaneously.
- Data Replication: Kafka can be used to stream incremental changes in databases and forward those to a destination such as a data lake or data warehouse for data replication and analysis.
- ESB (Enterprise Service Bus): Companies are using Kafka as an ESB and this is helping them to transition from a monolithic to a microservices architecture. Data is made available to various applications and services across the entire organization in real-time.
- Fraud Detection: The ability to collect, process and distribute financial events and database updates in real-time is enabling teams to do real-time threat/fraud detection.
- Data Streaming from Applications and IoT Devices: Applications can publish a stream of real-time user interaction events. On the other hand, IoT sensors can stream data to Kafka for use in other downstream systems.
- Load Balancing: A service can be deployed on servers that span multiple Datacenters but subscribe to a common topic (data stream). If one service goes down in any Datacenter, the others can automatically take over.
Simplify your data migration for Kafka and perform crucial tasks easily with Hevo’s automated pipeline. Hevo assists in simplifying Kafka ETL and Data Analysis by providing:
- 150+ Connectors.
- Real-time automated data loading.
- In-built transformations
Get Started with Hevo for Free
Working with Kafka CLI Commands
Before you begin working with Kafka CLI, make sure you meet the following requirements:
- Install Java 8+ on your workstation.
- Download and extract Kafka on your workstation.
In this section you learn about the following Kafka CLI commands:
1. Spin Up a Kafka Environment
- Open your terminal and navigate to the directory where you extracted the Kafka zip or tar file:
$ cd kafka_2.13-3.1.0
- Run the following command to launch the ZooKeeper service:
$ bin/zookeeper-server-start.sh config/zookeeper.properties
- In another terminal, run the following command to start the Kafka broker service:
$ bin/kafka-server-start.sh config/server.properties
After successfully executing these commands, you will have an active Kafka environment. Let’s now look at some basic commands that you can execute using the Kafka CLI.
2. Create a Topic
A topic in Kafka is akin to a directory or folder on your computer. The only difference is that a topic stores events (records/messages) rather than files and these events are normally distributed across multiple nodes/brokers.
Events can be application logs, web clickstreams, data emitted by IoT sensors, and much more. So before creating events, the first step is to create a topic or topics that will store and organize these records.
To create a Kafka topic using the Kafka CLI, you will use the bin/kafka-topics.sh shell script that’s bundled in the downloaded Kafka distribution. Launch another terminal session and execute the following command:
$ bin/kafka-topics.sh --create --topic my-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
You have successfully created a topic called my-topic.
3. View Details About a Topic
Kafka exposes a command that you can take advantage of to view metadata about the topics in your Kafka cluster. The following Kafka CLI command provides information on the number of partitions and replicas, among other topic details:
$ bin/kafka-topics.sh --bootstrap-server=localhost:9092 --describe --topic my-topic
Apart from passing the cluster information, you can also pass the Zookeeper address with the same results:
$ kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-topic
4. List Topics
You can use the bin/kafka-topics.sh shell script along with the Zookeeper service URL as well as the –list option to display a list of all the topics in the Kafka cluster.
$ bin/kafka-topics.sh --list --zookeeper localhost:2181
You can also pass the Kafka cluster URL to list all topics.
$ bin/kafka-topics.sh --bootstrap-server=localhost:9092 --list
Migrate data from Kafka to BigQuery
Integrate Kafka to Snowflake
Integrate Kafka to Redshift
5. Publish Events to a Topic
Kafka clients normally communicate via the Kafka broker to read and write events. The broker is responsible for persisting these events in a distributed, partitioned, commit log (topic).
You can use Kafka CLI’s console producer client to emit new events to your Kafka topic. Run the following command, to perform this:
$ bin/kafka-console-producer.sh --topic my-topic --bootstrap-server localhost:9092
After running this command, a prompt will open. Type your messages and click enter to publish them to the Kafka topic. Each time you click to enter a new message is submitted.
My first event
My second event
My third event
You have now published 3 events to the my-topic topic. Click Ctrl+C to stop the producer client.
6. View Events
To view the events stored in the Kafka topic, you will use Kafka CLI’s consumer client. So, open a new terminal session. Then you can run the command shown below:
$ bin/kafka-console-consumer.sh --topic my-topic --from-beginning --bootstrap-server localhost:9092
My first event
My second event
My third event
You can use the Kafka CLI’s producer in a new terminal to publish new events using the command used in the previous step. It’s interesting to see that the events will instantly appear in the consumer client’s terminal session.
7. Change Message Retention Period
When the producer client sends an event to the Kafka broker, that event is appended to the end of one of the commit logs. By default, the event is retained for 168 hours or 7 days after which it’s deleted to free up disk space.
Based on your application’s requirements, you might want to bypass this behavior. For example, consider the following command:
$ bin/kafka-topics.sh --zookeeper=localhost:2181 --alter --topic my-topic --config retention.ms=300000*
This sets a retention period of 5 minutes for all messages appended to the topic “my-topic“.
8. Add Partitions to a Topic
In a multi-node cluster, you might consider splitting your topics into multiple partitions to achieve higher throughput. This is because if you simply constrain a topic to a single node, it limits the ability to scale out. Instead of limiting yourself to a single node, you should take advantage of the extra CPU and RAM by distributing your topics across multiple machines. To add partitions to your Kafka topic, use the following command:
$ bin/kafka-topics.sh --zookeeper=localhost:2181 --alter --topic my-topic --partitions 10
You can add as many partitions as you would like but it’s recommended to limit that to 10 partitions per topic.
9. Delete a Topic
You can delete a topic from the Kafka broker using the following Kafka CLI command:
$ bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic my-topic
However, if you have set a custom retention policy for your topic as we did earlier, you first need to delete it before executing that command. To delete the retention policy, use the following command:
$ bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic mytopic --delete-config retention.ms
Conclusion
In this article, you learned how to set up a simple single-node Kafka cluster. Along the way, you also learned how to use various Kafka CLI commands to produce and consume messages among other tasks. You can use the commands showcased on this page as a reference when developing and managing your own Kafka applications.
Currently, Apache Zookeeper is used to manage the Kafka cluster’s metadata. This will soon change with the release of Apache Kafka 2.8.0 which removes the Apache Zookeeper dependency by implementing a built-in consensus layer.
However, as a Developer, streaming complex data from a diverse set of data sources like Databases, CRMs, Project management Tools, Streaming Services, Marketing Platforms using Kafka can seem to be quite challenging. This is where a simpler alternative like Hevo can save your day!
Hevo Data is a No-Code Data Pipeline that offers a faster way to move data from 150+ Data Sources such as Kafka and other 40+ Free Sources, into your Data Warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.
VISIT OUR WEBSITE TO EXPLORE HEVO
Want to take Hevo for a spin?
SIGN UP and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.
Any questions or feedback regarding Kafka CLI Commands? Reach out to us in the comments section below.
Jeremiah is a specialist in crafting insightful content for the data industry, and his writing offers informative and simplified material on the complexities of data integration and analysis. He enjoys staying updated on the latest trends and uses his knowledge to help businesses succeed.