Apache Kafka is an Open-source Stream Processing and Management Platform that receives, stores, organizes, and distributes data across different end-users or applications. As users can push hundreds and thousands of messages or data into Kafka Servers, there can be issues like Data Overloading and Data Duplication. Due to these problems, data present in the Kafka Servers often remains unorganized and confounded.
Consequently, consumers or end-users cannot effectively fetch the desired data from Kafka Servers. To eliminate the complications of having messy and unorganized data in the Kafka Servers, users can create different Kafka Topics in a Kafka Server.
Kafka Topic allows users to store and organize data according to different categories and use cases, allowing users to easily produce and consume messages to and from the Kafka Servers.
In this article, you will learn about Kafka, Kafka Topics, and steps for creating Kafka Topics in the Kafka server. You will also know about the process of Kafka Topic Configuration.
Table of Contents
Prerequisites
- Fundamental knowledge of Streaming Data.
What is Apache Kafka?
Image Source
Developed initially by Linkedin, Apache Kafka is an Open-Source and Distributed Stream Processing platform that stores and handles real-time data. In other words, Kafka is an Event Streaming service that allows users to build event-driven or data-driven applications.
Since Kafka is used for sending (publish) and receiving (subscribe) messages between processes, servers, and applications, it is also called a Publish-Subscribe Messaging System.
By default, Kafka has effective built-in features of partitioning, fault tolerance, data replication, durability, and scalability. Because of such effective capabilities, Apache Kafka is being used by the world’s most prominent companies, including Netflix, Uber, Cisco, and Airbnb.
As the ability of businesses to collect data explodes, data teams have a crucial role to play in fueling data-driven decisions. Yet, they struggle to consolidate the data scattered across sources into their warehouse to build a single source of truth. Broken pipelines, data quality issues, bugs and errors, and lack of control and visibility over the data flow make data integration a nightmare.
1000+ data teams rely on Hevo’s Data Pipeline Platform to integrate data from over 150+ sources in a matter of minutes. Billions of data events from sources as varied as SaaS apps, Databases, File Storage and Streaming sources can be replicated in near real-time with Hevo’s fault-tolerant architecture. What’s more – Hevo puts complete control in the hands of data teams with intuitive dashboards for pipeline monitoring, auto-schema management, custom ingestion/loading schedules.
All of this combined with transparent pricing and 24×7 support makes us the most loved data pipeline software on review sites.
Take our 14-day free trial to experience a better way to manage data pipelines.
What are Apache Kafka Topics?
Image Source
Apache Kafka has a dedicated and fundamental unit for Event or Message organization, called Topics. In other words, Kafka Topics are Virtual Groups or Logs that hold messages and events in a logical order, allowing users to send and receive data between Kafka Servers with ease.
When a Producer sends messages or events into a specific Kafka Topic, the topics will append the messages one after another, thereby creating a Log File. Furthermore, producers can Push Messages into the tail of these newly created logs while consumers Pull Messages off from a specific Kafka Topic.
By creating Kafka Topics, users can perform Logical Segregation between Messages and Events, which works the same as the concept of different tables having different types of data in a database.
In Apache Kafka, you can create any number of topics based on your use cases. However, each topic should have a unique and identifiable name to differentiate it across various Kafka Brokers in a Kafka Cluster.
Kafka Topic Partition
Apache Kafka’s single biggest advantage is its ability to scale. If Kafka Topics were constrained on a single Machine or on a single Cluster, it would pretty much become its biggest hindrance to scale. Luckily, Apache Kafka has a solution for this.
Apache Kafka divides Topics into several Partitions. For better understanding, you can imagine Kafka Topic as a giant set and Kafka Partitions to be smaller subsets of Records that are owned by Kafka Topics. Each Record holds a unique sequential identifier called the Offset, which gets assigned incrementally by Apache Kafka. This helps Kafka Partition to work as a single log entry, which gets written in append-only mode.
The way Kafka Partitions are structured gives Apache Kafka the ability to scale with ease. Kafka Partitions allow Topics to be parallelized by splitting the data of a particular Kafka Topic across multiple Brokers. Each Broker holds a subset of Records that belongs to the entire Kafka Cluster.
Apache Kafka achieves replication at the Partition level. Redundant units in Topic Partitions are referred to as Replicas, and each Partition generally contains one or more Replicas. That is, the Partition contains messages that are replicated through several Kafka Brokers in the Cluster. Each Partition (Replica) has one Server that acts as a Leader and another set of Servers that act as Followers.
A Kafka Leader replica handles all read/write requests for a particular Partition, and Kafka Followers imitate the Leader. Apache Kafka is intelligent enough, that if in any case, your Lead Server goes down, one of the Follower Servers becomes the Leader. As a good practice, you should try to balance the Leaders so that each Broker is the Leader of the same number of Partitions.
When a Producer publishes a Record to a Topic, it is assigned to its Leader. The Leader adds the record to the commit log and increments the Record Offset. Kafka makes Records available to consumers only after they have been committed, and all incoming data is stacked in the Kafka Cluster.
Each Kafka Producer uses metadata about the Cluster to recognize the Leader Broker and destination for each Partition. Kafka Producers can also add a key to a Record that points to the Partition that the Record will be in, and use the hash of the key to calculate Partition. As an alternative, you can skip this step and specify Partition by yourself.
How to Create Apache Kafka Topics?
Here are the simple 3 steps used to Create an Apache Kafka Topic:
Step 1: Setting up the Apache Kafka Environment
Image Source
In this method, you will be creating Kafka Topics using the default command-line tool, i.e., command prompt. In other words, you can write text commands in the command prompt terminal to create and configure Kafka Topics.
Below are the steps for creating Kafka topics and configuring the newly created topics to send and receive messages.
- For creating Kafka topics, you have to start and set up the Kafka Environment. Before that, make sure that Kafka and Zookeeper are pre-installed, configured, and running on your local machine. You have to also ensure that the Java 8+ Version is installed and running on your computer. Then, configure the file path and Java_Home environment variables for enabling your operating system to locate the Java utilities.
- For starting Apache Kafka, open a new command prompt and enter the below command:
.binwindowskafka-server-start.bat .configserver.properties
- For starting Zookeeper, open another command prompt and enter the below command. Generally, Apache Kafka uses Zookeeper to manage and store all the Metadata and Cluster Information. As a result, you have to run the Zookeeper Instance along with Kafka during message transfer.
zookeeper-server-start.bat .configzookeeper.properties
Now, Kafka and Zookeeper have started and are running successfully.
Step 2: Creating and Configuring Apache Kafka Topics
- In the further steps, you will be seeing how to create Kafka Topics and configure them for efficient message transfer.
- For creating a new Kafka Topic, open a separate command prompt window:
kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Image Source
When the above command is executed successfully, you will see a message in your command prompt saying, “Created Topic Test.” With the above command, you have created a new topic called Topic Test with a single partition and one replication factor.
The command consists of attributes like Create, Zookeeper, localhost:2181, Replication-factor, Partitions:
- Create: It is a basic command for creating a new Kafka topic.
- Partitions: The newly created Topics can be divided and stored in one or more Partitions to enable uniform scaling and balancing of messages or loads. In the above-mentioned basic command, you will be creating only one Partition. When a Kafka Server is handling or streaming large data, you can even have Ten Partitions for a single Topic and 10,000 Partitions per Kafka Cluster.
- Replication Factor: The Replication Factor defines the number of copies or replicas of a Topic across the Kafka Cluster. If you give the Replication Factor as 1, you are making one copy or replication of your newly created Topic. Similarly, when you provide the Replication Factor as 2, you make two copies for the respective Topic. This Replication feature ensures the Kafka Server to be highly fault-tolerant. Since it replicates and spreads the topics across other Kafka Servers, if one of the servers fails, the topic/data will be available on other servers.
- Zookeeper localhost:2181: This attribute states that your Zookeeper instance runs on port 2181. When you enter the command to create Topics in Kafka, the command will be redirected to the Zookeeper instance running along with Kafka. Further, Zookeeper redirects your command request to the Kafka Server or Broker to create a new Kafka Topic.
Kafka Topics should always have a unique name for differentiating and uniquely identifying between other topics to be created in the future. In this case, you are giving a “Topic Test” as a unique name to the Topic. This name can be used to configure or customize the topic elements in further steps.
In the above steps, you have successfully created a new Kafka Topic. You can list the previously created Kafka Topics using the command given below.
'kafka-topics.bat -zookeeper localhost:2181 -list'.
You can also get the information about the newly created Topic by using the following command. The below-given command describes the information of Kafka Topics like topic name, number of partitions, and replicas.
kafka-topics.bat -zookeeper localhost:2181 -describe --topic <the_topic_name>
After creating topics in Kafka, you can start producing and consuming messages in the further steps. By default, you have “bin/kafka-console-producer.bat” and “bin/kafka-console-consumer.bat” scripts in your main Kafka Directory. Such pre-written Producer and Consumer Scripts are responsible for running Kafka Producer and Kafka Consumer consoles, respectively.
Initially, you have to use a Kafka Producer for sending or producing Messages into the Kafka Topic. Then, you will use Kafka Consumer for receiving or consuming messages from Kafka Topics.
For that, open a new command prompt and enter the following command.
kafka-console-producer.bat --broker-list localhost:9092 --topic test
After the execution of the command, you can see the ” > ” cursor that is frequently blinking. Now, you can confirm that you are in the Producer Console or Window.
Once the Kafka producer is started, you have to start the Kafka consumer. Open a new command window and enter the below command according to your Kafka versions. If you are using an old Kafka version (<2.0 version), enter the command given below.
kafka-console-consumer.bat --zookeeper
localhost:2181 --topic test
If you are using the recent Kafka versions (>2.0 version), enter the command given below.
kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic test --from-beginning
In the above commands, the Topic Test is the name of the Topic inside which users will produce and store messages in the Kafka server. The same Topic name will be used on the Consumer side to Consume or Receive messages from the Kafka Server.
After implementing the above steps, you have successfully started the Producer and Consumer Consoles of Apache Kafka. Now, you can write messages in the Producer Panel and receive Messages in the Consumer Panel.
Using manual scripts and custom code to move data into the warehouse is cumbersome. Frequent breakages, pipeline errors and lack of data flow monitoring makes scaling such a system a nightmare. Hevo’s reliable data pipeline platform enables you to set up zero-code and zero-maintenance data pipelines that just work.
- Reliability at Scale: With Hevo, you get a world-class fault-tolerant architecture that scales with zero data loss and low latency.
- Monitoring and Observability: Monitor pipeline health with intuitive dashboards that reveal every stat of pipeline and data flow. Bring real-time visibility into your ELT with Alerts and Activity Logs
- Stay in Total Control: When automation isn’t enough, Hevo offers flexibility – data ingestion modes, ingestion, and load frequency, JSON parsing, destination workbench, custom schema management, and much more – for you to have total control.
- Auto-Schema Management: Correcting improper schema after the data is loaded into your warehouse is challenging. Hevo automatically maps source schema with destination warehouse so that you don’t face the pain of schema errors.
- 24×7 Customer Support: With Hevo you get more than just a platform, you get a partner for your pipelines. Discover peace with round the clock “Live Chat” within the platform. What’s more, you get 24×7 support even during the 14-day full-feature free trial.
- Transparent Pricing: Say goodbye to complex and hidden pricing models. Hevo’s Transparent Pricing brings complete visibility to your ELT spend. Choose a plan based on your business needs. Stay in control with spend alerts and configurable credit limits for unforeseen spikes in data flow.
Step 3: Send and Receive Messages using Apache Kafka Topics
Open both the Apache Kafka Producer Console and Consumer Console parallel to each other. Start typing any text or messages in the Producer Console. You can see that the messages you are posting in the Producer Console are Received and Displayed in the Consumer Console.
By this method, you have configured the Apache Kafka Producer and Consumer to write and read messages successfully.
You can navigate to the Data Directories of Apache Kafka to ensure whether the Topic Creation is successful. When you open the Apache Kafka Data Directory, you can find the topics created earlier to store messages. In the directory, such Topics are represented in the form of folders.
If you have created Partitions for your Topics, you can see that the Topic Folders are separated inside the same directory according to the given number of partitions. In the Kafka Data Directory, you will see several files named “Consumer Offsets” that store all the information about the consumer configuration and messages according to the Apache Kafka Topics.
From these steps, you can confirm and ensure that Apache Kafka is properly working for message Producing and Consuming operations.
Conclusion
In this article, you have learned about Apache Kafka, Apache Kafka Topics, and steps to create Apache Kafka Topics. You have learned the Manual Method of creating Topics and customizing Topic Configurations in Apache Kafka by the command-line tool or command prompt.
However, you can also use the Kafka Admin API, i.e., TopicBuilder Class, to programmatically implement the Topic Creation operations. By learning the manual method as a base, you can explore the TopicBuilder method later.
Extracting complicated data from Apache Kafka, on the other hand, can be Difficult and Time-Consuming. If you’re having trouble with these issues and want to find a solution, Hevo Data is a good place to start!
VISIT OUR WEBSITE TO EXPLORE HEVO
Hevo Data is a No-Code Data Pipeline that offers a faster way to move data from 150+ Data Sources including Apache Kafka, Kafka Confluent Cloud, and other 40+ Free Sources, into your Data Warehouse to be visualized in a BI tool.
You can use Hevo’s Data Pipelines to replicate the data from your Apache Kafka Source or Kafka Confluent Cloud to the Destination system. Hevo is fully automated and hence does not require you to code.
Want to take Hevo for a spin? SIGN UP for a 14-day Free Trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.
Share your experience of learning about Apache Kafka Topic Creation & Working in the comments section below!