Apache Kafka has a vast environmental architecture that comprises producer, broker, consumer, and Zookeeper. In Kafka architecture, Zookeeper serves as a centralized controller for managing all the metadata information about Kafka producers, brokers, and consumers. However, you can install and run Kafka without Zookeeper. In this case, instead of storing all the metadata inside Zookeeper, all the Kafka configuration data will be stored as a separate partition within Kafka itself.
In this article, you will learn about Kafka, Zookeeper, and running Apache Kafka without Zookeeper. You will also learn how to install Apache Kafka without Zookeeper.
Table of Contents
What is Apache Kafka?
Apache Kafka is an Open-source Distributed Streaming Platform that collects, processes, stores, and manages real-time data that are streaming continuously into Kafka servers. Kafka servers are nothing but a set of clusters working together to store and organize real-time data. Users can further access such real-time infinite data for building data-driven applications. Apache Kafka is also known as a publish-subscribe messaging service because users can publish and subscribe to and fro the Kafka server for performing various data-related operations. Because of such efficient capabilities and features, Kafka is used for multiple use cases, including stream processing, real-time analytics, user activity tracking, and more.
Key Features of Kafka
Apache Kafka is incredibly popular because of its features, which include ensuring uptime, making scaling simple, and allowing it to manage massive volumes. Let’s have a look at some of the powerful features it provides:
- High Scalability: The partitioned log model used by Kafka distributes data over several servers, allowing it to extend beyond the capacity of a single server.
- Low Latency: Kafka separates data streams, resulting in extremely low latency and great throughput.
- Fault-Tolerant & Durable: Data is written to disc, and partitions are distributed and replicated across several servers. This protects data from server failure and makes it fault-tolerant and long-lasting. The Kafka cluster can handle failures in the master and database. It’s capable of restarting the server on its own.
- Extensibility: Since Kafka’s surge in popularity in recent years, many other applications have built connectors. This enables the installation of extra features, such as integration with other systems, in a matter of seconds. Check out how you can integrate Kafka with Amazon Redshift and Salesforce.
- Metrics and Monitoring: Kafka is a popular tool for tracking operational data. This requires gathering data from several apps and consolidating it into centralized feeds with metrics. To read more about how you can analyze your data in Kafka, you can refer to Real-time Reporting with Kafka Analytics.
What is Zookeeper?
Zookeeper is an open-source coordination service for managing distributed applications. Since Apache Kafka is a distributed streaming platform, it uses Zookeeper to store and manage the configuration information about Kafka topics, servers, producers, and consumers. In other words, Apache Kafka uses Zookeeper as a centralized synchronization service that stores and manages information of Kafka clusters, including the overall metadata of Kafka brokers or servers.
Key Features of Zookeeper
The key features of Zookeeper are as follows:
- Naming Service: ZooKeeper assigns a unique identity to each node, which is extremely similar to DNA and aids in identification.
- Updating the Status of a Node: Zookeeper has the ability to update the status of every node. As a result, this functionality enables it to keep up-to-date information about each Node in the Cluster.
- Managing the Cluster: Zookeeper keeps track of the status of each Node in real-time. This reduces the likelihood of errors and ambiguity, which is how it manages the cluster.
- Automatic Failure Recovery: While modifying, ZooKeeper locks the data, so that if a failure happens in the database, it can be recovered automatically by the Cluster.
- Scalability: Zookeeper’s performance can be improved by deploying multiple machines.
- Fast: Zookeeper operates relatively quickly in “read-dominant” workloads.
- Ordered Messages: Zookeeper keeps track of the messages by stamping each update with a number indicating its order.
- Reliability: Once Zookeeper applies the update until a client overwrites it, it will be persistent from that point forward.
- Sequential Consistency: Sequential Consistency implies that changes from a client are applied in the same sequence in which they were delivered.
- Single System Image: Regardless of which server Zookeeper connects to, a client will get the same view of the service.
A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 150+ different Data sources (including 40+ free sources) such as Kafka to a Data Warehouse or Destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line.
Get Started with Hevo for Free
Check out some of the cool features of Hevo:
Sign up here for a 14-Day Free Trial!
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
- Connectors: Hevo supports 100+ integrations to SaaS platforms, files, Databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.
- Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (including 40+ free sources) that can help you scale your data infrastructure as required.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
How does Apache Kafka run without Zookeeper?
In the latest version of Kafka 2.8.0, users are provided with a preview of how to use Kafka without Zookeeper. Usually, Kafka uses Zookeeper to store and manage all the metadata information about Kafka clusters. Kafka also uses Zookeeper as a centralized controller that manages and organizes all the Kafka brokers or servers. However, in the new Kafka version, instead of storing all server config information in Zookeeper, you can store them as a topic partition inside the Kafka server itself. To start with Kafka without Zookeeper, you should run Kafka with Kafka Raft metadata mode i.e. KRaft.
The KRaft controllers collectively form a Kraft quorum, which stores all the metadata information regarding Kafka clusters. With this method, you eradicate the dependency of Zookeeper within Kafka environment architecture. Besides, you can achieve various benefits like eliminating system complexities and data redundancy while running Kafka without Zookeeper. As Kafka plans to discontinue Zookeeper as a centralized configuration service, you will have a simplified Kafka architecture without any third-party service dependencies.
Steps to Install Apache Kafka without Zookeeper
This article focuses on Scala version 2.12, i.e., Kafka 2.8.0 to install Kafka without Zookeeper.
The steps to be followed to install Kafka without Zookeeper are as follows:
A) Download Apache Kafka
The steps followed to download Apache Kafka are as follows:
Option 1: In Windows Operating System
- Step 1: Initially, go to the official website of Apache Kafka and click on the “Download Kafka” button.
- Step 2: On the next page, you will see various Kafka versions. From that, choose the latest Kafka version that removes Zookeeper dependency. You can download the preferred version by clicking on the respective Kafka version.
- Step 3: Now, you will be redirected to the new page having the direct download link of Kafka. Click the link to download Kafka to your PC directly.
Option 2: In Linux Operating System
- Step 1: If you are using a Linux OS, you can easily download it directly from your command prompt using the “wget” command. This command serves as a non-interactive network downloader that downloads any number of files from a specific server while being in your command prompt.
- Step 2: For that, open your command prompt terminal and write the command as given below.
The link followed by the “wget” command is nothing but the direct download link used to download Kafka.
- Step 3: Write the command and press the “Enter” key. Wait for a few minutes until the download completes.
- Step 4: After downloading, you can further extract or unzip your files by writing the below command in your terminal.
tar xzf kafka_2.12-2.8.0.tgz
B) Run KRaft
The steps to be followed are:
- Step 1: Navigate to your Kafka folder so that the commands you write from now on will point to the respective Kafka folder.
- Step 2: Write the below command in the terminal.
- Step 3: Now, you are in the Kafka folder. Further, go to the “Kraft” folder inside the “config” folder of the Kafka home directory. The below command will help you with the navigation.
- Step 4: As shown in the above image, you will see some sample configuration files inside the Kraft folder. From these, the “server.properties” file helps you start new Kafka clusters without Zookeeper.
- Step 5: Now, copy the “server.properties” file three times to create three new configuration files. Further, you can configure each file for creating three-node Kafka Clusters.
- Step 6: You can name the newly created property files as server1.properties, server2.properties, and server3.properties, indicating three different servers.
- Step 7: The name of the newly created files can be given parallelly to the command as shown below.
cp server.properties server1.properties
cp server.properties server2.properties
cp server.properties server3.properties
Here, the names of the newly created files are server1.properties, server2.properties, server3.properties.
- Step 8: After creating new config files, you can start configuring the properties of each file. Initially, configure server1.properties file. For that, you can use the “vi” command that allows you to edit your file within the terminal instead of using any external editor application.
- Step 9: To edit server1.properties file, write the command as given below.
- Step 10: The above command will open server1.properties file to edit and configure further.
- Step 11: In the server file, you will find many lines of codes that define their respective properties. You have to edit or modify each property to look the same as the below properties.
- Step 12: If some properties do not need any modifications to match the below codes, you can leave it as it is. Only modify the following properties and leave other properties unchanged.
You can also use the below commands to quickly copy and paste them into the respective configuration files.
- Step 13: After modifying the codes, save the file. You have successfully modified and configured server1.properties file.
- Step 14: Now, open server2.properties file to perform the same configuration process. Write the below command to open server2.properties file.
- Step 15: Follow the same procedure that you did while configuring the server1.properties file.
- Step 16: Modify the server2.properties to look the same as the properties shown below.
- Step 17: After configuring, save the respective file.
- Step 18: Finally, modify server3.properties file while keeping the below codes as reference. Only edit the properties that require modification, leave other properties to remain the same.
Now, all the server property files have been modified and updated.
- Step 19: In the next step, you will create a new “uuid” that will serve as your cluster-ID. For that, enter the command in your terminal as given below.
- Step 20: After executing the above command, you will get a unique UUID for the cluster. Note it down for future reference.
- Step 21: Now, you need to format the existing log directories or storage locations so that Kafka can store log files in the respective server’s folder instead of storing them in temporary directories. You have to format the locations separately for each server file.
- Step 22: The basic command to format locations based on each server property file is given below.
./bin/kafka-storage.sh format -t <uuid> -<server_config_location>
- Step 23: In the above command, replace <uuid> with the UUID you copied before. Replace the <server_config_location> with the respective server property file.
- Step 24: Enter the command given below to format locations based on each server property file.
- For server1.properties file
./bin/kafka-storage.sh format -t uuid -c
- For server2.properties file
./bin/kafka-storage.sh format -t uuid -c
- For server3.properties file
./bin/kafka-storage.sh format -t uuid -c
- Step 25: After formatting the locations, you are ready to start the Kafka servers.
- Step 26: Before starting the servers, you need to set up the heap properties. For that, you can execute the command given below.
export KAFKA_HEAP_OPTS="-Xmx200M –Xms100M"
- Step 27: By this command, you are providing a heap between 200 MB and 100 MB for Kafka. Kafka runs with 512MB as the heap size by default. Based on the requirements and use cases, you can increase your heap size to even 3GB and above.
- Step 28: In the below steps, you will start the servers.
Starting Server 1:
Starting Server 2:
Starting Server 3:
- Step 29: Now, you have successfully started Kafka without Zookeeper. To check whether Kafka has started, you can execute the following command.
ps -ef | grep kafka
Once the command is executed, you can see several commands on your terminal, which provide appropriate information about the newly created servers. With this, you can confirm that Kafka servers are successfully started and running live.
After starting Kafka, you can create topics to store a stream of real-time data. Further, you can run a Kafka producer and consumer in a separate terminal to start producing and receiving messages. Once you have started receiving messages, you can assure that you have successfully installed Apache Kafka without Zookeeper.
Note: Using Apache Kafka without Zookeeper is still in its preview or testing phase. Therefore, it is advised not to implement it for production.
In this article, you have learned about running Apache Kafka without Zookeeper. This article also provided information on Apache Kafka, its key features, Zookeeper, its key features and how to install Apache Kafka without Zookeeper. While using Apache Kafka with Zookeeper, you might witness various issues, including data duplication and system complexities. To eradicate such complications, you can install and use Kafka without Zookeeper to attain maximum throughput.
For further information on installing Kafka on Windows, you can visit the former link.
Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.
Visit our Website to Explore Hevo
Hevo Data with its strong integration with 150+ data sources (including 40+ Free Sources) allows you to not only export data from your desired data sources & load it to the destination of your choice but also transform & enrich your data to make it analysis-ready. Hevo also allows integrating data from non-native sources using Hevo’s in-built Webhooks Connector. You can then focus on your key business needs and perform insightful analysis using BI tools.
Want to give Hevo a try?
Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing Hevo Price, which will assist you in selecting the best plan for your requirements.
Share your experience of understanding the installation of Kafka without Zookeeper in the comment section below! We would love to hear your thoughts.