Apache Kafka is a distributed message broker designed to handle large volumes of real-time data efficiently. Unlike traditional brokers like ActiveMQ and RabbitMQ, Kafka runs as a cluster of one or more servers which makes it highly scalable and due to this distributed nature it has inbuilt fault-tolerance while delivering higher throughput when compared to its counterparts. In this guide, we will discuss steps to setup Kafka on Ubuntu 16.04
There are 9 steps to install Kafka on Ubuntu
- Step 1: Install Java
- Step 2: Install Zookeeper
- Step 3: Create a service user for Kafka
- Step 4: Installing Kafka
- Step 5: Configuring Kafka Server
- Step 6: Ensure Permission of Directories
- Step 7: Testing Installation
- Step 8: Launching Kafka as a Service on Startup
- Step 9: Setting up Multi-node Cluster
Step 1: Install Java
Kafka is written in Java and Scala and requires jre 1.7 and above to run it. In this step, we will ensure Java is installed.
sudo apt-get update sudo apt-get install default-jre
Step 2: Install Zookeeper
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Kafka uses Zookeeper for maintaining heart beats of its nodes, maintain configuration, and most importantly to elect leaders.
sudo apt-get install zookeeperd
we will now check if Zookeeper is alive and if it’s OK 😛
telnet localhost 2181
at Telnet prompt, we will enter
(are you okay) if it’s all okay it will end telnet session and reply with
Step 3: Create a service User for Kafka
As Kafka is a network application creating a non sudo user specifically for Kafka minimizes the risk if the machine is to be compromised.
sudo adduser --system --no-create-home --disabled-password --disabled-login kafka
Step 4: Installing Kafka
cd ~ wget "http://www-eu.apache.org/dist/kafka/1.0.1/kafka_2.12-1.0.1.tgz"
Optionally check the integrity of the downloaded file
curl http://kafka.apache.org/KEYS | gpg --import wget https://dist.apache.org/repos/dist/release/kafka/1.0.1/kafka_2.12-1.0.1.tgz.asc gpg --verify kafka_2.12-1.0.1.tgz.asc kafka_2.12-1.0.1.tgz
Create a directory for extracting Kafka
sudo mkdir /opt/kafka sudo tar -xvzf kafka_2.12-1.0.1.tgz --directory /opt/kafka --strip-components 1
Optionally delete Kafka tarball and .asc file
rm -rf kafka_2.12-1.0.1.tgz kafka_2.12-1.0.1.tgz.asc
Step 5: Configuring Kafka Server
Kafka persists data to disk so we will now make a directory for it.
sudo mkdir /var/lib/kafka sudo mkdir /var/lib/kafka/data
open /opt/kafka/config/server.properties in your favourite editor.
sudo nano /opt/kafka/config/server.properties
By default, Kafka doesn’t allow us to delete topics. To be able to delete topics, find the line and change it.
delete.topic.enable = true
next, we will change log directory
Kafka auto-deletes oldest logs after a particular time period or after logs reach a certain size. To adjust it according to the time change the line below.
log.retention.hours=168 #other accepted keys are(log.retention.ms, log.retention.minutes)
Or according to disk size.
Step 6: Ensure Permission of Directories
First, we will ensure Kafka user which we created at 3rd step has permission to all of the Kafka related directories.
sudo chown -R kafka:nogroup /opt/kafka sudo chown -R kafka:nogroup /var/lib/kafka
Step 7: Testing installation
In a terminal start a kafka server
sudo /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
In another terminal create a topic
/opt/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
List all topics with the command below and it will print test the topic we just created
/opt/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181
Let’s start publishing messages on test topic
/opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
We will now create a subscriber on test topic and listen from the beginning of the topic.
/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
Enter some message in the producer
You will see the messages appearing on the subscriber terminal.
As we had started Kafka in the first step as sudo, we should change the user of directories back to kafka. Basically, we would require rerunning step 6
Step 8: Launching Kafka as a Service on Startup
In order to launch Kafka as a service on Ubuntu 16.04, we will create a unit file describing the service.
For this, we will create a unit file in /etc/systemd/system directory
sudo nano /etc/systemd/system/kafka.service
[Unit] Description=High-available, distributed message broker After=network.target [Service] User=kafka ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties [Install] WantedBy=multi-user.target
You can select to forward the log to another file so that your syslog is clean. The log file will slowly grow huge over time so you might want to trim it from time to time. Change ExecStart line as follows.
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties > /opt/kafka/server.log
Start the newly created service
sudo systemctl start kafka.service
Next we will enable the service so that it auto starts on bootup
sudo systemctl enable kafka.service
Check status of the service by
sudo systemctl status kafka.service
Step 9: Setting up Multi-node Cluster
Although Kafka can run on a single node, we can run it on multiple nodes for data redundancy and accidentally failover.
For ease of understanding let’s assume the first node we installed in is
node-1 with ip 10.0.0.1
And nodes we will install next is
node-2 with ip 10.0.0.2 node-3 with ip 10.0.0.3
For setting up a multi-node cluster we will first follow step 1 through step 5 and in step 5 with other settings in
We will change the following settings.
broker.id should be unique for each node in the cluster.
for node-2 broker.id=1 for node-3 broker.id=2
change zookeeper.connect value to have such that it lists all zookeeper hosts with port
Note: Although Zookeeper is not required on every node it’s a good practice
We will also require changing zookeeper settings in
to mention all the nodes of zookeeper
server.0=10.0.0.1:2888:3888 server.1=10.0.0.2:2888:3888 server.2=10.0.0.3:2888:3888
sudo systemctl restart zookeeper.service
Now we can follow step 6 onwards to complete the installation.
Hope this guide has successfully helped you setup kafka on Ubuntu 16.04. Do let me know in the comments if you face any difficulty.