Steps to Install Kafka on Ubuntu 20.04: 8 Easy Steps

on Data Integration, ETL, Tutorial • February 7th, 2022 • Write for Hevo

Apache Kafka is a distributed message broker designed to handle large volumes of real-time data efficiently. Unlike traditional brokers like ActiveMQ and RabbitMQ, Kafka runs as a cluster of one or more servers which makes it highly scalable and due to this distributed nature it has inbuilt fault-tolerance while delivering higher throughput when compared to its counterparts. 

This article will walk you through the steps to install Kafka on Ubuntu 20.04 using simple 8 steps. It will also provide you with a brief introduction to Kafka and Ubuntu 20.04. Let’s get started.

Table of Contents

What is Apache Kafka?

Apache Kafka Logo
Image Source

Apache Kafka is a distributed event streaming platform that can be used to build high-performance data pipelines, data integration, stream analytics, and mission-critical applications. Users can utilize Kafka Streams in particular to implement end-to-end event streaming. Users can also create and read event streams, as well as import and export data from other systems, using Kafka as a data stream platform.

Apache Kafka is a distributed, highly scalable, elastic, fault-tolerant, and secure data stream platform that can be used on-premises as well as in the cloud. Users can also select between “self-managing their Kafka setups” and using “vendor-managed services” based on the requirements. Kafka is one of the five most active Apache Software Foundation projects, according to the developers, and is trusted by more than 80% of Fortune 100 organizations.

To know more about Apache Kafka, visit this link.

What is Ubuntu 20.04?

install Kafka on Ubuntu Logo
Image Source

The Ubuntu codename for version 20.04 of the Linux-based Ubuntu operating system is Xenial Xerus. It was launched on April 21st, 2016, as a substantial upgrade to Ubuntu, containing enhanced OpenStack integration, the addition of the snaps secure app format for packages, and the LXD pure-container hypervisor.

The Snapcraft tool, which facilitates constructing, developing, and distributing snap packages, is included in the Ubuntu 20.04 version for developers. Ubuntu 20.04 also gets rid of the Ubuntu Software Center, stops transmitting your desktop searches over the Internet by default, and relocates Unity’s dock to the bottom of the screen, among other things.

To know more about Ubuntu 20.04, visit this link.

Download the Guide on How to Set Up a Data Analytics Stack
Download the Guide on How to Set Up a Data Analytics Stack
Download the Guide on How to Set Up a Data Analytics Stack
Learn how to build a self-service data analytics stack for your use case.

Simplify Integration Using Hevo’s No-code Data Pipeline

Hevo Data helps you directly transfer data from Kafka and 100+ data sources (including 40+ free sources) to Business Intelligence tools, Data Warehouses, or a destination of your choice in a completely hassle-free & automated manner. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Hevo takes care of all your data preprocessing needs required to set up the integration and lets you focus on key business activities and draw a much powerful insight on how to generate more leads, retain customers, and take your business to new heights of profitability. It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination.

Get Started with Hevo for Free

Check out what makes Hevo amazing:

  • Real-Time Data Transfer: Hevo with its strong Integration with 100+ Sources (including 30+ Free Sources), allows you to transfer data quickly & efficiently. This ensures efficient utilization of bandwidth on both ends.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Tremendous Connector Availability: Hevo houses a large variety of connectors and lets you bring in data from numerous Marketing & SaaS applications, databases, etc. such as HubSpot, Marketo, MongoDB, Oracle, Salesforce, Redshift, etc. in an integrated and analysis-ready form.
  • Simplicity: Using Hevo is easy and intuitive, ensuring that your data is exported in just a few clicks. 
  • Completely Managed Platform: Hevo is fully managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

How to Install Kafka on Ubuntu 20.04

Now that you have a basic grasp of both technologies, let’s try to understand the procedure to install Kafka on Ubuntu. Below are the steps you can follow to install Kafka on Ubuntu:

Step 1: Install Java and Bookeeper

Kafka is written in Java and Scala and requires jre 1.7 and above to run it. In this step, you need to ensure Java is installed.

sudo apt-get update
sudo apt-get install default-jre

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Kafka uses Zookeeper for maintaining the heartbeats of its nodes, maintaining configuration, and most importantly to elect leaders.

sudo apt-get install zookeeperd

You will now need to check if Zookeeper is alive and if it’s OK 😛

telnet localhost 2181

at Telnet prompt, You will have to enter

ruok

(are you okay) if it’s all okay it will end the telnet session and reply with

imok

Step 2: Create a Service User for Kafka

As Kafka is a network application creating a non-root sudo user specifically for Kafka minimizes the risk if the machine is to be compromised.

$ sudo adduser kafka

Follow the Tabs and set the password to create Kafka User. Now, you have to add the User to the Sudo Group, using the following command:

$ sudo adduser kafka sudo

Now, your User is ready, you need to log in using, the following command:

$ su -l kafka

Step 3: Download Apache Kafka

Now, you need to download and extract Kafka binaries in your Kafka user’s home directory. You can create your directory using the following command:

$ mkdir ~/Downloads

You need to download the Kafka binaries using Curl:

$ curl "https://downloads.apache.org/kafka/2.6.2/kafka_2.13-2.6.2.tgz" -o ~/Downloads/kafka.tgz

Create a new directory called Kafka and change your path to this directory to make it your base directory.

$ mkdir ~/kafka && cd ~/kafka

Now simply extract the archive you have downloaded using the following command:

$ tar -xvzf ~/Downloads/kafka.tgz --strip 1

–strip 1 is used to ensure that the archived data is extracted in ~/kafka/.

Step 4: Configuring Kafka Server

The default behavior of Kafka prevents you from deleting a topic. Messages can be published to a Kafka topic, which is a category, group, or feed name. You must edit the configuration file to change this.

The server.properties file specifies Kafka’s configuration options. Use nano or your favorite editor to open this file:

$ nano ~/kafka/config/server.properties

Add a setting that allows us to delete Kafka topics first. Add the following to the file’s bottom:

delete.topic.enable = true

Now change the directory for storing logs:

log.dirs=/home/kafka/logs

Now you need to Save and Close the file. The next step is to set up Systemd Unit Files.

Step 5: Setting Up Kafka Systemd Unit Files

In this step, you need to create systemd unit files for the Kafka and Zookeeper service. This will help to manage Kafka services to start/stop using the systemctl command.

Create systemd unit file for Zookeeper with below command:

$ sudo nano /etc/systemd/system/zookeeper.service

Next, you need to add the below content:

[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save this file and then close it. Then you need to create a Kafka systemd unit file using the following command snippet:

$ sudo nano /etc/systemd/system/kafka.service

Now, you need to enter the following unit definition into the file:

[Unit]
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

This unit file is dependent on zookeeper.service, as specified in the [Unit] section. This will ensure that zookeeper is started when the Kafka service is launched.
The [Service] line specifies that systemd should start and stop the service using the kafka-server-start.sh and Kafka-server-stop.sh shell files. It also indicates that if Kafka exits abnormally, it should be restarted.
After you’ve defined the units, use the following command to start Kafka:

$ sudo systemctl start kafka

Check the Kafka unit’s journal logs to see if the server has started successfully:

$ sudo systemctl status kafka

Output:

kafka.service
     Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: enabled)
     Active: active (running) since Wed 2021-02-10 00:09:38 UTC; 1min 58s ago
   Main PID: 55828 (sh)
      Tasks: 67 (limit: 4683)
     Memory: 315.8M
     CGroup: /system.slice/kafka.service
             ├─55828 /bin/sh -c /home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1
             └─55829 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xlog:gc*:file=>

Feb 10 00:09:38 cart-67461-1 systemd[1]: Started kafka.service.

On port 9092, you now have a Kafka server listening.

The Kafka service has been begun. But if you rebooted your server, Kafka would not restart automatically. To enable the Kafka service on server boot, run the following commands:

$ sudo systemctl enable zookeeper
$ sudo systemctl enable kafka

You have successfully done the setup and installation of the Kafka server.

Step 6: Testing installation

In this stage, you’ll put your Kafka setup to the test. To ensure that the Kafka server is functioning properly, you will publish and consume a “Hello World” message.

In order to publish messages in Kafka, you must first:

  • A producer who allows records and data to be published to topics.
  • A person who reads communications and data from different themes.

To get started, make a new topic called TutorialTopic:

$ ~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic TutorialTopic

The kafka-console-producer.sh script can be used to build a producer from the command line. As arguments, it expects the hostname, port, and topic of the Kafka server.

The string “Hello, World” should now be published to the TutorialTopic topic:

$ echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

Using the Kafka-console-consumer.sh script, establish a Kafka consumer. As parameters, it requests the ZooKeeper server’s hostname and port, as well as a topic name.

Messages from TutorialTopic are consumed by the command below. Note the usage of the —from-beginning flag, which permits messages published before the consumer was launched to be consumed:

$ ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TutorialTopic --from-beginning

Hello, World will appear in your terminal if there are no configuration issues:

Hello, World

The script will keep running while it waits for further messages to be published. Open a new terminal window and log into your server to try this.
Start a producer in this new terminal to send out another message:

$ echo "Hello World from Sammy at DigitalOcean!" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

This message will appear in the consumer’s output:

Hello, World
Hello World from Sammy at DigitalOcean!

To stop the consumer script, press CTRL+C once you’ve finished testing.
On Ubuntu 20.04, you’ve now installed and set up a Kafka server.

You’ll do a few fast operations to tighten the security of your Kafka server in the next phase.

Step 7: Hardening Kafka Server

You can now delete the Kafka user’s admin credentials after your installation is complete. Log out and back in as any other non-root sudo user before proceeding. Type exit if you’re still in the same shell session as when you started this tutorial.

Remove the Kafka user from the sudo group:

$ sudo deluser kafka sudo

Lock the Kafka user’s password with the passwd command to strengthen the security of your Kafka server even more. This ensures that no one may use this account to log into the server directly:

$ sudo passwd kafka -l

Only root or a sudo user can log in as Kafka at this time by entering the following command:

$ sudo su - kafka

If you want to unlock it in the future, use passwd with the -u option:

$ sudo passwd kafka -u

You’ve now successfully restricted the admin capabilities of the Kafka user. You can either go to the next optional step, which will add KafkaT to your system, to start using Kafka.

Step 8: Installing KafkaT (Optional)

Airbnb created a tool called KafkaT. It allows you to view information about your Kafka cluster and execute administrative activities directly from the command line. You will, however, need Ruby to use it because it is a Ruby gem. To build the other gems that KafkaT relies on, you’ll also need the build-essential package. Using apt, install them:

$ sudo apt install ruby ruby-dev build-essential

The gem command can now be used to install KafkaT:

$ sudo CFLAGS=-Wno-error=format-overflow gem install kafkat

To suppress Zookeeper’s warnings and problems during the kafkat installation process, the “Wno-error=format-overflow” compiler parameter is required.

The configuration file used by KafkaT to determine the installation and log folders of your Kafka server is.kafkatcfg. It should also include a KafkaT entry that points to your ZooKeeper instance.

Make a new file with the extension .kafkatcfg:

$ nano ~/.kafkatcfg

To specify the required information about your Kafka server and Zookeeper instance, add the following lines:

{
  "kafka_path": "~/kafka",
  "log_path": "/home/kafka/logs",
  "zk_path": "localhost:2181"
}

You are now ready to use KafkaT. For a start, here’s how you would use it to view details about all Kafka partitions:

$ kafkat partitions

You will see the following output:

[DEPRECATION] The trollop gem has been renamed to optimist and will no longer be supported. Please switch to optimist as soon as possible.
/var/lib/gems/2.7.0/gems/json-1.8.6/lib/json/common.rb:155: warning: Using the last argument as keyword parameters is deprecated
...
Topic                 Partition   Leader      Replicas        ISRs    
TutorialTopic         0             0         [0]             [0]
__consumer_offsets    0               0           [0]                           [0]
...
...

You will see TutorialTopic, as well as __consumer_offsets, an internal topic used by Kafka for storing client-related information. You can safely ignore lines starting with __consumer_offsets.

To learn more about KafkaT, refer to its GitHub repository.

Conclusion

This article gave you a comprehensive guide to Apache Kafka and Ubuntu 20.04. You also got to know about the steps you can follow to Install Kafka on Ubuntu. Extracting complex data from a diverse set of data sources such as Apache Kafka can be a challenging task and this is where Hevo saves the day!

Visit our Website to Explore Hevo

Hevo Data offers a faster way to move data from 100+ data sources such as SaaS applications such as Apache Kafka or Databases into your Data Warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Hope this guide has successfully helped you install kafka on Ubuntu 20.04. Do let me know in the comments if you face any difficulty.

No-code Data Pipeline for your Data Warehouse