Debezium Kafka Auto Topic Creation Simplified: A Comprehensive Guide 101

By: Published: February 10, 2022

Debezium Kafka Auto Topic Creation

Debezium uses Kafka for handling real-time changes in databases to help developers build data-driven applications. Kafka uses Brokers, that refers to one or more servers in the Kafka clusters. These brokers consist of topics that hold the change in databases as events. To create such topics, you either use Kafka Connect or Kafka Broker. While Kafka Connect helps you create topics manually, Kafka Broker can be used to create topics automatically. 

In this tutorial, you will learn Debezium Kafka Auto Topic Creation i.e Debezium Topic creation through Kafka Connect and Kafka Broker. 

Table of Contents

Prerequisites

  • Fundamental understanding of Streaming Data.

What is Debezium?

Debezium Logo
Image Source

Debezium is an Open-Source, Distributed Platform for tracking the real-time changes in databases and generating events from them. Debezium uses the Change Data Capture (CDC) approach, a technique used to replicate data between databases in real-time. In Debezium, when the database connectors get started, it tracks all the changes of databases. It stores them to Kafka topic, and then events generated from these changes are received individually by applications.

What is the need for Debezium?

Since a database is significant for any application, keeping databases safe and secure is essential. Earlier, database administrators used to record changes in the source file of databases. However, with Debezium, you can keep track of real-time data and store it in different locations. Databases use Debezium through its database connectors like MongoDB Connector, MySQL Connector, SQL Connector, PostgreSQL Connector, etc.

What is Kafka Topic in Debezium?

Debezium consists of Kafka Clusters with one or more servers called Kafka Brokers. And these Kafka Brokers can have one or more topics. Kafka Topics are the segments that are used to categorize messages. These messages are the real-time data changes of databases tracked by the connectors. They are sent to Kafka’s topic, which gets converted into events. Kafka topics can be created in two ways through Kafka Broker or Kafka Connect. 

For automatic topic creation, Kafka broker uses auto.create.topics.enable property. The topic.creation.enable property in Kafka connect determines whether Kafka connect is allowed to create topics or not. When the automatic topic creation is enabled, the Debezium source connectors release the change data event record for a table with no Kafka topic. Thus, Debezium Auto Topic is created at runtime.

Simplify Kafka ETL and Data Integration using Hevo’s No-code Data Pipeline

Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up Data Integration for 100+ Data Sources (including 40+ Free sources) and will let you directly load data from sources like Apache Kafka to a Data Warehouse or the Destination of your choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data. 

Get Started with Hevo for Free

Let’s look at some of the salient features of Hevo:

  • Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and maps it to the destination schema.
  • Connectors: Hevo supports 100+ Integrations to SaaS platforms FTP/SFTP, Files, Databases, BI tools, and Native REST API & Webhooks Connectors. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake, Firebolt, Data Warehouses; Amazon S3 Data Lakes; Databricks; and MySQL, SQL Server, TokuDB, MongoDB, PostgreSQL Databases to name a few.  
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within Data Pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

Getting Started with Debezium Kafka Auto Topic Creation

A) Kafka Topic Creation

Topics created by Kafka Broker share only a single default configuration, whereas the Topics created by Kafka Connect can apply several configurations while creating topics. The Kafka Broker configuration enables the broker to create topics at run time by default.

Suppose you are using a Kafka version older than 2.6.0, and you want to create topics with some specific configuration. In that case, you have to disable the Debezium Kafka Auto Topic creation at the broker and create the topic explicitly. You can set that with the below property.

auto.create.topics.enable false

To enable Debezium Kafka Auto Topic creation, you can use the following property.

topic.creation.enable = true

Some internal Kafka Connect-related topics are already created when you start Kafka connect. You can see it using the below command.

kafka-topics.sh --bootstrap-server $HOSTNAME:9092 --list

It will show the below list of topics.

connect_configs
connect_offsets
connect_statuses

When you want to create topics from Kafka Broker, you need to set the below property to create the topic with the default configuration.

auto.create.topics.enable = true

You have to add the above property to the below Broker Configuration.

Debezium Kafka Auto Topic: Broker Configuration
Source

When you do not want to use Debezium Kafka Auto Topic creation in Kafka Connect, you can set the below property.

topic.creation.enable = false

B) Connector Configuration

Kafka Connects works with groups that specify a collection of topic configuration properties. It also consists of a regular expression list of topic names, which are applied to the configuration.

There is a default group in Kafka connect when no other group is created. In this tutorial, you will see the Postgre configuration for the Debezium Kafka Auto Topic creation.

The Postgre configuration consists of the below set of commands.

Connector Configuration
Source

C) Default Configuration

All the topics which do not match the topic.creation groups will apply the default group configuration.

Set the replication factor to 3 and partitions to 10 of the topic as default. The configuration of topics determines how much space is allowed for the topic and how the data is managed. The process of making data expire from the topic is called cleanup.

Since messages stored in Kafka are in JSON format by default, all the data are in the string format. Due to the string format, it creates several duplicate entries in the Kafka topic. Therefore, Kafka’s topic’s message compression is carried out to optimize space usage.

Set the cleanup policy to compact and enable log compaction in the topic.

Set the compression type to compression.type = “lz4”. As a result, all the messages in the hard disk will be compressed in ‘LZ4’ format. 

The default configuration of the Kafka topic creation will be set through the following commands.

Default Configuration
Source

D) Group Configuration

In the inventory schema of databases, there are tables with table name products. These tables are captured to the topic with the same names as products in the inventory schema of the dbserver1 as dbserver1.inventory.products.

All the messages should go to the topic name for table names, starting with products. They are stored in the topic for a retention time of 3 months or 90days. Retention time means the amount of time the published messages will be allowed to consume. After that, it will be discarded. The cleanup policy is set to delete, meaning that the messages will be discarded when they have reached their retention time.

You have to register a group named productlog using the topic.creation.groups property.

Define the topic name to the above group and specify the below configuration to it.

Group Configuration
Source

From above, topic.creation.groups define the group name – here it is productlog. The topic.creation.productlog.include a list of regular expressions that match the topic name where the productlog group configuration should be applied. Thus, the group will match all the topics starting with dbserver1.inventory.products.

Start the Debezium connector and type the below command.

Kafka-topics.sh

It will show how the Debezium Kafka Auto Topic is created and defined as the following output.

Output
Source

E) Adding or Removing Topics

You can use the tools under the bin directory of the Kafka distribution, and each tool can print the details if it has no arguments.

You can add or create topics manually when the data is first published to the new topic using the below tool.

bin/kafka-topics.sh --zookeeper zk_host:port/chroot --create --topic my_topic_name 
--partitions 20 --replication-factor 3 --config x=y

The replication factor controls how many servers can replicate the messages from above. The partition log controls the number of logs the topic will be divided. For partition, each partition must fit on a single server. Suppose if you have 20 partitions, then not more than 20 servers will handle the full dataset. Therefore, topic partition impacts maximum parallelism for the messages consumers in the topic.

F) Modifying the Topics

You can change the partition of the topic or its configuration with the below tool.

bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter--topic my_topic_name 
--partitions 40

To add the configuration, use the below tool.

bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name --config x=y

Adding a partition does not change the existing data partition, but this may disturb consumers’ messages if they rely on that partition. If hash(key) % number_of_partitions partition data, it might be shuffled due to the additional partition.

To remove the configuration, use the below tool.


bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name --deleteConfig x

To delete the topic, use the below tool.

bin/kafka-topics.sh --zookeeper zk_host:port/chroot --delete --topic my_topic_name

Conclusion

In this tutorial, you have learned about the Debezium Kafka Auto Topic creation and its type. You have also seen the configuration properties of Kafka’s topic. Before Kafka version 2.6, the custom setup processes created Kafka topic manually. Kafka 2.6 version is used in this tutorial, which consists of built-in topic creation for connectors. The other operations like adding, deleting, or modifying Kafka are also explained in the tutorial.

 As your business begins to grow, data is generated at an exponential rate across all of your company’s SaaS applications, Databases, and other sources. To meet this growing storage and computing needs of data,  you would require to invest a portion of your engineering bandwidth to Integrate data from all sources, Clean & Transform it, and finally load it to a Cloud Data Warehouse for further Business Analytics. All of these challenges can be efficiently handled by a Cloud-Based ETL tool such as Hevo Data.

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources like Apache Kafka and a wide variety of Desired Destinations, with a few clicks. Hevo Data with its strong integration with 100+ sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share with us your experience of learning about Debezium Kafka Auto Topic Creation in the comments below!

Manjiri Gaikwad
Freelance Technical Content Writer, Hevo Data

Manjiri loves data science and produces insightful content on AI, ML, and data science. She applies her flair for writing for simplifying the complexities of data integration and analysis for solving problems faced by data professionals businesses in the data industry.

No-Code Data Pipeline for Apache Kafka