Debezium uses Kafka to handle real-time changes in databases and help developers build data-driven applications. Kafka uses Brokers, which refers to one or more servers in the Kafka clusters. These brokers consist of topics that hold the change in databases as events. To create such topics, you can use either Kafka Connect or Kafka Broker. While Kafka Connect helps you create topics manually, Kafka Broker can be used to create topics automatically.
In this tutorial, you will learn Debezium Kafka Auto Topic Creation i.e Debezium Topic creation through Kafka Connect and Kafka Broker.
Prerequisites
- Fundamental understanding of Streaming Data.
What is Debezium?
Debezium is an Open-Source, Distributed Platform for tracking the real-time changes in databases and generating events from them. Debezium uses the Change Data Capture (CDC) approach, a technique used to replicate data between databases in real time. In Debezium, when the database connectors get started, it tracks all the changes in databases. It stores them in the Kafka topic, and then events generated from these changes are received individually by applications.
What is Kafka Topic in Debezium?
- Debezium consists of Kafka Clusters with one or more servers called Kafka Brokers. And these Kafka Brokers can have one or more topics.
- Kafka Topics are the segments that are used to categorize messages. These messages are the real-time data changes of databases tracked by the connectors. They are sent to Kafka’s topic, which gets converted into events. Kafka topics can be created in two ways through Kafka Broker or Kafka Connect.
- For automatic topic creation, Kafka broker uses auto.create.topics.enable property. The topic.creation.enable property in Kafka connect determines whether Kafka connect is allowed to create topics or not.
- When the automatic topic creation is enabled, the Debezium source connectors release the change data event record for a table with no Kafka topic. Thus, Debezium Auto Topic is created at runtime.
Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up Data Integration for 150+ Data Sources (including 60+ Free sources) and will let you directly load data from sources like Apache Kafka to a Data Warehouse or the Destination of your choice. Let’s look at some of the salient features of Hevo:
- Risk management and security framework for cloud-based systems with SOC2 Compliance.
- Always up-to-date data with real-time data sync.
- Transform your data with custom Python scripts or using the drag-and-drop feature.
Don’t just take our word for it—try Hevo and experience why industry leaders like Whatfix say,” We’re extremely happy to have Hevo on our side.”
Sign up here for a 14-Day Free Trial!
Getting Started with Debezium Kafka Auto Topic Creation
1) Kafka Topic Creation
Topics created by Kafka Broker share only a single default configuration, whereas the Topics created by Kafka Connect can apply several configurations while creating topics. The Kafka Broker configuration enables the broker to create topics at run time by default.
- Suppose you are using a Kafka version older than 2.6.0, and you want to create topics with some specific configuration. In that case, you have to disable the Debezium Kafka Auto Topic creation at the broker and create the topic explicitly. You can set that with the below property.
auto.create.topics.enable false
- To enable Debezium Kafka Auto Topic creation, you can use the following property.
topic.creation.enable = true
- Some internal Kafka Connect-related topics are already created when you start Kafka Connect. You can see it using the below command.
kafka-topics.sh --bootstrap-server $HOSTNAME:9092 --list
- It will show the list of topics below.
connect_configs
connect_offsets
connect_statuses
- When you want to create topics from Kafka Broker, you need to set the below property to create the topic with the default configuration.
auto.create.topics.enable = true
- When you do not want to use Debezium Kafka Auto Topic creation in Kafka Connect, you can set the below property.
topic.creation.enable = false
Integrate Kafka to Snowflake
Integrate Kafka to BigQuery
Integrate Kafka to Redshift
2) Connector Configuration
Kafka Connects works with groups that specify a collection of topic configuration properties. It also consists of a regular expression list of topic names, which are applied to the configuration.
There is a default group in Kafka Connect when no other group is created. In this tutorial, you will see the Postgre configuration for the Debezium Kafka Auto Topic creation.
The Postgre configuration consists of the set of commands below.
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": 1,
"database.hostname": "postgres",
"database.port": 5432,
"database.user": "postgres",
"database.password": "postgres",
"database.dbname" : "postgres",
"database.server.name": "dbserver1",
"schema.include.list": "inventory"
}
}
3) Default Configuration
- All the topics which do not match the topic.creation groups will apply the default group configuration.
- Set the replication factor to 3 and partitions to 10 of the topic as default. The configuration of topics determines how much space is allowed for the topic and how the data is managed.
- Set the cleanup policy to compact and enable log compaction in the topic.
- Set the compression type to compression.type = “lz4”. As a result, all the messages in the hard disk will be compressed in ‘LZ4’ format.
- The default configuration of the Kafka topic creation will be set through the following commands.
4) Group Configuration
- In the inventory schema of databases, there are tables with table names of products. These tables are captured to the topic with the same names as products in the inventory schema of the dbserver1 as dbserver1.inventory.products.
- You have to register a group named productlog using the topic.creation.groups property.
- Define the topic name for the above group and specify the configuration below.
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": 1,
"database.hostname": "postgres",
"database.port": 5432,
"database.user": "postgres",
"database.password": "postgres",
"database.dbname" : "postgres",
"database.server.name": "dbserver1",
"schema.include.list": "inventory",
"topic.creation.default.replication.factor": 3,
"topic.creation.default.partitions": 10,
"topic.creation.default.cleanup.policy": "compact",
"topic.creation.default.compression.type": "lz4",
"topic.creation.groups": "productlog",
"topic.creation.productlog.include": "dbserver1\\.inventory\\.product.*",
"topic.creation.productlog.replication.factor": 1,
"topic.creation.productlog.partitions": 20,
"topic.creation.productlog.cleanup.policy": "delete",
"topic.creation.productlog.retention.ms": 7776000000,
"topic.creation.productlog.compression.type": "producer"
}
}
- From above, topic.creation.groups define the group name – here it is productlog. The topic.creation.productlog.include a list of regular expressions that match the topic name where the productlog group configuration should be applied. Thus, the group will match all the topics starting with dbserver1.inventory.products.
- Start the Debezium connector and type the below command.
Kafka-topics.sh
It will show how the Debezium Kafka Auto Topic is created and defined as the following output.
$ kafka-topics.sh --bootstrap-server $HOSTNAME:9092 --describe --topic dbserver1.inventory.products
Topic: dbserver1.inventory.products PartitionCount: 20 ReplicationFactor: 1
Configs: compression.type=producer,cleanup.policy=delete,retention.ms=7776000000,segment.bytes=1073741824
$ kafka-topics.sh --bootstrap-server $HOSTNAME:9092 --describe --topic dbserver1.inventory.orders
Topic: dbserver1.inventory.orders PartitionCount: 10 ReplicationFactor: 3
Configs: compression.type=lz4,cleanup.policy=compact,segment.bytes=1073741824,delete.retention.ms=2592000000
Load Data from Kafka to any Data Warehouse
No credit card required
5) Adding or Removing Topics
- You can use the tools under the bin directory of the Kafka distribution, and each tool can print the details if there are no arguments.
- You can add or create topics manually when the data is first published to the new topic using the below tool.
bin/kafka-topics.sh --zookeeper zk_host:port/chroot --create --topic my_topic_name
--partitions 20 --replication-factor 3 --config x=y
- The replication factor controls how many servers can replicate the messages from above.
- The partition log controls the number of logs the topic will be divided. For partition, each partition must fit on a single server. Suppose you have 20 partitions; then not more than 20 servers will handle the full dataset.
6) Modifying the Topics
- You can change the partition of the topic or its configuration with the below tool.
bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter--topic my_topic_name
--partitions 40
- To add the configuration, use the below tool.
bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name --config x=y
- Adding a partition does not change the existing data partition, but this may disturb consumers’ messages if they rely on that partition. If hash(key) % number_of_partitions partition data, it might be shuffled due to the additional partition.
- To remove the configuration, use the below tool.
bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name --deleteConfig x
- To delete the topic, use the below tool.
bin/kafka-topics.sh --zookeeper zk_host:port/chroot --delete --topic my_topic_name
Conclusion
In this tutorial, you have learned about the Debezium Kafka Auto Topic creation and its type. You have also seen the configuration properties of Kafka’s topic. Before Kafka version 2.6, the custom setup processes created Kafka topic manually. Kafka 2.6 version is used in this tutorial, which consists of built-in topic creation for connectors. The other operations like adding, deleting, or modifying Kafka are also explained in the tutorial.
As your business begins to grow, data is generated at an exponential rate across all of your company’s SaaS applications, Databases, and other sources. To meet this growing storage and computing needs of data, you would be required to invest a portion of your engineering bandwidth to Integrate data from all sources, Clean & Transform it, and finally load it to a Cloud Data Warehouse for further Business Analytics. All of these challenges can be efficiently handled by a Cloud-Based ETL tool such as Hevo Data.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.
Share with us your experience of learning about Debezium Kafka Auto Topic Creation in the comments below!
Frequently Asked Questions
1. How to create a Kafka topic automatically?
Use auto.create.topics.enable=true in Kafka configuration, or create topics programmatically using the AdminClient API.
2. Does Debezium use Kafka?
Debezium relies on Kafka to change data capture and stream database changes to Kafka topics.
3. How to create a Kafka topic programmatically?
You can use the AdminClient API in languages like Java and Python to create Kafka topics dynamically within your applications.
Manjiri is a proficient technical writer and a data science enthusiast. She holds an M.Tech degree and leverages the knowledge acquired through that to write insightful content on AI, ML, and data engineering concepts. She enjoys breaking down the complex topics of data integration and other challenges in data engineering to help data professionals solve their everyday problems.