Understanding Kafka Debezium Event Sourcing: 7 Critical Steps

on Data Integration, Debezium, Kafka • January 31st, 2022 • Write for Hevo

Organizations use real-time data to streamline several business processes with robust applications. However, it is not straightforward to build a fault-tolerant application since a colossal amount of changes occur within databases. The slightest of undesired change can result in numerous failures. To mitigate these challenges, organizations use different techniques that can track modifications of data in databases and facilitate real-time information for dependent applications or systems to run effectively. Today, a combination of Debezium and Kafka is embraced by organizations to record changes in databases and provide information to subscribers (other applications). 

In this article, you will learn about Debezium, features of Debezium, and how to perform event sourcing using Debezium and Kafka. 

Table of Contents

Prerequisites

  • Understanding of streaming data.

What is Kafka?

Kafka Debezium: Kafka Logo
Image Source

Initially developed by LinkedIn, Kafka is a distributed event streaming platform that powers most data-driven applications. However, today Kafka is a part of the Apache Software Foundation and is one of Apache’s most popular open-source solutions. Written in Java and Scala, Kafka supports publish-subscribe messaging to support real-time streaming of data across several applications. Such extensive integration makes Kafka popular among developers to streamline data delivery not only for building data-driven applications but also to support real-time analytics. 

What is Debezium?

Kafka Debezium: Debezium Logo
Image Source

Debezium is an open-source distributed platform that is built on top of Apache Kafka. The platform is popular among database developers as it converts your existing databases into event streams, allowing applications to view and respond to changes in the databases at row level.

It’s always been difficult to keep track of databases and be notified when the data changes. Some databases provide frameworks or APIs for monitoring changes, however as there is no standard, the method of each database is unique and necessitates a great deal of specialized expertise and code. Thanks to Debezium, monitoring databases is no longer a challenge for developers. The distributed open-source change data capture platform supports monitoring a variety of database systems with ease.

To simplify the event sourcing, Debezium is building up a library of connectors that capture changes from a variety of DBMS and produce events with almost similar structures. This, as a result, makes it faster for applications to consume as well as respond to the events regardless of where the changes originated. 

Debezium provides Kafka Connect compatible connectors that monitor specific database management systems. The platform records the history of data changes in Kafka logs, from where applications consume them. Debezium captures row-level changes in your databases so that your real-time applications can see and respond to those changes. At present, Debezium includes support for monitoring various popular databases like PostgreSQL servers, SQL Server databases, MySQL database servers, MongoDB replica sets or sharded clusters, and others. 

Key Features of Debezium

  • Low Delay: Debezium produces change events with a very low delay while avoiding increased CPU usage of frequent polling.
  • Data Changes: Debezium makes sure that all the data changes are captured and requires no changes to your data model.
  • Capture Deletes and Old Record State: Debezium can capture deletes as well as Old Record State and further metadata such as Transaction Id and causing query, depending on the capabilities and configuration of the database.
  • Change Data Capture Feature: The change data capture feature of Debezium is amended with a range of related capabilities and options, such as filters, masking, monitoring, message transformations, and snapshots. 
  • Fault-tolerant: Debezium is architected to be tolerant of faults and failures. The platform distributes the monitoring processes across multiple machines so that, if anything goes wrong, the connectors can be restarted.

Uses of Debezium

One of the primary uses of this platform is to enable applications to respond almost immediately whenever there is a change of data in databases. Debezium can be used for various other cases. Some of them are as follows:

  • Cache Invalidation: Debezium automatically invalidates entries in a cache as soon as the records for entries change or are removed.
  • Monolithic Applications: Mostly, applications update a database and carry out other associated work after the changes are committed. These include updating a cache, updating search indexes, and more. With the help of Debezium, these activities can be performed in separate threads or separate processes when the data is committed in the original database.
  • Data Integration: Keeping numerous systems synced might be difficult, but with Debezium and simple event processing logic, you can build ETL solutions effectively.
  • Sharing Databases: When more than one application shares a single database, it is frequently difficult for one application to become aware of changes made by another. However, with Debezium, each application can easily monitor the database and react to the changes.

Simplify Kafka ETL with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo helps you integrate data from 100+ data sources (including 40+ Free Data Sources) like Kafka to a destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources provides users with the flexibility to bring in data of different kinds, in a smooth fashion without having to code a single line. 

GET STARTED WITH HEVO FOR FREE

Check Out Some of the Cool Features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ Integrations from sources like Kafka to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Amazon Redshift, Firebolt, Snowflake Data Warehouses; Databricks, Amazon S3 Data Lakes, MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL databases to name a few.  
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources, that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.

Simplify your Data Analysis with Hevo today! 

SIGN UP HERE FOR A 14-DAY FREE TRIAL!

What is Event Sourcing?

Event Sourcing is a method for software to keep track of its state as a log of domain events. The basic principle behind event sourcing is to ensure that every change in an application’s state is captured in an event object. Unlike CDC (Change Data Capture), which relies on the transaction log, event sourcing focus on journal events.

The implementations of Event Sourcing usually have the following characteristics:

  • Domain events, which are generated by the application’s business logic, will introduce additional states for applications.
  • The state of applications is recorded with journals, which are append-only event logs.
  • Journal is what differentiates event sourcing from CDC. It is considered the source of truth for applications and is replayable to rebuild the state of the application.
  • In general, Journal groups ‘domain events’ by an ID to capture the state of an object. 
Kafka Debezium: Event Sourcing
Image Source

Why use Event Sourcing?

As organizations are building microservices and distributing data across multiple data stores, data can be lost, out of sync, or even corrupted in a matter of seconds, posing a serious threat to mission-critical systems. Event Sourcing mitigates such issues by providing a representation of past and current application states as a series of events. Some of the other features that Event Sourcing provides include performance, simplification, integration with other subsystems, production troubleshooting, fixing errors, testing, flexibility, and more. 

Kafka Debezium Event Sourcing Setup

Steps for Kafka Debezium Event Sourcing will be discussed in this section. To leverage Debezium for event sourcing, you need three separate services: ZooKeeper, Kafka, and the Debezium connector. 

Note: Each service will be used with a different container.

Kafka Debezium Event Sourcing: Start Zookeeper

  • Step 1: The below command runs a new container using version 1.8 of the debezium/zookeeper image:
Kafka Debezium: Event Sourcing Start Zookeeper Step 1
Image Source
  • Step 2: On executing the above command, the zookeeper will start listening on port 2181.

Kafka Debezium Event Sourcing: Start Kafka

  • Step 1: Now, open a new terminal and run the following command to start Kafka in a new container.
Kafka Debezium: Event Sourcing Start Kafka Step 1
Image Source

Kafka Debezium Event Sourcing: Start a MySQL Database

  • Step 1: After starting zookeeper and Kafka, we need a database server that can be used by Debezium to capture changes. Start a new terminal and run the following command for starting MySQL database server.
Kafka Debezium: Event Sourcing Start a MySQL Database Step 1
Image Source

Kafka Debezium Event Sourcing: Start a MySQL Command-Line Client

  • Step 1: To access sample data, run the below command after starting the MySQL server. This runs a new container. 
Kafka Debezium: Event Sourcing Start a MySQL Command-Line Client Step 1
Image Source
  • Step 2: Then, type the command – use inventory – to switch to the inventory database. From here, you can list the tables and access the data.

Kafka Debezium Event Sourcing: Start Kafka Connect

  • Step 1: Start a new terminal and run the below command to start Kafka connect.
Kafka Debezium: Event Sourcing Start Kafka Connect Step 1
Image Source

Kafka Debezium Event Sourcing: Deploying the MySQL Connector

  • Step 1: After starting all the above services, you can now deploy the Debezium MySQL connector and start monitoring the inventory database. 
  • Step 2: For deploying the MySQL connector, you should register the MySQL connector to monitor the inventory database. You can do this using the curl command.
  • Step 3: The below command uses the localhost to connect to the Docker host.
Kafka Debezium: Event Sourcing Deploying the MySQL Connector Step 3
Image Source

Note: You might need to escape the double-quotes.

  • Step 4: Now, verify the connector using the below command.
Kafka Debezium: Event Sourcing Deploying the MySQL Connector Step 4
Image Source
  • Step 5: Review the connector tasks with the below command.
Kafka Debezium: Event Sourcing Deploying the MySQL Connector Step 5
Image Source

Kafka Debezium Event Sourcing: View Change Events

  • Step 1: On deploying the Debezium MySQL connector, you can start monitoring the inventory database for data change events. All the events are written to the topics with the dbserver-1 prefix. 

Conclusion

In this article, you have learned about Debezium, Event Sourcing, and the steps to perform event sourcing with Debezium and Kafka. Debezium is a distributed platform that turns your existing databases into event streams. This helps your applications see as well as respond immediately to each row-level change in the databases. In the above tutorial, we have used the MySQL connector. You can further try running the tutorial with Debezium connectors for MongoDB, Postgres, and Oracle.

Extracting complex data from a diverse set of data sources to carry out an insightful analysis can be challenging, and this is where Hevo saves the day! Hevo offers a faster way to move data from 100+ Data Sources including Databases or SaaS applications like Kafka into your Data Warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.

VISIT OUR WEBSITE TO EXPLORE HEVO

Want to take Hevo for a spin?SIGN UP and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

No-code Data Pipeline for Kafka