Debezium Logging 101: Critical Concepts and Configurations

on Data Integration, Data Streaming, Debezium • February 17th, 2022 • Write for Hevo

Logging in databases refers to the process of keeping every minor update of database activity in terms of messages. These log messages or log files are essential when the system fails to recover. Debezium is an open-sourced distributed system used to keep track of real-time changes on databases. It has a separate connector for separate databases like PostgreSQL, MySQL, SQL, Oracle, etc. Thus, the logging process is initiated when one such connector is connected to databases. The connector produces valuable information when it is connected to the database, and this information is then stored in terms of messages called log messages. 

Debezium allows users to change the configuration of the logging system and loggers to generate many log messages, enabling the diagnosis of connector failures.

Table of Contents

Prerequisites

  • Basics of Debezium connectors.

What is Debezium?

Debezium Logging: Debezium Logo
Image Source

Debezium is an open-sourced event streaming platform that keeps track of every real-time change in databases. It uses different connectors of databases like MySQL, SQL, Oracle, PostgreSQL, etc. When the Debezium connectors are connected to databases, you can track all changes on databases and send them to Kafka topic. These changes are then accessed by different applications for further processing-dependent tasks. 

Debezium follows the Change Data Capture approach used to replicate data between databases in real-time. Other ways to approach CDC are Postgre Audit Triggers, Postgre Logical Decoding, and Timestamp column. In the Postgre Audit Trigger-based method, the databases create the triggers to capture events related to insert, update and delete methods.

But the disadvantage of this method is that it affects the performance of the database. However, the Postgre Logical Decoding method uses the write-ahead log to maintain the log of activities occurring in databases. Write ahead log is the internal log that describes the database changes on storage level. This method, in contrast, increases the complexity of the databases by writing logs. And the Timestamp column needs to query the table and monitor the changes accordingly.

This method would require the user’s time and effort to create the query and track the changes. Therefore, Debezium is used as an alternative to all the above approaches, which is a distributed platform and is fast so that applications can respond to data changes quickly.

What is Debezium Logging?

All the databases consist of logs that record the database changes. In case of system failures, logs are needed to restore and recover the system. Logging refers to the process of keeping logs. Similarly, Debezium consists of extensive logging into its connectors.

Users can change the logging configuration in Debezium to control access to the log statements. Usually, connectors produce few logs when they are connected to the source databases. These logs are adequate when the connector operates but might not be sufficient when connectors stop. In such scenarios, you can change the logging levels to produce more logs.

Simplify ETL and Data Streaming with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ data sources (including 30+ free data sources) and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Get Started with Hevo for Free

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, E-Mail, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Understanding Debezium Logging Concepts

Here are the key Debezium Logging Concepts to keep in mind:

Debezium Logging Concepts: Loggers

Applications produce log messages arranged in hierarchies and send them to specific loggers. The root logger resides at the top of the hierarchy, defining the default logger configuration. For eg. io.debezium is the root logger whose child is io.debezium.connector.

Debezium Logging Concepts: Log Levels

Every produced log message has specific log levels as follows.

  • ERROR: It specifies errors, exceptions, or other issues.
  • WARN: It specifies potential problems.
  • INFO: It specifies the status and low volume information.
  • DEBUG: It specifies detailed activity, which helps detect unexpected behavior or failure.
  • TRACE: It specifies very detailed and high-volume activity.

Debezium Logging Concepts: Appenders

Appender is a destination where all the log messages are written and is used to control the format of the log messages. For the configuration of logging, you need to specify the desired level of each logger and the appender. Since the loggers are hierarchical, the configuration of the root logger serves as a default for all the loggers below it.

Methods to Modify Debezium Logging Configuration

Kafka Connect uses the log4j configuration file to run Debezium connectors in a Kafka Connect process for default logger configuration. By default, the log4j consists of the following configuration.

Debezium Logging: Log4j Configuration
Image Source

From above, there is a root logger that decides the default configuration. It has ERROR, WARN, INFO messages and is written to the stdout appender.

The stdout appender writes the log messages to the console and uses the pattern matching algorithm to format the log messages.

You can change the configuration of the Debezium connectors in the following ways:

Debezium Logging Configuration Modification: Change the Logging Level

The default logging level provides sufficient information to detect whether the connector is healthy or not. If the connector is not healthy, you can change the logging level to detect the issue.

Debezium connectors send their log messages to the loggers with names that match the fully qualified name of the java class generating the log message. In Debezium, codes or functions are organized by using packages. Therefore, you can control all the log messages of one class or all the classes under the same package.

The steps for changing the logging levels are as follows.

  • Step 1: Open the log4j.properties file.
  • Step 2: Configure the logger for the connectors. In this tutorial, you use a MySQL database connector.

The log4j.properties file consists of the following configuration.

Debezium Logging: Log4j Properties File
Image Source

From above, the logger named io.debezium.connector.mysql is configured and sends the DEBUG, INFO, WARN and ERROR messages to the stdout appender.

The second logger named io.debezium.relational.history consists of database history and sends the DEBUG, INFO, WARN and ERROR messages to the stdout appender.

From above, the 3rd and the 4th line consists of turn-off additivity, meaning you cannot send the log messages to the appenders of the parent loggers.

  • Step 3: You can change the logging level of the specific subset of classes if necessary.

If you increase the log level in the connector, it increases the number of words in the messages, which can lead to confusion. You can change the logging level for the particular subset of classes where such confusion needs to be detected.

You can use the below steps to change the level of logging.

  • Step 1: You can set the logging level to DEBUG or TRACE.
  • Step 2: Review the log messages.
  • Step 3: You can find the log messages related to the associated issue. The name of the Java class that produced the message is shown at the end of every log message.
  • Step 4: Set the connector’s logging level to INFO.
  • Step 5: For each Java class identified, you need to configure the logger.

For example, consider a MySQL connector is skipping some events when processing the binlog. Instead of setting the logging level to DEBUG or TRACE, you can set it to INFO and then configure DEBUG or TRACE just for the class, which reads the binlog as follows.

Debezium Logging: Log4j Step 5
Image Source

Debezium Logging Configuration Modification: Mapped Diagnostic Contexts

To perform different activities, Kafka Workers and Debezium use multiple threads. Due to threads, it becomes difficult to search for a particular log message of a specific file. Debezium has several mapped diagnostic contexts that provide additional information about the threads to find the log messages easily.

Debezium provides the following mapped diagnostic contexts properties.

  • dbz.connectorType: It is a short name given to connectors like MySql, Mongo, Postgre, and more to find log messages produced by them. You can check the thread associated with the same type of connector.
  • dbz.connectorName: In the connector configuration, the name of the connector or the database server is defined. To find the log messages produced by a specific connector instance, you can check all the threads associated with a particular connector instance with the same values.
  • dbz.connectorContext: It is a short name for the activities running as a separate thread in the connector’s task. When a connector assigns some threads to the resource, the name of that resource is used instead of the thread. Each thread uses a distinct value when connected to the connector. Therefore, you can find all the log messages with these activities.

To enable Mapped diagnostic context for the connector, you should configure the appender in log4j.properties file.

Steps to enable Mapped diagnostic context.

  • Step 1: Open the log4j.properties file.
  • Step 2: Use any appender supported by the Debezium to enable Mapped diagnostic context properties.

In the below example, stdout appender is used.

Debezium Logging: Mapped Diagnostic Contexts Step 2
Image Source

It produces the below log messages.

Debezium Logging: Mapped Diagnostic Contexts Step 2 Part 2
Image Source

Every message from adobe includes the connector type, the connector’s name, and the thread’s activity.

Conclusion

In this tutorial, you have learned about key concepts and configurations needed for Debezium logging. Debezium logging is necessary when the connector goes down or stops to increase the logging level, generating more log messages. You can also use Kafka Connect loggers to configure Debezium loggers and logging levels.

Visit our Website to Explore Hevo

Companies need to analyze their business data stored in multiple data sources. The data needs to be loaded to the Data Warehouse to get a holistic view of the data. Hevo Data is a No-code Data Pipeline solution that helps to transfer data from 100+ sources to desired Data Warehouse. It fully automates the process of transforming and transferring data to a destination without writing a single line of code. Hevo helps simplify ETL and Data Streaming for your business requirements.

Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of learning about Debezium SQL Server Integration in the comments section below!

No-code Data Pipeline for Your Data Warehouse