The signaling mechanism in Debezium allows users to trigger a one-time action or modify the connector’s behavior by initiating an ad hoc snapshot of a table. In order to trigger a connector, users are required to issue an SQL command to add a signal message onto a signal table, also known as signaling data collection.
The signaling data collection was exclusively designated for communicating with Debezium. Whenever there is a new logging record or an ad hoc snapshot record added to the signaling data collection, the signal is detected, read by Debezium, and later initiates the operation.
Currently, the feature of sending signals to a Debezium connector is in incubating state. In this blog, we will learn about Debezium signals, how to configure the signals, and the actions that are taken.
Table of Contents
- Prerequisites
- What is Debezium?
- Debezium Connectors
- Debezium Signals
- Conclusion
Prerequisites
Knowledge about data streaming.
What is Debezium?
Image Credit
In the year 2016, Randall Hauch, a software developer, while working at Red Hat, founded the Debezium project to handle the in-house data streaming problem. Hauch served as the project lead for the first 16 months and acted as a contributor for the project. Debezium is an open-source, low-latency, data streaming platform for change data capture (CDC). Debezium is currently released under the Apache License, Version 2.0. It allows users to turn the current databases into event streams, enabling applications to immediately monitor and respond to changes at each row level in databases.
Debezium is built on top of Apache Kafka and includes Kafka Connect compatible connectors for monitoring database management systems. Debezium maintains the history of data changes in Kafka logs, allowing applications to be stopped and resumed at any moment. This helps in consuming all of the events it missed while it was not operating and guarantees that all events are handled correctly and thoroughly.
Debezium Connectors
Debezium’s main purpose is to provide a library of connectors that capture changes from a range of database management systems and generate events with very similar structures. This makes it much easier for the applications to consume and respond to the events regardless of where the changes originated from.
Currently, the following connectors are in use:
- SQL Server
- MySQL
- PostgreSQL
- MongoDB
- Oracle
- Db2
- Vitess (Incubating)
- Cassandra (Incubating)
Debezium Signals
Debezium Signals: Configuration
Users must explicitly enable signaling for each connector since the Debezium signal mechanism is by default disabled.
Procedure
- Create a signaling data collection table in the source database to transmit signals to the connector.
- Enable CDC for the signaling table, for source databases such as SQL or Db2 server that implements a CDC mechanism.
- In the Debezium connector configuration, add the name of the signaling data collection.
Next, add the property signal.data.collection, and set its value to the fully-qualified name of the signaling data collection that was created in Step 1.
The naming formats for each connector are as follows:
SQL Server
<databaseName>.<schemaName>.<tableName>
MySQL
<databaseName>.<tableName>
PostgreSQL
<schemaName>.<tableName>
MongoDB
<databaseName>.<collectionName>
Db2
<schemaName>.<tableName>
Oracle
<databaseName>.<schemaName>.<tableName>
- Now, add the data collection’s name that was created in Step 1 to the table.include.list property.
Signaling data collection structure
Signals sent to a connector to trigger a certain operation are stored in a signaling data collection, also known as a signaling table.
Signaling data collection/Signalling Table
It offers a command pattern based on a source database table for sending commands, also known as signals to Debezium, in order to perform certain activities. The framework is extensible, allowing a connector to provide custom commands in addition to the Debezium core commands.
In order to use the signal table, in the connector’s configuration, the signal.data.collection option must be specified. This option gives the fully qualified name of the table that will be used to source signal requests. The signal table functionality will be deactivated if this option is not supplied or is empty.
The structure of the signaling data collection:
- Contains three fields/columns.
- Fields must be arranged in a certain order.
Field | Data Type | Description |
ID* | STRING | The unique identifier of the signal, for example, UUID. A unique string that identifies a signal instance. Users can assign an ID to each signal that is submitted to the signaling table. Typically the ID is a UUID string. Users can use signal instances for logging, debugging, or de-duplication. Whenever a signal triggers Debezium to perform an incremental snapshot, it generates a signal message with an id string. |
TYPE* | STRING | Specifies the type of signal to send. Some signal types can be used with any connector that supports signaling, while others are exclusively available with specific connectors. |
DATA* | STRING | Specifies JSON-formatted parameters to be passed to a signal action. Each signal type necessitates a specific set of data. |
Create a signaling data collection
In order to create a signaling table, submit a standard SQL DDL query to the source database.
Prerequisites
Sufficient access privileges are required to create a table on the desired database.
Procedure
Now, to create a table, submit a SQL query to the source database that matches the desired structure, as demonstrated in the example below:
Example Credit:
CREATE TABLE debezium_signal (id VARCHAR(42) PRIMARY KEY, type VARCHAR(32) NOT NULL, data VARCHAR(2048) NULL);
Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process.
Hevo supports 100+ data sources (including 40+ free data sources) like Asana and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.
GET STARTED WITH HEVO FOR FREE[/hevoButton]
Check out why Hevo is the Best:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
SIGN UP HERE FOR A 14-DAY FREE TRIAL
Debezium Signals: Actions
To initiate the Debezium Signals, users can use the following actions:
Logging signals
Users are allowed to make a request to a connector to add an entry into the log. This entry is made by creating a signaling table entry with the log signal type. After the signal is processed, a specified message gets printed by the connector to the log.
Users can optionally customize the signal to include the streaming coordinates in the generated message.
Example Credit:
INSERT INTO DEBEZIUM_SIGNALS (ID, TYPE, DATA) VALUES ('1', 'log', 'Hello World');
Example Credit: Signaling record for adding a log message.
Column | Value | Description |
id | 924e3ff8-2245-43ca-ba77-2af9af02fa07 | |
type | log | The action type of the signal. |
data | {“message”: “Signal message at offset {}”} | The message parameter specifies the string to print to the log.If a placeholder ({}) is added to the message, it is replaced with streaming coordinates. |
Schema Changes Signal
Whenever the TYPE field in the signal data collection table is signal-changes, this signal is automatically detected. It instructs Debezium to generate a SchemaChangeEvent to the schema changes topic depending on the changes in the DATA column formatted as JSON. In addition, the signal will have Debezium modify its in-memory representation of the table’s schema structure.
Trigger Ad-hoc Snapshot in Debezium Signals
An ad hoc snapshot signal specifies the table that includes the snapshots. The snapshot can either capture the entire contents of the database or only a selection of the tables.
Users can make a request to a connector to initiate the ad hoc snapshot simply by creating a signaling table entry using the execute-snapshot signal type. After the signal is processed, the connector runs the requested snapshot operation.
After the connector has started feeding change events from a database, an ad hoc snapshot is produced during runtime. At any time, users can initiate ad hoc snapshots.
Following are the Debezium connectors for which Ad hoc snapshots are available: SQL Server, MySQL, PostgreSQL, MongoDB, Oracle, Db2.
Example Credit: Ad hoc snapshot signal record.
Column | Value |
id | d139b9b7-7777-4547-917d-e1775ea61d41 |
type | execute-snapshot |
data | {“data-collections”: [“public.MyFirstTable”, “public.MySecondTable”]} |
Currently, the execute-snapshot action triggers incremental snapshots only.
Debezium features an additional snapshot method known as incremental snapshotting to enable flexibility in managing snapshots.
Incremental snapshots
During the incremental snapshot, the connector captures the initial state of the tables that users may want to specify. It also employs a watermarking technique to trace the progress of the snapshot. Additionally, an incremental snapshot captures tables in chunks and not all at once.
Advantages of Incremental Snapshots in Debezium Signals
- The near-real-time event streaming from the transaction log continues uninterrupted.
- If in case the incremental snapshot process is interrupted, it can be resumed from the point at which it stopped.
- An incremental snapshot can be initiated at any moment by the user.
Incremental Snapshot Process in Debezium Signals
When users run an incremental snapshot –
- Debezium sorts each table by a primary key and then divides the table into chunks.
- Working in chunks, it captures each table row in a chunk.
- For each row, the snapshot generates a READ event.
- That event indicates the value of the row when the snapshot of the chunk is started.
- Further, it can also perform the following operations: INSERT, UPDATE, or DELETE
Conclusion
In this blog, we learned in detail about Debezium, Debezium Connector, and the Debezium signals that are sent to the connectors. We also learned how to configure these Debezium Signals and what all actions are taken in order to send these signals.
visit our website to explore hevo
Hevo can help you Integrate your data from numerous sources and load them into a destination to Analyze in real-time from 100+ data sources such as SaaS applications or Databases into your Redshift Data Warehouse to be visualized in a BI tool. It will make your life easier and data migration hassle-free. and it’s user-friendly, reliable, and secure.
Hevo Product Video
SIGN UP for a 14-day free trial and see the difference!
Share your experience of learning about Debezium Signals in the comments section below.