Today, in microservices architecture, a large number of applications are communicating with each other. Thus, application performance monitoring is useful for debugging a single application. However, when an application expands into multiple services, it is important to know the time taken by each service, at what stage the exception occurs, and the system’s overall health.

In short, it is essential to understand how to measure and evaluate network latency services in order to know the time taken by a request to transit from one application to another.

This is when the distributed tracing technique comes into play. With the higher-level deployment of services in a cloud-based environment, tracing has become a critical component of the cloud architecture that supports such services.

The tracing support is externally added in Debezium through the OpenTracing specification. In this blog, we will learn about distributed tracing in Debezium.

Prerequisites 

Knowledge about data streaming.

What is Debezium?

Distributed Tracing Debezium: debezium logo
  • Debezium is an open-source project and a data streaming platform for change data capture (CDC). Because it is a CDC platform, Debezium can easily achieve its durability, reliability, and fault tolerance qualities by reusing Kafka and Kafka Connect.
  • Debezium monitors the databases, and then the applications consume events for each row-level change made to databases. As a result, it is also known for providing low latency. 
  • Furthermore, Debezium provides a single model of all change events, eliminating the need for applications to worry about the complexities of each database management system.
  • Debezium may also be stopped and resumed at any moment since it captures the history of data changes in durable, duplicate logs. It may also consume all events that occurred while it was not operating, guaranteeing that all events are processed appropriately.

What is Tracing?

Distributed Tracing Debezium: Tracing in Microservice
Distributed Tracing Debezium: Tracing in Microservice

Introduction to Tracing

Tracing an application’s behavior entails acquiring information about it. However, traditional tracing mechanisms have problems troubleshooting applications based on a distributed software architecture.

Since microservices scale independently, it is normal to have numerous versions of the same service operating at the same time on various servers, locations, and environments.

This might result in a complicated network through which a request must travel. Traditional solutions built for a single service application make tracking these requests extremely hard. 

What is OpenTracing?

In OpenTracing, a trace depicts the workflow as it propagates across a distributed system. Micro-service-oriented applications rely heavily on observability. Although tracing offers visibility into an application as the number of processes grows, instrumenting a system for tracing has been time-consuming.

To mitigate such challenges, the use of the OpenTracing standard can make it simple to instrument applications for distributed tracing. Furthermore, in OpenTracing, a distributed trace is a collection of spans, where each span reflects a logical work unit that has been completed. 

What is Distributed Tracing in Debezium?

Introduction to Distributed Tracing

The distributed tracing mechanism can solve the problems faced by OpenTracing and numerous other performance issues because it can track requests through each service and can provide an end-to-end description of that request.

Distributed tracing, also known as distributed request tracing, allows users to monitor applications built on a microservices architecture. Through this mechanism, users can track the path of a request or transaction as it propagates through applications that are monitored over distributed cloud environments.

Distributed Tracing in Debezium

Whenever an application writes a record into a database that is later processed by Debezium, it must take additional measures. The active tracing is demarcated by the write to the database. 

Debezium’s Integration with OpenTracing

The Debezium integration with Distributed Tracing comprises of three components:

  1. ActivateTracingSpan SMT
  2. Outbox Extension
  3. Event Router SMT

ActivateTracingSpan SMT

The distributed tracing in Debezium is enabled by using the ActivateTracingSpan SMT.

In Debezium, the main implementation point of tracing is the ActivateTracingSpan SMT (Single Message Transformation). Here, the tracing span context is provided by the application writing to the database.

The span context must be injected into a java.util.Properties instance that is serialized and written to the database as a separate table field. 

The SMT will build a new span if the span context is not specified, and Debezium operations and metadata will be traced in this situation.

Once the SMT is invoked with a message:
  1. If the parent span context is present in the message, extract it.
  2. Now, create the event db-log-write span context along with the start timestamp set into the database log and write timestamp.
  3. As tags, insert the fields from the source block into the span.
  4. Now, as a child of db-log-write span, create a processing Debezium-read span with the start timestamp set to the processing time of the even.
  5. Again, as tags, insert fields from the envelope in the processing span.
  6. Now, into message headers, inject the processing span context.

Outbox Extension

With tracing integration, it is challenging to keep the trace across process boundaries. This is because all the spans that are related to each other are recorded in the same trace to enable end-to-end tracing.

However, the OpenTracing specification specifies how to export and import trace-related metadata, allowing the trace to be shared across processes. This approach is used in the Outbox extension to export the metadata into a specific column in the outbox table, which the event router SMT can then import and continue the tracing.

The objective of the Outbox extension is to provide a Quarkus application that facilitates the use of the Outbox pattern paired with Debezium’s CDC connector pipeline. This is to exchange data with any consumer of the data in a reliable and asynchronous method.

Outbox Quarkus Extension

The Outbox Quarkus extension is inspired by the Outbox Event Router single message transformation (SMT). Microservices frequently need to communicate with one another, and using the Outbox pattern combined with Outbox Event Router SMT in Debezium is an excellent way to deal.

Once an outbox event is emitted or arrives at the EventDispatcher, the extension will:

  1. Create a new outbox-write span as a child of the current active span or as a root span if no parent span is available.
  1. The span metadata is exported into a distinct field of the outbox event, then the outbox event is written to the outbox table.
  1. The Event Router SMT receives the event and imports the span metadata from the field.

Two new spans are created:

db-log-write using database write timestamp as its start timestamp. The source block’s fields are added to the span as tags.

Debezium-read using the processing timestamp as its start time. The envelope’s fields are added to the span as tags.

Event Router SMT

The outbox pattern is a great method for exchanging data between numerous microservices in a secure and reliable manner. Implementing it prevents inconsistencies between an internal state of the service and the state of events received by services that require the same data.

Configure a Debezium connector to implement the outbox pattern in a Debezium application by:

  • Changes in an outbox table should be captured.
  • Apply the Debezium outbox event router single message transformation (SMT).

Only changes in an outbox table should be captured by a Debezium connection set to apply the outbox SMT.

Benefits of Distributed Tracing 

  • Distributed tracing aids teams in getting to the root of application performance issues even before users are aware that something is wrong. Once the issue is discovered, organizations can quickly identify and treat the core cause of an issue.
  • In addition, observability can also detect performance bottlenecks everywhere in the software stack and flag code that should be improved, giving teams early warnings when microservices are in trouble.
  • Distributed tracing improves cooperation and communication across teams by pinpointing the specific regions where problems are present. This strengthens the working connections that are necessary for quick troubleshooting as well as providing business-growth ideas.
Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Get Started with Hevo for Free

Tracing Option – Kafka Producer tracing

Tracing can be enabled at the Kafka producer level if desired. If enabled, the producer will extract Debezium’s processing span context from the Kafka message headers, create a new child span, and record information about the write to the broker when the message is sent to the broker.

The new span is then injected into the message headers, allowing a message consumer to recover the trace and restart end-to-end tracing.

Explore our guide on Debezium serialization to learn how to use Avro and Apicurio Registry for optimized data serialization.

Conclusion 

Microservices architecture is the current trend in application development. While this strategy provides significant flexibility for developer teams in terms of autonomous deployments and development pace, it has a downside when trying to track down a bug in production.

This is where Distributed Tracing comes to the rescue of organizations. As a result, today, organizations can bring new applications to market faster, giving them a competitive edge.

In this blog, you learned about Tracing mechanisms, such as OpenTracing and Distributed Tracing in Debezium, and why they are useful.

MongoDB is a trusted source that a lot of companies use as it provides many benefits but transferring data from it into a data warehouse is a hectic task. The Automated data pipeline helps in solving this issue and this is where Hevo comes into the picture.

Share your experience of learning about Distributed Tracing in Debezium in the comments section below. 

Shravani Kharat
Technical Content Writer, Hevo Data

Shravani is a passionate data science enthusiast interested in exploring complex topics within the field. She excels in data integration and analysis, skillfully solving intricate problems and crafting comprehensive content tailored for data practitioners and businesses. Shravani’s analytical prowess and dedication to delivering high-quality, informative material make her a valuable asset in data science.

No-code Data Pipeline For Your Data Warehouse