Distributed Tracing in microservice applications using Debezium: Easy Guide

on Data Streaming, Debezium, Distributed Tracing, Tracing • February 28th, 2022 • Write for Hevo

distributed tracing - featured image

Today, in microservices architecture, a large number of applications are communicating with each other. Thus, application performance monitoring is useful for debugging a single application. However, when an application expands into multiple services, it is important to know the time taken by each service, at what stage the exception occurs, and the system’s overall health. In short, it is essential to understand how to measure and evaluate network latency services in order to know the time taken by a request to transit from one application to another.

This is when the distributed tracing technique comes into play. With the higher-level deployment of services in a cloud-based environment, tracing has become a critical component of the cloud architecture that supports such services. The tracing support is externally added in Debezium through the OpenTracing specification. In this blog, we will learn about distributed tracing in Debezium.

Table of Contents

Prerequisites 

Knowledge about data streaming.

What is Debezium?

Distributed Tracing Debezium: debezium logo
Image Source: debezium.io

Debezium is an open-source project and a data streaming platform for change data capture (CDC). Because it is a CDC platform, Debezium can easily achieve its durability, reliability, and fault tolerance qualities by reusing Kafka and Kafka Connect. Debezium monitors the databases, and then the applications consume events for each row-level change made to databases. As a result, it is also known for providing low latency. 

Furthermore, Debezium provides a single model of all change events, eliminating the need for applications to worry about the complexities of each database management system. Debezium may also be stopped and resumed at any moment since it captures the history of data changes in durable, duplicate logs. It may also consume all events that occurred while it was not operating, guaranteeing that all events are processed appropriately.

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process. It supports 100+ data sources (including 40+ free data sources) like Asana and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

GET STARTED WITH HEVO FOR FREE[/hevoButton]

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
SIGN UP HERE FOR A 14-DAY FREE TRIAL

What is Tracing?

Distributed Tracing Debezium: Tracing in Microservice
Image Source: site24x7static.com

Introduction to Tracing

Tracing an application’s behavior entails acquiring information about it. However, traditional tracing mechanisms have problems troubleshooting applications based on a distributed software architecture. Since microservices scale independently, it is normal to have numerous versions of the same service operating at the same time on various servers, locations, and environments. This might result in a complicated network through which a request must travel. Traditional solutions built for a single service application make tracking these requests extremely hard. 

What is OpenTracing?

In OpenTracing, a trace depicts the workflow as it propagates across a distributed system. Micro-service-oriented applications rely heavily on observability. Although tracing offers visibility into an application as the number of processes grows, instrumenting a system for tracing has been time-consuming. To mitigate such challenges, the use of the OpenTracing standard can make it simple to instrument applications for distributed tracing. Furthermore, in OpenTracing, a distributed trace is a collection of spans, where each span reflects a logical work unit that has been completed. 

What is Distributed Tracing in Debezium?

Introduction to Distributed Tracing

The distributed tracing mechanism can solve the problems faced by OpenTracing and numerous other performance issues because it can track requests through each service and can provide an end-to-end description of that request.

Distributed tracing, also known as distributed request tracing, allows users to monitor applications built on a microservices architecture. Through this mechanism, users can track the path of a request or transaction as it propagates through applications that are monitored over distributed cloud environments.

Distributed Tracing in Debezium

Whenever an application writes a record into a database that is later processed by Debezium, it must take additional measures. The active tracing is demarcated by the write to the database. 

Debezium’s Integration with OpenTracing

The Debezium integration with Distributed Tracing comprises of three components:

  1. ActivateTracingSpan SMT
  2. Outbox Extension
  3. Event Router SMT

ActivateTracingSpan SMT

The distributed tracing in Debezium is enabled by using the ActivateTracingSpan SMT.

In Debezium, the main implementation point of tracing is the ActivateTracingSpan SMT (Single Message Transformation). Here, the tracing span context is provided by the application writing to the database. The span context must be injected into a java.util.Properties instance that is serialized and written to the database as a separate table field. 

The SMT will build a new span if the span context is not specified, and Debezium operations and metadata will be traced in this situation.

Once the SMT is invoked with a message:
  1. If the parent span context is present in the message, extract it.
  2. Now, create the event db-log-write span context along with the start timestamp set into the database log and write timestamp.
  3. As tags, insert the fields from the source block into the span.
  4. Now, as a child of db-log-write span, create a processing Debezium-read span with the start timestamp set to the processing time of the even.
  5. Again, as tags, insert fields from the envelope in the processing span.
  6. Now, into message headers, inject the processing span context.

Outbox Extension

With tracing integration, it is challenging to keep the trace across process boundaries. This is because all the spans that are related to each other are recorded in the same trace to enable end-to-end tracing. However, the OpenTracing specification specifies how to export and import trace-related metadata, allowing the trace to be shared across processes.

This approach is used in the Outbox extension to export the metadata into a specific column in the outbox table, which the event router SMT can then import and continue the tracing. The objective of the Outbox extension is to provide a Quarkus application that facilitates the use of the Outbox pattern paired with Debezium’s CDC connector pipeline. This is to exchange data with any consumer of the data in a reliable and asynchronous method.

Outbox Quarkus Extension

The Outbox Quarkus extension is inspired by the Outbox Event Router single message transformation (SMT). Microservices frequently need to communicate with one another, and using the Outbox pattern combined with Outbox Event Router SMT in Debezium is an excellent way to deal.

Once an outbox event is emitted or arrives at the EventDispatcher, the extension will:

  1. Create a new outbox-write span as a child of the current active span or as a root span if no parent span is available.
  1. The span metadata is exported into a distinct field of the outbox event, then the outbox event is written to the outbox table.
  1. The Event Router SMT receives the event and imports the span metadata from the field.

Two new spans are created:

db-log-write using database write timestamp as its start timestamp. The source block’s fields are added to the span as tags.

Debezium-read using the processing timestamp as its start time. The envelope’s fields are added to the span as tags.

Event Router SMT

The outbox pattern is a great method for exchanging data between numerous microservices in a secure and reliable manner. Implementing it prevents inconsistencies between an internal state of the service and the state of events received by services that require the same data.

Configure a Debezium connector to implement the outbox pattern in a Debezium application by:

  • Changes in an outbox table should be captured.
  • Apply the Debezium outbox event router single message transformation (SMT).

Only changes in an outbox table should be captured by a Debezium connection set to apply the outbox SMT.

Benefits of Distributed Tracing 

Distributed tracing aids teams in getting to the root of application performance issues even before users are aware that something is wrong. Once the issue is discovered, organizations can quickly identify and treat the core cause of an issue. In addition, observability can also detect performance bottlenecks everywhere in the software stack and flag code that should be improved, giving teams early warnings when microservices are in trouble.

Distributed tracing improves cooperation and communication across teams by pinpointing the specific regions where problems are present. This strengthens the working connections that are necessary for quick troubleshooting as well as providing business-growth ideas.

Tracing Option – Kafka Producer tracing

Tracing can be enabled at the Kafka producer level if desired. If enabled, the producer will extract Debezium’s processing span context from the Kafka message headers, create a new child span, and record information about the write to the broker when the message is sent to the broker. The new span is then injected into the message headers, allowing a message consumer to recover the trace and restart end-to-end tracing.

Conclusion 

Microservices architecture is the current trend in application development. While this strategy provides significant flexibility for developer teams in terms of autonomous deployments and development pace, it has a downside when trying to track down a bug in production. This is where Distributed Tracing comes to the rescue of organizations. As a result, today, organizations can bring new applications to market faster, giving them a competitive edge.

In this blog, you learned about Tracing mechanisms, such as OpenTracing and Distributed Tracing in Debezium, and why they are useful.

MongoDB is a trusted source that a lot of companies use as it provides many benefits but transferring data from it into a data warehouse is a hectic task. The Automated data pipeline helps in solving this issue and this is where Hevo comes into the picture. Hevo Data is a No-code Data Pipeline and has awesome 100+ pre-built Integrations that you can choose from.

visit our website to explore hevo

Hevo can help you Integrate your data from numerous sources and load them into a destination to Analyze real-time data with a BI tool such as Tableau. It will make your life easier and data migration hassle-free. It is user-friendly, reliable, and secure.

SIGN UP for a 14-day free trial and see the difference!

Share your experience of learning about Distributed Tracing in Debezium in the comments section below. 

No-code Data Pipeline For Your Data Warehouse