Kafka CDC SQL Server: A Detailed Guide

on Data Integration, Tutorials • June 26th, 2020

KAFKA CDC SQL SERVER

Table of contents

Introduction

The main objective of this post is to introduce you to Apache Kafka so that you can build a strong foundation and understand how to stream change data in your SQL Server database. It follows a stepwise approach to ‘Kafka CDC SQL Server’.

In this post, you will learn how to leverage Apache Kafka to continuously stream DML and DDL events that are committed to your SQL Server tables in real-time so that they can be easily consumed by other applications and services.

The change data capture functionality was introduced in SQL Server 2016 SP1. This feature allows SQL Server users to monitor databases and tables and it stores the changes that are committed into specifically created CDC tables.

Kafka

Kafka is a distributed streaming platform that is built on the principles of a messaging system. Over the years, it has evolved into a full-fledged streaming platform that offers all the core capabilities to implement stream processing applications over a real-time data pipeline.

Kafka CDC SQL server : Apache Kafka
Image from Apache Kafka

Why Use Kafka?

If you have data sitting in a SQL database, it could be:

  • New log entries from your web servers.
  • New sensor data from your IoT systems.
  • New stock trades.

How do you get new data into your Kafka cluster? Kafka lets you publish and process this data in real-time to your distributed cluster.

What Can It Do?

  • Kafka can be used as an enterprise Pub/Sub messaging service. 
  • Kafka can be used for stream processing. 
  • Kafka provides connectors to import and export change records from databases and other systems as they occur.

In this post, you will learn how to stream data from a SQL Server source and then read the stream and load it into a destination. You will utilize the Debezium SQL Server connector, an open-source distributed platform built on top of Kafka Connect. Debezium is used for streaming rows from SQL Server Change Data Capture (CDC) to Kafka topics.

Prerequisites

Methods To Achieve SQL Server CDC

Method 1: SQL CDC Using Kafka

In this method, you will use Apache Kafka to continuously stream data that is committed to your SQL Server tables in real-time. You will use the Debezium SQL Server connector for streaming rows from SQL Server Change Data Capture (CDC) to Kafka topics.

Method 2: SQL CDC Using Hevo

GET STARTED WITH HEVO FOR FREE[/hevoButton]

Hevo, a No-code Data Pipeline, can help you load data from source tables into target tables without having to write or maintain complex code.

SIGN UP HERE FOR A 14-DAY FREE TRIAL

Method 1: SQL CDC Using Kafka

Steps Overview

Step 1: Enable CDC For The Table(s) That Should Be Captured By The Connector

1. Enable CDC For The SQL Server Database

USE TestDB 
GO 
EXEC sys.sp_cdc_enable_db 
GO

2. Enable CDC For The SQL Server Database Table That You Are Tracking

USE TestDB 
GO 
EXEC sys.sp_cdc_enable_table 
@source_schema = N'dbo', 
@source_name = N'Posts', 
@role_name = N'Admin', 
@supports_net_changes = 1 
GO

Step 2: Install Java On Your Linux VM

Kafka libs are based on Java and therefore we should download it and add it to an environment variable.

> sudo yum install java-1.8.0
> export JAVA_HOME=/usr/lib/jvm/jre-1.8.0

Step 3: Download And Install Apache Kafka

Download the 2.5.0 release and un-tar it.

> mkdir kafka/
> cd kafka
> wget https://archive.apache.org/dist/kafka/2.5.0/kafka_2.12-2.5.0.tgzz
> tar -xzf kafka_2.12-2.5.0.tgz
> cd kafka_2.12-2.5.0

Step 4: Download And Install A Debezium Connector

Download the SQL Server Connector plug-in archive and extract the files into your Kafka Connect environment. Next, add the parent directory of the extracted plug-in(s) to Kafka Connect’s plugin path.

> wget https://repo1.maven.org/maven2/io/debezium/debezium-connector-sqlserver/1.2.0.Final/debezium-connector-sqlserver-1.2.0.Final-plugin.tar.gz
> tar -xzf /kafka/connect/debezium-connector-sqlserver
> plugin.path=/kafka/connect

Step 5: Start A Kafka Server

The next step is to start the Kafka server but Kafka uses Zookeeper — another open-source project that came out of the Hadoop project. Zookeeper is used to coordinate various processes on Kafka’s distributed cluster of brokers. 

Kafka provides a CLI tool to start Zookeeper which is available in your bin directory. On a different CLI window run the following command.

> bin/zookeeper-server-start.sh config/zookeeper.properties
[2020-07-19 19:08:32,351] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
...

This tool takes some configurations and we provide a config file with all the default values.

Now start the Kafka server.

> bin/kafka-server-start.sh config/server.properties
[2020-07-19 19:08:48,052] INFO Verifying properties (kafka.utils.VerifiableProperties)
[2020-07-19 19:08:48,101] INFO Property socket.send.buffer.bytes is overridden to 1044389 (kafka.utils.VerifiableProperties)
...

Step 6: Configure Kafka Connect

The Kafka Connect configuration can be loaded into Kafka Connect via the REST API. The following example adds the Debezium SQL Server connector configuration in a JSON format. 

This example specifies that the Debezium connector instance should monitor a SQL Server instance running at port 1433 on 102.5.232.115.

curl -k -X POST -H "Accept:application/json" -H "Content-Type:application/json" https://localhost:8083/connectors/ -d “
{
  "name": "sqlserver-posts-connector",  
  "config": {
    "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector", 
    "database.hostname": "102.5.232.115", 
    "database.port": "1433", 
    "database.user": "admin", 
    "database.password": "Supersecretpassword!", 
    "database.dbname": "TestDB", 
    "database.server.name": "staging", 
    "table.whitelist": "dbo.Posts", 
    "database.history.kafka.bootstrap.servers": "kafka:9092", 
    "database.history.kafka.topic": "dbhistory.Posts" 
  }
}
”

Now check that the connector is running successfully using this command.

curl -k https://localhost:8083/connectors/sqlserver-posts-connector/status

Step 7: Monitor Kafka Logs For Change Data

If the connector is in RUNNING state, you should see the Connect worker logs indicating the successful ingestion of data from SQL Server to Kafka.

Method 2: SQL CDC Using Hevo

This article introduced you to Kafka CDC SQL Server. But there are challenges you might face with hand-coding. You need tech bandwidth and expertise to manage a clean execution. Otherwise, there is the unnecessary risk of getting bogged down in the complexity and the details. So things become a lot smoother and opportunities suddenly become abundant when you let the right platform handle this.

Hevo supports SQL Server CDC out of the box and provides an intuitive graphical interface to enable this feature. Also, you will get a granular activity trail of all the changes made and synced to any target database or data warehouse. 

visit our website to explore hevo[/hevoButton]

Here’s how easy it is to set up SQL Server CDC with Hevo:

  • Authenticate and connect your SQL Server data source.
  • Select Change Data Capture as your replication mode and configure your destination.

Hevo’s point-and-click interface ensures the lowest time for production possible. Check out Hevo’s cool features:

  • Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • Reliable Data Migration: Hevo’s robust infrastructure ensures that there is no data loss and the data transfer is highly reliable.
  • Schema Management: Hevo takes away the tedious task of schema management and automatically detects the schema of incoming data and maps it to the destination schema.
  • Secure: End-to-end encryption and two-factor authentication ensure that your data is secured.
  • Minimal Learning: Hevo with its simple and interactive UI, is extremely intuitive for new customers to work on and perform operations.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to export. 

Conclusion

In this blog, You would have got a detailed understanding of Kafka. The two methods to set up SQL Server CDC using Kafka and Hevo are explained in detail. Get started with Hevo for seamless SQL Server CDC set up.

Watch this short product video to understand how Hevo works.

SIGN UP for a 14-day free trial and see the difference!

What are your thoughts on Kafka CDC SQL Server? Let us know in the comments!

No-code Data Pipeline for MS SQL