Connect Kafka to SQL Server: 2 Easy Methods

on Data Integration, ETL, Tutorials • November 19th, 2020 • Write for Hevo

Do you want to transfer your Microsoft SQL Server data using Apache Kafka? Are you finding it challenging to connect Apache Kafka to SQL Server? If yes, then you’ve landed at the right place! This article will answer all your queries & relieve you of the stress of finding a truly efficient solution. Follow our easy step-by-step guide to help you master the skill of efficiently transferring your data from Microsoft SQL Server using Apache Kafka.

It will help you take charge in a hassle-free way without compromising efficiency. This article aims at making the data export process as smooth as possible.

Upon a complete walkthrough of the content, you will be able to successfully set up a connection between Apache Kafka & Microsoft SQL Server to seamlessly transfer data to/from Microsoft SQL Server for a fruitful analysis in real-time. It will further help you build a customized ETL pipeline for your organization. Through this article, you will get a deep understanding of the tools and techniques & thus, it will help you hone your skills further.

Table of Contents

Introduction to Apache Kafka

Kafka Logo.

Apache Kafka is a popular real-time data streaming software that allows users to store, read and analyze streaming data using its open-source framework. Being open-source, it is available free of cost to users. Leveraging its distributed nature, users can achieve high throughput, minimal latency, computation power, etc. and handle large volumes of data with ease.

Written in Scala, Apache Kafka supports bringing in data from a large variety of sources and stores them in the form of “topics” by processing the information stream. It uses two functions, namely Producers, which act as an interface between data source and topics, and Consumers, which allow users to read and transfer the data stored in Apache Kafka.

Key features of Apache Kafka:

  • Scalability: Apache Kafka has exceptional scalability and can be scaled easily without downtime.
  • Data Transformation: Apache Kafka offers Kstream and KSQL (in case of Confluent Kafka) for on the fly data transformation.
  • Fault-Tolerant: Apache Kafka uses Brokers to replicate data and persists the data to make it a fault-tolerant system.
  • Security: Apache Kafka can be combined with various security measures like Kerberos to stream data securely.
  • Performance: Apache Kafka is distributed and partitioned and has very high throughput for publishing and subscribing to the message.

For further information on Apache Kafka, you can check the official website here.

Introduction to Microsoft SQL Server

SQL Server Logo.

Microsoft SQL Server, also known as MS SQL, is a relational database developed by Microsoft to store the structured data. Microsoft SQL Server is built on top of SQL (Structured Query Language) to interact with database objects. Microsoft SQL Server uses Transact-SQL or T-SQL (Microsoft proprietary language) as its programming interface, such as declaring the variable, stored procedure, exception handling, etc. Initially, Microsoft SQL Server ran only on Windows Server and Windows devices, however now it is supported on Linux as well.

Key features of Microsoft SQL Server:

  • Scalable: It is easy to scale and supports large amounts of data.
  • Performance: It performs exceptionally well on both Windows & Linux.
  • Secure: It ensures data security & availability irrespective of whether the data is at rest or in use. It further supports data recovery during crashes/failures.
  • Integrations: It is easy to integrate MS SQL with tools like Hadoop for big-data analytics using T-SQL commands.
  • Threading: MS SQL efficiently supports multi-threading & parallel processing, even with a massive amount of data & ensures powerful analytics.

For further information on Microsoft SQL Server, you can check the official Microsoft website here.

Ways to Connect Apache Kafka With Microsoft SQL Server

Method 1: Using the Debezium SQL Server Connector to Connect Apache Kafka to SQL Server 

Using Debezium’s SQL Server connector to load data from Microsoft SQL Server to Apache Kafka is one such way. Apache Kafka facilitates users to transfer their data from their desired source such as Microsoft SQL Server to a destination of their choice by using one of the connectors provided by Confluent Hub. This method requires you to configure your SQL Server instance and install the Debezium SQL Server connector to transfer data by making use of the Change Data Capture (CDC) functionality to your Kafka Consumer with the help of POST REST API calls.  

Method 2: Using Hevo Data, a No-code Data Pipeline

A fully managed, No-code Data Pipeline platform, like Hevo Data, helps you load data from Apache Kafka and 100+ sources including Microsoft SQL Server and transfer it to a data warehouse/destination of your choice such as Microsoft SQL Server in real-time, in an effortless manner. Hevo, with its minimal learning curve, can be set up in a matter of minutes making the users ready to load data without compromising performance. Its strong integration with various sources such as databases, files, analytics engine, etc. gives users the flexibility to bring in data of all different kinds in a way that’s as smooth as possible, without having to write a single line of code.

Get started with Hevo today! Sign up here for a 14-day free trial! 

Prerequisites

  • Working knowledge of Apache Kafka.
  • Working knowledge of MS SQL Server
  • A general idea of Change Data Capture.
  • MS SQL installed at the host workstation.
  • Apache Kafka installed at the host workstation.

Methods to Connect Apache Kafka to SQL Server

There are multiple ways in which you can connect Apache Kafka to SQL Server:

Method 1: Using the Debezium SQL Server Connector to Connect Apache Kafka to SQL Server

Apache Kafka supports connecting with Microsoft SQL Server and numerous other databases/data warehouses with the help of various in-built connectors. These connectors help bring in data from a source of your choice to Apache Kafka and then stream it to the destination of your choice from Kafka Topics.

Kafka to SQL Server using Debezium

This can be implemented using the following steps:

Step 1: Configuring Microsoft SQL Server to Enable CDC

To configure your Microsoft SQL Server instance, launch the Microsoft SQL Server Management Studio on your system, and select the properties option by right-clicking on the server option in your object explorer.

Selecting the Properties Option.

Once you’ve clicked on it, a new window will now open up on your screen representing the server properties. To allow the Debezium’s SQL Server connector to fetch data in real-time from your Microsoft SQL Server instance, click on the “Security” option and choose “SQL Server and Windows Authentication” as the desired authentication mechanism.

Configuring Server Authentication.

Now restart your Microsoft SQL Server instance to bring the changes in effect. With your Microsoft SQL Server Management Studio now up and running, expand the “Security” folder, found in the object explorer and right-click on the “Login” folder. 

You now need to create a new login for your Microsoft SQL Server. To do this, add a new username as “testuser” along with a password of your choice. 

Setting up the User in SQL Server.

Click on the server roles option found in the panel on the left and select “sysadmin” from the list of options available on your screen and click on okay.

Selecting the Sysadmin Server Roles.

Once you’ve made all the necessary configurations, you now need to enable the CDC functionality. To do this, use your Windows Powershell to start the Notepad application & create a new SQL file as “Test_DB.sql”. Add the following lines of code to your file and save it:

# Creating a temporary test database 
CREATE DATABASE Test_DB;
GO
USE Test_DB;
EXEC sys.sp_cdc_enable_db;ALTER DATABASE Test_DB
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)

# Creating a temporary table & enabling CDC for the same
CREATE TABLE prospects (
  id INTEGER IDENTITY(1001,1) NOT NULL PRIMARY KEY,
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL UNIQUE
);
INSERT INTO prospects(first_name,last_name,email)
  VALUES ('Jeff','Thomas','thomas.jeff@afer.com');
INSERT INTO customers(first_name,last_name,email)
  VALUES ('Sasha','Bailey','s.bailey@foobar.com');
INSERT INTO customers(first_name,last_name,email)
  VALUES ('Edward','Hart','edhart@walker.com');

EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'fruit', @role_name = NULL, @supports_net_changes = 0;
GO

With your SQL file now ready, restart your Microsoft SQL Server instance using the following lines of code:

net stop MSSQLSERVER
net start MSSQLSERVER

Launch your Microsoft SQL Service Agent and execute your “Test_DB.sql” file using the following lines of code:

net start SQLSERVERAGENT
sqlcmd -U testuser -P {your password} -i {location of Test_DB.sql}
Executing SQL File using th Windows Powershell.

This is how you can configure Microsoft SQL Server and enable the CDC functionality to connect Apache Kafka to SQL Server.

Step 2: Installing Apache Kafka on your Workstation

With your Microsoft SQL Server now ready, you need to download and install Apache Kafka, either on standalone or distributed mode. You can check out the following links & follow Apache Kafka’s official documentation, that will help you get started with the installation process:

Step 3: Installing the Debezium Microsoft SQL Server Connector for Apache Kafka

Confluent provides users with a diverse set of in-built connectors that act as a source or a sink and allow users to transfer data to the destination of their choice from their desired data source such as Microsoft SQL Server via Apache Kafka. One such connector that lets users connect Apache Kafka to SQL Server is the Debezium SQL Server connector for Apache Kafka.

To install the Debezium SQL Server connector, go to Confluent Hub’s official website and search for Microsoft SQL Server using the search bar found at the top of your screen. In case you’re finding it challenging to locate the correct connector, you can click here to go to the connector’s page directly.

Installing the Debezium SQL Server Connector.

Once you’ve found the desired Debezium SQL Server connector, you can now download the connector by executing the following command on the Confluent CLI:

confluent-hub install debezium/debezium-connector-sqlserver:1.2.2

In case you don’t have the Confluent CLI installed on your system, you can click here and follow Confluent’s official documentation that will help you install and configure the Confluent CLI. 

You can also click on the download button to install the connector. Once you’ve clicked on it, a zip file will now start downloading on your system. Extract the zip file and copy all jar files, found in the lib folder to your Confluent installation.

This is how you can install the Debezium SQL Server connector that will help you connect Apache Kafka to SQL Server.

Are you tired of following a tedious setup process? Click here to check out the method of using Hevo Data that will let you integrate Apache Kafka to SQL Server in an automated and hassle-free manner!

Step 4: Starting the Apache Kafka, Zookeeper & Connect Server

Once you’ve installed the Debezium SQL Server connector for Apache Kafka, you now need to start the Zookeeper, Kafka and schema registry. To do this, execute the following lines of code on different terminals:

/bin/zookeeper-server-start ./etc/kafka/zookeeper.properties
./bin/kafka-server-start  ./etc/kafka/server.properties
./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties

In case you’re running Confluent Platform v5.3 or above, you can also use the following line of code:

confluent local start connect

Once you’ve enabled your Apache Kafka servers, you will have to create a JSON configuration file for Kafka Connect, that you can load using a POST REST API call. It will instruct the Debezium connector to monitor your Microsoft SQL Server instance, using the particular port number and host IP. You can make the API call using the following lines of code:

curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{ "name": "test-connector", 
"config": { 
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector", "database.hostname": "{Private IP Address}", 
"database.port": "1433", 
"database.user": "testuser",
 "database.password": "password!", 
"database.dbname": "TestDB", 
"database.server.name": "TestDB", 
"table.whitelist": "dbo.topic_name", 
"database.history.kafka.bootstrap.servers": "localhost:9092", 
"database.history.kafka.topic": "dbhistory.fulfillment" } 
}';

You can now verify if the Debezium Connector is running properly or not, using the following curl command:

curl -k https://localhost:8083/connectors/sqlserver-posts-connector/status

This is how you can start the Apache Kafka, Zookeeper & Connect server.

Step 5: Integrating Apache Kafka to SQL Server to Start Ingesting Data

With Apache Kafka servers now up & running on your system, you can start ingesting data from Microsoft SQL Server. To do this, you will have to create a Consumer for your Apache Kafka server that will read data from your desired topic of the Test_DB. You can use the following line of code to create a Kafka Consumer:

/etc/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic Test_DB.dbo.topic_name --from-beginning

You will now be able to see that your Kafka Consumer starts fetching data from your Kafka Topic in the JSON format.

Kafka Consumer Fetching Data.

You can also verify if Apache Kafka is consuming your data correctly or not by modifying your database and then checking your Kafka Consumer output. To do this, launch Windows Powershell on your system and create a new SQL file as “new_insert.sql” with the following lines of code:

use TestDB;
INSERT INTO customers(first_name,last_name,email)
VALUES ('Helen','Keyler','keyler.helen@wr.com');

Save your file and execute it using the following command:

sqlcmd -U testuser -P password! -i {location of new_insert.sql}

You will now be able to see Apache Kafka consuming new records from your Microsoft SQL Server table as follows:

Kafka Consuming New Records.

This is how you can connect Apache Kafka to SQL Server using the Debezium SQL Server connector to start streaming your data seamlessly.

Limitations of using the Debezium SQL Server Connector:

  • Using the Debezium SQL Server connector requires you to write custom code and make API calls and, hence you must have strong technical knowledge.
  • Apache Kafka Topics are case-sensitive, and hence you must be careful while making API calls to connect the Debezium SQL Server connector to avoid any failures/exceptions while streaming data.
  • The Debezium SQL Server connector monitors and streams any data changes made to your source table, however, it is not able to replicate any structure-based changes of your source table. It thus requires users to modify the structure of the capture table manually whenever the source table structure changes.  

Method 2: Using Hevo Data, a No-code Data Pipeline

Hevo Logo.

Hevo Data, a No-code Data Pipeline, helps you transfer data from Apache Kafka (among 100+ sources) to Microsoft SQL Server & lets you visualize it in a BI tool. Hevo is fully-managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. It allows you to focus on key business needs and perform insightful analysis using various BI tools such as Power BI, Tableau, etc. 

Steps to use Hevo Data:

Hevo Data focuses on two simple steps to get you started:

  • Configure Source: Connect Hevo Data with a data source of your choice such as Apache Kafka.
  • Integrate Data: Load data from Apache Kafka to SQL Server by providing a unique name for your destination along with your database credentials such as username and password. To help Hevo connect with your Microsoft SQL Server database, you will also have to provide information such as the host IP, port number and the name & schema of your database.

Check out what makes Hevo amazing:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.

Simplify your data analysis with Hevo today! Sign up here for a 14-day free trial!

Conclusion

This article teaches you how to connect Apache Kafka to SQL Server with ease. It provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently. These methods, however, can be challenging especially for a beginner & this is where Hevo saves the day. Hevo Data, a No-code Data Pipeline, helps you transfer data from a source of your choice in a fully-automated and secure manner without having to write the code repeatedly. Hevo, with its strong integration with 100+ sources & BI tools, allows you to not only export & load data but also transform & enrich your data & make it analysis-ready in a jiff.

Want to take Hevo for a spin? Sign up here for the 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!

Tell us about your experience of connecting Apache Kafka to SQL Server! Share your thoughts in the comments section below!

No-code Data Pipeline For SQL Server