Do you want to transfer your Microsoft SQL Server data using Apache Kafka? Are you finding it challenging to connect Apache Kafka to SQL Server? If yes, then you’ve landed at the right place! This article will answer all your queries & relieve you of the stress of finding a truly efficient solution.

Follow our easy step-by-step guide to help you successfully set up a connection between Apache Kafka & Microsoft SQL Server. You can seamlessly transfer data to/from Microsoft SQL Server for a fruitful real-time analysis. It will also help you build a customized ETL pipeline for your organization. With Kafka Connect SQL Server, you can easily configure data pipelines to stream data in real-time from Kafka topics to SQL Server tables.  

Let’s jump right into it?

Methods to Connect Apache Kafka to SQL Server

Method 1: Using Hevo to Connect Apache Kafka to SQL Server

Skip the coding complexity and multiple setup steps. With Hevo, you can seamlessly migrate data from Apache Kafka to SQL Server in minutes. Simplify your data integration process effortlessly!

Method 2: Using the Debezium SQL Server Connector to Connect Apache Kafka to SQL Server

Connecting Apache Kafka to SQL Server with Python libraries can be complex and time-consuming. Instead, use the Debezium SQL Server Connector for a straightforward and efficient data migration solution. Follow our simple steps to establish your connection with ease!

Get Started with Hevo for Free

Method 1: Using Hevo to Connect Apache Kafka to SQL Server

Hevo, a No-code Data Pipeline, helps you transfer data from Apache Kafka (among 150+ Data sources) to Microsoft SQL Server & lets you visualize it in a BI tool. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code.

Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss and real-time.

Steps to use Hevo:

Hevo focuses on two simple steps to get you started:

  • Configure Source: Connect Hevo with a data source of your choice such as Apache Kafka by providing a unique name for your Data Pipeline, along with information about your Bootstrap Servers. You can also select the specific topics from where you want to ingest data.
Kafka to SQL Server: Source Configuration
Image Source: Hevo Docs
  • Integrate Data: Load data from Apache Kafka to SQL Server by providing a unique name for your destination along with your database credentials such as username and password. To help Hevo connect with your Microsoft SQL Server database, you will also have to provide information such as the host IP, port number and the name & schema of your database.
Kafka to SQL Server: Destination Configuration
Image Source: Hevo Docs

If you’re looking to expand your knowledge about Kafka, this guide offers insights into how Kafka can be used for Change Data Capture, allowing you to capture and process data changes in real time. Additionally, if you’re interested in learning about Kafka’s role in data ingestion, read our helpful guide Kafka for Data Ingestion.

Integrate Kafka to MS SQL Server
Integrate Kafka to MySQL
Integrate Kafka to PostgreSQL

Method 2: Using the Debezium SQL Server Connector to Connect Apache Kafka to SQL Server

This method talks about using Debezium SQL Server to establish Kafka Connect SQL Server Integration.

Apache Kafka supports connecting with Microsoft SQL Server and numerous other databases/data warehouses with the help of various in-built connectors. These connectors help bring in data from a source of your choice to Apache Kafka and then stream it to the destination of your choice from Kafka Topics.

Kafka to SQL Server using Debezium
Image Source: Medium

This can be implemented using the following steps:

Step 1: Configuring Microsoft SQL Server to Enable CDC

To configure your Microsoft SQL Server instance, launch the Microsoft SQL Server Management Studio on your system, and select the properties option by right-clicking on the server option in your object explorer.

Selecting the Properties Option.
Image Source: Medium

Once you’ve clicked on it, a new window will now open up on your screen representing the server properties. To allow the Debezium’s SQL Server connector to fetch data in real-time from your Microsoft SQL Server instance, click on the “Security” option and choose “SQL Server and Windows Authentication” as the desired authentication mechanism.

Configuring Server Authentication.
Image Source: Medium

Now restart your Microsoft SQL Server instance to bring the changes in effect. With your Microsoft SQL Server Management Studio now up and running, expand the “Security” folder, found in the object explorer and right-click on the “Login” folder. 

You now need to create a new login for your Microsoft SQL Server. To do this, add a new username as “testuser” along with a password of your choice. 

Setting up the User in SQL Server.
Image Source: Medium

Click on the server roles option found in the panel on the left and select “sysadmin” from the list of options available on your screen and click on okay.

Selecting the Sysadmin Server Roles.
Image Source: Medium

Once you’ve made all the necessary configurations, you now need to enable the CDC functionality. To do this, use your Windows Powershell to start the Notepad application & create a new SQL file as “Test_DB.sql”. Add the following lines of code to your file and save it:

# Creating a temporary test database 
CREATE DATABASE Test_DB;
GO
USE Test_DB;
EXEC sys.sp_cdc_enable_db;ALTER DATABASE Test_DB
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)

# Creating a temporary table & enabling CDC for the same
CREATE TABLE prospects (
  id INTEGER IDENTITY(1001,1) NOT NULL PRIMARY KEY,
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL UNIQUE
);
INSERT INTO prospects(first_name,last_name,email)
  VALUES ('Jeff','Thomas','thomas.jeff@afer.com');
INSERT INTO customers(first_name,last_name,email)
  VALUES ('Sasha','Bailey','s.bailey@foobar.com');
INSERT INTO customers(first_name,last_name,email)
  VALUES ('Edward','Hart','edhart@walker.com');

EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'fruit', @role_name = NULL, @supports_net_changes = 0;
GO
  • CREATE DATABASE Test_DB; creates a new temporary database named Test_DB.
  • USE Test_DB; switches to the Test_DB database for subsequent operations.
  • EXEC sys.sp_cdc_enable_db; enables Change Data Capture (CDC) for the database, tracking changes to data.
  • CREATE TABLE prospects (...); creates a prospects table and inserts sample data.
  • EXEC sys.sp_cdc_enable_table enables CDC on the prospects table to track changes made to its data.

With your SQL file now ready, restart your Microsoft SQL Server instance using the following lines of code:

net stop MSSQLSERVER
net start MSSQLSERVER
  • net stop MSSQLSERVER stops the SQL Server service (named MSSQLSERVER) on the system.
  • This command halts the SQL Server instance, preventing database operations.
  • net start MSSQLSERVER restarts the SQL Server service after it has been stopped.
  • This command reinitializes the SQL Server instance, allowing database operations to resume.

    Launch your Microsoft SQL Service Agent and execute your “Test_DB.sql” file using the following lines of code:

    net start SQLSERVERAGENT
    sqlcmd -U testuser -P {your password} -i {location of Test_DB.sql}
    • net start SQLSERVERAGENT starts the SQL Server Agent service, used for running automated tasks like jobs and alerts.
    • It enables SQL Server Agent to manage and execute tasks or schedules in SQL Server.
    • sqlcmd -U testuser -P {your password} -i {location of Test_DB.sql} runs a SQL script (Test_DB.sql) using the sqlcmd utility.
    • The -U and -P parameters provide the username and password for authentication.
    • The -i parameter specifies the location of the SQL script to be executed on the SQL Server instance.
    Executing SQL File using th Windows Powershell.
    Image Source: Medium

    This is how you can configure Microsoft SQL Server and enable the CDC functionality to connect Apache Kafka to SQL Server.

    Learn insights and step-by-step instructions on how to export data from MySQL to CSV format, enabling you to manage and analyze your data effectively.

    Step 2: Installing Apache Kafka on your Workstation

    With your Microsoft SQL Server now ready, you need to download and install Apache Kafka, either on standalone or distributed mode. You can check out the following links & follow Apache Kafka’s official documentation, that will help you get started with the installation process:

    Step 3: Installing the Debezium Microsoft SQL Server Connector for Apache Kafka

    Confluent provides users with a diverse set of in-built connectors that act as a source or a sink and allow users to transfer data to the destination of their choice from their desired data source such as Microsoft SQL Server via Apache Kafka. One such connector that lets users connect Apache Kafka to SQL Server is the Debezium SQL Server connector for Apache Kafka.

    To install the Debezium SQL Server connector, go to Confluent Hub’s official website and search for Microsoft SQL Server using the search bar found at the top of your screen. In case you’re finding it challenging to locate the correct connector, you can click here to go to the connector’s page directly.

    Installing the Debezium SQL Server Connector.
    Image Source: Medium

    Once you’ve found the desired Debezium SQL Server connector, you can now download the connector by executing the following command on the Confluent CLI:

    confluent-hub install debezium/debezium-connector-sqlserver:1.2.2

    In case you don’t have the Confluent CLI installed on your system, you can click here and follow Confluent’s official documentation that will help you install and configure the Confluent CLI. 

    You can also click on the download button to install the connector. Once you’ve clicked on it, a zip file will now start downloading on your system. Extract the zip file and copy all jar files, found in the lib folder to your Confluent installation.

    This is how you can install the Debezium SQL Server connector that will help you connect Apache Kafka to SQL Server.

    Solve your data replication problems with Hevo’s reliable, no-code, automated pipelines with 150+ connectors.
    Get your free trial right away!

    Step 4: Starting the Apache Kafka, Zookeeper & Connect Server

    Once you’ve installed the Debezium SQL Server connector for Apache Kafka, you now need to start the Zookeeper, Kafka and schema registry. To do this, execute the following lines of code on different terminals:

    /bin/zookeeper-server-start ./etc/kafka/zookeeper.properties
    ./bin/kafka-server-start  ./etc/kafka/server.properties
    ./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties
    • /bin/zookeeper-server-start ./etc/kafka/zookeeper.properties starts the Zookeeper server using the provided configuration file.
    • Zookeeper is used to manage and coordinate Kafka brokers in a Kafka cluster.
    • ./bin/kafka-server-start ./etc/kafka/server.properties starts the Kafka server (broker) using the specified configuration file.
    • Kafka brokers handle the storage, retrieval, and distribution of messages in the Kafka system.
    • ./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties starts the Schema Registry service, which manages and validates Avro schemas for Kafka messages.

    In case you’re running Confluent Platform v5.3 or above, you can also use the following line of code:

    confluent local start connect

    Once you’ve enabled your Apache Kafka servers, you will have to create a JSON configuration file for Kafka Connect, that you can load using a POST REST API call. It will instruct the Debezium connector to monitor your Microsoft SQL Server instance, using the particular port number and host IP. You can make the API call using the following lines of code:

    curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{ "name": "test-connector", 
    "config": { 
    "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector", "database.hostname": "{Private IP Address}", 
    "database.port": "1433", 
    "database.user": "testuser",
     "database.password": "password!", 
    "database.dbname": "TestDB", 
    "database.server.name": "TestDB", 
    "table.whitelist": "dbo.topic_name", 
    "database.history.kafka.bootstrap.servers": "localhost:9092", 
    "database.history.kafka.topic": "dbhistory.fulfillment" } 
    }';
    • The curl -i -X POST command sends a POST request to create a new connector on the Kafka Connect API running at localhost:8083.
    • The request includes headers to accept and send JSON data, specifying Accept:application/json and Content-Type:application/json.
    • The body of the request contains the connector configuration in JSON format, which specifies the use of the Debezium SQL Server connector (io.debezium.connector.sqlserver.SqlServerConnector).
    • The configuration includes database connection details (hostname, port, user, password, etc.) and specifies the database and table to capture changes from (dbo.topic_name).
    • The connector will store database change history in Kafka, using the dbhistory.fulfillment topic for tracking.

    You can now verify if the Debezium Connector is running properly or not, using the following curl command:

    curl -k https://localhost:8083/connectors/sqlserver-posts-connector/status

    This is how you can start the Apache Kafka, Zookeeper & Connect server.

    Step 5: Integrating Apache Kafka to SQL Server to Start Ingesting Data

    With Apache Kafka servers now up & running on your system, you can start ingesting data from Microsoft SQL Server. To do this, you will have to create a Consumer for your Apache Kafka server that will read data from your desired topic of the Test_DB. You can use the following line of code to create a Kafka Consumer:

    /etc/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic Test_DB.dbo.topic_name --from-beginning

    You will now be able to see that your Kafka Consumer starts fetching data from your Kafka Topic in the JSON format.

    Kafka Consumer Fetching Data.
    Image Source: Medium

    You can also verify if Apache Kafka is consuming your data correctly or not by modifying your database and then checking your Kafka Consumer output. To do this, launch Windows Powershell on your system and create a new SQL file as “new_insert.sql” with the following lines of code:

    use TestDB;
    INSERT INTO customers(first_name,last_name,email)
    VALUES ('Helen','Keyler','keyler.helen@wr.com');

    Save your file and execute it using the following command:

    sqlcmd -U testuser -P password! -i {location of new_insert.sql}

    You will now be able to see Apache Kafka consuming new records from your Microsoft SQL Server table as follows:

    Kafka Consuming New Records.
    Image Source: Medium

    This is how you can connect Apache Kafka to SQL Server using the Debezium SQL Server connector to start streaming your data seamlessly.

    Limitations of using the Debezium SQL Server Connector:

    • Using the Debezium SQL Server connector requires you to write custom code and make API calls, hence you must have strong technical knowledge.
    • Apache Kafka Topics are case-sensitive, so you must be careful while making API calls to connect the Debezium SQL Server connector to avoid any failures/exceptions while streaming data.
    • The Debezium SQL Server connector monitors and streams any data changes made to your source table; however, it cannot replicate any structure-based changes to your source table. It thus requires users to modify the structure of the capture table manually whenever the source table structure changes.  
    • If any component in this system fails, there is a risk of data loss or compromised transactional integrity. It’s the responsibility of the development team to ensure data integrity and minimize the risk of any potential issues.
    • Although Debezium offers multiple connectors, some of them have scalability issues. Even with Postgres, plugins such as wal2json can cause out-of-memory exceptions when converting write-ahead log output to JSON. Another problem is the inability to snapshot tables while still open for incoming events. For large tables, this means that there may be extended periods when the table is unavailable while snapshotting.
    • Certain data connectors have limitations in supporting specific data types. For instance, the Oracle connector by Debezium has a limitation in handling BLOB data types as they are still in an incubation state. Hence, using BLOB data types in Debezium-based modules meant for production is not advisable.

    Conclusion

    This article teaches you how to connect Apache Kafka to SQL Server with ease. It provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently. Kafka SQL Server is a powerful tool that supports various data formats and provides connectors for seamless integration with different data sources.

    The Kafka SQL Server Connector configuration options also allow for fine-tuning the data ingestion parameters. These methods, however, can be challenging, especially for a beginner & this is where Hevo saves the day.

    FAQ on Connect Kafka to SQL Server

    How can Kafka be used with SQL Server?

    1. Create a Kafka cluster.
    2. Configure the Kafka Connect connector for the SQL Server, namely the JDBC Source Connector, to slurp data from SQL Server into Kafka.
    3. Configure settings from the JDBC Source Connector on how to connect to SQL Server.
    4. Now, Kafka consumers can do whatever needs to be done based on streamed data.

    What is Kafka in a Database?

    Kafka is not a traditional database. It is a distributed streaming platform for building real-time data pipelines and streaming applications. It allows the real-time processing and integration of data from several sources, including databases.

    What is Kafka Server Used For?

    The Kafka server is used:
    Real-time Data Streaming: to capture data in real time from different sources;
    Data Integration: to integrate data across several systems and applications.
    Messaging System: Publish-subscribe messaging system.
    Event Sourcing: Storing and processing event logs.

    What is the SQL Language in Kafka?

    Not every SQL language is found natively inside Kafka. However, tools like KSQL, that is, Kafka Stream Query Language, provide an SQL-like interface to process and query streams in Kafka. With KSQL, one can write queries filtering, transforming, and aggregating streaming data.

    Divij Chawla
    Marketing Operations and Analytics Manager, Hevo Data

    Divij Chawla is interested in data analysis, software architecture, and technical content creation. With extensive experience driving Marketing Operations and Analytics teams, he excels at defining strategic objectives, delivering real-time insights, and setting up efficient processes. His technical expertise includes developing dashboards, implementing CRM systems, and optimising sales and marketing workflows. Divij combines his analytical skills with a deep understanding of data-driven solutions to create impactful content and drive business growth.