Do you want to transfer your Microsoft SQL Server data using Apache Kafka? Are you finding it challenging to connect Apache Kafka to SQL Server? If yes, then you’ve landed at the right place! This article will answer all your queries & relieve you of the stress of finding a truly efficient solution.
Follow our easy step-by-step guide to help you successfully set up a connection between Apache Kafka & Microsoft SQL Server. You can seamlessly transfer data to/from Microsoft SQL Server for a fruitful real-time analysis. It will also help you build a customized ETL pipeline for your organization. With Kafka Connect SQL Server, you can easily configure data pipelines to stream data in real-time from Kafka topics to SQL Server tables.
Let’s jump right into it?
Method 1: Using Hevo to Connect Apache Kafka to SQL Server
Hevo, a No-code Data Pipeline, helps you transfer data from Apache Kafka (among 150+ Data sources) to Microsoft SQL Server & lets you visualize it in a BI tool. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code.
Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss and real-time.
Steps to use Hevo:
Hevo focuses on two simple steps to get you started:
- Configure Source: Connect Hevo with a data source of your choice such as Apache Kafka by providing a unique name for your Data Pipeline, along with information about your Bootstrap Servers. You can also select the specific topics from where you want to ingest data.
- Integrate Data: Load data from Apache Kafka to SQL Server by providing a unique name for your destination along with your database credentials such as username and password. To help Hevo connect with your Microsoft SQL Server database, you will also have to provide information such as the host IP, port number and the name & schema of your database.
If you’re looking to expand your knowledge about Kafka, this guide offers insights into how Kafka can be used for Change Data Capture, allowing you to capture and process data changes in real time. Additionally, if you’re interested in learning about Kafka’s role in data ingestion, read our helpful guide Kafka for Data Ingestion.
Integrate Kafka to MS SQL Server
Integrate Kafka to PostgreSQL
Method 2: Using the Debezium SQL Server Connector to Connect Apache Kafka to SQL Server
This method talks about using Debezium SQL Server to establish Kafka Connect SQL Server Integration.
Apache Kafka supports connecting with Microsoft SQL Server and numerous other databases/data warehouses with the help of various in-built connectors. These connectors help bring in data from a source of your choice to Apache Kafka and then stream it to the destination of your choice from Kafka Topics.
This can be implemented using the following steps:
Step 1: Configuring Microsoft SQL Server to Enable CDC
To configure your Microsoft SQL Server instance, launch the Microsoft SQL Server Management Studio on your system, and select the properties option by right-clicking on the server option in your object explorer.
Once you’ve clicked on it, a new window will now open up on your screen representing the server properties. To allow the Debezium’s SQL Server connector to fetch data in real-time from your Microsoft SQL Server instance, click on the “Security” option and choose “SQL Server and Windows Authentication” as the desired authentication mechanism.
Now restart your Microsoft SQL Server instance to bring the changes in effect. With your Microsoft SQL Server Management Studio now up and running, expand the “Security” folder, found in the object explorer and right-click on the “Login” folder.
You now need to create a new login for your Microsoft SQL Server. To do this, add a new username as “testuser” along with a password of your choice.
Click on the server roles option found in the panel on the left and select “sysadmin” from the list of options available on your screen and click on okay.
Once you’ve made all the necessary configurations, you now need to enable the CDC functionality. To do this, use your Windows Powershell to start the Notepad application & create a new SQL file as “Test_DB.sql”. Add the following lines of code to your file and save it:
# Creating a temporary test database
CREATE DATABASE Test_DB;
GO
USE Test_DB;
EXEC sys.sp_cdc_enable_db;ALTER DATABASE Test_DB
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
# Creating a temporary table & enabling CDC for the same
CREATE TABLE prospects (
id INTEGER IDENTITY(1001,1) NOT NULL PRIMARY KEY,
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL UNIQUE
);
INSERT INTO prospects(first_name,last_name,email)
VALUES ('Jeff','Thomas','thomas.jeff@afer.com');
INSERT INTO customers(first_name,last_name,email)
VALUES ('Sasha','Bailey','s.bailey@foobar.com');
INSERT INTO customers(first_name,last_name,email)
VALUES ('Edward','Hart','edhart@walker.com');
EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'fruit', @role_name = NULL, @supports_net_changes = 0;
GO
With your SQL file now ready, restart your Microsoft SQL Server instance using the following lines of code:
net stop MSSQLSERVER
net start MSSQLSERVER
Launch your Microsoft SQL Service Agent and execute your “Test_DB.sql” file using the following lines of code:
net start SQLSERVERAGENT
sqlcmd -U testuser -P {your password} -i {location of Test_DB.sql}
This is how you can configure Microsoft SQL Server and enable the CDC functionality to connect Apache Kafka to SQL Server.
Learn insights and step-by-step instructions on how to export data from MySQL to CSV format, enabling you to manage and analyze your data effectively.
Step 2: Installing Apache Kafka on your Workstation
With your Microsoft SQL Server now ready, you need to download and install Apache Kafka, either on standalone or distributed mode. You can check out the following links & follow Apache Kafka’s official documentation, that will help you get started with the installation process:
Step 3: Installing the Debezium Microsoft SQL Server Connector for Apache Kafka
Confluent provides users with a diverse set of in-built connectors that act as a source or a sink and allow users to transfer data to the destination of their choice from their desired data source such as Microsoft SQL Server via Apache Kafka. One such connector that lets users connect Apache Kafka to SQL Server is the Debezium SQL Server connector for Apache Kafka.
To install the Debezium SQL Server connector, go to Confluent Hub’s official website and search for Microsoft SQL Server using the search bar found at the top of your screen. In case you’re finding it challenging to locate the correct connector, you can click here to go to the connector’s page directly.
Once you’ve found the desired Debezium SQL Server connector, you can now download the connector by executing the following command on the Confluent CLI:
confluent-hub install debezium/debezium-connector-sqlserver:1.2.2
In case you don’t have the Confluent CLI installed on your system, you can click here and follow Confluent’s official documentation that will help you install and configure the Confluent CLI.
You can also click on the download button to install the connector. Once you’ve clicked on it, a zip file will now start downloading on your system. Extract the zip file and copy all jar files, found in the lib folder to your Confluent installation.
This is how you can install the Debezium SQL Server connector that will help you connect Apache Kafka to SQL Server.
Step 4: Starting the Apache Kafka, Zookeeper & Connect Server
Once you’ve installed the Debezium SQL Server connector for Apache Kafka, you now need to start the Zookeeper, Kafka and schema registry. To do this, execute the following lines of code on different terminals:
/bin/zookeeper-server-start ./etc/kafka/zookeeper.properties
./bin/kafka-server-start ./etc/kafka/server.properties
./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties
In case you’re running Confluent Platform v5.3 or above, you can also use the following line of code:
confluent local start connect
Once you’ve enabled your Apache Kafka servers, you will have to create a JSON configuration file for Kafka Connect, that you can load using a POST REST API call. It will instruct the Debezium connector to monitor your Microsoft SQL Server instance, using the particular port number and host IP. You can make the API call using the following lines of code:
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{ "name": "test-connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector", "database.hostname": "{Private IP Address}",
"database.port": "1433",
"database.user": "testuser",
"database.password": "password!",
"database.dbname": "TestDB",
"database.server.name": "TestDB",
"table.whitelist": "dbo.topic_name",
"database.history.kafka.bootstrap.servers": "localhost:9092",
"database.history.kafka.topic": "dbhistory.fulfillment" }
}';
You can now verify if the Debezium Connector is running properly or not, using the following curl command:
curl -k https://localhost:8083/connectors/sqlserver-posts-connector/status
This is how you can start the Apache Kafka, Zookeeper & Connect server.
Step 5: Integrating Apache Kafka to SQL Server to Start Ingesting Data
With Apache Kafka servers now up & running on your system, you can start ingesting data from Microsoft SQL Server. To do this, you will have to create a Consumer for your Apache Kafka server that will read data from your desired topic of the Test_DB. You can use the following line of code to create a Kafka Consumer:
/etc/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic Test_DB.dbo.topic_name --from-beginning
You will now be able to see that your Kafka Consumer starts fetching data from your Kafka Topic in the JSON format.
You can also verify if Apache Kafka is consuming your data correctly or not by modifying your database and then checking your Kafka Consumer output. To do this, launch Windows Powershell on your system and create a new SQL file as “new_insert.sql” with the following lines of code:
use TestDB;
INSERT INTO customers(first_name,last_name,email)
VALUES ('Helen','Keyler','keyler.helen@wr.com');
Save your file and execute it using the following command:
sqlcmd -U testuser -P password! -i {location of new_insert.sql}
You will now be able to see Apache Kafka consuming new records from your Microsoft SQL Server table as follows:
This is how you can connect Apache Kafka to SQL Server using the Debezium SQL Server connector to start streaming your data seamlessly.
Limitations of using the Debezium SQL Server Connector:
- Using the Debezium SQL Server connector requires you to write custom code and make API calls, hence you must have strong technical knowledge.
- Apache Kafka Topics are case-sensitive, so you must be careful while making API calls to connect the Debezium SQL Server connector to avoid any failures/exceptions while streaming data.
- The Debezium SQL Server connector monitors and streams any data changes made to your source table; however, it cannot replicate any structure-based changes to your source table. It thus requires users to modify the structure of the capture table manually whenever the source table structure changes.
- If any component in this system fails, there is a risk of data loss or compromised transactional integrity. It’s the responsibility of the development team to ensure data integrity and minimize the risk of any potential issues.
- Although Debezium offers multiple connectors, some of them have scalability issues. Even with Postgres, plugins such as wal2json can cause out-of-memory exceptions when converting write-ahead log output to JSON. Another problem is the inability to snapshot tables while still open for incoming events. For large tables, this means that there may be extended periods when the table is unavailable while snapshotting.
- Certain data connectors have limitations in supporting specific data types. For instance, the Oracle connector by Debezium has a limitation in handling BLOB data types as they are still in an incubation state. Hence, using BLOB data types in Debezium-based modules meant for production is not advisable.
Integrate Kafka with SQL Server in minutes
No credit card required
Conclusion
This article teaches you how to connect Apache Kafka to SQL Server with ease. It provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently. Kafka SQL Server is a powerful tool that supports various data formats and provides connectors for seamless integration with different data sources.
The Kafka SQL Server Connector configuration options also allow for fine-tuning the data ingestion parameters. These methods, however, can be challenging, especially for a beginner & this is where Hevo saves the day.
FAQ on Connect Kafka to SQL Server
How can Kafka be used with SQL Server?
1. Create a Kafka cluster.
2. Configure the Kafka Connect connector for the SQL Server, namely the JDBC Source Connector, to slurp data from SQL Server into Kafka.
3. Configure settings from the JDBC Source Connector on how to connect to SQL Server.
4. Now, Kafka consumers can do whatever needs to be done based on streamed data.
What is Kafka in a Database?
Kafka is not a traditional database. It is a distributed streaming platform for building real-time data pipelines and streaming applications. It allows the real-time processing and integration of data from several sources, including databases.
What is Kafka Server Used For?
The Kafka server is used:
Real-time Data Streaming: to capture data in real time from different sources;
Data Integration: to integrate data across several systems and applications.
Messaging System: Publish-subscribe messaging system.
Event Sourcing: Storing and processing event logs.
What is the SQL Language in Kafka?
Not every SQL language is found natively inside Kafka. However, tools like KSQL, that is, Kafka Stream Query Language, provide an SQL-like interface to process and query streams in Kafka. With KSQL, one can write queries filtering, transforming, and aggregating streaming data.
Divij Chawla is interested in data analysis, software architecture, and technical content creation. With extensive experience driving Marketing Operations and Analytics teams, he excels at defining strategic objectives, delivering real-time insights, and setting up efficient processes. His technical expertise includes developing dashboards, implementing CRM systems, and optimising sales and marketing workflows. Divij combines his analytical skills with a deep understanding of data-driven solutions to create impactful content and drive business growth.