SQL Server Streaming: Easy Way To Stream Data In Real-Time

This article introduces the concepts of streaming data, its benefits, and its applications. Streaming data is beneficial in scenarios where new dynamic data is generated continually. Hence, real-time streaming applications are becoming popular across a host of industries.

Real-time data streaming from SQL Server will help you make fast, data-driven decisions on your transactional data. The article begins with an introduction to the high-level concept of streaming data and then shifts to the practical implementation of SQL Server streaming data. Let’s get started!

SQL Server Streaming Data. — Image Source: www.analyticsvidhya.com/blog

Streaming data can be defined as the data that is generated continuously from a wide variety of sources. It is simultaneously transferred usually in small sizes (order of kilobytes) to be processed, analyzed in a sequential fashion. This means that the data flows in continuous streams that have no specified beginning or end but rather provide the opportunity for the data to be acted upon in real-time.

Analyzing streaming data can help organizations gain insights about various metrics such as server activity, website clicks, Geo-location of users, etc. This allows them to understand the trends and performance of their product in the market.

Some common examples of streaming data are log files generated by a web application, financial data from stock trading, location data from GPS applications, data from sensors in industrial equipment, etc.

Method 1: SQL Server Streaming Using Apache Kafka

Apache Kafka is a distributed data store that is primarily used for ingesting and analysing streaming data. It provides a unified data platform that helps to develop real-time data pipelines and applications. Kafka connector API along with debezium can implement SQL Server streaming. This method makes use of custom scripts written by the user to stream data from the SQL Server by enabling change data capture feature.

Method 2: Using Hevo, The Easier Approach

Hevo Data provides a hassle-free & a fully managed solution using its No-code Data Pipelines. It helps you replicate the data effortlessly from SQL Server without any intervention. Hevo’s pre-built integration with SQL Server (among 150+ Sources) will take full charge of the data replication process, allowing you to focus on key business activities.

Get started with hevo for free

Table of Contents

Comparing Batch & Stream Processing

Batch & Real-time processing. — Image Source: www.medium.com

Initially, data was always processed using the concept of batch processing. Batch processing requires data to be queried in chunks. This is usually done on the entire data or over most of the data. This data processing method inadvertently leads to latency which can vary from minutes to hours depending on the size of the data being processed. The Hadoop MapReduce framework is the best way to process data in batches.

Stream processing takes an alternative approach in which data processing is limited to the most recent or over a rolling time-window. This results in small-sized data, containing individual records or only a few records. The latency is almost unnoticeable as the data is processed within seconds or milliseconds. Real-time data applications, therefore, leverage stream processing in its architecture.

Methods To Implement Real-Time Data Streaming In SQL Server

Real-time data streaming from SQL server can be done using the following two methods:

Method 1: SQL Server Streaming Using Apache Kafka
Method 2: Using Hevo, The Easier Approach

Method 1: SQL Server Streaming Using Apache Kafka

Image Source: www.vecta.io/symbols

Step 1: Setting Up CDC On SQL Server
Step 2: Setting Up The Client Machine
Step 3: Configuring Kafka And Connector Properties

Step 1: Setting Up CDC On SQL Server

Create a new database using the sqlcmd utility. Sqlcmd utility can be used to interact with the various database’s created by the users. Use the following command to create a new database called myDB and a table in it:

CREATE DATABASE MyDB;
GO
USE MyDB;
GO
CREATE TABLE MyTable(ID int, MyTitle nvarchar(max));

Now to enable change data capture (CDC) for the created database, use the following command:

USE MyDB 
GO 
EXEC sys.sp_cdc_enable_db 
GO

To enable CDC for the newly created table, you can use the following command:

USE MyDB 
GO 
EXEC sys.sp_cdc_enable_table 
@source_schema = N’dbo’, 
@source_name = N’MyTable’, 
@role_name = N’MyRole’, 
@supports_net_changes = 1 
GO

Now check whether the CDC configurations are done properly. You can use the following command:

USE MyDB
GO 
EXEC sys.sp_cdc_help_change_data_capture 
GO

For the CDC features to function properly, you must have a SQL Server agent service instance running. If you are using the SQL Server on the Windows platform, find the SQL Server application and right-click on it to start the agent service. If you are using the Linux platform, download the SQL Server agent and follow the instructions to install it properly. You can check the official documentation to guide you in the installation process.

This is how you can enable, CDC features for SQL Server streaming process.

Step 2: Setting Up The Client Machine

To set up the client machine, you first need to download & untar the Kafka libs. Use the following command to do so:

mkdir kafka/
cd kafka
wget https://archive.apache.org/dist/kafka/2.2.1/kafka_2.12-2.2.1.tgz
tar -xzf kafka_2.12-2.2.1.tgz

Kafka libs are based on java, therefore, ensure that java is installed on your system and added as an environment variable. You can install java using the following command if required:

sudo yum install java-1.8.0
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0

Download the plug-in for the SQL Server connector and extract the files into your Kafka connect environment. Now, add the parent directory of the extracted plug-in(s) to the plugin path.

wget https://repo1.maven.org/maven2/io/debezium/debezium-connector-sqlserver/0.10.0.Final/debezium-connector-sqlserver-0.10.0.Final-plugin.tar.gz

tar -xzf debezium-connector-sqlserver-0.10.0.Final-plugin.tar.gz
mkdir plugins
mv debezium-connector-sqlserver plugins/

This is how you can set up the client machine for SQL Server streaming process.

Step 3: Configuring Kafka And Connector Properties

Now create a file caller worker_properties and append the following lines of code to it:

offset.storage.file.filename=/tmp/connect.offsets
bootstrap.servers=<bootstrap servers list>
offset.flush.interval.ms=10000
rest.port=10082
rest.host.name=<hostname>
rest.advertised.port=10082
rest.advertised.host.name=<hostname>
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
plugin.path=/home/plugins
#If kafka is TLS authenticated, uncomment below lines.
#security.protocol=SSL 
#ssl.truststore.location=/tmp/kafka.client.truststore.jks
#producer.security.protocol=SSL
#producer.ssl.truststore.location=/tmp/kafka.client.truststore.jks

Create a file called connector_properties and append the following lines of code to it:

name=sql-server-connection
connector.class=io.debezium.connector.sqlserver.SqlServerConnector
database.hostname=<DB Hostname>
database.port=<DB Port>
database.user=<DB User>
database.password=<DB Password>
database.dbname=<DB Name>
database.server.name=<Assign any name>
table.whitelist=<schema_name.table_name>
database.history.kafka.bootstrap.servers=<bootstrap servers list>
database.history.kafka.topic=dbhistory.history
#If kafka is TLS authenticated, uncomment below lines.
#database.history.producer.security.protocol=SSL
#database.history.producer.ssl.truststore.location=/tmp/kafka.client.truststore.jks

The Kafka connect configuration can be loaded via the REST API. The following example adds debezium SQL Server connector configuration in a JSON format.

Example query:

curl -k -X POST -H "Accept:application/json" -H "Content-Type:application/json" https://localhost:8083/connectors/ -d “
{
  "name": "sqlserver-posts-connector",  
  "config": {
    "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector", 
    "database.hostname": "102.5.232.115", 
    "database.port": "1433", 
    "database.user": "admin", 
    "database.password": "Supersecretpassword!", 
    "database.dbname": "TestDB", 
    "database.server.name": "staging", 
    "table.whitelist": "dbo.Posts", 
    "database.history.kafka.bootstrap.servers": "kafka:9092", 
    "database.history.kafka.topic": "dbhistory.Posts" 
  }
}
”

SQL Server and client machine are now ready to stream data. Start the process by enabling Kafka connect using the following command:

bin/connect-standalone.sh worker.properties connector.properties

This is how you can set up SQL Server streaming using Kafka to monitor data in real-time.

Method 2: Using Hevo Data, The Easier Approach

Hevo Data, a No-code Data Pipeline can help you replicate data from SQL Server (among 100+ sources) swiftly to a database/data warehouse of your choice. Hevo is fully-managed and automates the process of monitoring and replicating the changes on the secondary database rather than making the user write the code repeatedly. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss. Hevo provides you with a truly efficient and fully-automated solution to replicate your data in real-time.

You can easily set up SQL Server streaming with Hevo in just 3 steps:

Authenticate & connect the SQL Server data source.
Select change data capture (CDC) as the replication mode.
Select and configure your desired destination.

Have a look at the amazing features of Hevo:

Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to export.
Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support call.
Minimal Learning: Hevo with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
Completely Managed Platform: Hevo is fully-managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.

Example Use Cases for Streaming SQL

In the future, the use of streaming SQl will be expanded for anyone with the database. Presently, users with message brokers like Kafka can do so. Some examples of applications of SQL streaming are:

Business Intelligence and Analytics: Businesses use SQL streaming for coordination of time intervals in batch processing, dealing with inconsistency issues caused by caching and slow-loading dashboards.
Microservices: SQL streaming replaces code that does complex data coordination and transformation. Anything that appears like a query in microservices is replaced by SQL streaming.
Real-Time Applications: Streaming SQL can be effectively used in systems that require real-time and instant updates without any latency
New Capabilities: SQL streaming offers new features, such as business automation to make quicker decisions. It also facilitates user-facing real-time analytics, which was earlier done only by a few services such as LinkedIn, Google, etc.

Benefits Of Real-Time Data Streaming

Enterprises leveraging big data need to implement a form of real-time data streaming as they cannot wait for data to be processed in batches & limit their ability to respond to changing events. Real-time data streaming allows reacting to the records being generated. This is especially useful when measures need to be taken for safeguarding the system from external attacks or any malicious use. A prime example of this is fraud detection.

Financial institutions use fraud detection to validate financial transactions in real-time. This domain can’t survive on batch processing of data as the fraud may have already been perpetrated before it is analyzed. It is therefore paramount for streaming data to be analyzed in real-time so that a suspicious transaction can be flagged before it occurs.

Another common application of real-time data streaming is in geo-location applications where content has must be updated or recommended to users based on their current location. In such applications, the geo-location data is implemented as streaming data to provide up-to-date changes in the records. These changes can then be propagated to other components which respond to them and update the offerings available to the user via the application’s user interface.

The examples explored above are a small subset of what is possible with real-time streaming data as it has found application in various verticals such as business analytics, retail, marketing, threat detection, etc.

Conclusion

This article enriches you with in-depth knowledge about the various concepts behind the streaming process in SQL Server. It provides a step-by-step comprehensive guide to help you perform SQL server streaming in the smoothest way possible. SQL Server streaming method can, however, prove to be challenging especially for beginners & this is where Hevo can help you out. Hevo Data, a No-code Data Pipeline can help you replicate data in real-time without having to write any code. Hevo being a fully-managed system provides a highly secure automated solution to help perform replication in just a few clicks using its interactive UI.

visit our website to explore hevo

Want to give Hevo a try? sign up and have a look at our unbeatable Hevo pricing, that will help you choose the right plan for you.

Share your experience of working with SQL Server streaming process. Get in touch with us in the comment section below.

Ofem Eteng Technical Content Writer, Hevo Data

Ofem Eteng is a seasoned technical content writer with over 12 years of experience. He has held pivotal roles such as System Analyst (DevOps) at Dagbs Nigeria Limited and Full-Stack Developer at Pedoquasphere International Limited. He specializes in data science, data analytics and cutting-edge technologies, making him an expert in the data industry.