SQL Server Streaming: Easy Way To Stream Data In Real-Time

on Data Integration, ETL, Tutorials • June 30th, 2020 • Write for Hevo

This article is aimed at introducing the concepts of streaming data, its benefits, and applications. Streaming data is beneficial in the scenarios where new dynamic data is generated continually and hence real-time streaming applications are becoming popular across a host of industries. The article begins with an introduction to the high-level concept of streaming data and then shifts to the practical implementation of using MS SQL server streaming.

Table Of Contents

Introduction To SQL Server

sql server streaming

It is a relational database developed by Microsoft to store the structured data. It is known as MS SQL. SQL Server is built on top of SQL (Structured Query Language) to interact with database objects. SQL Server is tied to Transact-SQL or T-SQL (Microsoft propriety language) for its programming interface, such as declaring the variable, stored procedure, exception handling, etc. Initially, SQL Server ran only on Windows Server and Windows devices, however now it is supported on Linux as well.

Key features of SQL server:

  • Scalable: It is easy to scale and supports large amounts of data.
  • Performance: It performs exceptionally well on both Windows & Linux.
  • Secure: It ensures data security & availability irrespective of whether the data is at rest or being worked on. It further supports data recovery during crashes/failures.
  • Analytics Support: MS SQL supports data analytics & machine learning. It uses languages like Python or R to perform such operations.
  • Integrations: It is very easy to integrate MS SQL with tools like Hadoop for big-data analytics using T-SQL commands.
  • Threading: MS SQL efficiently supports multi-threading & parallel processing even with a massive amount of data & ensures powerful analytics.

For further information, you can check the Microsoft site on SQL Server here.

Introduction To Streaming Data

Streaming data can be defined as the data that is generated continuously from a wide variety of sources. It is simultaneously transferred usually in small sizes (order of kilobytes) to be processed, analyzed in a sequential fashion. This means that the data flows in continuous streams that have no specified beginning or end but rather provide the opportunity for the data to be acted upon in real-time.

Analyzing streaming data can help organizations gain insights about various metrics such as server activity, website clicks, Geo-location of users, etc. This allows them to understand the trends and performance of their product in the market.

Some common examples of streaming data are log files generated by a web application, financial data from stock trading, location data from GPS applications, data from sensors in industrial equipment, etc.

Methods To Implement Real-Time Data Streaming In SQL Server

Method 1: SQL Server Streaming Using Apache Kafka

Apache Kafka is a distributed data store that is primarily used for ingesting and analysing streaming data. It provides a unified data platform that helps to develop real-time data pipelines and applications. Kafka connector API along with debezium can implement SQL Server streaming. This method makes use of custom scripts written by the user to stream data from the SQL Server by enabling change data capture feature.

Method 2: Using Hevo, The Easier Approach

Hevo Data provides a hassle-free & a fully managed solution using its No-code Data Pipelines. It helps you replicate the data effortlessly from SQL Server without any intervention. Hevo’s pre-built integration with SQL Server (among 100+ Sources) will take full charge of the data replication process, allowing you to focus on key business activities.

Get started today! Sign up here for a 14-day free trial!

Prerequisites

  • Working knowledge of SQL Server.
  • Kafka libs (v2.2.1).
  • A client machine to set up Debezium and Kafka connect API.
  • Debezium SQL Server plugin.

Comparing Batch & Stream Processing

Initially, data was always processed using the concept of batch processing. Batch processing requires data to be queried in chunks. This is usually done on the entire data or over most of the data. This data processing method inadvertently leads to latency which can vary from minutes to hours depending on the size of the data being processed. The Hadoop MapReduce framework is the best way to process data in batches.

Stream processing takes an alternative approach in which data processing is limited to the most recent or over a rolling time-window. This results in small-sized data, containing individual records or only a few records. The latency is almost unnoticeable as the data is processed within seconds or milliseconds. Real-time data applications, therefore, leverage stream processing in its architecture.

Methods To Implement Real-Time Data Streaming In SQL Server

Data can be streamed from the SQL Server in real-time using the following two methods:

Method 1: SQL Server Streaming Using Apache Kafka

Apache Kafka is a distributed data store that is primarily used for ingesting and analysing streaming data. It provides a unified data platform that helps to develop real-time data pipelines and applications. Kafka connector API along with debezium can implement SQL Server streaming. This method makes use of custom scripts written by the user to stream data from the SQL Server by enabling change data capture feature. This method can be implemented using the following steps:

Step 1: Setting Up CDC On SQL Server

Create a new database using the sqlcmd utility. Sqlcmd utility can be used to interact with the various database’s created by the users. Use the following command to create a new database called myDB and a table in it:

CREATE DATABASE MyDB;
GO
USE MyDB;
GO
CREATE TABLE MyTable(ID int, MyTitle nvarchar(max));

Now to enable change data capture (CDC) for the created database, use the following command:

USE MyDB 
GO 
EXEC sys.sp_cdc_enable_db 
GO

To enable CDC for the newly created table, you can use the following command:

USE MyDB 
GO 
EXEC sys.sp_cdc_enable_table 
@source_schema = N’dbo’, 
@source_name = N’MyTable’, 
@role_name = N’MyRole’, 
@supports_net_changes = 1 
GO

Now check whether the CDC configurations are done properly. You can use the following command:

USE MyDB
GO 
EXEC sys.sp_cdc_help_change_data_capture 
GO

For the CDC features to function properly, you must have a SQL Server agent service instance running. If you are using the SQL Server on the Windows platform, find the SQL Server application and right-click on it to start the agent service. If you are using the Linux platform, download the SQL Server agent and follow the instructions to install it properly. You can check the official documentation to guide you in the installation process.

This is how you can enable, CDC features for SQL Server streaming process.

Step 2: Setting Up The Client Machine

To set up the client machine, you first need to download & untar the Kafka libs. Use the following command to do so:

mkdir kafka/
cd kafka
wget https://archive.apache.org/dist/kafka/2.2.1/kafka_2.12-2.2.1.tgz
tar -xzf kafka_2.12-2.2.1.tgz

Kafka libs are based on java, therefore, ensure that java is installed on your system and added as an environment variable. You can install java using the following command if required:

sudo yum install java-1.8.0
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0

Download the plug-in for the SQL Server connector and extract the files into your Kafka connect environment. Now, add the parent directory of the extracted plug-in(s) to the plugin path.

wget https://repo1.maven.org/maven2/io/debezium/debezium-connector-sqlserver/0.10.0.Final/debezium-connector-sqlserver-0.10.0.Final-plugin.tar.gz

tar -xzf debezium-connector-sqlserver-0.10.0.Final-plugin.tar.gz
mkdir plugins
mv debezium-connector-sqlserver plugins/

This is how you can set up the client machine for SQL Server streaming process.

Step 3: Configuring Kafka And Connector Properties

Now create a file caller worker_properties and append the following lines of code to it:

offset.storage.file.filename=/tmp/connect.offsets
bootstrap.servers=<bootstrap servers list>
offset.flush.interval.ms=10000
rest.port=10082
rest.host.name=<hostname>
rest.advertised.port=10082
rest.advertised.host.name=<hostname>
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
plugin.path=/home/plugins
#If kafka is TLS authenticated, uncomment below lines.
#security.protocol=SSL 
#ssl.truststore.location=/tmp/kafka.client.truststore.jks
#producer.security.protocol=SSL
#producer.ssl.truststore.location=/tmp/kafka.client.truststore.jks

Create a file called connector_properties and append the following lines of code to it:

name=sql-server-connection
connector.class=io.debezium.connector.sqlserver.SqlServerConnector
database.hostname=<DB Hostname>
database.port=<DB Port>
database.user=<DB User>
database.password=<DB Password>
database.dbname=<DB Name>
database.server.name=<Assign any name>
table.whitelist=<schema_name.table_name>
database.history.kafka.bootstrap.servers=<bootstrap servers list>
database.history.kafka.topic=dbhistory.history
#If kafka is TLS authenticated, uncomment below lines.
#database.history.producer.security.protocol=SSL
#database.history.producer.ssl.truststore.location=/tmp/kafka.client.truststore.jks

The Kafka connect configuration can be loaded via the REST API. The following example adds debezium SQL Server connector configuration in a JSON format. 

Example query:

curl -k -X POST -H "Accept:application/json" -H "Content-Type:application/json" https://localhost:8083/connectors/ -d “
{
  "name": "sqlserver-posts-connector",  
  "config": {
    "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector", 
    "database.hostname": "102.5.232.115", 
    "database.port": "1433", 
    "database.user": "admin", 
    "database.password": "Supersecretpassword!", 
    "database.dbname": "TestDB", 
    "database.server.name": "staging", 
    "table.whitelist": "dbo.Posts", 
    "database.history.kafka.bootstrap.servers": "kafka:9092", 
    "database.history.kafka.topic": "dbhistory.Posts" 
  }
}
”

SQL Server and client machine are now ready to stream data. Start the process by enabling Kafka connect using the following command:

bin/connect-standalone.sh worker.properties connector.properties

This is how you can set up SQL Server streaming using Kafka to monitor data in real-time.

Method 2: Using Hevo Data, The Easier Approach

Hevo Data, a No-code Data Pipeline can help you replicate data from SQL Server (among 100+ sources) swiftly to a database/data warehouse of your choice. Hevo is fully-managed and automates the process of monitoring and replicating the changes on the secondary database rather than making the user write the code repeatedly. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss. Hevo provides you with a truly efficient and fully-automated solution to replicate your data in real-time.

You can easily set up SQL Server streaming with Hevo in just 3 steps:

  • Authenticate & connect the SQL Server data source.
  • Select change data capture (CDC) as the replication mode.
  • Select and configure your desired destination.

Have a look at the amazing features of Hevo:

  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to export. 
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support call.
  • Minimal Learning: Hevo with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Completely Managed Platform: Hevo is fully-managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.

Benefits Of Real-Time Data Streaming

Enterprises leveraging big data need to implement a form of real-time data streaming as they cannot wait for data to be processed in batches & limit their ability to respond to changing events. Real-time data streaming allows reacting to the records being generated. This is especially useful when measures need to be taken for safeguarding the system from external attacks or any malicious use. A prime example of this is fraud detection.

Financial institutions use fraud detection to validate financial transactions in real-time. This domain can’t survive on batch processing of data as the fraud may have already been perpetrated before it is analyzed. It is therefore paramount for streaming data to be analyzed in real-time so that a suspicious transaction can be flagged before it occurs.

Another common application of real-time data streaming is in geo-location applications where content has must be updated or recommended to users based on their current location. In such applications, the geo-location data is implemented as streaming data to provide up-to-date changes in the records. These changes can then be propagated to other components which respond to them and update the offerings available to the user via the application’s user interface.

The examples explored above are a small subset of what is possible with real-time streaming data as it has found application in various verticals such as business analytics, retail, marketing, threat detection, etc.

Conclusion

This article enriches you with in-depth knowledge about the various concepts behind the streaming process in SQL Server. It provides a step-by-step comprehensive guide to help you perform SQL server streaming in the smoothest way possible. SQL Server streaming method can, however, prove to be challenging especially for beginners & this is where Hevo can help you out. Hevo Data, a No-code Data Pipeline can help you replicate data in real-time without having to write any code. Hevo being a fully-managed system provides a highly secure automated solution to help perform replication in just a few clicks using its interactive UI.

Want to give Hevo a try? Sign up for a 14-day free trial! Have a look at our unbeatable pricing, that will help you choose the right plan for you.

Share your experience of working with SQL Server streaming process. Get in touch with us in the comment section below.

No-code Data Pipeline For SQL Server