SQL Batch Processing: A Comprehensive Guide

Batch Processing is essential for corporations and organizations to effectively manage massive volumes of data. It’s particularly well-suited to managing regular, repetitive tasks. Batch Processing operations are supported by SQL Server using a variety of ways. SQL Server is a relational database management system from Microsoft that is leveraged by many Fortune 100 companies.

In this article, you will understand what is Batch Processing and gain some understanding of SQL Server Integration Services(SSIS). Further in this article, you will learn about various methods to set up your SQL Batch Processing. At the end of this article, you will explore some of the key benefits offered by the SQL Batch Processing methods. So, let’s get started on setting up SQL Batch Processing.

Table of Contents

Performance Test: Batch Mode Execution vs Row Mode Execution

In this test, the CPU use of executing the identical query in batch mode vs row mode will be compared. To measure a query’s CPU utilization, it is easy and convenient to activate the SET STATISTICS TIME option. We will use the DROPCLEANBUFFERS command to test queries on a cold buffer cache environment.

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (60+ free sources), we help you not only export data from sources such as SQL Server & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Let’s see some unbeatable features of Hevo Data:

Live Support: With 24/5 support, Hevo provides customer-centric solutions to the business use case.
Fully Managed: Hevo Data is a fully managed service and is straightforward to set up.
Schema Management: Hevo Data automatically maps the source schema to perform analysis without worrying about the changing schema.
Real-Time: Hevo Data works on the batch as well as real-time data transfer so that your data is analysis-ready always.

GET STARTED WITH HEVO FOR FREE

Therefore, for these two questions, we shall establish equivalent circumstances. At first, we are going to execute the following SQL batch query example which process rows on the batch mode:

SET STATISTICS TIME ON
GO
DBCC DROPCLEANBUFFERS
GO
SELECT ModifiedDate,CarrierTrackingNumber ,
SUM(OrderQty*UnitPrice) FROM Sales.SalesOrderDetailEnlarged
GROUP BY ModifiedDate,CarrierTrackingNumber

Image Source

We will parse this output with Statistics Parser to convert to more meaningful reports:

Image Source

SET STATISTICS TIME ON
GO
DBCC DROPCLEANBUFFERS
GO
SELECT ModifiedDate,CarrierTrackingNumber ,
SUM(OrderQty*UnitPrice) FROM Sales.SalesOrderDetailEnlarged
GROUP BY ModifiedDate,CarrierTrackingNumber
OPTION(USE HINT('DISALLOW_BATCH_MODE'))

Enable Statistics: SET STATISTICS TIME ON enables the display of execution time statistics for the query.
Clear Buffer Cache: DBCC DROPCLEANBUFFERS clears the buffer cache to ensure that the query execution uses fresh data from the disk, providing a more accurate measurement of performance.
Select and Aggregate Data: The SELECT statement retrieves ModifiedDate and CarrierTrackingNumber, calculating the total revenue (SUM(OrderQty * UnitPrice)) for each combination of ModifiedDate and CarrierTrackingNumber.
Group Results: Results are grouped by ModifiedDate and CarrierTrackingNumber.
Use Hint: OPTION(USE HINT('DISALLOW_BATCH_MODE')) suggests that the query should not use batch mode processing, which may affect how the query is executed.

In comparison to row style query processing, it appears that batch mode query processing performs noticeably better. Compared to row mode, the CPU utilization in batch mode is over four times lower. Additionally, the elapsed time in batch mode is 2.5 times less than in row mode.

In most cases, the elapsed time should be longer than the CPU time; however, this is not the case with these queries. The query optimizer chose to employ parallel execution plans for these queries, which is the very clear explanation for this ludicrous scenario. The CPU time in the parallel execution plans represents the sum of the individual threads’ CPU times over time:

Tip: Adaptive Joins

The query optimizer selects several joining methods in the execution plans to join the rows of the various tables. The query optimizer does, however, also take into account employing the adaptive join type if it employs the batch execution mode. The primary advantage of this join operator is that, based on the row count threshold, it dynamically selects a join technique at runtime. These join techniques include stacked join algorithms and hash joins. We will now run the following query and see the plan of execution:

SELECT  ProductID,SUM(LineTotal)  ,
SUM(UnitPrice) , SUM(UnitPriceDiscount) FROM
Sales.SalesOrderDetailEnlarged SOrderDet
INNER JOIN Sales.SalesOrderHeaderEnlarged  SalesOr
ON SOrderDet.SalesOrderID = SalesOr.SalesOrderID
GROUP  BY ProductID

Selects Data: Retrieves ProductID, total LineTotal, total UnitPrice, and total UnitPriceDiscount.
Joins Tables: Combines data from SalesOrderDetailEnlarged and SalesOrderHeaderEnlarged using SalesOrderID.
Groups Results: Aggregates the results by ProductID to calculate totals for each product.

The hash join technique has been used for this execution plan since the row number exceeds the value of the adaptive threshold rows.

How to Set Up SQL Server Batch Processing?

In this section, you will learn 2 methods to work with SQL Batch Processing:

Method 1: SQL Server Batch Processing Using Hevo Data

Hevo accepts the following SQL Server versions as a Source:

Choose your preferred version and proceed to configure it as a Source. For the sake of this article, let’s use Amazon RDS SQL Server as an example.

Step 1: For SQL Server to be chosen as the Source:

In the Navigation Bar, select PIPELINES.

In the Pipelines List View, click + CREATE.

Choose the SQL Server version on the Select Source Type screen.

As we are using Amazon RDS SQL Server in this post, please ensure that you have specified the Pipeline Name, SQL Server Host, SQL Server Port, SQL Server User, and SQL Server Password on the Configure your Amazon RDS SQL Server Source page. Choose an Ingestion Mode, Schema Name, Database Name, and SSH Connection.

Click on the TEST CONNECTION. Once all required fields have been filled out, this button becomes active. Your supplied connection settings are verified by Hevo’s built-in connectivity checker.

To proceed with configuring the destination, click TEST & CONTINUE. Once all required fields have been filled out, this button becomes active.

Step 2: Set up the SQL Server as the Destination.

To set up SQL Server in Hevo as a destination, follow these steps:

In the Navigation Bar, choose DESTINATIONS > + CREATE.

Choose SQL Server from the Add Destination screen.

Provide the Destination Name, Database Host, Database Port, Database User, Database Password, and Database Name on the Configure your SQL Server Destination page.

Click on TEST CONNECTION. When all of the required fields have been filled out, this button becomes active.

Input SAVE & CONTINUE here. When all of the required fields have been filled out, this button becomes active.

Some Key Benefits Of Using Hevo

Auto-schema mapping

Hevo eliminates the laborious process of managing schemas by automatically identifying the schema of incoming data and mapping it to the target schema.

Data Transformation

Hevo offers drag-and-drop and Python-based transformations to help you clean up and get ready to load data to your destination.

Incremental Data Load

Hevo supports real-time changed data transfers. This guarantees effective bandwidth use on both ends.

Method 2: Batch Mode on Rowstore

SQL Batch Processing - Batch mode on Rowstore — Image Source

Many situations exist where the data set and query attributes are ideal for SQL Batch Processing, yet the table isn’t a strong fit for columnstore indexes in the SQL Server. Some of these situations can be:

It might be a table with a high update rate, which, if not managed effectively, can affect columnstore performance.
It might be a legacy table constructed before columnstore indexes were available, or a table with a schema that can’t be modified.

To solve these challenges, Batch Mode on Rowstore processing was introduced in SQL Server 2019. The Batch Mode on Rowstore execution feature offers multiple row handling capabilities simultaneously instead of the typical row-by-row processing technique.

Batch Mode on Rowstore execution is a query processing approach that has the advantage of being able to handle several rows at once. When a query conducts aggregation, sorting, and group-by operations on a big quantity of data, this strategy improves speed and reduces CPU utilization.

Method 3: SSIS Batch Processing

SQL Server Integration Services(SSIS) works well with the Toolbox’s current components to facilitate database Batch Processing. An easy way to perform SQL Batch Processing in SSIS is to devise a method for grouping the rows to be processed into batches, processing each batch, and then updating each group when it is completed.

Let’s create a simple logical flow to perform SQL Batch Processing using the SSIS package to complete the task.

Step 1: Set Up the Database
Step 2: Get a Batch List
Step 3: Process Batch Loop
Step 4: Create a Transaction Container
Step 5: Append Batch
Step 6: Execute SQL Task
Step 7: Mark Batch as Processed

Step 1: Set Up the Database

You can choose any sample database and load it into your SQL Server. After uploading, check that all the rows are processed.

Step 2: Get a Batch List

In this step, you get a batch list. This batch list will divide the source data into batches and creates a result set with a single row for each batch.

Step 3: Process Batch Loop

After creating batches, now you create a batch loop. You can design a Foreach Loop container that iterates across the rows of the result set, executing once for each row.

Step 4: Create a Transaction Container

You need to set up a Sequence container. It includes the tasks that must be completed for each loop iteration. It also controls the transaction that will commit if everything goes well or roll back if something goes wrong.

Step 5: Append Batch

You can run a SQL task that extracts a batch of rows and inserts them into history or another table at this stage.

Step 6: Execute SQL Task

You can now conduct any SQL job, such as aggregation, which executes batch aggregations & updates an aggregation table.

Step 7: Mark Batch as Processed

After you’ve completed all of the preceding stages, you’ll need to write a script to run a SQL job that updates rows in the source table as processed.

The above steps are just one way to approach your SQL Batch Processing using SSIS.

What are the Benefits of SQL Batch Processing?

Now that you have explored the 2 methods of working with SQL Batch Processing, let’s understand some of the benefits offered by these methods.

Benefits of Batch Mode on Rowstore in SQL Server

This method boosts the speed of analytical queries.
It also lowers the CPU utilization for analytical queries.
You can handle numerous rows simultaneously.

The results in terms of query performance and efficiency are quite excellent for queries that need aggregations over a large number of rows, which batch processing was built for.

In this example – Introducing Batch Mode on Rowstore – Microsoft Tech Community, on the same hardware, the execution time for the same query, with the same data set, went from 10 seconds to 3 seconds. That’s nearly 3x quicker with no modifications to the application or query!

Benefits of SQL Batch Processing using SSIS

SSIS can read from a wide range of data sources, can easily transform data in memory, and can bulk load data without the need to stage it.
SSIS operates as a distinct process from the database engine, allowing you to do many CPU-intensive processes without putting the database engine under strain. You can also run SSIS on a separate machine.
Bulk load procedures can be readily scaled up to obtain exceptionally high throughput.

Conclusion

In this article, you learned various methods to set up SQL Batch Processing. You can use the Batch mode on Rowstore or perform SQL Batch Processing in SSIS. Apart from these methods, you also explored some of the notable benefits provided by each of these methods that help with batch processing in SQL Server. that help with batch processing in SQL Server.

However, knowing where to start and how to combine data from various applications to SQL Server can be a challenge for many companies. This is where Hevo can help save your day!

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. It supports 150+ Data Sources such as SQL Server, including 40+ Free Sources, into your Data Warehouse to be visualized in a BI tool.

Try a 14-day free trial and experience the feature-rich Hevo suite firsthand. Also, check out our unbeatable pricing to choose the best plan for your organization.

FAQs

1. What is batch job in SQL?

A batch job in SQL is essentially a collection of SQL commands executed together to carry out any update, process, or report-carrying tasks. It is typically set to run automatically on a schedule.

2. What is the difference between batch and transaction in SQL?

A batch in SQL is a group of commands executed together, while a transaction is a sequence of operations treated as a single unit of work, ensuring atomicity and consistency.

3. How to execute SQL query in batches?

To execute SQL queries in batches, divide the data or operations into smaller chunks and use the constructs LIMIT, OFFSET, or looping constructs in a script to process each batch serially.

Shubhnoor Gill Research Analyst, Hevo Data

Shubhnoor is a data analyst with a proven track record of translating data insights into actionable marketing strategies. She leverages her expertise in market research and product development, honed through experience across diverse industries and at Hevo Data. Currently pursuing a Master of Management in Artificial Intelligence, Shubhnoor is a dedicated learner who stays at the forefront of data-driven marketing trends. Her data-backed content empowers readers to make informed decisions and achieve real-world results.