SQL Server CDC (Change Data Capture): Easy Steps for Real-time Data Load

on Tutorials • March 13th, 2020 • Write for Hevo

This post discusses the steps to set up a robust Microsoft SQL Server CDC (Change Data Capture) in detail. Here a snapshot of what the blog will cover:

  1. Introduction to Microsoft SQL Server
  2. Understanding SQL Server Change Data Capture (CDC)
  3. SQL Server Incremental Data Load
  4. Steps to Implement Change Data Capture in SQL Server
  5. SQL Server CDC: Limitations of Building Custom Code
  6. Exploring an Easier Approach to SQL Server CDC

Before we get started, let us briefly understand SQL Server and its features. 

Understanding Microsoft SQL Server

Microsoft SQL Server is a Relational Database Management System (RDBMS). If you or the company you work for have data, you use it to create and manage different databases. SQL Server has been around for about 3 decades which makes it a very mature product with an immense amount of features. It offers superior data security, integrity, reliability, and performance. It also has great integration with other Microsoft products such as ASP.NET, Excel, Sharepoint, Silverlight, etc.

The rise in popularity of data warehouse systems has led to a scenario where a combination of SQL Server transactional database and a separate analytical data warehouse is a common sight in a business’ data stack. A common challenge found in such cases is in implementing continuous real-time sync between Microsoft SQL Server and the data warehouse so that the analytics engines always have up to date data.

This is accomplished using the change data capture paradigm. This post details how to implement Microsoft SQL Server CDC (SQL Server change data capture) to copy data to various destinations in real-time.

What is SQL Server Change Data Capture (CDC)?

SQL Server CDC or Change Data Capture is the process of capturing and recording changes made to the Microsoft SQL Server database. CDC records INSERT, UPDATE, and DELETE operations performed on a source table and then publishes this information to a target table. Change Data Capturing (CDC) is asynchronous by default. It can be applied when building caches, messaging, search engines, backups, and as part of a larger solution to alleviate system failures.

Where can we use CDC in Business Intelligence?

A typical use case is when you have a source database with user data and a Data Warehouse with Data Scientists doing analysis and reporting.

In most big projects, you will find yourself being tasked with maintaining audit trails for tables. Let’s take for example this Users table:

IDFirstNameLastNameEmail
1JorgeRamosramos@yahoo.com
2AndrewStudwickandrew@aol.com
3ThomasTucheltuchel@gmail.com

If someone changes Jorge to Juan, then you want to have some mechanism in place that maintains some kind of audit trail table where you can retrieve the old value Jorge and the new value Juan. In this way, CDC helps us to capture DELETE, INSERT, and UPDATE events on table data.

Methods to Implement SQL Server CDC (Change Data Capture)

Changes in SQL Server data can be captured using these two methods:

  1. Using the SQL Server Change Data Capture feature
  2. Using Hevo, a No-code Data Pipeline

In this post, we are going to explore the SQL Server Change Data Capture (CDC) method and the steps required to achieve this. Our focus will be on how to use the CDC feature with the SQL Server Integration Services (SSIS) to achieve incremental data load.

SQL Server Incremental Data Load

In Incremental Data Load, you want to make sure that you are only processing the most recent data (i.e.) Just the data since the last ETL data load. Obviously, you don’t want to process every transaction from inception to date, but rather, you want to only process rows that have changed since the last ETL load.

This process problem space is the one we refer to as Change Data Capture. The term refers to the fact that we want to capture just the changed data from the source database within a specified window of time. 

The best way to achieve this is by letting SQL Server tell you. The SQL Server CDC feature tracks which rows in the source have changed in a table or tables. This allows you to only read the rows that have changed in the source. These operations have minimal impact on the database since CDC utilizes SQL Server logs. 

Enabling CDC is a 2 step process:

  1. Database Level – Enable CDC on the database level.
  2. Table Level – Define specific tables on which to enable CDC.

Prerequisites:

  1. To enable CDC functionality for the database you are planning to track, you must have “sysadmin” privileges.
  2. You should ensure that you’re running SQL Server Developer, Enterprise, or Standard Edition. CDC functionality is not supported on the Web and Express Editions.
  3. Change Data Capture requires that a SQL Server Agent is running on a SQL Server instance.

Steps to Implement Change Data Capture in SQL Server:

1. Open SQL Server Management Studio and create a database.

-- Create a database
CREATE DATABASE Source_DB

USE [Source_DB];
GO
EXEC sp_changedbowner 'admin'
GO

2. Create a table.

-- Create a Users table
CREATE TABLE Users
(
    ID int NOT NULL PRIMARY KEY,
    FirstName varchar(30),
    LastName varchar(30),
    Email varchar(50)
)

3. Enable CDC on the database.

We are going to enable SQL Server on our ‘Source_DB’ by executing the following code:

Enabling SQL Server CDC

‘1’ means CDC is enabled and ‘0’ means it’s not enabled.

4. Define the specific table on which to enable CDC.


-- Enable CDC on a specific table
EXEC sys.sp_cdc_enable_table
@source_schema = 'dbo',
@source_name = 'Users',
@role_name = NULL,
@supports_net_changes = 1

-- Check that CDC is enabled in the table
EXECUTE sys.sp_cdc_help_change_data_capture
@source_schema = 'dbo',
@source_name = 'Users';
GO

This will enable CDC on the Users table under the ‘dbo’ schema.

5. Insert some values into the table.

INSERT INTO Users Values (1, 'Jorge', 'Ramos', 'ramos@yahoo.com')
INSERT INTO Users Values (2, 'Andrew', 'Strudwick', 'andrew@yahoo.com')
INSERT INTO USERS Values (3, 'Thomas', 'Tuchel', 'tuchel@gmail.com')

-- Query the results of the changes captured from the Users table
SELECT * FROM [cdc].[dbo_Users_CT]
GO

Here is the result-set of changes captured for the above changes:

Inserting values to SQL server - blog image

You can see from the results that we’ve got our values in. The value ‘2’ in the column _$operation means that was an insert and on the last 4 columns, you will see all the data that was inserted.

The first 3 columns show the log sequence number.

6. Verify that CDC is working.
Let’s make a few modifications, delete 2 rows and update 1 row.

DELETE FROM Users WHERE ID = 1
UPDATE Users SET LastName = 'Snow' WHERE ID = 2
DELETE FROM Users WHERE ID = 3

When we look at our CDC table, you’ll see that there are some additional records.

SELECT * FROM [cdc].[dbo_Users_CT] GO
SQL Server CDC - Verifying its working

Operation code 1 means it is a delete, 3 was the value before the change and 4 is the new change.

SQL Server CDC: Limitations of Building Custom Code

  1. Using the SQL Server CDC feature as an audit solution will demand a substantial amount of effort to maintain and administer properly. The greatest challenge lies in configuring how long data should be kept in the change table or even whether you should create new tables to store the new change data.
  2. Change data capture is not supported in databases that are isolated from the SQL Server instance that hosts the database.
  3. If the SQL Server Agent service is not running, CDC capture jobs will not execute.

An Easier Approach for SQL Server CDC

Hevo, a No-code Data Pipeline

To alleviate these bottlenecks, you should consider using a managed Data Pipeline solution like Hevo. Hevo can help you to load data from source tables into target tables without having to write or maintain complex code. Hevo’s point and click interface ensure the lowest time to production possible.

Here’s how simple it to set up SQL Server CDC with Hevo:

  • Authenticate and connect your SQL Server data source
  • Select Change Data Capture as your replication mode
  • Point to the destination where you want to move data

Hevo supports SQL Server CDC out of the box and provides an easy to use graphical interface for enabling this feature. Additionally, you will get a granular activity trail of all the changes made and synced to any target database or data warehouse

Watch this quick product video to understand how Hevo can seamlessly help you achieve SQL Server CDC:

Alternately, experience the power and simplicity of implementing change data capture in SQL Server by signing up for a 14-day free trial with Hevo.

What are your thoughts about setting up SQL Server CDC? Let us know in the comments?

No-code Data Pipeline For SQL Server