Microsoft SQL Server CDC (Change Data Capture): 5 Easy Steps

on Tutorials, Data Driven Strategies, Data Processing • January 6th, 2022 • Write for Hevo

SQL Server CDC - Change Data Capture in SQL Server

Data is now considered to be one of the most valuable assets of any organization. It makes transactions within a business easier and facilitates a smooth flow of operations. With organizations relying on evidence-based decision-making more than ever before, data also acts as a key decision-making tool. Almost all companies today, irrespective of their market position or size, leverage the data collected to analyze their business & customers, and thereby make smarter and informed business decisions that help drive business growth and profitability.

Hence, businesses are constantly looking for ways to manage their data better. One of the most widely used practices is Change Data Capture. This article will help you understand what Change Data Capture (CDC) is and how you can set up Microsoft SQL Server CDC (Change Data Capture). You can also read our article about CDC tools.

Table of Contents

What is Microsoft SQL Server

SQL Server CDC: Microsoft SQL Server Logo
Image Source: https://deepinthecode.com/2019/01/31/use-is-null-rather-than-null-when-upgrading-to-newer-versions-of-sql-server/

Microsoft SQL Server is a well-known Relational Database Management System (RDBMS). Considering it is an RDBMS, its primary purpose is to store and retrieve data as per the requirements of the users and the applications connected to it. Microsoft SQL Server houses an implementation for a wide range of functionalities including Transactional Processing, Business Intelligence, Data Analytics, etc. that are required by most businesses today. It is considered to be one of the top three leading Database Technologies along with Oracle Database and IBM DB2.

Along with support for ANSI SQL, which is the popular Structured Query Language (SQL), Microsoft SQL Server comes with its own implementation of SQL called Transact SQL (T-SQL). T-SQL supports additional capabilities that allow users to declare variables, handle Exceptions, etc. The main interface tool for Microsoft SQL Server is called SQL Server Management Studio (SSMS).

More information on Microsoft SQL Server can be found here.

Understanding the Key Features of Microsoft SQL Server

The key features of Microsoft SQL Server are as follows:

  • Accelerated Data Recovery: There might be situations wherein a complex transaction being run on the Microsoft SQL Server fails or the Server crashes while performing an operation. In these situations, the database has to be recovered to its initial state by rolling back all incomplete transactions and operations. This process was considered to be time-consuming until Microsoft’s implementation of Accelerated Data Recovery in Microsoft SQL Server. The process that performs database recovery in Microsoft SQL Server has been redeveloped using Transaction Logs which has led to a drastic decrease in the time taken.
  • Advanced Encryption: Considering that databases store sensitive information for most businesses, it is essential for Database Providers to ensure that their databases are secure. Microsoft SQL Server 2016 introduced a new Encryption Technology called AlwaysEncrypted that allowed transparent Column Encryption without giving Database Administrators access to Decryption Keys.
  • Intelligent Query Processing: Microsoft SQL Server houses a robust in-built Query Optimizer that generates an execution plan for the queries being executed. Various improvements have been made to this Query Optimizer to ensure that the databases provide the best in class performance to their users. These improvements include Dynamic Memory Grants for Rowstore Tables, Batch Mode on Rowstore, Table Variable Deferred Compilation, etc.
  • Advanced Analytics: All data stored in Microsoft SQL Server can be easily leveraged to perform a comprehensive analysis as per requirements using Microsoft SQL Server Analysis Service (SSAS) which is a Data Mining and an Online Analytical Processing (OLAP) tool in Microsoft SQL Server.
  • Advanced Integrations: Microsoft SQL Server also allows users to perform a broad range of tasks such as Data Extraction, Transformation, Loading, Data Migration, etc. using the Microsoft SQL Server Integration Services (SSIS). SSIS can be used to extract and transform data from various sources such as Relational Databases, Flat Files, XML Files, etc. using its Graphical User Interface (GUI) without writing a single line of code.

What is Microsoft SQL Server CDC (Change Data Capture)

Microsoft SQL Server CDC (Change Data Capture) is the process of capturing and recording changes made to the Microsoft SQL Server database. CDC records INSERT, UPDATE, and DELETE operations performed on a source table and then publishes this information to a target table. Microsoft SQL Server CDC (Change Data Capture) is asynchronous by default. It can be applied when building caches, messaging, search engines, backups, and as part of a larger solution to alleviate system failures.

A typical use case is when you have a source database with user data and a Data Warehouse with Data Scientists performing the necessary analysis and reporting their insights. In most big projects, you will find yourself being tasked with maintaining audit trails for tables. For example, consider the following table:

IDFirstNameLastNameEmail
1JorgeRamosramos@yahoo.com
2AndrewStudwickandrew@aol.com
3ThomasTucheltuchel@gmail.com

If someone changes Jorge to Juan, then you want to have some mechanism in place that maintains some kind of audit trail table where you can retrieve the old value Jorge and the new value Juan. In this way, Microsoft SQL Server CDC (Change Data Capture) helps us to capture DELETE, INSERT, and UPDATE events on table data.

Simplify Microsoft SQL Server ETL with Hevo’s No-code Data Pipelines

Hevo is a No-code Data Pipeline that offers a fully managed solution to set up data integration from Microsoft SQL Server and 100+ data sources (including 40+ free data sources) to numerous Business Intelligence tools, Data Warehouses, or a destination of choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully-automated solution to manage data in real-time and always have analysis-ready data.

Get started with hevo for free

Let’s look at Some Salient Features of Hevo:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-day free trial!

Understanding Microsoft SQL Server Incremental Data Load

In Incremental Data Load, you want to make sure that you are only processing the most recent data, i.e., just the data since the last ETL data load. Obviously, you don’t want to process every transaction from inception to date, but rather, you want to only process rows that have changed since the last ETL load.

This process problem space is the one that is referred to as Change Data Capture. The term refers to the fact that you want to capture just the changed data from the source database within a specified window of time. 

Microsoft SQL Server CDC (Change Data Capture) feature tracks which rows in the source have changed in a table or tables and only read those rows. These operations have minimal impact on the database since Microsoft SQL Server CDC (Change Data Capture) utilizes SQL Server logs. Microsoft SQL Server CDC (Change Data Capture) can either be enabled at the Database Level or at the Table Level.

Download the Ultimate Guide on Database Replication
Download the Ultimate Guide on Database Replication
Download the Ultimate Guide on Database Replication
Learn the 3 ways to replicate databases & which one you should prefer.

Installation and Architecture

SQL Server CDC needs a SQL Server Agent running on an SQL Server interface. When you enable the feature on a SQL Server database table, two SQL Server Agent jobs will get created for that specific database. The first job is responsible for populating the database change tables with the changed information. The second job cleans up the change tables by deleting the records older than the configurable retention period of 3 days.

Change Data Capture relies on the SQL Server Transaction Log as the source of the data changes. Whenever a change is applied, it gets written to the Transaction Log File. If the CDC feature is enabled on that table, the transaction log replication Log Reader agent, which serves as the capture process for the CDC feature, will read the changelogs from the Transaction Log File itself. On top of it, it adds the metadata information about these changes and writes it to the associated SQL Server CDC Change Tables, as depicted in the figure given below:

SQL Server CDC: Data Flow
Image Source

Change Data Capture as an Audit Solution

You can even leverage Change Data Capture as an asynchronous SQL Server Audit Solution. This can help you audit and track the table’s DML changes such as UPDATE, INSERT, and DELETE operations. It, however, does not provide any options for the SELECT statements. Change Data Capture is deemed as a good Audit Solution since it can be configured easily with the help of a couple of T-SQL commands. This provides you with historical information about the values before the modification process along with detailed information about the data modification process.

Enabling CDC (Change Data Capture)

Users can enable Microsoft SQL Server CDC (Change Data Capture) by implementing the following steps:

  • Step 1: Open Microsoft SQL Server Management Studio and create a new database by executing the following query:
-- Create a database
CREATE DATABASE Source_DB

USE [Source_DB];
GO
EXEC sp_changedbowner 'admin'
GO
  • Step 2: Create a table in the database created in the last step by executing the following query:
-- Create a Users table
CREATE TABLE Users
(
    ID int NOT NULL PRIMARY KEY,
    FirstName varchar(30),
    LastName varchar(30),
    Email varchar(50)
)
  • Step 3: You can now enable Microsoft SQL Server CDC (Change Data Capture) for the entire database by executing the following query:
-- ====  
-- Enable Database for CDC template   
-- ====  
USE Source_DB  
GO  
EXEC sys.sp_cdc_enable_db  
GO

This will enable Microsoft SQL Server CDC (Change Data Capture) on the Users table under the dbo Schema. You can also confirm if Change Data Capture (CDC) was enabled for a given table or not by executing the following query:

-- Check that CDC is enabled on the database
SELECT name, is_cdc_enabled
FROM sys.databases WHERE database_id = DB_ID();

You will see a response similar to the following:

SQL Server CDC: CDC Enable Check

Alternatively, you can enable Microsoft SQL Server CDC (Change Data Capture) for specific tables in the database by executing the following query:

-- =========  
-- Enable a Table Specifying Filegroup Option Template  
-- =========  
USE Source_DB  
GO  
  
EXEC sys.sp_cdc_enable_table  
@source_schema = N'dbo',  
@source_name   = N'Users',  
@role_name     = N'admin',  
@filegroup_name = N'MyDB_CT',  
@supports_net_changes = 1  
GO  
  • Step 4: Insert some values into the table by executing the following queries:
INSERT INTO Users Values (1, 'Jorge', 'Ramos', 'ramos@yahoo.com')
INSERT INTO Users Values (2, 'Andrew', 'Strudwick', 'andrew@yahoo.com')
INSERT INTO USERS Values (3, 'Thomas', 'Tuchel', 'tuchel@gmail.com')

-- Query the results of the changes captured from the Users table
SELECT * FROM [cdc].[dbo_Users_CT]
GO

Here is the result-set of changes captured for the above changes:

SQL Server CDC: CDC Insert Log

You can see from the results that the new rows have been inserted and changes have been captured. The value ‘2’ in the column _$operation means that was an insert and on the last 4 columns, you will see all the data that was inserted. The first 3 columns show the log sequence number.

  • Step 5: You can now verify if Microsoft SQL Server CDC (Change Data Capture) is working by making a few changes to the existing data in the table:
DELETE FROM Users WHERE ID = 1
UPDATE Users SET LastName = 'Snow' WHERE ID = 2
DELETE FROM Users WHERE ID = 3

You can view your Microsoft SQL Server CDC (Change Data Capture) table by executing the following query:

SELECT * FROM [cdc].[dbo_Users_CT] GO
SQL Server CDC: CDC Insert, Delete, Update Logs

Operation code 1 means that a delete operation was executed, 3 was the value before the change and 4 is the new change.

Disabling CDC

The Change Data Capture can easily be disabled on a given table with the help of the sys.sp_cdc_disable_table system stored procedure, as depicted below:

SQL Server CDC: DCD Disable Partial
Image Source

You can even disable SQL Server CDC completely at the database level, without the need to disable it on CDC-enabled tables one at a time. You can again use the sys.sp_cdc_disable_table system stored procedure, as depicted below:

SQL Server CDC: Complete Disabling
Image Source

Auditing DML Changes

Once you’ve enabled CDC on the database table, you can perform the database DML changes easily. To check the impact of these changes and how they will be audited with the help of the CDC feature, follow along with the steps mentioned below:

SQL Server CDC: Auditing DML Changes 1
Image Source
  • Step 1: As discussed previously, the data modifications get written in the Change table related to the CDC-enabled table. For this situation, it is the [cdc].[dbo_Employee_Main_CT] table. To look at all the records inserted recently to the source table, you can query the Change table for all operations with type 2 along with complete information about the INSERT operation. This includes the inserted values, as mentioned below:
SQL Server CDC: Auditing DML Changes 2
Image Source
  • Step 2: By querying the Change table for all the operations with type 1, will result in all the records being deleted recently from the source table, with the values of the deleted records, will be shown as follows:
SQL Server CDC: Auditing DML Changes 3
Image Source
  • Step 3: Finally, you can query the Change table for the operations with types 3 and 4. This allows you to track the UPDATE statement and display the values before the update, under operation type 3. This would be followed by the value after the change, as mentioned under operation type 4.
SQL Server CDC: Auditing DML Changes 4
Image Source
  • Step 4: Microsoft, however, does not recommend querying the Change tables. Instead, you can query the CDC.fn_cdc_get_all_changes system function related to the SQL Server CDC-enabled table as shown:
SQL Server CDC: Auditing DML Changes 5
Image Source
  • Step 5: The CDC.fn_cdc_get_all_changes function can be queried as long as you provide the @row_filter_option, @from_lsn, and @to_lsn parameters. These parameters will be instrumental in retrieving all the DML changes information in the SQL Server CDC table, as depicted below:
SQL Server CDC: Auditing DML Changes 6
Image Source

Limitations of Microsoft SQL Server CDC (Change Data Capture)

The limitations associated with Microsoft SQL Server CDC (Change Data Capture) are as follows:

  • Using the Microsoft SQL Server CDC (Change Data Capture) feature as an audit solution will demand a substantial amount of effort to maintain and administer properly. The greatest challenge lies in configuring how long data should be kept in the Change Table or even whether you should create new tables to store the new change data.
  • Microsoft SQL Server CDC (Change Data Capture) is not supported in databases that are isolated from the Microsoft SQL Server instance that hosts the database.
  • If the Microsoft SQL Server Agent service is not running, Microsoft SQL Server CDC (Change Data Capture) capture jobs will not execute.

Conclusion

This article helped you understand how you can set up Microsoft SQL Server CDC (Change Data Capture) with ease. It provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently.

Most businesses today, however, have an extremely high volume of data with a dynamic structure. Creating a Data Pipeline from scratch for such data is a complex process since businesses will have to utilize a high amount of resources to develop it and then ensure that it can keep up with the increased data volume and Schema variations. Businesses can instead use automated platforms like Hevo.

visit our website to explore hevo

Hevo helps you directly transfer data from a source of your choice to a Data Warehouse or desired destination in a fully automated and secure manner without having to write the code or export data repeatedly. It will make your life easier and make data migration hassle-free. It is User-Friendly, Reliable, and Secure.

Details on Hevo pricing can be found here. Give Hevo a try and sign up for a 14-Day free trial.

No-code Data Pipeline For SQL Server