Data is now considered to be one of the most valuable assets of any organization. It makes transactions within a business easier and facilitates a smooth flow of operations. With organizations relying on evidence-based decision-making more than ever before, data also acts as a key decision-making tool. Almost all companies today, irrespective of their market position or size, leverage the data collected to analyze their business & customers, and thereby make smarter and informed business decisions that help drive business growth and profitability.
Hence, businesses are constantly looking for ways to manage their data better. One of the most widely used practices is Change Data Capture. This article will help you understand what Change Data Capture (CDC) is and how you can set up Change Data Capture Microsoft SQL Server CDC. You can also read our article about CDC tools.
What is Microsoft SQL Server?
Microsoft SQL Server is a well-known Relational Database Management System (RDBMS). Considering it is an RDBMS, its primary purpose is to store and retrieve data as per the requirements of the users and the applications connected to it. Microsoft SQL Server houses an implementation for a wide range of functionalities including Transactional Processing, Business Intelligence, Data Analytics, etc. that are required by most businesses today. It is considered to be one of the top three leading Database Technologies along with Oracle Database and IBM DB2.
MSSQL CDC (Change Data Capture) is essential for real-time data replication and synchronization. Try Hevo’s no-code platform and see how Hevo has helped customers across 45+ countries by offering:
- Real-time data replication with ease.
- CDC Query Mode for capturing both inserts and updates.
- 150+ connectors(including 60+ free sources)
Don’t just take our word for it—listen to customers, such as Thoughtspot, Postman, and many more, to see why we’re rated 4.3/5 on G2.
Get Started with Hevo for Free
Along with support for ANSI SQL, which is the popular Structured Query Language (SQL), Microsoft SQL Server comes with its own implementation of SQL called Transact SQL (T-SQL). T-SQL supports additional capabilities that allow users to declare variables, handle Exceptions, etc. The main interface tool for Microsoft SQL Server is called SQL Server Management Studio (SSMS).
More information on Microsoft SQL Server can be found here.
Key Features of Microsoft SQL Server
The key features of Microsoft SQL Server are as follows:
- Accelerated Data Recovery: There might be situations wherein a complex transaction being run on the Microsoft SQL Server fails or the Server crashes while performing an operation. In these situations, the database has to be recovered to its initial state by rolling back all incomplete transactions and operations. This process was considered to be time-consuming until Microsoft’s implementation of Accelerated Data Recovery in Microsoft SQL Server. The process that performs database recovery in Microsoft SQL Server has been redeveloped using Transaction Logs which has led to a drastic decrease in the time taken.
- Advanced Encryption: Considering that databases store sensitive information for most businesses, it is essential for Database Providers to ensure that their databases are secure. Microsoft SQL Server 2016 introduced a new Encryption Technology called AlwaysEncrypted that allowed transparent Column Encryption without giving Database Administrators access to Decryption Keys.
- Intelligent Query Processing: Microsoft SQL Server houses a robust in-built Query Optimizer that generates an execution plan for the queries being executed. Various improvements have been made to this Query Optimizer to ensure that the databases provide the best in class performance to their users. These improvements include Dynamic Memory Grants for Rowstore Tables, Batch Mode on Rowstore, Table Variable Deferred Compilation, etc.
- Advanced Analytics: All data stored in Microsoft SQL Server can be easily leveraged to perform a comprehensive analysis as per requirements using Microsoft SQL Server Analysis Service (SSAS) which is a Data Mining and an Online Analytical Processing (OLAP) tool in Microsoft SQL Server.
- Advanced Integrations: Microsoft SQL Server also allows users to perform a broad range of tasks such as Data Extraction, Transformation, Loading, Data Migration, etc. using the Microsoft SQL Server Integration Services (SSIS). SSIS can be used to extract and transform data from various sources such as Relational Databases, Flat Files, XML Files, etc. using its Graphical User Interface (GUI) without writing a single line of code.
What is SQL Server CDC (Change Data Capture)?
SQL Server CDC (Change Data Capture) is the process of capturing and recording changes made to the Microsoft SQL Server database. CDC records INSERT, UPDATE, and DELETE operations performed on a source table and then publishes this information to a target table. Microsoft SQL Server CDC (Change Data Capture) is asynchronous by default. It can be applied when building caches, messaging, search engines, backups, and as part of a larger solution to alleviate system failures.
A typical use case is when you have a source database with user data and a Data Warehouse with Data Scientists performing the necessary analysis and reporting their insights. In most big projects, you will find yourself being tasked with maintaining audit trails for tables. For example, consider the following table:
ID | FirstName | LastName | Email |
1 | Jorge | Ramos | ramos@yahoo.com |
2 | Andrew | Studwick | andrew@aol.com |
3 | Thomas | Tuchel | tuchel@gmail.com |
If someone changes Jorge to Juan, then you want to have some mechanism in place that maintains some kind of audit trail table where you can retrieve the old value Jorge and the new value Juan. In this way, Microsoft SQL Server CDC (Change Data Capture) helps us to capture DELETE, INSERT, and UPDATE events on table data.
Understanding Microsoft SQL Server Incremental Data Load
In Incremental Data Load, you want to make sure that you are only processing the most recent data, i.e., just the data since the last ETL data load. Obviously, you don’t want to process every transaction from inception to date, but rather, you want to only process rows that have changed since the last ETL load.
This process problem space is the one that is referred to as Change Data Capture. The term refers to the fact that you want to capture just the changed data from the source database within a specified window of time.
Microsoft SQL Server CDC (Change Data Capture) feature tracks which rows in the source have changed in a table or tables and only reads those rows. These operations have minimal impact on the database since Microsoft SQL Server CDC (Change Data Capture) utilizes SQL Server logs. Microsoft SQL Server CDC (Change Data Capture) can either be enabled at the Database Level or at the Table Level.
Download the Ultimate Guide on Database Replication
Learn the 3 ways to replicate databases & which one you should prefer.
Installation and Architecture
SQL Server CDC needs a SQL Server Agent running on an SQL Server interface. When you enable the feature on a SQL Server database table, two SQL Server Agent jobs will get created for that specific database. The first job is responsible for populating the database change tables with the changed information. The second job cleans up the change tables by deleting the records older than the configurable retention period of 3 days.
Change Data Capture relies on the SQL Server Transaction Log as the source of the data changes. Whenever a change is applied, it gets written to the Transaction Log File. If the CDC feature is enabled on that table, the transaction log replication Log Reader agent, which serves as the capture process for the CDC feature, will read the changelogs from the Transaction Log File itself. On top of it, it adds the metadata information about these changes and writes it to the associated SQL Server CDC Change Tables, as depicted in the figure given below:
Change Data Capture as an Audit Solution
You can even leverage Change Data Capture as an asynchronous SQL Server Audit Solution. This can help you audit and track the table’s DML changes such as UPDATE, INSERT, and DELETE operations. It, however, does not provide any options for the SELECT statements. Change Data Capture is deemed as a good Audit Solution since it can be configured easily with the help of a couple of T-SQL commands. This provides you with historical information about the values before the modification process along with detailed information about the data modification process.
How to Enable CDC (Change Data Capture): 5 Steps
Users can enable Microsoft SQL Server CDC (Change Data Capture) by implementing the following steps:
- Step 1: Open Microsoft SQL Server Management Studio and create a new database by executing the following query:
-- Create a database
CREATE DATABASE Source_DB
USE [Source_DB];
GO
EXEC sp_changedbowner 'admin'
GO
- Step 2: Create a table in the database created in the last step by executing the following query:
-- Create a Users table
CREATE TABLE Users
(
ID int NOT NULL PRIMARY KEY,
FirstName varchar(30),
LastName varchar(30),
Email varchar(50)
)
- Step 3: You can now enable Microsoft SQL Server CDC (Change Data Capture) for the entire database by executing the following query:
-- ====
-- Enable Database for CDC template
-- ====
USE Source_DB
GO
EXEC sys.sp_cdc_enable_db
GO
This will enable Microsoft SQL CDC (Change Data Capture) on the Users table under the dbo Schema. You can also confirm if Change Data Capture (CDC) was enabled for a given table or not by executing the following query:
-- Check that CDC is enabled on the database
SELECT name, is_cdc_enabled
FROM sys.databases WHERE database_id = DB_ID();
You will see a response similar to the following:
Alternatively, you can enable Microsoft SQL Server CDC (Change Data Capture) for specific tables in the database by executing the following query:
-- =========
-- Enable a Table Specifying Filegroup Option Template
-- =========
USE Source_DB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'Users',
@role_name = N'admin',
@filegroup_name = N'MyDB_CT',
@supports_net_changes = 1
GO
- Step 4: Insert some values into the table by executing the following queries:
INSERT INTO Users Values (1, 'Jorge', 'Ramos', 'ramos@yahoo.com')
INSERT INTO Users Values (2, 'Andrew', 'Strudwick', 'andrew@yahoo.com')
INSERT INTO USERS Values (3, 'Thomas', 'Tuchel', 'tuchel@gmail.com')
-- Query the results of the changes captured from the Users table
SELECT * FROM [cdc].[dbo_Users_CT]
GO
Here is the result-set of changes captured for the above changes:
You can see from the results that the new rows have been inserted and changes have been captured. The value ‘2’ in the column _$operation means that was an insert and on the last 4 columns, you will see all the data that was inserted. The first 3 columns show the log sequence number.
- Step 5: You can now verify if Microsoft SQL Server CDC (Change Data Capture) is working by making a few changes to the existing data in the table:
DELETE FROM Users WHERE ID = 1
UPDATE Users SET LastName = 'Snow' WHERE ID = 2
DELETE FROM Users WHERE ID = 3
You can view your Microsoft SQL Server CDC (Change Data Capture) table by executing the following query:
SELECT * FROM [cdc].[dbo_Users_CT] GO
Operation code 1 means that a delete operation was executed, 3 was the value before the change and 4 is the new change.
Integrate MS SQL Server to BigQuery
Integrate MS SQL Server to Databricks
Integrate MS SQL Server to Redshift
Disabling CDC
The Change Data Capture can easily be disabled on a given table with the help of the sys.sp_cdc_disable_table system stored procedure, as depicted below:
You can even disable SQL Server CDC completely at the database level, without the need to disable it on CDC-enabled tables one at a time. You can again use the sys.sp_cdc_disable_table system stored procedure, as depicted below:
Auditing DML Changes
Once you’ve enabled CDC on the database table, you can perform the database DML changes easily. To check the impact of these changes and how they will be audited with the help of the CDC feature, follow along with the steps mentioned below:
- Step 1: As discussed previously, the data modifications get written in the Change table related to the CDC-enabled table. For this situation, it is the [cdc].[dbo_Employee_Main_CT] table. To look at all the records inserted recently to the source table, you can query the Change table for all operations with type 2 along with complete information about the INSERT operation. This includes the inserted values, as mentioned below:
- Step 2: By querying the Change table for all the operations with type 1, will result in all the records being deleted recently from the source table, with the values of the deleted records, will be shown as follows:
- Step 3: Finally, you can query the Change table for the operations with types 3 and 4. This allows you to track the UPDATE statement and display the values before the update, under operation type 3. This would be followed by the value after the change, as mentioned under operation type 4.
- Step 4: Microsoft, however, does not recommend querying the Change tables. Instead, you can query the CDC.fn_cdc_get_all_changes system function related to the SQL Server CDC-enabled table as shown:
- Step 5: The CDC.fn_cdc_get_all_changes function can be queried as long as you provide the @row_filter_option, @from_lsn, and @to_lsn parameters. These parameters will be instrumental in retrieving all the DML changes information in the SQL Server CDC table, as depicted below:
SQL Server Change Data Capture: Pros & Cons
Limitations of SQL Server CDC (Change Data Capture)
The limitations associated with Microsoft SQL Server CDC (Change Data Capture) are as follows:
- Using the SQL Server CDC (Change Data Capture) feature as an audit solution will demand a substantial amount of effort to maintain and administer properly. The greatest challenge lies in configuring how long data should be kept in the Change Table or even whether you should create new tables to store the new change data.
- Microsoft SQL Server CDC (Change Data Capture) is not supported in databases that are isolated from the Microsoft SQL Server instance that hosts the database.
- If the Microsoft SQL Server Agent service is not running, Microsoft SQL Server CDC (Change Data Capture) capture jobs will not execute.
Migrate Data from SQL Server seamlessly Within Minutes!
No credit card required
Advantages of SQL Server CDC (Change Data Capture)
The advantages of SQL Server CDC (Change Data Capture) are as follows:
- The advantage of SQL Server Change Data Capture is that the majority of SQL Server replication mechanisms call for the tables to have a primary key. Since incremental data and real-time replication may be accomplished without primary keys on the tables to be duplicated, CDC will support tables without a primary key, and this is the use case it is most beneficial for.
- As CDC data is duplicated to the replicas, the CDC technique can be used for Always On Availability Groups on SQL Server.
SQL Server CDC Performance: Best Practices
- All tables enabled for CDC should have primary keys. CDC relies on these keys to track changes efficiently.
- Creating nonclustered indexes on frequently used columns in CDC queries may be worthwhile for their performance improvement.
- CDC captures all changes against the transaction log. The log size should be large enough to support a high volume of transactions without becoming too large.
- Depending on the workload, work out an appropriate CDC capture job frequency; for example, high transactional systems may require a more frequent schedule to reduce latency.
Conclusion
This article will help you understand how you can set up SQL Server CDC (Change Data Capture) with ease. It provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently.
Most businesses today, however, have an extremely high volume of data with a dynamic structure. Creating a data pipeline from scratch for such data is a complex process since businesses will have to utilize a large amount of resources to develop it and then ensure that it can keep up with the increased data volume and schema variations. Businesses can instead use automated platforms like Hevo.
Hevo helps you directly transfer data from a source of your choice to a Data Warehouse or desired destination in a fully automated and secure manner without having to write the code or export data repeatedly. It will make your life easier and make data migration hassle-free. It is User-Friendly, Reliable, and Secure. You can schedule a personalized demo to learn more about Hevo.
FAQ on SQL Server CDC
What is CDC in SQL Server?
Change Data Capture (CDC) in SQL Server is a feature that tracks and captures changes (inserts, updates, and deletes) in tables. It logs these changes in separate tables, allowing you to replicate or analyze them without affecting the original source tables.
What is the role of CDC in SQL?
CDC enables you to track changes made to SQL Server tables over time, making it useful for auditing, synchronization, and data replication tasks. It helps ensure that data changes are captured and made available for further processing, especially in ETL and replication scenarios.
How do I enable CDC in SQL Server?
1. Ensure SQL Server Agent is running.
2. Enable CDC at the database level using the command:
EXEC sys.sp_cdc_enable_db;
3. Enable CDC for specific tables using:
EXEC sys.sp_cdc_enable_table @source_schema = 'schema_name', @source_name = 'table_name', @role_name = NULL;
This tracks the changes for the given table.
Vivek Sinha is a seasoned product leader with over 10 years of expertise in revolutionizing real-time analytics and cloud-native technologies. He specializes in enhancing Apache Pinot, focusing on query processing and data mutability. Vivek is renowned for his strategic vision and ability to deliver cutting-edge solutions that empower businesses to harness the full potential of their data.