What is Snowflake Change Data Capture (CDC)? | Simplified

on CDC, Change Data Capture, Data Integration, Data Replication, Data Warehouses, Snowflake • June 27th, 2022 • Write for Hevo

Snowflake Change Data Capture Featured Image | Hevo Data

In today’s world, data in an operational system changes quite frequently. Traditional Data Warehouses, on the other hand, do not contain up-to-date data. Up-to-date data is stored in operational systems before being put into the Data Warehouse at a set frequency. Snowflake is a modern Data Warehouse that has become an industry-leading Cloud-Based SaaS (Software-as-a-service) Data Platform. Snowflake Change Data Capture (CDC) makes it easy to detect new changes/data in a table for efficient processing.

In today’s data-driven economy, organizations are relying on evidence-based decision-making more than ever before. Today, almost all companies collect data to better understand their business and customers. The valuable insights extracted from the company data help in making smarter and informed business decisions that help drive business growth and profitability. Hence, it becomes increasingly important for companies to manage their data in a better and more efficient way, and this is where Change Data Capture comes in.

This article is all about Snowflake Change Data Capture and how it keeps you updated with the current data changes. But before getting started with Snowflake Change Data Capture, let’s discuss Snowflake and its salient features in brief.

Table of Contents

What is Snowflake?

Snowflake Change Data Capture: Snowflake Logo | Hevo Data
Image Source: www.commons.wikimedia.org

Snowflake is a modern Cloud Data Warehousing solution provided as a SaaS offering. It is built on Amazon Web Service, Microsoft Azure, or Google Cloud infrastructure that provides an unbounded platform for storing and retrieving data. Snowflake Data Warehouse makes use of a proprietary SQL Database Engine with a Cloud-specific architecture.

Snowflake requires no hardware or software to install, configure, or manage, making it suitable for enterprises that do not want to devote resources specifically to in-house server setup, maintenance, or support. Snowflake’s security and sharing features make it easy for businesses to securely exchange and share data in real-time using any ETL solution. Snowflake’s architecture allows for Big Data flexibility. When compared to other Data Warehouses in the market, Snowflake is noted for its scalability and relative ease of use.

Key Features of Snowflake

Below are the reasons responsible for the immense popularity of Snowflake.

  • Caching Paradigm: Snowflake employs a Caching Paradigm to quickly deliver results from the cache. It leverages Persistent Query results to avoid re-generation of the report.
  • Standard and Extended SQL Support: Snowflake supports most of the DDL and DML commands of SQL for querying data. In addition to that, it also supports advanced DML transactions, stored procedures, lateral views, etc.
  • Scalability: It is a well-known fact that Snowflake’s architecture separates its “Compute” and “Storage” units, thereby allowing them to scale differently. As a result, customers can pay only for the services used.
  • Security: Snowflake supports a variety of improved mechanisms for authentication. This includes Two-Factor Authentication and SSO via Federated Authentication.
  • Semi-Structured Data Support: Snowflake’s architecture supports the VARIANT schema on the Read data type that enables the storage of Structured and Semi-Structured data in the same area.

Replicate Data in Snowflake in Minutes Using Hevo’s No-Code Data Pipeline

Hevo Data, a Fully-managed Data Pipeline platform, can help you automate, simplify & enrich your data replication process in a few clicks. With Hevo’s wide variety of connectors and blazing-fast Data Pipelines, you can extract & load data from 100+ Data Sources straight into Snowflake or a Data Warehouse of your choice. Streamline your Snowflake Change Data Capture and prepare your data for analysis using Hevo’s robust & built-in Transformation Layer without writing a single line of code!

Get started with hevo for free

Hevo is the fastest, easiest, and most reliable data replication platform that will save your engineering bandwidth and time multifold. Try our 14-day full access free trial today to experience an entirely automated hassle-free Data Replication!

What is Change Data Capture (CDC)?

Snowflake Change Data Capture: CDC | Hevo Data
Image Source: www.hazelcast.com

Change Data Capture (CDC) is an ideal solution to capture the near real-time movement of data in Databases. CDC refers to the collection of software design patterns used to detect and track data changes in a Database. It triggers the event connected with data, causing a specific action to be executed for any Change Data Capture. All companies require access to real-time Data Streams to perform efficient Data Analytics and CDC provides near-real-time movement of data by processing data as soon as new database events occur.

Events are captured and streamed in real-time using CDC and it helps to achieve reliable, low-latency, and scalable data replication in high-velocity data environments. It eliminates the process of bulk data loading by implementing incremental data loading. This way, Data Warehouses or Databases remain active in order to execute specific actions as soon as a Change Data Capture event occurs. Furthermore, companies can send fresh data updates to BI (Business Intelligence) tools and team members in near real-time with CDC, keeping them up to date.

What is Snowflake Streams?

In today’s data-driven economy, data in your systems change quite frequently, and it would be a complex task to load the data fully into Snowflake every time. It will cost you both time and money. This is where Snowflake Change Data Capture (CDC) comes in. You can implement CDC in Snowflake effectively by just using a few commands, and this is made possible with the concept of Streams in Snowflake.

A Snowflake Stream object basically tracks all DML changes made to rows in a source table and stores the metadata of each change. This metadata between two transactional points of time in a table is used later in order to retrieve the changed data.

Snowflake Streams capture an initial snapshot of all the rows present in the source table as the current version of the table with respect to an initial point in time. Streams then enable Change Data Capture every time you insert, update, or delete data in your source table. The Streams will have additional columns whenever any DML change is committed. So, by capturing the CDC Events you can easily merge just the changes from source to target using the MERGE statement.

How to Setup Snowflake Change Data Capture with Streams?

To get started with Snowflake Change Data Capture, log in to Snowflake Web-Based UI or SnowSQL. Then, follow the below-mentioned steps.

Step 1: Run the following command to create a CDC_Stream.

create or replace database CDC_STREAM;
Use CDC_STREAM;

Step 2: You’ll need a source and a destination table to operate upon. Run the following command to create a source table named “employees_source”.

create or replace table employees_source (
id int,
first_name varchar(200),
last_name varchar(200) )

Step 3: Run the following command to create a destination table named “employees_destination”.

create or replace table employees_destination ( id int, first_name varchar(200), last_name varchar(200) );

Step 4: Run the following command to create a stream on top of the source table. This stream will track any data changes made in the EMPLOYEES table.

create or replace stream employee_stream on table employees_source;

Note: As long as there is no data management command to consume it, any changes to the source table with respect to data would be considered as INSERT and not UPDATE.

For the purpose of this demonstration, let’s add a couple of records to the employees_source table. Run the following command to do so.

insert into employees_source values (1,’Ron’,’Higgins’);
insert into employees_source values (2,’Joseph’,’Geller’);
insert into employees_source values (3,’Charlie’,’Bricks’);
insert into employees_source values (4,’Mathew’,’Smith’);
insert into employees_source values (5,’Daisy’,’Whelsh’);
insert into employees_source values (6,’Scott’,’Hall’);
insert into employees_source values (7,’Jose’,’Martinez’);
insert into employees_source values (8,’Bryan’,’Son’);
insert into employees_source values (9,’Nisha’,’Grant’);
insert into employees_source values (10,’Hamilton’,’Gell’);
insert into employees_source values (11,’Larry’,’Armstrong’);
insert into employees_source values (12,’Emilia’,’Rodrigez’);
insert into employees_source values (13,’Kate’,’Becket’);

Step 5: Let’s view the change log in the stream before proceeding.

select * from employee_stream;

As you can see, here are the records and metadata fields. There are 3 additional META columns (METADATA$ROW_ID, METADATA$ISUPDATE, and METADATA$ACTION) introduced and they make it very easy to detect if a row has been updated, deleted, or inserted. And, since it’s an initial load everything is inserted. 

Now, there are no records in the destination table as they’re still in the stream.

Step 6: Let’s move the records to the destination using the MERGE statement.

Use CDC_STREAM;
MERGE into employees_destination as T
using (select *
from employee_stream) AS S
ON T.id = s.id
when matched AND S.metadata$action = ‘INSERT’ AND S.metadata$isupdate
THEN
update set T.first_name = S.first_name, T.last_name = S.last_name
When matched
And S.metadata$action = ‘DELETE’ THEN DELETE
when not matched
And S.metadata$action = ‘INSERT’ THEN
INSERT (id,
first_name,
last_name)
VALUES (S.id,
S.first_name,
S.last_name);

Now, the data has moved to the destination and there will be nothing in the stream. After the MERGE command has consumed, the stream object will become empty.

The cycle continues, and the stream will continue to record all the changes that happen in the employees_source table.

Step 7: Let’s view the destination table now. Run the following command to do so.

select * from employees_destination;

As you can see, the destination table is updated with the records.

What Makes Hevo’s Snowflake ETL Process Best-In-Class

Providing a high-quality ETL solution can be a difficult task if you have a large volume of data. Hevo’s automated, No-code platform empowers you with everything you need to have for a smooth Snowflake Change Data Capture and data replication experience.

Check out what makes Hevo amazing:

  • Fully Managed: Hevo requires no management and maintenance as it is a fully automated platform.
  • Data Transformation: Hevo provides a simple interface to perfect, modify, and enrich the data you want to transfer.
  • Faster Insight Generation: Hevo offers near real-time data replication so you have access to real-time insight generation and faster decision making. 
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (with 40+ free sources) that can help you scale your data infrastructure as required.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-day free trial!

Step 8: Now, it’s time to demonstrate the Snowflake Change Data Capture. For that purpose, go ahead and make a change in the source table. Let’s update a record and observe the stream.

update employees_source
set last_name="Harper"
where id=1;

This basically sets the last_name as “Harper” for id=1.

  • While observing the stream, you’ll find 2 records, INSERT and DELETE. This is because the original last_name (i.e Higgins) was deleted for id=1 and the new last_name (i.e Harper) was updated.
  • Use the MERGE command to update the destination table as shown below.
Use CDC_STREAM;
MERGE into employees_destination as T
using (select *
from employee_stream
Where Not (metadata$action = ‘DELETE’ AND metadata$isupdate = TRUE)) AS S
ON T.id = s.id
when matched AND S.metadata$action = ‘INSERT’ AND S.metadata$isupdate
THEN
update set T.first_name = S.first_name, T.last_name = S.last_name
When matched
And S.metadata$action = ‘DELETE’ THEN DELETE
when not matched
And S.metadata$action = ‘INSERT’ THEN
INSERT (id,
first_name,
last_name)
VALUES (S.id,
S.first_name,
S.last_name);

Step 9: Run the following command to take a look at the destination table.

select * from employees_destination;

Here, you will be clearly able to observe the changes made to the source table updated in the destination table. That’s it, that’s how easy Snowflake Change Data Capture is.

For a better understanding, you can try out Snowflake Change Data Capture by deleting or updating other records in the source table. And, don’t forget to use the MERGE command for the changes to reflect in the destination table.

Conclusion

Snowflake Change Data Capture has totally replaced the old ways of implementing CDC. Although Snowflake is already an industry-leading Cloud-Based Data Platform known for its speed and flexible warehousing options, Snowflake CDCs make it much worth it to use.

Snowflake Change Data Capture proves to be very helpful in cases where millions of records get transacted on a daily basis but you only want to update the modified ones. Doing a full load will eat up your resources and time, so just leverage the Snowflake Change Data Capture and use the MERGE command to update the destination.

However, hand-coding the CDC infrastructure can prove to be a tedious task without dedicated engineering resources. It is a lot more efficient to invest in an out-of-the-box tool like Hevo.

visit our website to explore hevo

Hevo Data with its strong integration with 100+ Sources allows you to not only export data from sources & load data to the destinations like Snowflake in real-time, but also transform & enrich your data, & make it analysis-ready so that you can focus only on your key business needs and perform insightful analysis using BI tools.

Give Hevo Data a try and sign up for a 14-day free trial today. Hevo offers plans & pricing for different use cases and business needs, check them out!

Share your experience of working around Snowflake Change Data Capture (CDC) in the comments section below.

No-code Data Pipeline For Snowflake CDC