Snowflake is a data warehousing solution offered as a Software-as-a-Service (SaaS) offering. It decouples storage and computes, thereby allowing you to pay separately for the two. It provides you with the flexibility of choosing the region and also the resource provider (AWS, Azure, or Google Cloud).

Like most other data warehouses, it stores data in columns instead of rows, enabling parallel query execution. Similarly, it uses SQL for querying data like most other warehouses. This article will introduce you to Snowflake and cover in detail a very niche topic in Snowflake’s SQL Syntax, Common Table Expressions (CTEs).

It will discuss the types, examples, and use cases of the Snowflake Common Table Expressions and will help you understand its implementation better.

Prerequisites

  • Familiarity with SQL is essential.
  • A Snowflake account.

Introduction to Snowflake

Snowflake Common Table Expression: Snowflake Logo

Snowflake is a completely relational, SQL-primarily based Cloud Data Warehouse that gives a Database as a Service (DBaaS) platform to users. Snowflake can deliver your enterprise the power and agility required to satisfy the converting facts desires.

It’s Cloud Storage lets you keep nearly limitless quantities of based and semi-based facts in a single area, as a result, consolidating the facts out of your numerous data sources. Furthermore, this Data Warehouse is easily scalable and permits you to satisfy your increasing data storage needs without extra overheads.

Key Features of Snowflake

The following features of Snowflake are responsible for its growing popularity in the global market:

  • Scalability: Snowflake provides separate storage and computes capabilities. A database is used to store data and Snowflake performs the calculation with a Virtual Data Warehouse. Therefore, it can provide high scalability at a low cost.
  • Low Maintenance: Snowflake’s design ensures that minimal user interaction and effort are required for any maintenance-related activity.
  • Query Optimization: Snowflake has automatic query optimization to safeguard you from having to manually improve your queries.
  • Load Balancer: Snowflake is the ideal choice if you wish to separate routine workloads into different virtual data warehouses so that analytical loads are not affected by a busy cluster during routine peak loads.

Introduction to Snowflake Common Table Expressions

A Snowflake Common Table Expression is essentially a subquery (named) defined inside a WITH Clause. To understand the Expression, you need to first understand a Subquery. A Subquery is a SELECT statement within a query, often used to query a temporary view, instead of an existing table.

For example, in everyday SQL, you would do something like this to determine the top rankers in a class:

SELECT t1.roll_no, t1.perc from
(SELECT roll_no, SUM(marks)*100.0/SUM(total_marks) as perc
FROM student_marks
GROUP BY roll_no) as t1
ORDER BY t1.perc DESC
LIMIT 5

The query within the parentheses in the above statement is the subquery. Snowflake Common Table Expressions essentially are named subqueries, used with a WITH clause. The syntax for a CTE is:

WITH cte_name (cte_col1, cte_col2,…) as ( -- subquery starts
SELECT col1, col2,….)

At this point, you may be wondering “What’s the big deal?”. Well, Snowflake Common Table Expressions become powerful with hierarchical data, because they allow recursion.

Types of Snowflake Common Table Expressions

The Snowflake Common Table Expressions are of two types:

  • Recursive: A recursive CTE, on the other hand, allows you to query itself. Confused? Let’s break it down.
  • Non-recursive: A non-recursive CTE is like your familiar subquery, which returns a temporary view for your main query, and doesn’t reference itself.

Here’s the syntax of a Recursive CTE:

WITH RECURSIVE cte_name (cte_col1, cte_col2, …) AS (
UNION ALL
)
SELECT …. From cte_name

Notice the structure of the subquery. You have an anchor clause and a recursive clause. The anchor clause is an independent query that CANNOT reference cte_name. It is generally used for the top of the hierarchy. The recursive clause CAN reference cte_name. What does that mean? Can the recursive clause query the entire view generated so far? No! It can only query the view generated in the previous iteration.

Thus, if the anchor clause contributed one row to the view, that one row forms the working table for the first iteration of the recursive clause. Let’s say this iteration gave four rows to the view. Thus, the view has a total of 5 rows. However, for the second iteration of the recursive clause, the working table consists ONLY of the four rows returned in the previous iteration of the recursive clause.

Example of Snowflake Common Table Expression

Still, confused about it? Let’s consider an example.

Let’s create a table named ‘places’, and populate it with some data.

create or replace table places (name varchar, code integer, division_code integer);

India has numerous states/ union territories. States are divided into districts. Districts are divided into towns, villages, or cities. Each of these divisions has a unique code which can be found here. We will consider a situation where we have the name of any division (it can be state, district, anything), its code, and the code of the division within which this particular division lies. Let’s populate the table with some data.

insert into places (name, code, division_code) values
('Chakan',555862,521),
('Lonavala',802810,521),
('Pune',521,27),
('Maharashtra',27,null),
('Karad',802870,527),
('Khandala',563201,527),
('Satara',527,27)

As you can see, we have some towns in Pune and Satara districts, which have the codes for Pune and Satara as their division codes respectively. Pune and Satara have Maharashtra’s code as their division code. Maharashtra is a state, has only no division above it (the state is the largest division in a country). Hence, its division code is NULL.

Now, let’s first try and map each place with the division it belongs to, using a CTE. A non-recursive CTE will also do.

WITH mapping_cte(name, division_name) as (
-- subquery begins
SELECT places.name as name, divisions.name as division
from places
LEFT OUTER JOIN places as divisions on places.division_code = divisions.code
-- subquery ends
)
SELECT * from mapping_cte

The output is:

Snowflake Common Table Expression: CTE Output
Self Snowflake Common Table Expression: CTE Output

As you can see, it correctly maps each place with the division it belongs to. However, there is no sense of hierarchy in the result. You cannot determine how many levels of hierarchies there are (perhaps you can in this small data set, but imagine you have a thousand rows here, then it would become very difficult.).

Now let’s try visualizing this using recursive CTEs. Over here, we will do two things:

  • Add an indent at each successive iteration. That will make the hierarchies visible.
  • Create a new column containing the concatenated version of the current place’s code, and all the divisions before it. This will help sort the data.

Here’s what the Recursive CTE looks like:

SELECT indent || name, sort_key from mapping_cte
ORDER BY sort_key
WITH RECURSIVE mapping_cte(indent, name, code, sort_key) as (-- Anchor Clause
SELECT '' as indent, places.name, places.code, TO_VARCHAR(places.code) as sort_key
from places
WHERE division_code IS NULL
UNION ALL
-- Recursive Clause
SELECT mapping_cte.indent ||'………', places.name, places.code, mapping_cte.sort_key || TO_VARCHAR(places.code)
from places
INNER JOIN mapping_cte ON mapping_cte.code = places.division_code
)
SELECT indent || name, sort_key from mapping_cte
ORDER BY sort_key

Here’s what the output looks like:

Snowflake Common Table Expression: CTE Output
Snowflake Common Table Expression: CTE Output

As you can see, the hierarchy is very clearly visible now. You can, of course, hide the sort key from the final output. It is shown here for demonstration purposes. Now let’s unpack what all was going on in the above query.

The anchor clause executes standalone and does not refer to the CTE name (mapping_cte). It is generally supposed to output the top of the hierarchy, which it does here, because of the WHERE condition (division_code IS NULL). At the end of the execution of the anchor clause
The final CTE view has one row corresponding to Maharashtra
The working table for the next recursive query has one row corresponding to Maharashtra

The Recursive clause, in the first iteration, takes the working table and finds out all rows whose division_codes match the code column in the working table. At the end of this iteration:

  • The final CTE view has 3 rows corresponding to Maharashtra, Pune, and Satara
  • The working table for the next recursive query has the two rows added by the current query, namely Pune and Satara

In the next iteration, the Recursive clause finds all rows whose division codes match the code column in the working table (Pune and Satara). It will now get all the remaining 4 matches. At the end of this iteration:

  • The final CTE view has 7 rows, corresponding to all the places
  • The working table for the next recursive query has the 4 rows added by the current query, namely Chakan, Lonavala, Khandala, and Karad

In the next iteration again the Recursive query tries to find rows whose division codes match the code column in the working table. However, it doesn’t find any this time. Thus, the query exits and the CTE view is fully formed.

Now the main query executes and orders the rows by the sort key. Since the sort key is a VARCHAR field and contains the codes concatenated from the top of the hierarchy to the current place, the sorting, along with the indentation, gives the hierarchical output that you see above.

Simplify your Data Analysis with Hevo’s No-code Data Pipeline

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Get Started with Hevo for Free

Use Cases of Snowflake Common Table Expression

Use Case for Recursive

Recursive Snowflake Common Table Expressions should be used with caution. They lead to infinite loops easily (although Snowflake stops the computation after a certain threshold of iterations). One scenario which can lead to an infinite loop is the presence of cycles in the data. In the above data, if we add Satara’s code as the division code for Maharashtra, we get a cycle. If the data is not supposed to contain a cycle, avoid it. Otherwise, avoid using recursive CTEs for cyclical data.

Use Case for Non-Recursive

Non-recursive Snowflake Common Table Expressions have the same use cases as subqueries. They make the code more readable and structured. Recursive CTEs should be used primarily with hierarchical data, especially when the hierarchical nature of the data is important in the output. In the above table, if I had to find all the places which contain the letter ‘S’ in their name, then you would not require a recursive CTE, even though the data is hierarchical.

Conclusion

The article introduced you to Snowflake and explained its key features. Moreover, it introduced you to Common Table Expressions, discussed how to work with them on Snowflake, what their types are (recursive and non-recursive), and when to use them, and more importantly, when not to use them. You have also seen an example to aid the understanding.

Now, to run SQL queries or perform Data Analytics on Snowflake data, you first need to export this data from various sources to your Snowflake Data Warehouse. This will require you to custom code complex scripts to develop the ETL processes.

Hevo Data can automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc. It will provide you with a hassle-free experience and make your work life much easier.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your understanding of Snowflake Common Table Expressions in the comments below!

Veeresh Biradar
Senior Customer Experience Engineer

Veeresh is a skilled professional specializing in JDBC, REST API, Linux, and Shell Scripting. With a knack for resolving complex issues and implementing Python transformations, he plays a crucial role in enhancing Hevo's data integration solutions.

No Code Data Pipeline For Your Snowflake Data Warehouse