Snowflake is a data warehousing solution offered as a Software-as-a-Service (SaaS) offering. It decouples storage and computes, thereby allowing you to pay separately for the two. It provides you with the flexibility of choosing the region and also the resource provider (AWS, Azure, or Google Cloud). Like most other data warehouses, it stores data in columns instead of rows, enabling parallel query execution. Similarly, it uses SQL for querying data like most other warehouses.
This article will introduce you to Snowflake and cover in detail a very niche topic in Snowflake’s SQL Syntax, Common Table Expressions (CTEs). It will discuss the types, examples, and use cases of the Snowflake Common Table Expressions and will help you understand its implementation better.
Table of Contents
- Introduction to Snowflake
- Introduction to Snowflake Common Table Expressions
- Types of Snowflake Common Table Expressions
- Example of Snowflake Common Table Expression
- Use Cases of Snowflake Common Table Expression
- Familiarity with SQL is essential.
- A Snowflake account.
Introduction to Snowflake
Snowflake is a completely relational, SQL-primarily based Cloud Data Warehouse that gives a Database as a Service (DBaaS) platform to users. Snowflake can deliver your enterprise the power and agility required to satisfy the converting facts desires. It’s Cloud Storage lets you keep nearly limitless quantities of based and semi-based facts in a single area, as a result, consolidating the facts out of your numerous data sources. Furthermore, this Data Warehouse is easily scalable and permits you to satisfy your increasing data storage needs without extra overheads.
Key Features of Snowflake
The following features of Snowflake are responsible for its growing popularity in the global market:
- Scalability: Snowflake provides separate storage and computes capabilities. A database is used to store data and Snowflake performs the calculation with a Virtual Data Warehouse. Therefore, it can provide high scalability at a low cost.
- Low Maintenance: Snowflake’s design ensures that minimal user interaction and effort are required for any maintenance-related activity.
- Query Optimization: Snowflake has automatic query optimization to safeguard you from having to manually improve your queries.
- Load Balancer: Snowflake is the ideal choice if you wish to separate routine workloads into different virtual data warehouses so that analytical loads are not affected by a busy cluster during routine peak loads.
You can read more about Snowflake, here.
Introduction to Snowflake Common Table Expressions
A Snowflake Common Table Expression is essentially a subquery (named) defined inside a WITH Clause. To understand a Snowflake Common Table Expression, you need to first understand a Subquery. A Subquery is a SELECT statement within a query, often used to query a temporary view, instead of an existing table.
For example, in everyday SQL, you would do something like this to determine the top rankers in a class:
SELECT t1.roll_no, t1.perc from (SELECT roll_no, SUM(marks)*100.0/SUM(total_marks) as perc FROM student_marks GROUP BY roll_no) as t1 ORDER BY t1.perc DESC LIMIT 5
The query within the parentheses in the above statement is the subquery. Snowflake Common Table Expressions essentially are named subqueries, used with a WITH clause. The syntax for a CTE is:
WITH cte_name (cte_col1, cte_col2,…) as ( -- subquery starts SELECT col1, col2,….)
At this point, you may be wondering “What’s the big deal?”. Well, Snowflake Common Table Expressions become powerful with hierarchical data, because they allow recursion.
Simplify your Data Analysis with Hevo’s No-code Data Pipeline
Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ data sources and loads the data onto the desired Data Warehouse like Snowflake, enriches the data, and transforms it into an analysis-ready form without writing a single line of code.
Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different Business Intelligence (BI) tools as well.Get Started with Hevo for Free
Check out why Hevo is the Best:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Types of Snowflake Common Table Expressions
The Snowflake Common Table Expressions are of two types:
- Recursive: A recursive CTE, on the other hand, allows you to query itself. Confused? Let’s break it down.
- Non-recursive: A non-recursive CTE is like your familiar subquery, which returns a temporary view for your main query, and doesn’t reference itself.
Here’s the syntax of a Recursive CTE:
WITH RECURSIVE cte_name (cte_col1, cte_col2, …) AS ( UNION ALL ) SELECT …. From cte_name
Notice the structure of the subquery. You have an anchor clause and a recursive clause. The anchor clause is an independent query that CANNOT reference cte_name. It is generally used for the top of the hierarchy. The recursive clause CAN reference cte_name. What does that mean? Can the recursive clause query the entire view generated so far? No! It can only query the view generated in the previous iteration.
Thus, if the anchor clause contributed one row to the view, that one row forms the working table for the first iteration of the recursive clause. Let’s say this iteration gave four rows to the view. Thus, the view has a total of 5 rows. However, for the second iteration of the recursive clause, the working table consists ONLY of the four rows returned in the previous iteration of the recursive clause.
Example of Snowflake Common Table Expression
Still, confused about Snowflake Common Table Expressions? Let’s consider an example.
Let’s create a table named ‘places’, and populate it with some data.
create or replace table places (name varchar, code integer, division_code integer);
India has numerous states/ union territories. States are divided into districts. Districts are divided into towns, villages, or cities. Each of these divisions has a unique code which can be found here. We will consider a situation where we have the name of any division (it can be state, district, anything), its code, and the code of the division within which this particular division lies. Let’s populate the table with some data.
insert into places (name, code, division_code) values ('Chakan',555862,521), ('Lonavala',802810,521), ('Pune',521,27), ('Maharashtra',27,null), ('Karad',802870,527), ('Khandala',563201,527), ('Satara',527,27)
As you can see, we have some towns in Pune and Satara districts, which have the codes for Pune and Satara as their division codes respectively. Pune and Satara have Maharashtra’s code as their division code. Maharashtra is a state, has only no division above it (the state is the largest division in a country). Hence, its division code is NULL.
Now, let’s first try and map each place with the division it belongs to, using a CTE. A non-recursive CTE will also do.
WITH mapping_cte(name, division_name) as ( -- subquery begins SELECT places.name as name, divisions.name as division from places LEFT OUTER JOIN places as divisions on places.division_code = divisions.code -- subquery ends ) SELECT * from mapping_cte
The output is:
As you can see, it correctly maps each place with the division it belongs to. However, there is no sense of hierarchy in the result. You cannot determine how many levels of hierarchies there are (perhaps you can in this small data set, but imagine you have a thousand rows here, then it would become very difficult.).
Now let’s try visualizing this using recursive CTEs. Over here, we will do two things:
- Add an indent at each successive iteration. That will make the hierarchies visible.
- Create a new column containing the concatenated version of the current place’s code, and all the divisions before it. This will help sort the data.
Here’s what the Recursive CTE looks like:
SELECT indent || name, sort_key from mapping_cte ORDER BY sort_key WITH RECURSIVE mapping_cte(indent, name, code, sort_key) as (-- Anchor Clause SELECT '' as indent, places.name, places.code, TO_VARCHAR(places.code) as sort_key from places WHERE division_code IS NULL UNION ALL -- Recursive Clause SELECT mapping_cte.indent ||'………', places.name, places.code, mapping_cte.sort_key || TO_VARCHAR(places.code) from places INNER JOIN mapping_cte ON mapping_cte.code = places.division_code ) SELECT indent || name, sort_key from mapping_cte ORDER BY sort_key
Here’s what the output looks like:
As you can see, the hierarchy is very clearly visible now. You can, of course, hide the sort key from the final output. It is shown here for demonstration purposes. Now let’s unpack what all was going on in the above query.
The anchor clause executes standalone and does not refer to the CTE name (mapping_cte). It is generally supposed to output the top of the hierarchy, which it does here, because of the WHERE condition (division_code IS NULL). At the end of the execution of the anchor clause
The final CTE view has one row corresponding to Maharashtra
The working table for the next recursive query has one row corresponding to Maharashtra
The Recursive clause, in the first iteration, takes the working table and finds out all rows whose division_codes match the code column in the working table. At the end of this iteration:
- The final CTE view has 3 rows corresponding to Maharashtra, Pune, and Satara
- The working table for the next recursive query has the two rows added by the current query, namely Pune and Satara
In the next iteration, the Recursive clause finds all rows whose division codes match the code column in the working table (Pune and Satara). It will now get all the remaining 4 matches. At the end of this iteration:
- The final CTE view has 7 rows, corresponding to all the places
- The working table for the next recursive query has the 4 rows added by the current query, namely Chakan, Lonavala, Khandala, and Karad
In the next iteration again the Recursive query tries to find rows whose division codes match the code column in the working table. However, it doesn’t find any this time. Thus, the query exits and the CTE view is fully formed.
Now the main query executes and orders the rows by the sort key. Since the sort key is a VARCHAR field and contains the codes concatenated from the top of the hierarchy to the current place, the sorting, along with the indentation, gives the hierarchical output that you see above.
Use Cases of Snowflake Common Table Expression
Use cases of Snowflake Common Table Expressions are divided as follow:
- Use Case for Recursive Snowflake Common Table Expressions
- Use Case for Non-Recursive Snowflake Common Table Expressions
Use Case for Recursive Snowflake Common Table Expressions
Recursive Snowflake Common Table Expressions should be used with caution. They lead to infinite loops easily (although Snowflake stops the computation after a certain threshold of iterations). One scenario which can lead to an infinite loop is the presence of cycles in the data. In the above data, if we add Satara’s code as the division code for Maharashtra, we get a cycle. If the data is not supposed to contain a cycle, avoid it. Otherwise, avoid using recursive CTEs for cyclical data.
Use Case for Non-Recursive Snowflake Common Table Expressions
Non-recursive Snowflake Common Table Expressions have the same use cases as subqueries. They make the code more readable and structured. Recursive CTEs should be used primarily with hierarchical data, especially when the hierarchical nature of the data is important in the output. In the above table, if I had to find all the places which contain the letter ‘S’ in their name, then you would not require a recursive CTE, even though the data is hierarchical.
The article introduced you to Snowflake and explained its key features. Moreover, it introduced you to Common Table Expressions, discussed how to work with them on Snowflake, what their types are (recursive and non-recursive), and when to use them, and more importantly, when not to use them. You have also seen an example to aid the understanding.Visit our Website to Explore Hevo
Now, to run SQL queries or perform Data Analytics on Snowflake data, you first need to export this data from various sources to your Snowflake Data Warehouse. This will require you to custom code complex scripts to develop the ETL processes. Hevo Data can automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc. This platform allows you to transfer data from 100+ multiple sources to Cloud-based Data Warehouses like Snowflake, Amazon Redshift, Google BigQuery, etc. It will provide you with a hassle-free experience and make your work life much easier.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.
Share your understanding of Snowflake Common Table Expressions in the comments below!