To empower your business decisions with data, you need Real-Time High-Quality data from all your data sources at a central repository. Traditional On-Premise Data Warehousing Solutions lack in terms of Scalability, Performance and require continuous Maintainance. A more Cost-Effective & Instantly Scalable with Best-in-Class Query Performance alternative is Snowflake.
It is a one-stop Cloud Data Warehousing and Analytics Solution that provides full ANSI SQL Language support for Data Analysis and Transformations. Set Operators like Snowflake Union, Except/Minus & Intersect are crucial while running queries. Using these you can easily combine two query results seamlessly.
In this article, you will learn how to effectively use the Snowflake Union, Except/Minus & Intersect Set Operators with several examples.
What is Snowflake?
Snowflake is a unified Cloud Data platform that provides a complete 360 Degree Data Analytics Stack that includes Data Warehouses, Data Lakes, Data Science, Data Applications, Data Sharing, etc. Enabling the users to take advantage of the Muti-Cloud Deployment Strategy, Snowflake allows you to choose your cloud platform from Amazon Redshift, Google BigQuery, and Microsoft Azure Data Warehouse.
Its unique architecture allows you to scale both Storage and Computational resources independently. Snowflake charges you per second and you only pay for what you use at that instant. Leveraging the MPP(Massively Parallel Processing) Compute Clusters and Shared Disk Architecture you can run an unlimited number of Virtual Warehouses in parallel i.e. unlimited independent workloads against the same data without affecting each other’s performance.
Key Features of Snowflake
Since its inception in 2012, Snowflake has grown into a Powerful and Robust Cloud Data Warehousing & Analytics platform that offers the following eye-catching features:
- Completely Managed: Snowflake is a True Cloud SaaS platform that doesn’t require you to install or configure any software. Fully Automated and Administered by the Snowflake Team, you don’t have to worry about software or hardware failures, updates, and several other maintenance tasks.
- Scalability: Snowflake’s compute resources can scale during query execution without any downtime or a need to reallocate storage units. The Auto Scaling feature detects if any scaling is required and automatically scales up or down without any user or admin intervention.
- Semi-Structured Data Support: Data is generated in several formats from mobile devices and sensors. Snowflake allows you to work with larger and broader data sets by providing support for Semi-Structured Data via its own patented VARIANT data type. This allows you to load Semi-structured data into Snowflake and query JSON in a fully relational manner.
- Data Sharing: You can securely share your data across your firm and outside the company with external vendors and clients. Powered by the Robust Metadata Management, Snowflake allows you to grant access and share a complete database or a slice of it.
- Data Security: You can rest assured while using Snowflake as all your data is automatically protected with AES 256 bit encryption in transit as well as at rest. Meeting the SOC I, SOC 1 and 2 Type II, and PCI DSS requirements, Snowflake is a trusted data platform for governments, financial services, healthcare, etc.
Hevo is a no-code ETL tool designed for users who want to integrate their data seamlessly. It offers a range of industry-leading features such as:
- 150+ pre-built connectors for seamless migration
- Fault-tolerant architecture to ensure no data is ever lost
- Drag-and-drop transformation capabilities
- Round-the-clock support and extensive documentation
Hevo Data is also Snowflake Ready Technology validated, ensuring seamless compatibility with Snowflake.
Get Started with Hevo for Free
How to use the Snowflake Union & other Set Operators?
One of the important tasks while analysing data is to combine queries. You can perform this by using Snowflake Set Operators.
There are 3 types of Snowflake Set Operators available:
A) Snowflake Interset
The Snowflake Intersect Operator returns rows from one query’s result set that is also present in another query’s result set and removes the duplicates if any.
- Snowflake Interest Syntax
SELECT ...
INTERSECT
SELECT ...
Let’s consider the following 2 sample tables for reference:
TABLE_1
+----+----+----+
| C1 | C2 | C3 |
|----+----+----|
| 1 | 5 | D |
| 2 | 6 | C |
| 3 | 7 | B |
| 4 | 8 | A |
+------+------+------+
TABLE_2
+----+----+----+
| C1 | C2 | C3 |
|----+----+----|
| 2 | 6 | C |
| 3 | 7 | B |
+----+----+----+
- Snowflake Intersect Example
SELECT C1, C2, C3 FROM TABLE_1
INTERSECT
SELECT C1, C2, C3 FROM TABLE_2;
Output:
+----+----+----+
| C1 | C2 | C3 |
|----+----+----|
| 2 | 6 | C |
| 3 | 7 | B |
+----+----+----+
B) Snowflake Minus/Except
Snowflake Minus or Except Set operators can be used interchangeably to remove rows from one query’s result that is present in another query’s result set and remove the duplicates if any.
- Snowflake Minus/Except Syntax
SELECT ...
MINUS
SELECT ...
SELECT ...
EXCEPT
SELECT ...
- Snowflake Minus/Except Example
SELECT C1, C2, C3 FROM TABLE_1
EXCEPT
SELECT C1, C2, C3 FROM TABLE_2;
Output:
+----+----+----+
| C1 | C2 | C3 |
|----+----+----|
| 1 | 5 | D |
| 4 | 8 | A |
+----+----+----+
C) Snowflake Union and Snowflake Union[All]
The Snowflake Union operator has the following two versions:
- Snowflake Union: Combines the result set queries and removes duplicates if any.
- Snowflake Union ALL: Combines the result set queries without removing duplicates if any
The default set operator for combining query set results is Snowflake Union(i.e with duplicate elimination).
- Union in Snowflake & Union ALL Syntax
SELECT ...
UNION
SELECT ...
SELECT ...
UNION ALL
SELECT ...
SELECT C1, C2, C3 FROM TABLE_1
UNION
SELECT C1, C2, C3 FROM TABLE_2;
Output:
+----+----+----+
| C1 | C2 | C3 |
|----+----+----|
| 1 | 5 | D |
| 2 | 6 | C |
| 3 | 7 | B |
| 4 | 8 | A |
+----+----+----+
- Snowflake Union ALL Example
SELECT C1, C2, C3 FROM TABLE_1
UNION ALL
SELECT C1, C2, C3 FROM TABLE_2;
Output:
+----+----+----+
| C1 | C2 | C3 |
|----+----+----|
| 1 | 5 | D |
| 2 | 6 | C |
| 3 | 7 | B |
| 4 | 8 | A |
| 2 | 6 | C |
| 3 | 7 | B |
+----+----+----+
In this Snowflake Union ALL example, you can observe that the duplicates have not been removed. For more information, you can visit the Snowflake Set Operators Documentation.
Automate Your Snowflake Data Pipelines:
Integrate Oracle to Snowflake
Integrate PostgreSQL to Snowflake
Integrate MongoDB to Snowflake
Integrate Salesforce to Snowflake
Snowflake Union & other Set Operators Usage Guidelines
While using the Snowflake Union & other Set Operators, it is a good practice to follow the usage guidelines given below:
- Ensure to select the same number of columns in the queries.
- Before selecting the columns, ensure that the data type of each column is consistent across the rows from different sources.
Example:
select VC from table1 -- VARCHAR
union
select IR from table2 -- INTEGER
The above Snowflake Union example will raise an error as it tries to combine the query results with two different data types i.e VARCHAR and INTEGER. You can correct this Snowflake Union example by using the Snowflake CAST or :: operator for converting the inputs into the same data type:
select VC::varchar from table1 —- VARCHAR
union
select IR::varchar from table2; —- VARCHAR
- Ensure that the meaning of the columns remains intact across both the queries. Also, the names of the output columns are based on the first query columns.
Example:
select MobileNo, LandlineNo from Home
union all
select LandlineNo, MobileNo from Office;
Here, the order of columns is changed in the second query. Thus, the meaning of the columns becomes different. The output will display as if the query were:
select MobileNo, LandlineNo from Home
union all
select LandlineNo as MobileNo, MobileNo as LandlineNo from Office;
- Your queries can have query operators to allow any number of query expressions to be combined with set operators.
- Clauses such as ORDER BY and LIMIT / FETCH are applied to your query result of the set operator.
Conclusion
In this article, you have learned how to effectively use the Snowflake Union, Intersect, and Minus/Except Set Operators. Snowflake’s Standard and Extended SQL support allows Data Analysts to easily perform queries. The Set Operators such as Snowflake Union play an important role in analysing your data by combining query results.
As you make strategic decisions based on your data analysis, your business will start growing gradually. Data starts generating at an exponential rate across all of your company’s SaaS applications as your business grows.
To efficiently process this astronomical amount of data for harnessing insights into your business performance, you would require to invest a portion of your engineering bandwidth to Integrate Data from all sources, Clean & Transform it, and finally Load it to a Cloud Data Warehouse such as Snowflake for further business analytics.
Frequently Asked Questions
What is union in Snowflake?
The ‘UNION’ in Snowflake is one of the set operators used to retrieve data from two or more ‘SELECT’ statements into a single result set. Each ‘SELECT’ statement within a ‘UNION’ must return the same number of columns of similar data types.
Does union in Snowflake remove duplicates?
Yes, ‘UNION’ in Snowflake removes duplicate records by default. If you want to include duplicates, use ‘UNION ALL.’
What is a Snowflake join?
The Snowflake join combines records from two or more tables in a database based on a standard column between the two. Types of join in Snowflake include :
1. INNER JOIN
2. LEFT JOIN, or LEFT OUTER JOIN
3. RIGHT JOIN or RIGHT OUTER JOIN
4. FULL JOIN or FULL OUTER JOIN
What is := in Snowflake?
It is used for variable assignment in SQL scripting; := assigns a value to a variable in Snowflake.
What does union () do?
‘UNION’ returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( other ), excluding any duplicate rows.
Sanchit Agarwal is an Engineer turned Data Analyst with a passion for data, software architecture and AI. He leverages his diverse technical background and 2+ years of experience to write content. He has penned over 200 articles on data integration and infrastructures, driven by a desire to empower data practitioners with practical solutions for their everyday challenges.