In this article, we are going to work through some examples of using the Snowflake ROW_NUMBER() analytic function to solve some business questions. We will see how to use this function to work with Snowflake row numbers. By the end, you will be able to understand the concepts behind the ROW_NUMBER() function.
Introduction to Snowflake
Snowflake is a fully managed service that provides customers with near-infinite scalability of concurrent workloads to easily integrate, load, analyze, and securely share their data. Its common applications include data lakes, data engineering, data application development, data science, and secure consumption of shared data.
Snowflake uses AWS, Azure, or GCP to host its services and provides an intuitive user interface that allows customers to perform analytics. It uses SQL to query the data that runs on its virtual machines.
Snowflake’s unique architecture natively integrates computing and storage. This architecture enables you to virtually enable your users and data workloads to access a single copy of your data without any detrimental effect on performance. Snowflake makes it possible by abstracting the complexity of underlying Cloud infrastructures.
Here are a few features of Snowflake as a Software as a Service (SaaS) offering:
- Snowflake is a Cloud Data Warehouse operating on a SaaS (software-as-a-service) model.
- Snowflake’s infrastructure is built on AWS (Amazon Web Services) and it offers an architecture that is based on a new SQL database engine that enables faster, more dynamic queries with performant data integration.
- Snowflake’s analytics layer integrates with modern tools and services that facilitate and optimize Data Science exploration.
Key Features of Snowflake
Here are a few key features of Snowflake:
- Scalable: Snowflake provides a secure, scalable architecture that can spin up an unlimited number of independent virtual warehouses in a few minutes. You can execute several parallel jobs without worrying about performance and memory management.
- Pay Per Use Model: Snowflake is available as pay per usage; i.e., you only have to pay for the time you use the services offered by Snowflake. You can shut down the warehouses once the execution gets completed, to save costs.
- High Processing Speed: Each Virtual Warehouse in Snowflake is associated with the MPP cluster (Massive Parallel Processing) that performs the parallel execution of a job without degrading other clusters’ performance.
- Separate Storage and Compute Layer: Snowflake uses different storage and computes layers that can scale up or down without affecting the other.
- Disaster Recovery: In Snowflake, data is replicated three times (by default) across the availability zones and regions, hence providing a complete fail-safe and fault-tolerant system.
To learn more about Snowflake, visit here.
Introduction to Snowflake ROW_NUMBER() Function
The ROW_NUMBER() is an analytic function that generates a non-persistent sequence of temporary values which are calculated dynamically when the query is executed. The ROW_NUMBER() function assigns a unique incrementing number for each row within a partition of a result set. The row number starts at 1 and continues up sequentially, to the end of the table.
Snowflake Row Number Syntax and Arguments
row_number() over (
[ partition by <expression1> [, <expression2> ... ] ]
order by <expression3> [ , <expression4> ... ] [ { asc | desc } ]
)
Snowflake Row Number Syntax: OVER
- The OVER clause defines the window or set of rows that the ROW_NUMBER() function operates on.
- The possible components of the OVER clause are ORDER BY (required), and PARTITION BY (optional).
Snowflake Row Number Syntax: PARTITION BY
- The PARTITION BY clause divides the rows into partitions (groups of rows) to which the function is applied.
- The PARTITION BY clause is optional. When you omit it, the ROW_NUMBER() function will treat the whole result set as a single partition/group.
Snowflake Row Number Syntax: ORDER BY
- The ORDER BY clause defines the sequential order of the rows within each partition of the result set.
- The ORDER BY clause is required, you must include it because the ROW_NUMBER() function is order sensitive.
Snowflake Row Number Syntax: Expression1 and Expression2
- expression1 and expression2 specify the column(s) or expression(s) to partition by. You can partition by one or more columns or expressions.
For example, suppose that you are selecting data across multiple states (or provinces) and you want row numbers from 1 to N within each state; in that case, you can partition by the state.
Snowflake Row Number Syntax: expression3 and expression4
- expression3 and expression4 are a component of the Snowflake Row Number specify the column(s) or expression(s) to use to determine the order of the rows. You can order by 1 or more expressions.
Snowflake ROW_NUMBER() Examples
In this section, we’ll take a look at some practical examples of using the Snowflake row number function to answer critical business questions. We’ll use sample data from the TPC-H dataset. The TPC Benchmark™ H (TPC-H) specification has broad industry-wide relevance.
The following statement uses the row number function to return the row number, customer name, and account balance from the customer table. The row number values are assigned based on the order of account balance.
use schema snowflake_sample_data.tpch_sf1; -- or snowflake_sample_data.{tpch_sf10 | tpch_sf100 | tpch_sf1000}
select name, c_acctbal, row_number()
over (order by c_acctbal desc)
from customer;
Here is the partial output:
Next, let’s look at how to assign row numbers within partitions. For example, we can get the most expensive line item partitioned by the supplier key using the following query:
with cte_lineitem as (
select
row_number() over(
partition by ps_suppkey
order by p_retailprice desc
) row_num,
ps_suppkey,
p_name,
p_retailprice
from
p_part
)
select * from cte_lineitem
where row_num = 1;
Here is the partial output:
The following statement returns the emails of employees in ascending order based on the date of hire. It shows the row number, email and hire date from the employee table (the employee who was hired first will have row number 1).
select email, hire_date, row_number()
over (order by hire_date asc)
from employees
Using Snowflake ROW_NUMBER() Function to Get the Top-N Query
Let’s switch to a simple inventories table with the following schema.
INVENTORIES |
*PRODUCT_ID *WAREHOUSE_ID PRODUCT_NAME |
To get the most abundant product in a warehouse, you can use the ROW_NUMBER() function of Snowflake Row Number as shown in the following query:
with list_products as (
select
row_number() over(
partition by warehouse_id
order by quantity desc
) row_num,
warehouse_id,
product_name,
quantity
from
inventories
)
select * from list_products
where row_num = 1;
Here is the output:
In this example:
- First, the PARTITION BY clause divided the rows into partitions by warehouse id.
- Then, the ORDER BY clause sorted the products in each warehouse by quantity in descending order.
- Next, the ROW_NUMBER() function is applied to each row in a specific warehouse id. It re-initialized the row number for each category.
- After that, the outer query selected the rows with row number 1 which is the most expensive product in each warehouse.
To get more than one product with the same N-highest prices with Snowflake Row Number, you can use the RANK() or DENSE_RANK() function.
Conclusion
So far, we have reviewed the Snowflake ROW_NUMBER function in detail. You covered a few examples of how to use the Snowflake ROW_NUMBER() function to make useful queries such as inner-N, top-N, and bottom-N.
You can make more of your insights by centralizing your various data sources into Snowflake faster with Hevo Data. Hevo supports real-time streaming from 150+ data sources and allows you to easily replicate these sources into your Snowflake data warehouse with just a few clicks. Our data ingestion tool stands out in terms of ease of use, the fact that it is zero maintenance, and that we have a knowledgeable and highly responsive customer support team that supports multiple time zones.
If you are interested in trying out Hevo for free, Sign Up for a 14-day free trial here.
Check out the short video to get a product overview.
FAQ
How to get the row number in Snowflake?
To get the row number in Snowflake, you can use the ROW_NUMBER()
window function. For example:
SELECT *, ROW_NUMBER() OVER (ORDER BY column_name) AS row_num
FROM table_name;
What is the difference between RANK and ROW_NUMBER in Snowflake?
The ROW_NUMBER()
function assigns a unique sequential integer to each row, while RANK()
assigns a rank to each row within a partition of a result set, with the same rank for ties. If there are ties, RANK()
will skip ranks, whereas ROW_NUMBER()
will not.
Does ROW_NUMBER start with 0 or 1?
The ROW_NUMBER()
function in Snowflake starts with 1. The first row will have a row number of 1, the second row ill have a row number of 2, and so on.
Talha is a Software Developer with over eight years of experience in the field. He is currently driving advancements in data integration at Hevo Data, where he has been instrumental in shaping a cutting-edge data integration platform for the past four years. Prior to this, he spent 4 years at Flipkart, where he played a key role in projects related to their data integration capabilities. Talha loves to explain complex information related to data engineering to his peers through writing. He has written many blogs related to data integration, data management aspects, and key challenges data practitioners face.