Snowflake is a popular Cloud Data Warehousing solution that has been implemented by scores of well-known firms, including Fortune 500 companies, as their Data Warehouse provider and manager. However, the process of understanding Snowflake Pricing is not straightforward.
- This article describes the many aspects of Snowflake Pricing that one should be aware of before going ahead with the implementation.
- Specifically, the article delves into the different usage-related cost accruals at the data storage level as well as the computational resources level.
- Furthermore, it explains the different Pricing plans offered by Snowflake as well as the strategies around making the right call in terms of going about deciding on a particular plan.
What is Snowflake?
- Snowflake provides a scalable Cloud-based platform for enterprises and developers and supports advanced Data Analytics. There are multiple data stores available, but Snowflake’s architectural capabilities and data-sharing capabilities are unique.
- Snowflake’s architecture enables storage and computing to scale independently, so customers can use storage and computing separately and pay for it.
- The best property of Snowflake is that it provides separate storage and calculation options for data.
- Snowflake is designed to ensure that users do not require minimal effort or interaction to perform performance or maintenance-related activities.
- The minimum and maximum group size and scaling occur automatically in this area at a very high speed
Snowflake Costs With Cost Intelligence Factor
Snowflake pricing depends on how the following services are being utilized by the user:
Factor 1: Virtual Warehouses
These are a set of servers, together called a compute cluster, that can carry out operations like query execution and data loading. Snowflake offers the following set of computing clusters categorized by their sizes (the number of servers in the cluster):
Type | # of servers |
X-Small | 1 |
Small | 2 |
Medium | 4 |
Large | 8 |
X-Large | 16 |
2X-Large | 32 |
3X-Large | 64 |
4X-Large | 128 |
The usage activity for these servers is tracked and converted to what is known as Snowflake credits. Hence, for availing any of these warehouse-related services one has to purchase a bunch of credits that can then be used to keep the servers operational, as well as for utilizing the services described in the upcoming sections – data storage and cloud services. There are two different ways to purchase credits; this will be covered in a later section.
In terms of virtual warehouses, the cluster size is directly related to the usage credits. For example – The size 2 cluster requires 0.0006 credits per second (or 2 credits per hour) and the Size 32 cluster requires 0.0089 credits per second (or 32 credits per hour). Billing is done at the second level, hence a warehouse that was operational for 37 minutes and 12 seconds is only billed for those 37.12 minutes.
Another thing to be kept in mind is that Snowflake provides for the option of ‘suspending’ warehouses when they are not in operation. ‘Suspended’ warehouses are not billed or, in other words, they do not accrue usage credits.
The warehouse activity can be monitored in a couple of ways:
- Using the web interface: Account -> Billing & Usage
- Using the SQL table function: WAREHOUSE_METERING_HISTORY
Download the Cheatsheet on How to Set Up ETL to Snowflake
Learn the best practices and considerations for setting up high-performance ETL to Snowflake
Factor 2: Data Storage
Data is stored and managed in Snowflake under the following three cases :
- All the data is stored as internal stages which are used for data loading. This is typically part of the ETL process where data from an external source is first uploaded to Snowflake and stored as a stage which is later copied into a Snowflake table using bulk data loading.
- All the storage space is occupied by the Snowflake tables. A point to note here is that Snowflake automatically compresses all the table data so that the actual physical space occupied by these tables is less than their combined raw sizes.
- There is some data stored for historical fail-safe purposes.
Similar to usage monitoring for compute clusters, the data usage information for the account is available as either the web interface (same location) or table functions:
- DATABASE_STORAGE_USAGE_HISTORY
- STAGE_STORAGE_USAGE_HISTORY
In addition to data usage at the account level, admins can look into the data usage of specific tables via the following :
- Using the web interface :Databases -> select db_name -> Tables
- Using the SQL table function: TABLE_STORAGE_METRICS
Factor 3: Cloud Services
These are a set of administrative services to ensure the smooth handling and coordination of a bunch of Snowflake tasks. These tasks include:
- Authentication
- Infrastructure management
- Metadata management
- Query parsing and optimization
- Access control
Cloud services require a certain amount of computing and hence it consumes some credits for their operations. However, 10% of the actual compute (compute from the warehouse operations) is discounted from the compute credits used up by the cloud services at a daily level. So for instance, if the compute from the operational clusters = 100 credits and cloud services compute = 15, then the final compute of cloud services for that day = 15 – (10% of 100) = 5.
Cloud services are generally not monitored for optimizing usage as much as it is done with data storage and virtual Data Warehouses, however, Snowflake provides for a couple of methods to do the same :
1) Query History
To understand the specific queries (by their type) that are consuming cloud service credits, the following SQL can be used –
select
query_type,
sum(credits_used_cloud_services) as cloud_services_credits
from snowflake.account_usage.query_history
where
start_time >= '2020-01-01 00:00:01'
group by 1;
2) Warehouse History
To find out the virtual warehouses that use up cloud service credits, the following query can be used –
select warehouse_name, sum(credits_used_cloud_services) credits_used_cloud_services, sum(credits_used_compute) credits_used_compute, sum(credits_used) credits_used from snowflake.account_usage.warehouse_metering_history where start_time >= ‘2020-01-01 00:00:01’ group by 1;
Snowflake Pricing Purchase Plans
Now that you have an idea as to how the costs are incurred based on the credits accrued depending on the usage of the different Snowflake services, this section talks about the options for choosing a pricing plan:
- On-Demand
This is similar to the pay-as-you-go pricing plans of other cloud providers such as Amazon Web Services where you only pay for what you consume. At the end of the month, a bill is generated with the details of usage for that month. There is a $25 minimum for every month, and for data storage, the rates are typically set to $40 per TB.
- Pre-Purchased Capacity
With this option, a customer can purchase a set amount or capacity of Snowflake resources in advance. The major advantage of going with this plan is that the packaged pre-purchase rates will be available at a lower price than the corresponding On Demand option.
A popular way of going about the pricing strategy, especially when you are new and unsure about this, is to first opt for the On-demand, and then switch to Pre-purchased. Once the On-Demand cycle starts, monitor the resource usage for a month or two, and once you have a good idea for your monthly data warehousing requirements, switch to a pre-purchased plan to optimize the recurring monthly charges.
Optimizing Snowflake Pricing
As pointed out in the sections before, there are many things to be dealt with in terms of understanding the usage of different Snowflake resources and how that translates into costs.
Here are a few things to be kept in mind that will help optimize these incurred costs:
- Depending on your location, it is important to choose the cloud region (like US East, US West, etc. depending on the cloud provider) wisely, to minimize latency, to have access to the required set of features, etc. If you are to move your data to a different region, later on, there are data transfer costs associated with it at a per Terabyte scale. So the larger your data store, the more the costs you incur.
- It can make quite a difference to the costs incurred by optimally managing the operational status of your compute clusters. The features such as ‘auto suspension’ and ‘auto resume’ should be made use of unless there is a better strategy to address this.
- The workload/data usage monitoring at an account level, warehouse level, database, or table level is necessary to make sure there aren’t unnecessary query operations or data storage contributing to the overall monthly costs.
- Make sure to have the data compressed before storage as much as possible. There are instances, such as storing database tables, where Snowflake automatically does a data compression, however, this is not always the case, so this is something to be mindful of and to be monitored regularly.
- Snowflake works better with the date or timestamp columns stored as such rather than them being stored as type varchar.
- Try to make more use of transient tables as they are not maintained in the history tables which in turn reduces the data storage costs for history tables.
Conclusion
- The article introduced you to Snowflake and explained in detail the factors on which Snowflake Pricing depends.
- Moreover, it discussed the various Snowflake Pricing models and the ways in which you can optimize your cost-effectively.
- You can have a good working knowledge of Snowflake by understanding Snowflake Create Table.
Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.
What are your thoughts on Snowflake Pricing? Let us know in the comments.
Avinash specializes in writing within the data industry, delivering informative and engaging content on data analytics, machine learning, AI, big data, and business intelligence. With a deep understanding of these fields, he excels at translating complex concepts into accessible and insightful narratives.