Snowflake Pricing: A Detailed Guide for 2020

on Data Warehouse • March 31st, 2020 • Write for Hevo

Snowflake is a popular cloud data warehousing solution that has been implemented by scores of well-known firms, including Fortune 500 companies, as their data warehouse provider and manager. You can read more about the unique features of Snowflake here.

However, the process of understanding Snowflake pricing is not straightforward. 

We at Hevo Data (Hevo is an official Snowflake Data Pipeline Partner helps companies load data from any data source to Snowflake in real-time without writing any code), often come across customers who do not find it easy to decode Snowflake pricing. 

This article describes the many aspects of Snowflake pricing that one should be aware of before going ahead with the implementation. Specifically, the article delves into the different usage related cost accruals at the data storage level as well as the computational resources level. Furthermore, it explains the different pricing plans offered by Snowflake as well as the strategies around making the right call in terms of going about deciding on a particular plan.

Snowflake Pricing – Usage

The pricing philosophy adopted by Snowflake is not very different from that of some other popular cloud data warehouse services such as Amazon Web Services, in that, the emphasis is on providing maximum flexibility for the user, in terms of usability of the different services provided. To elaborate on this, Snowflake pricing depends on how the following three services are being utilized by the user – 

  • Virtual Warehouses (compute)
  • Data Storage
  • Cloud Services

Virtual Warehouses

These are a set of servers, together called a compute cluster, that can carry out operations like query execution and data loading. Depending on the size of your data and the number of users tasked with the data warehousing and data management operations, Snowflake offers the following set of compute clusters categorized by their sizes (the number of servers in the cluster):

Type # of servers
X-Small 1
Small 2
Medium 4
Large 8
X-Large 16
2X-Large 32
3X-Large 64
4X-Large 128

The usage activity for these servers are tracked and converted to what is known as Snowflake credits. Hence, for availing any of these warehouse related services one has to purchase a bunch of credits that can then be used to keep the servers operational, as well as for utilizing the services described in the upcoming sections – data storage and cloud services. There are two different ways to purchase credits; this will be covered in a later section.

In terms of virtual warehouses, the cluster size is directly related to the usage credits. For example – Size 2 cluster requires 0.0006 credit per second (or 2 credits per hour) and Size 32 cluster requires 0.0089 credit per second (or 32 credits per hour). Billing is done at the second level, hence a warehouse that was operational for 37 minutes and 12 seconds is only billed for those 37.12 minutes.

Another thing to be kept in mind is that Snowflake provides for the option of ‘suspending’ warehouses when they are not in operation. ‘Suspended’ warehouses are not billed or, in other words, they do not accrue usage credits.

The warehouse activity can be monitored through a couple of ways – 

  • Using the web interface – Account -> Billing & Usage
  • Using the SQL table function – WAREHOUSE_METERING_HISTORY

Data Storage

Data is stored and managed in Snowflake under the following three cases – 

  • All the data stored as internal stages which is used for data loading. This is typically part of the ETL process where data from an external source is first uploaded to Snowflake and stored as a stage which is later copied into a Snowflake table using bulk data loading.
  • All the storage space occupied by the Snowflake tables. A point to note here is that Snowflake automatically compresses all the table data so that the actual physical space occupied by these tables is less than their combined raw sizes.
  • There is some data stored for historical fail-safe purposes.

Similar to usage monitoring for compute clusters, the data usage information for the account is available as either the web interface (same location) or table functions:

  • DATABASE_STORAGE_USAGE_HISTORY
  • STAGE_STORAGE_USAGE_HISTORY

In addition to data usage at the account level, admins can look into the data usage of specific tables via the following – 

  • Using the web interface – Databases -> select db_name -> Tables
  • Using the SQL table function – TABLE_STORAGE_METRICS

Cloud Services

These are a set of administrative services to ensure the smooth handling and coordination of a bunch of Snowflake tasks. These tasks include – 

  • Authentication
  • Infrastructure management
  • Metadata management
  • Query parsing and optimization
  • Access control

Cloud services require a certain amount of compute and hence it consumes some credits for its operations. However, 10% of the actual compute (compute from the warehouse operations) is discounted from the compute credits used up by the cloud services at a daily level. So for instance, if the compute from the operational clusters = 100 credits and cloud services compute = 15, then the final compute of cloud services for that day = 15 – (10% of 100) = 5. 

Cloud services are generally not monitored for optimizing usage as much as it is done with data storage and virtual data warehouses, however, Snowflake provides for a couple of methods to do the same – 

Query History

To understand the specific queries (by their type) that are consuming cloud service credits, the following SQL can be used – 

select 

  query_type, 

  sum(credits_used_cloud_services) as cloud_services_credits

from snowflake.account_usage.query_history

where 

  start_time >= '2020-01-01 00:00:01'

group by 1;

Warehouse History 

To find out the virtual warehouses that use up cloud service credits, the following query can be used – 

select 

  warehouse_name,

  sum(credits_used_cloud_services) credits_used_cloud_services,

  sum(credits_used_compute) credits_used_compute,

  sum(credits_used) credits_used

from snowflake.account_usage.warehouse_metering_history

where 

  start_time >= ‘2020-01-01 00:00:01’

group by 1;

Snowflake Pricing – Purchase Plans

Now that you have an idea as to how the costs are incurred based on the credits accrued depending on the usage of the different Snowflake services, this section talks about the options for choosing a pricing plan – 

  • On-Demand
    This is similar to the pay-as-you-go pricing plans of other cloud providers such as Amazon Web Services where you only pay for what you consume. At the end of the month, a bill is generated with the details of usage for that month. There is a $25 minimum for every month, and for data storage, the rates are typically set to $40 per TB.
  • Pre-Purchased CapacityWith this option, a customer can purchase a set amount or capacity of Snowflake resources in advance. The major advantage of going with this plan is that the packaged pre-purchase rates will be available at a lower price than the corresponding On Demand option.

A popular way of going about the pricing strategy, especially when you are new and unsure about this, is to first opt for the On-Demand, and then switch to Pre-Purchased. Once the On-Demand cycle starts, monitor the resource usage for a month or two, and once you have a good idea for your monthly data warehousing requirements, switch to a pre-purchased plan to optimize the recurring monthly charges. 

Optimize/Reduce Snowflake Costs Incurred

As pointed out in the sections before, there are many things to be dealt with in terms of understanding the usage of different Snowflake resources and how that translates into costs.

Here are a few things to be kept in mind that will help optimize these incurred costs:

  • Depending on your location, it is important to choose the cloud region (like US East, US West, etc. depending on the cloud provider) wisely, to minimize latency, to have access to the required set of features, etc. If you are to move your data to a different region later on, there are data transfer costs associated with it at a per Terabyte scale. So the larger your data store, the more the costs you incur.
  • It can make quite a difference to the costs incurred by optimally managing the operational status of your compute clusters. The features such as ‘auto suspension’ and ‘auto resume’ should be made use of unless there is a better strategy to address this.
  • The workload/data usage monitoring at an account level, warehouse level, database or table level is necessary to make sure there aren’t unnecessary query operations or data storage contributing to the overall monthly costs.
  • Make sure to have the data compressed before storage as much as possible. There are instances, such as storing database tables, where Snowflake automatically does a data compression, however this is not always the case, so this is something to be mindful of and to be monitored regularly..
  • Snowflake works better with date or timestamp columns stored as such rather than them being stored as type varchar.
  • Try to make more use of transient tables as they are not maintained in the history tables which in turn reduces the data storage costs for history tables.

What are your thoughts on Snowflake pricing? Let us know in the comments.

No-code Data Pipeline for Snowflake