Snowflake Features: 7 Comprehensive Aspects

on Engineering, Data Integration, Tutorial • January 18th, 2019 • Write for Hevo

Snowflake is a Cloud-Based Data Warehouse created in 2012 by three Data Warehousing professionals. Six years later, the firm was valued at $3.5 billion after receiving a $450 million venture capital investment. But what is Snowflake, and why is this Cloud-Based Data Warehouse making such a stir in the analytics community?

This article helps you uncover the unique features of the Snowflake Data Warehouse. You will also get a quick walk-through on the implementation of some of the important Snowflake features such as schemaless loading of JSON/XML, Time Travel, Cloning, Data Clustering, etc., provided by Snowflake. You will also look into the benefits of using Snowflake in comparison to other Data Warehouse Providers. Let’s get started.

Table of Contents

Introduction to Snowflake

Snowflake Logo
Image Source

Snowflake is a fully managed Cloud Data Warehouse that is offered as Software-as-a-Service (SaaS) or Database-as-a-Service (DaaS) to clients. Snowflake uses the ANSI SQL protocol, which supports both fully structured and semi-structured data formats such as JSON, Parquet, and XML. It is highly scalable in terms of user count and processing capacity, with pricing based on per-second resource use.

To know more about Snowflake, visit this link.

Snowflake Features and Capabilities

Snowflake takes care of all the database management and assures the best performance when tables are queried. All one needs to do is create tables, load the data, and query it. One need not create partitions, indexes like in RDBMS or run vacuum operations like in Redshift either.

The details of how Snowflake accomplishes this can be best understood by knowing the Snowflake Table structures and its principal concepts.

Overview of Snowflake Table Structures

Two principal concepts used by Snowflake to physically maintain Table Structures for best performance are:

  • Micro-Partitions
  • Data-Clustering

1) Micro-Partitions

Snowflake stores the data in a table using columnar fashion divided into a number of micro-partitions which are contiguous units of storage of size around 50 MB to 500 MB of uncompressed data. You can read more on micro-partitions here. Unlike traditional databases, partitions aren’t static and aren’t defined and maintained by users but are automatically managed by the Snowflake Data Warehouse in a dynamic fashion.

A table can consist of thousands or millions of micro-partitions in Snowflake depending on the table size.

snowflake table structures
Image Source: Self

Example:
Let’s consider a table consisting of Student data as shown below:
This data will be broken into multiple micro-partitions in Snowflake. Let’s assume 3 micro-partitions are created for this table, data will be partitioned as below in columnar fashion:

snowflake columnar architecture
Image Source: Self

As seen in the above tables, data will be partitioned into 3 parts and stored in a columnar fashion. Data is also sorted and stored at each micro-partition level.

2) Data-Clustering

Data-Clustering is similar to the concept of sort-key available in most MPP Databases. As data is loaded into Snowflake, it co-locates column data with the same values in the same micro-partition. This helps Snowflake to easily filter out data during data scan, as the entire micro-partition can be skipped if a value doesn’t exist in the range of that micro-partition.

Snowflake automatically does the job of clustering on the tables, and this natural clustering process of Snowflake is good enough for most cases and gives good performance even for big tables.

However, if a user wants to do manual clustering, there is a concept of the clustering key which can be defined on the table by the user and Snowflake uses this key to do the clustering on the table. This can be useful only for very large tables. Below is the architecture of Snowflake for your reference.

Image Source

Simplify Integrations Using Hevo’s No-code Data Pipeline

Hevo Data helps you directly transfer data from 100+ data sources (including 30+ free sources) to Business Intelligence tools, Data Warehouses, or a destination of your choice in a completely hassle-free & automated manner. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Hevo takes care of all your data preprocessing needs required to set up the integration and lets you focus on key business activities and draw a much powerful insight on how to generate more leads, retain customers, and take your business to new heights of profitability. It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. 

Get Started with Hevo for Free

Check out what makes Hevo amazing:

  • Real-Time Data Transfer: Hevo with its strong Integration with 100+ Sources (including 30+ Free Sources), allows you to transfer data quickly & efficiently. This ensures efficient utilization of bandwidth on both ends.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Tremendous Connector Availability: Hevo houses a large variety of connectors and lets you bring in data from numerous Marketing & SaaS applications, databases, etc. such as HubSpot, Marketo, MongoDB, Oracle, Salesforce, Redshift, etc. in an integrated and analysis-ready form.
  • Simplicity: Using Hevo is easy and intuitive, ensuring that your data is exported in just a few clicks. 
  • Completely Managed Platform: Hevo is fully managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

Few Other Important Snowflake Features

Snowflake has gained wide acceptance in the market. Snowflake features have set it apart from other Data Warehouses and thus gaining popularity. Some of the popular Snowflake features include:

1) Time Travel

Time Travel is one of the unique Snowflake features. Time Travel allows you to track the change of data over time. This Snowflake feature is available to all accounts and enabled by default to all, free of cost.  Moreover, this Snowflake feature allows you to access the historical data of a Table. One can find out how the table looked at any point in time within the last 90 days.

For standard accounts, a maximum of 1 day Time Travel retention period is given by default. In the Enterprise edition, this can be extended up to 90 days.

2) Cloning

Another important Snowflake feature is Cloning. We can use the clone feature to create an instant copy of any Snowflake object such as databases, schemas, tables, and other Snowflake objects in near real-time without much waiting time. This is possible due to Snowflake’s architecture of storing data as immutable in S3 and versioning the changes and storing the changes as Metadata.

Hence a clone of an object is actually a Metadata operation and doesn’t actually duplicate the storage data. So let’s say you want a copy of the entire Production database for some testing purpose, you can create it in seconds using the Clone feature. Thus, Cloning is one of the important Snowflake features.

3) Undrop

UNDROP is also one of the unique Snowflake features. Using this, a dropped object can be restored using the Undrop command in Snowflake, as long as that object is still not purged by the system.

Undrop restores the object to the state it was before dropping the object. Objects such as Databases, Schemas, or Tables can be undropped.

drop database testdb;
undrop database testdb;
drop table some_table;
undrop table some_table;

Undrop can be used to restore multiple times as well, let’s say a table has been dropped 3 times and re-created each time. Each dropped version can be undropped in reverse order of drop i.e. last dropped version is restored first. This works as long as the existing table with the same name is renamed before Undrop operation.

Undrop will fail if a table with the same name already exists in the system. Hence it is important to rename the existing table and then perform the Undrop operation. Thus, the UNDROP command is one of the Snowflake features that is of utmost importance.

4) Fail-Safe

Another important Snowflake feature is Fail-Safe. Fail-Safe ensures historical data is protected in event of disasters such as disk failures or any other hardware failures. Snowflake provides 7 days of Fail-Safe protection of data which can be recovered only by Snowflake in event of a disaster.

Fail-Safe 7-day duration starts after the Time Travel period is ended. Hence, a table with 90 days of Time Travel will have an overall of 97 days for recovery. However, Fail-Safe cannot be recovered by users or through DDLs. It is handled by Snowflake only during catastrophic disasters.

These were some of the Snowflake features that set it apart from other Data Warehouses.

Connectivity Support in Snowflake Data Warehouse

One of the important Snowflake features is Connectivity. Snowflake provides various Connectivity options including Native connectors (e.g. Python), JDBC/ODBC drivers, a Command-Line tool called “SnowSQL”,  Web Interface which helps to manage Snowflake as well as to query the data.

Snowflake account can be hosted on either Amazon AWS or Microsoft Azure cloud platform. Hence depending on the organizations existing Cloud service provided one among AWS or Azure can be chosen.

Pricing of Snowflake Data Warehouse

Snowflake features come with reasonable pricing. Below are some of the insights related to Snowflake pricing.

  • Snowflake Data Warehouse charges for the Storage and Compute separately.
  • Data stored in the Snowflake will be charged as per the average monthly usage per TB or can be paid upfront costs per TB to save storage costs.
  • Compute costs are separate and will be charged at per second usage depending on the size of the virtual warehouse chosen from X-Small to 4X-Large.
  • Snowflake bills compute costs as credits, depending on the size of the warehouse, per hour credit cost varies from 1 credit per hour for X-Small to 128 credits per hour for 4X-Large. Note, Snowflake charges per second.
  • Depending on the Cloud Platform i.e. AWS or Azure and the region costs can vary.

Read more on Snowflake’s pricing here.

Benefits of Creating Account on Snowflake

Before getting started with the Snowflake features, some of the benefits provided by Snowflake include:

  1. Snowflake provides $400 free credits to try and explore the warehouse.
  2. Once registered, one needs to set up the warehouse to use for loading the data.
  3. Loading data into the warehouse is one of the trickiest things in Snowflake.
    There are two ways to do this:

You can create databases/schemas/tables as required and load the data using the “COPY INTO” command or query a file directly on cloud storage like S3 using Select statement.

Alternatively, you can use an official Snowflake ETL Partner like Hevo (14-day free trial) to easily load data from a wide array of sources into Snowflake.

Use-Cases of Snowflake Data Warehouse

Snowflake features can be used to host a Data Warehouse in the cloud of any scale for analysis of various data from different sources. Snowflake feature supports loading from various formats including traditional compressed CSVs to semi-structured data such as JSON/XML from Cloud storage like S3 or Azure. If you need to load data from a variety of different sources, you could use a data integration platform like Hevo.

Common use cases for using Snowflake features include:

  1. Analytics on data available in Data Lake on Cloud Storage like S3.
  2. Rapid delivery of analytics to Business without worrying about managing the warehouse.
  3. Data Sharing: If an organization wants to share data outside then Snowflake can be used to quickly set up to share data without transferring or developing pipelines.
  4. Snowflake can be used to set up an Insights-as-a-Service platform.

Benefits of Snowflake Over Other Data Warehouses

In order to provide the greatest service to consumers, Snowflake data warehouse employ a unique and modern design. It scales well both upwards and outwards, and it outperforms competitors on a varied range of metrics. Snowflake is also available across data clouds and regions, making it accessible to almost any organization.

Snowflake is very adaptable in terms of scaling up and down. Snowflake is meant to allow several consumers to simultaneously examine huge volumes of data from the same data source. Snowflake makes it possible for users to run many queries and complicated queries at the same time.

Despite the fact that Snowflake competes with other big Data Warehouses like Google BigQuery and Amazon Redshift, it is also a partner with them and hosts its own services on their clouds. This implies that Snowflake can store data on Amazon, Google, and even Microsoft Azure clouds, but each of these cloud providers can only store data on their own cloud. This means that businesses considering switching to Snowflake don’t have to worry about which cloud providers they’re already using or if they’re compatible with their other data sources because Snowflake supports them all.

These were some of the Snowflake features or benefits that make it an ideal choice for the users

Conclusion

If you are considering setting up a data warehouse, Snowflake is a strong contender, you can check out our article to know about Snowflake Architecture and Snowflake best practices for high-performance ETL. There are many features that work out of the box on Snowflake in comparison to Redshift and Bigquery, the other popular choices for Data Warehousing.

Visit our Website to Explore Hevo

Businesses can use automated platforms like Hevo Data to set this integration and handle the ETL process. It helps you directly transfer data from a source of your choice to a Data Warehouse, Business Intelligence tools, or any other desired destination in a fully automated and secure manner without having to write any code and will provide you a hassle-free experience.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Do you want us to cover any other Snowflake features? Let us know in the comments.

No-code Data Pipeline for Snowflake