Snowflake is a Cloud-Based Data Warehouse created in 2012 by three Data Warehousing professionals. Six years later, the firm was valued at $3.5 billion after receiving a $450 million venture capital investment. But what is Snowflake, and why is this Cloud-Based Data Warehouse making such a stir in the analytics community?

This article helps you uncover the unique features of the Snowflake Data Warehouse. You will also get a quick walk-through on the implementation of some of the important Snowflake features such as schemaless loading of JSON/XML, Time Travel, Cloning, Data Clustering, etc., provided by Snowflake. You will also look into the benefits of using Snowflake in comparison to other Data Warehouse Providers. Let’s get started.

What is Snowflake?

Snowflake is a fully managed Cloud Data Warehouse that is offered as Software-as-a-Service (SaaS) or Database-as-a-Service (DaaS) to clients. Snowflake uses the ANSI SQL protocol, which supports both fully structured and semi-structured data formats such as JSON, Parquet, and XML. It is highly scalable in terms of user count and processing capacity, with pricing based on per-second resource use.

Snowflake Features and Capabilities

Snowflake takes care of all the database management and assures the best performance when tables are queried. All one needs to do is create tables, load the data, and query it. One need not create partitions, and indexes like in RDBMS or run vacuum operations like in Redshift either.

The details of how Snowflake accomplishes this can be best understood by knowing the Snowflake Table structures and its principal concepts.

Overview of Snowflake Table Structures

Two principal concepts used by Snowflake to physically maintain Table Structures for best performance are:

  • Micro-Partitions
  • Data-Clustering

1) Micro-Partitions

Snowflake stores the data in a table using a columnar fashion divided into a number of micro-partitions which are contiguous units of storage of size around 50 MB to 500 MB of uncompressed data. Unlike traditional databases, partitions aren’t static and aren’t defined and maintained by users but are automatically managed by the Snowflake Data Warehouse in a dynamic fashion.

A table can consist of thousands or millions of micro-partitions in Snowflake depending on the table size.

Snowflake Features: snowflake table structures | Hevo Data

Example:
Let’s consider a table consisting of Student data as shown below:
This data will be broken into multiple micro-partitions in Snowflake. Let’s assume 3 micro-partitions are created for this table, data will be partitioned as below in a columnar fashion:

Snowflake Features: snowflake columnar architecture | Hevo Data

As seen in the above tables, data will be partitioned into 3 parts and stored in a columnar fashion. Data is also sorted and stored at each micro-partition level.

2) Data-Clustering

Data-Clustering is similar to the concept of sort-key available in most MPP Databases. As data is loaded into Snowflake, it co-locates column data with the same values in the same micro-partition. This helps Snowflake to easily filter out data during data scan, as the entire micro-partition can be skipped if a value doesn’t exist in the range of that micro-partition.

Snowflake automatically does the job of clustering on the tables, and this natural clustering process of Snowflake is good enough for most cases and gives good performance even for big tables.

However, if a user wants to do manual clustering, there is a concept of the clustering key which can be defined on the table by the user and Snowflake uses this key to do the clustering on the table. This can be useful only for very large tables. Below is the architecture of Snowflake for your reference.

Simplify Integrations Using Hevo’s No-code Data Pipeline

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Get Started with Hevo for Free

Few Other Important Snowflake Features

Snowflake has gained wide acceptance in the market. Snowflake features have set it apart from other Data Warehouses and thus gained popularity. Some of the popular Snowflake features include:

1) Time Travel

Time Travel is one of the unique Snowflake features. Time Travel allows you to track the change of data over time. This Snowflake feature is available to all accounts and enabled by default to all, free of cost.  Moreover, this Snowflake feature allows you to access the historical data of a Table. One can find out how the table looked at any point in time within the last 90 days.

For standard accounts, a maximum of 1 day Time Travel retention period is given by default. In the Enterprise edition, this can be extended up to 90 days.

2) Cloning

Another important Snowflake feature is Cloning. We can use the clone feature to create an instant copy of any Snowflake object such as databases, schemas, tables, and other Snowflake objects in near real-time without much waiting time. This is possible due to Snowflake’s architecture of storing data as immutable in S3 and versioning the changes and storing the changes as Metadata.

Hence a clone of an object is actually a Metadata operation and doesn’t actually duplicate the storage data. So let’s say you want a copy of the entire Production database for some testing purpose, you can create it in seconds using the Clone feature. Thus, Cloning is one of the important Snowflake features.

3) Undrop

UNDROP is also one of the unique Snowflake features. Using this, a dropped object can be restored using the Undrop command in Snowflake, as long as that object is still not purged by the system.

Undrop restores the object to the state it was before dropping the object. Objects such as Databases, Schemas, or Tables can be undropped.

drop database testdb;
undrop database testdb;
drop table some_table;
undrop table some_table;

Undrop can be used to restore multiple times as well, let’s say a table has been dropped 3 times and re-created each time. Each dropped version can be undropped in reverse order of drop i.e. last dropped version is restored first. This works as long as the existing table with the same name is renamed before Undrop operation.

Undrop will fail if a table with the same name already exists in the system. Hence it is important to rename the existing table and then perform the Undrop operation. Thus, the UNDROP command is one of the Snowflake features that is of utmost importance.

4) Fail-Safe

Another important Snowflake feature is Fail-Safe. Fail-Safe ensures historical data is protected in event of disasters such as disk failures or any other hardware failures. Snowflake provides 7 days of Fail-Safe protection of data which can be recovered only by Snowflake in event of a disaster.

Fail-Safe 7-day duration starts after the Time Travel period is ended. Hence, a table with 90 days of Time Travel will have an overall of 97 days for recovery. However, Fail-Safe cannot be recovered by users or through DDLs. It is handled by Snowflake only during catastrophic disasters.

5)Support for Semi-structured Data

Snowflake’s ability to combine structured and semi-structured data without the use of complex technologies like Hadoop or Hive is one of the most significant steps toward big data. Data can come from a variety of places, including machine-generated data, sensors, and mobile devices. Snowflake supports semi-structured data ingestion in a variety of formats, including JSON, Avro, ORC, Parquet, and XML, using the VARIANT data type, which has a size limit of 16MB. Snowflake also optimizes the data by extracting as much as possible in columnar format and storing the remainder in a single column. Data functions that can parse, extract, cast, and manipulate data can also be used to flatten nested structures in semi-structured data.

6) Continuous Data Pipelines

Many of the manual steps involved in loading data into Snowflake tables and then transforming the data for further analysis are automated by continuous data pipelines. Snowflake has a number of features that allow for continuous data ingestion, change data tracking, and the creation of recurring tasks in order to create continuous data pipeline workflows.

Snowpipe for continuous data ingestion

Snowpipe is Snowflake’s data ingestion service that runs in the background. Snowpipe is a pipeline that allows you to load new data in micro-batches as soon as it becomes available in an external stage like AWS S3, allowing you to access it in minutes rather than having to manually load larger batches using COPY statements.
Snowpipe works by relying on external stages to notify it when new files become available for ingestion. These files are then copied into a queue and loaded into Snowflake from there. Snowpipe’s REST endpoint can also be used to make calls. This is helpful for applications that need to call Snowpipe with a list of data filenames to be ingested.

Change data capture using Streams

It’s helpful to know which data has changed when copying new data from staging tables to other tables so that only the changed data is copied. Snowflake table streams are a useful feature for doing exactly that: capturing metadata about DML changes made to a table, as well as the state of a row before and after the change so that actions can be taken with the changed data. Change data capture is another name for this.

Schedules SQL statements using Tasks

Tasks are a feature that allows you to run SQL statements on a schedule that you set when you create the task. Single SQL statements or stored procedures can be executed by tasks. They’re great for things like generating report tables on a regular basis, or as part of data pipelines that capture recently changed table rows from table streams and transform them for use in other tables.

7) Caching Results

To help speed up your queries and reduce costs, the Snowflake architecture includes caching at various levels. When a query is run, for example, Snowflake keeps the results of the query for 24 hours. So, if the same query is run again by the same user or another account user, the results are already available to be returned, assuming the underlying data hasn’t changed. This is especially useful for analysis work, as it eliminates the need to rerun complex queries to access previous data or compare the results of complex queries before and after a change.

8) Sharing data between accounts

Snowflake’s Secure Data Sharing feature allows you to share objects (such as tables) from a database in your account with another Snowflake account without having to duplicate the data. As a result, the shared data does not require additional storage space and does not add to the data consumer’s storage costs. Because data sharing is done through Snowflake’s metadata store, the setup is simple, and data consumers have immediate access to the information.

Snowflake’s architecture allows for the creation of a network of data providers and data consumers that can be used for a variety of purposes. One of them is the Snowflake data marketplace, which connects providers with consumers who want to share free or paid data. Consumers can access shared data directly in their accounts, which they can query and combine with other data sources as needed. Another Snowflake use case is data exchange, which allows users to collaborate on data with invited members. This simplifies use cases such as data sharing between business customers, suppliers, and partners.

Snowflake allows providers to create reader accounts for data consumers who don’t have Snowflake accounts. Reader accounts are a cost-effective way for consumers to access shared data without becoming Snowflake customers.

These were some of the Snowflake features that set it apart from other Data Warehouses.

Connectivity Support in Snowflake Data Warehouse

One of the important Snowflake features is Connectivity. Snowflake provides various Connectivity options including Native connectors (e.g. Python), JDBC/ODBC drivers, a Command-Line tool called “SnowSQL”,  Web Interface which helps to manage Snowflake as well as to query the data.

Snowflake accounts can be hosted on either Amazon AWS or Microsoft Azure cloud platforms. Hence depending on the organizations existing Cloud service provided one among AWS or Azure can be chosen.

What is the Pricing of Snowflake Data Warehouse?

Snowflake features come with reasonable pricing. Below are some of the insights related to Snowflake pricing.

  • Snowflake Data Warehouse charges for Storage and Computing separately.
  • Data stored in the Snowflake will be charged as per the average monthly usage per TB or can be paid upfront costs per TB to save storage costs.
  • Compute costs are separate and will be charged at per second usage depending on the size of the virtual warehouse chosen from X-Small to 4X-Large.
  • Snowflake bills compute costs as credits, depending on the size of the warehouse, per hour credit cost varies from 1 credit per hour for X-Small to 128 credits per hour for 4X-Large. Note, Snowflake charges per second.
  • Depending on the Cloud Platform i.e. AWS or Azure and the region costs can vary.

What are the Benefits of Creating an Account on Snowflake?

Before getting started with the Snowflake features, some of the benefits provided by Snowflake include:

  1. Snowflake provides $400 free credits to try and explore the warehouse.
  2. Once registered, one needs to set up the warehouse to use for loading the data.
  3. Loading data into the warehouse is one of the trickiest things in Snowflake.
    There are two ways to do this:

You can create databases/schemas/tables as required and load the data using the “COPY INTO” command or query a file directly on cloud storage like S3 using the Select statement.

Alternatively, you can use an official Snowflake ETL Partner like Hevo (14-day free trial) to easily load data from a wide array of sources into Snowflake.

What are the Use-Cases of Snowflake Data Warehouse?

Snowflake features can be used to host a Data Warehouse in the cloud of any scale for analysis of various data from different sources. Snowflake feature supports loading from various formats including traditional compressed CSVs to semi-structured data such as JSON/XML from Cloud storage like S3 or Azure. If you need to load data from a variety of different sources, you could use a data integration platform like Hevo.

Common use cases for using Snowflake features include:

  1. Analytics on data available in Data Lake on Cloud Storage like S3.
  2. Rapid delivery of analytics to Business without worrying about managing the warehouse.
  3. Data Sharing: If an organization wants to share data outside then Snowflake can be used to quickly set up to share data without transferring or developing pipelines.
  4. Snowflake can be used to set up an Insights-as-a-Service platform.

What are the Benefits of Snowflake Over Other Data Warehouses?

In order to provide the greatest service to consumers, Snowflake data warehouse employ a unique and modern design. It scales well both upwards and outwards, and it outperforms competitors on a varied range of metrics. Snowflake is also available across data clouds and regions, making it accessible to almost any organization.

Snowflake is very adaptable in terms of scaling up and down. Snowflake is meant to allow several consumers to simultaneously examine huge volumes of data from the same data source. Snowflake makes it possible for users to run many queries and complicated queries at the same time.

Despite the fact that Snowflake competes with other big Data Warehouses like Google BigQuery and Amazon Redshift, it is also a partner with them and hosts its own services on their clouds. This implies that Snowflake can store data on Amazon, Google, and even Microsoft Azure clouds, but each of these cloud providers can only store data on their own cloud. This means that businesses considering switching to Snowflake don’t have to worry about which cloud providers they’re already using or if they’re compatible with their other data sources because Snowflake supports them all.

These were some of the Snowflake features or benefits that make it an ideal choice for the users

Conclusion

If you are considering setting up a data warehouse, Snowflake is a strong contender, you can check out our article to know about Snowflake Architecture and Snowflake’s best practices for high-performance ETL. There are many features that work out of the box on Snowflake in comparison to Redshift and Bigquery, the other popular choices for Data Warehousing.

Visit our Website to Explore Hevo

Businesses can use automated platforms like Hevo Data to set this integration and handle the ETL process. It helps you directly transfer data from a source of your choice to a Data Warehouse, Business Intelligence tools, or any other desired destination in a fully automated and secure manner without having to write any code and will provide you a hassle-free experience.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable Hevo Pricing that will help you choose the right plan for your business needs.

Do you want us to cover any other Snowflake features? Let us know in the comments.

mm
Principal Frontend Engineer, Hevo Data

With over a decade of experience, Suraj has played a crucial role in architecting and developing core frontend modules for Hevo. His expertise lies in building scalable UI solutions, collaborating across teams, and contributing to the open-source community, showcasing a deep commitment to innovation in the tech industry.

All your customer data in one place.