Snowflake Architecture & Its Fundamental Concepts

This article helps focuses on an in-depth understanding of Snowflake architecture, how it stores and manages data, and its micro-partitioning concepts. By the end of this blog, you will also be able to understand how Snowflake architecture is different from the rest of the cloud-based Massively Parallel Processing Databases.

Table of Contents

What is a Data Warehouse?

Businesses today are overflowing with data. The amount of data produced every day is truly staggering. With Data Explosion, it has become seemingly difficult to capture, process, and store big or complex datasets. Hence, it becomes a necessity for organizations to have a Central Repository where all the data is stored securely and can be further analyzed to make informed decisions. This is where Data Warehouses come into the picture.

A Data Warehouse also referred to as “Single Source of Truth”, is a Central Repository of information that supports Data Analytics and Business Intelligence (BI) activities. Data Warehouses store large amounts of data from multiple sources in a single place and are intended to execute queries and perform analysis for optimizing their business. Its analytical capabilities allow organizations to derive valuable business insights from their data to improve decision-making.

What is the Snowflake Data Warehouse?

Snowflake Architecture: Snowflake Data Warehouse — Image Source

Snowflake is a cloud-based Data Warehouse solution provided as a Saas (Software-as-a-Service) with full support for ANSI SQL. It also has a unique architecture that enables users to just create tables and start querying data with very less administration or DBA activities needed.

Download the Cheatsheet on How to Set Up ETL to Snowflake

Learn the best practices and considerations for setting up high-performance ETL to Snowflake

Features of Snowflake Data Warehouse

Let’s discuss some major features of Snowflake data warehouse:

Security and Data Protection: Snowflake data warehouse offers enhanced authentication by providing Multi-Factor Authentication (MFA), federal authentication and Single Sign-on (SSO) and OAuth. All the communication between the client and server is protected y TLS.
Standard and Extended SQL Support: Snowflake data warehouse supports most DDL and DML commands of SQL. It also supports advanced DML, transactions, lateral views, stored procedures, etc.
Connectivity: Snowflake data warehouse supports an extensive set of client connectors and drivers such as Python connector, Spark connector, Node.js driver, .NET driver, etc.
Data Sharing: You can securely share your data with other Snowflake accounts.

Read more about the features of Snowflake data warehouse here. Let’s learn about Snowflake architecture in detail.

Hevo is a No-code Data Pipeline. It supports pre-built data integration from 100+ data sources at a reasonable price. It can automate your entire data migration process in minutes. It offers a set of features and supports compatibility with several databases and data warehouses.

Let’s see some unbeatable features of Hevo:

Simple: Hevo has a simple and intuitive user interface.
Fault-Tolerant: Hevo offers a fault-tolerant architecture. It can automatically detect anomalies and notifies you instantly. If there is any affected record, then it is set aside for correction.
Real-Time: Hevo has a real-time streaming structure, which ensures that your data is always ready for analysis.
Schema Mapping: Hevo will automatically detect schema from your incoming data and maps it to your destination schema.

Types of Data Warehouse Architecture

There are mainly 3 ways of developing a Data Warehouse:

Single-tier Architecture: This type of architecture aims to deduplicate data in order to minimize the amount of stored data.
Two-tier Architecture: This type of architecture aims to separate physical Data Sources from the Data Warehouse. This makes the Data Warehouse incapable of expanding and supporting multiple end-users.
Three-tier Architecture: This type of architecture has 3 tiers in it. The bottom tier consists of the Database of the Data Warehouse Servers, the middle tier is an Online Analytical Processing (OLAP) Server used to provide an abstracted view of the Database, and finally, the top tier is a Front-end Client Layer consisting of the tools and APIs used for extracting data.

Components of Data Warehouse Architecture

The 4 components of a Data Warehouse are as follows.

1. Data Warehouse Database

A Database forms an essential component of a Data Warehouse. A Database stores and provides access to company data. Amazon Redshift and Azure SQL come under Cloud-based Database services.

2. Extraction, Transformation, and Loading Tools (ETL)

All the operations associated with the Extraction, Transformation, and Loading (ETL) of data into the warehouse come under this component. Traditional ETL tools are used to extract data from multiple sources, transform it into a digestible format, and finally load it into a Data Warehouse.

3. Metadata

Metadata provides a framework and descriptions of data, enabling the construction, storage, handling, and use of the data.

4. Data Warehouse Access Tools

Access Tools allow users to access actionable and business-ready information from a Data Warehouse. These Warehouse Tools include Data Reporting tools, Data Querying Tools, Application Development tools, Data Mining tools, and OLAP tools.

Integrate MySQL to Snowflake

Get a Demo Try it

Integrate MongoDB to Snowflake

Get a Demo Try it

Snowflake Architecture

Snowflake architecture comprises a hybrid of traditional shared-disk and shared-nothing architectures to offer the best of both. Let us walk through these architectures and see how Snowflake combines them into new hybrid architecture.

Overview of Shared-Disk Architecture
Overview of Shared-Nothing Architecture
Snowflake Architecture – A Hybrid Model

Overview of Shared-Disk Architecture

Used in traditional databases, shared-disk architecture has one storage layer accessible by all cluster nodes. Multiple cluster nodes having CPU and Memory with no disk storage for themselves communicate with central storage layer to get the data and process it.

Snowflake Architecture: Shared-disk Architecture

Overview of Shared-Nothing Architecture

Contrary to Shared-Disk architecture, Shared-Nothing architecture has distributed cluster nodes along with disk storage, their own CPU, and Memory. The advantage here is that the data can be partitioned and stored across these cluster nodes as each cluster node has its own disk storage.

Snowflake Architecture: Shared-Nothing Architecture

Snowflake Architecture – A Hybrid Model

Snowflake supports a high-level architecture as depicted in the diagram below. Snowflake has 3 different layers:

Storage Layer
Compute Layer
Cloud Services Layer

Snowflake Architecture: Hybrid Architecture

1. Storage Layer

Snowflake stores data in a highly optimized, compressed columnar format on cloud storage platforms such as AWS S3, Azure Blob Storage, or Google Cloud Storage. Key Features of Snowflake Storage Layer include:

Optimized Storage Format: Snowflake stores data in a highly optimized, compressed columnar format.
Cloud Storage Integration: Data is stored on platforms like AWS S3, Azure Blob Storage, or Google Cloud Storage.
Micro-Partitioning: Data is organized into small, column-structured units called “micro-partitions” for efficient storage and retrieval.
Storage and Compute Separation: Snowflake separates the storage and compute layers, allowing each to scale independently.
SQL-Based Access: Users interact with data solely through SQL queries without managing the underlying storage infrastructure.

A Snowflake-native app to monitor Fivetran costs.

No external tools or data movement. Get insights, alerts, and forecasts within your Snowflake account.

Get the ETL Cost Monitor

A Snowflake-native app to monitor Fivetran costs.

2. Compute Layer

Snowflake uses “Virtual Warehouse” (explained below) for running queries. Snowflake separates the query processing layer from the disk storage. Queries execute in this layer using the data from the storage layer.

Virtual Warehouses are MPP compute clusters consisting of multiple nodes with CPU and Memory provisioned on the cloud by Snowflake. Multiple Virtual Warehouses can be created in Snowflake for various requirements depending upon workloads. Each virtual warehouse can work with one storage layer. Generally, a virtual Warehouse has its own independent compute cluster and doesn’t interact with other virtual warehouses.

Advantages of Virtual Warehouse

Some of the advantages of virtual warehouse are listed below:

Virtual Warehouses can be started or stopped at any time and also can be scaled at any time without impacting queries that are running.
They also can be set to auto-suspend or auto-resume so that warehouses are suspended after a specific period of inactive time and then when a query is submitted are resumed.
They can also be set to auto-scale with minimum and maximum cluster size, so for e.g. we can set minimum 1 and maximum 3 so that depending on the load Snowflake can provision between 1 to 3 multi-cluster warehouses.

3. Cloud Services Layer

All the activities such as authentication, security, metadata management of the loaded data and query optimizer that coordinate across Snowflake happens in this layer.

Examples of services handled in this layer:

When a login request is placed it has to go through this layer,
Query submitted to Snowflake will be sent to the optimizer in this layer and then forwarded to Compute Layer for query processing.
Metadata required to optimize a query or to filter a data are stored in this layer.

These three layers scale independently and Snowflake charges for storage and virtual warehouse separately. Services layer is handled within compute nodes provisioned, and hence not charged.

The advantage of this Snowflake architecture is that we can scale any one layer independently of others. For e.g. you can scale storage layer elastically and will be charged for storage separately. Multiple virtual warehouses can be provisioned and scaled when additional resources are required for faster query processing and to optimize performance. Know more about Snowflake architecture from here.

Connecting to Snowflake

Now that you’re familiar with Snowflake’s architecture, it’s now time to discuss how you can connect to Snowflake. Let’s take a look at some of the best third-party tools and technologies that form the extended ecosystem for connecting to Snowflake.

Snowflake Ecosystem — This list will take you through Snowflake’s partners, clients, third-party tools, and emerging technologies in the Snowflake ecosystem.
- Third-party partners and technologies are certified to provide native connectivity to Snowflake.
- Data Integration or ETL tools are known to provide native connectivity to Snowflake.
- Business intelligence (BI) tools simplify analyzing, discovering, and reporting on business data to help organizations make informed business decisions.
- Machine Learning & Data Science cover a broad category of vendors, tools, and technologies that extend Snowflake’s functionality to provide advanced capabilities for statistical and predictive modeling.
- Security & Governance tools ensure that your data is stored and maintained securely.
- Snowflake also provides native SQL Development and Data Querying interfaces.
- Snowflake supports developing applications using many popular programming languages and development platforms.
Snowflake Partner Connect — This list will take you through Snowflake partners who offer free trials for connecting to Snowflake.
General Configuration (All Clients) — This is a set of general configuration instructions that is applicable to all Snowflake-provided Clients (CLI, connectors, and drivers).
SnowSQL (CLI Client) — SnowSQL is a next-generation command-line utility for connecting to Snowflake. It allows you to execute SQL queries and perform all DDL and DML operations.
Connectors & Drivers – Snowflake provides drivers and connectors for Python, JDBC, Spark, ODBC, and other clients for application development. You can go through each of them listed below to start learning and using them.

You can always connect to Snowflake via the above-mentioned tools/technologies.

Conclusion

Ever since 2014, Snowflake has been simplifying how organizations store and interact with their data. In this blog, you have learned about Snowflake’s data warehouse, Snowflake architecture, and how it stores and manages data. You learned about various layers of the hybrid model in Snowflake architecture. Check out more articles about the Snowflake data warehouse to know about vital Snowflake data warehouse features and Snowflake best practices for ETL. You can have a good working knowledge of Snowflake by understanding Snowflake Create Table.

Hevo, an official Snowflake ETL Partner, can help bring your data from various sources to Snowflake in real-time. You can reach out to us or take up a free trial if you need help in setting up your Snowflake Architecture or connecting your data sources to Snowflake.

FAQ

What is the architecture of Snowflake?

Snowflake’s architecture is a cloud-based data platform that utilizes a multi-cluster, shared data model. It separates compute, storage, and services layers, allowing for independent scaling of resources.

Why use Snowflake instead of AWS?

Snowflake provides a specialized data warehousing solution with seamless integration for diverse data workloads, unlike AWS, which offers a broader range of cloud services.

What are the advantages of Snowflake architecture?

Separation of Compute and Storage
Automatic Scaling
Multi-Cloud Capability
Concurrency Handling

Suresh H Freelance Technical Content Writer, Hevo Data

Suresh is enthusiastic about writing on data science and uses his problem-solving approach for helping data teams stay ahead of their competition.

Snowflake Architecture & Concepts: A Comprehensive Guide

What is a Data Warehouse?

What is the Snowflake Data Warehouse?

Features of Snowflake Data Warehouse

Types of Data Warehouse Architecture

Components of Data Warehouse Architecture