This article helps focuses on an in-depth understanding of Snowflake architecture, how it stores and manages data, and its micro-partitioning concepts. By the end of this blog, you will also be able to understand how Snowflake architecture is different from the rest of the cloud-based Massively Parallel Processing Databases.
Here’s an outline of this blog:
- What is the Snowflake Data Warehouse?
- Features of Snowflake Data Warehouse
- Snowflake Architecture
What is the Snowflake Data Warehouse?
Snowflake is a cloud-based Data Warehouse solution provided as a Saas (Software-as-a-Service) with full support for ANSI SQL. It also has a unique architecture that enables users to just create tables and start querying data with very less administration or DBA activities needed. Know about Snowflake pricing here.
Features of Snowflake Data Warehouse
Let’s discuss some major features of Snowflake data warehouse:
- Security and Data Protection: Snowflake data warehouse offers enhanced authentication by providing Multi-Factor Authentication (MFA), federal authentication and Single Sign-on (SSO) and OAuth. All the communication between the client and server is protected y TLS.
- Standard and Extended SQL Support: Snowflake data warehouse supports most DDL and DML commands of SQL. It also supports advanced DML, transactions, lateral views, stored procedures, etc.
- Connectivity: Snowflake data warehouse supports an extensive set of client connectors and drivers such as Python connector, Spark connector, Node.js driver, .NET driver, etc.
- Data Sharing: You can securely share your data with other Snowflake accounts.
Read more about the features of Snowflake data warehouse here. Let’s learn about Snowflake architecture in detail.
Hevo Data: A Convenient Method to Explore your Snowflake Data
Hevo is a No-code Data Pipeline. It supports pre-built data integration from 100+ data sources at a reasonable price. It can automate your entire data migration process in minutes. It offers a set of features and supports compatibility with several databases and data warehouses.
Let’s see some unbeatable features of Hevo:
- Simple: Hevo has a simple and intuitive user interface.
- Fault-Tolerant: Hevo offers a fault-tolerant architecture. It can automatically detect anomalies and notifies you instantly. If there is any affected record, then it is set aside for correction.
- Real-Time: Hevo has a real-time streaming structure, which ensures that your data is always ready for analysis.
- Schema Mapping: Hevo will automatically detect schema from your incoming data and maps it to your destination schema.
- Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
- Live Support: Hevo team is available round the clock to extend exceptional support to you through chat, email, and support call.
All these features make Hevo a must-have option for everyone. Give Hevo a try by signing up for a 14-day free trial today.
Snowflake architecture comprises of a hybrid of traditional shared-disk and shared-nothing architectures to offer the best of both. Let us walk through these architectures and see how Snowflake combines them into new hybrid architecture.
- Overview of Shared-Disk Architecture
- Overview of Shared-Nothing Architecture
- Snowflake Architecture – A Hybrid Model
Overview of Shared-Disk Architecture
Used in traditional databases, shared-disk architecture has one storage layer accessible by all cluster nodes. Multiple cluster nodes having CPU and Memory with no disk storage for themselves communicate with central storage layer to get the data and process it.
Overview of Shared-Nothing Architecture
Contrary to Shared-Disk architecture, Shared-Nothing architecture has distributed cluster nodes along with disk storage, their own CPU, and Memory. The advantage here is that the data can be partitioned and stored across these cluster nodes as each cluster node has its own disk storage.
Snowflake Architecture – A Hybrid Model
Snowflake supports a high-level architecture as depicted in the diagram below. Snowflake has 3 different layers:
1. Storage Layer
Snowflake organizes the data into multiple micro partitions that are internally optimized and compressed. It uses a columnar format to store. Data is stored in the cloud storage and works as a shared-disk model thereby providing simplicity in data management. This makes sure users do not have to worry about data distribution across multiple nodes in the shared-nothing model.
Compute nodes connect with storage layer to fetch the data for query processing. As the storage layer is independent, we only pay for the average monthly storage used. Since Snowflake is provisioned on the Cloud, storage is elastic and is charged as per the usage per TB every month.
2. Compute Layer
Snowflake uses “Virtual Warehouse” (explained below) for running queries. Snowflake separates the query processing layer from the disk storage. Queries execute in this layer using the data from the storage layer.
Virtual Warehouses are MPP compute clusters consisting of multiple nodes with CPU and Memory provisioned on the cloud by Snowflake. Multiple Virtual Warehouses can be created in Snowflake for various requirements depending upon workloads. Each virtual warehouse can work with one storage layer. Generally, a virtual Warehouse has its own independent compute cluster and doesn’t interact with other virtual warehouses.
Advantages of Virtual Warehouse
Some of the advantages of virtual warehouse are listed below:
- Virtual Warehouses can be started or stopped at any time and also can be scaled at any time without impacting queries that are running.
- They also can be set to auto-suspend or auto-resume so that warehouses are suspended after a specific period of inactive time and then when a query is submitted are resumed.
- They can also be set to auto-scale with minimum and maximum cluster size, so for e.g. we can set minimum 1 and maximum 3 so that depending on the load Snowflake can provision between 1 to 3 multi-cluster warehouses.
3. Cloud Services Layer
All the activities such as authentication, security, metadata management of the loaded data and query optimizer that coordinate across Snowflake happens in this layer.
Examples of services handled in this layer:
- When a login request is placed it has to go through this layer,
- Query submitted to Snowflake will be sent to the optimizer in this layer and then forwarded to Compute Layer for query processing.
- Metadata required to optimize a query or to filter a data are stored in this layer.
These three layers scale independently and Snowflake charges for storage and virtual warehouse separately. Services layer is handled within compute nodes provisioned, and hence not charged.
The advantage of this Snowflake architecture is that we can scale any one layer independently of others. For e.g. you can scale storage layer elastically and will be charged for storage separately. Multiple virtual warehouses can be provisioned and scaled when additional resources are required for faster query processing and to optimize performance. Know more about Snowflake architecture from here.
In this blog, you have learned about Snowflake data warehouse, Snowflake architecture and how it stores and manages data. You learned about various layers of the hybrid model in Snowflake architecture. Check out more article about Snowflake data warehouse to know about vital Snowflake data warehouse features and Snowflake best practices for ETL.
Hevo, an official Snowflake ETL Partner, can help bring your data from various sources to Snowflake in real-time. You can reach out to us or take up a free trial if you need help in setting up your Snowflake Architecture or connecting your data sources to Snowflake.
Give Hevo a try by signing up for a 14-day free trial today.
Share your experience of using Snowflake data warehouse in the comment section below.