Snowflake is a Data Warehouse that uses AWS, Azure, or GCP resources in the background (you can select the cloud service provider as well as the region). It provides a scalable warehouse as a Software-as-a-Service (SaaS) offering, delineating compute and storage, allowing you to resize your warehouse when your compute requirements increase and pause it if there is no load. It also provides automatic concurrency, wherein it automatically spins off new clusters in case of sudden spikes in your workload.
A Data Warehouse is different from a database in that a database is optimized for transactions (OLTP) whereas a data warehouse is optimized for analytics (OLAP). However, the method of dealing with data remains largely the same: SQL. There are differences, of course. For instance, Snowflake and several other data warehouses organize data in columns instead of rows. This helps them perform parallel queries. The SQL syntax is all you need to worry about for processing data.
One aspect of Snowflake’s SQL syntax will be the focus of this article: SELECT INTO Snowflake statement. While Snowflake does not support the traditional SELECT INTO Snowflake statements, you will see an alternative that can help us get equivalent results as a SELECT INTO Snowflake statement.
Table of Contents
- What is Snowflake?
- SELECT INTO Statement
- Snowflake Equivalent: CREATE TABLE AS SELECT
You will need a basic understanding of SQL to understand the contents of this article. If you wish to follow along and execute the example queries of this article yourself, you will need a snowflake account. You can sign up for an account if you do not have one, and get a 30-day free trial.
What is Snowflake?
Snowflake is a Data Warehouse-as-a-Service platform developed for the Cloud. It has a Data Architecture that uses scalable, elastic Azure Blobs Storage as the internal storage engine and Azure Data Lake for storage of Unstructured, Structured, and On-premise data ingested via the Azure Data Factory.
Snowflake provides security and protection of data using Amazon S3 Policy Controls, SSO, Azure SAS Tokens, and Google Cloud Storage access permissions. You can also scale your storage depending on your storage needs. Thus, when you use a Snowflake Data Warehouse to store your data, while enjoying its features like scalability, security of your data, among many other benefits.
Key Features of Snowflake
Some of Snowflake’s key characteristics are as follows:
- Scalability: Snowflakes’ Multi-Cluster Shared Data Architecture separates compute and storage resources. This strategy allows users to scale up resources when large amounts of data must be loaded quickly and scale back down when the process is finished without interfering with other operations
- No Extracurricular Activity: It enables businesses to set up and manage a solution without relying on Database Administrators or IT teams. It does not require any software or hardware to be installed or activated.
- Security: Snowflake has a number of security features that range from how users access Snowflake to how data is stored. You can manage Network Policies by whitelisting IP addresses to restrict access to your account. Snowflake supports a number of authentication methods, including Two-Factor Authentication and SSO through Federated Authentication.
- Support for Semi-Structured Data: By utilizing the VARIANT schema on the Read data type, Snowflake’s architecture allows for the storage of Structured and Semi-Structured data in the same location. VARIANT has the ability to store both structured and semi-structured data. Snowflake automatically parses the data, extracts the attributes, and stores it in Columnar Format once it is loaded
For more information on Snowflake, click here.
Simplify Snowflake Data Transfer with Hevo’s No-code Pipeline
Hevo Data is a No-code Data Pipeline that helps you transfer data from 100+ sources (including 40+ Free Data Sources) to Snowflake in real-time in an effortless manner. After using Hevo you can easily carry out Snowflake Create Users Tasks.Get Started with Hevo for Free
Key Features of Hevo Data:
- Fully Managed: Hevo Data is a fully managed service and is straightforward to set up.
- Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
- Connectors: Hevo supports 100+ integrations to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL databases to name a few.
- Ensure Unique Records: Hevo Data helps you ensure that only unique records are present in the tables if Primary Keys are defined.
- Multiple Sources: Hevo Data has various connectors incorporated with it, which can connect to multiple sources with ease.
- Automatic Mapping: Hevo Data automatically maps the source schema to perform analysis without worrying about the changes in the schema.
- Real-time Data Transfer: Hevo Data works on both batch as well as real-time data transfer.
- Resume from Point of Failure: Hevo Data can resume the ingestion from the point of failure if it occurs.
- Advanced Monitoring: Advanced monitoring gives you a one-stop view to watch all the activity that occurs within pipelines.
- 24/7 Support: With 24/7 Support, Hevo provides customer-centric solutions to the business use case.
Steps to load Snowflake data using Hevo Data:
- Sign up on the Hevo Data, and select Snowflake as the destination.
- Provide the user credentials and connect to the server.
- Select the database, and schema to load the data.
SELECT INTO Statement
Note: This statement is not supported by Snowflake. The equivalent is provided in the next section
The general SELECT INTO statement helps you copy the contents of one table into a new table. If you wish to copy contents into an existing table, you would use the INSERT INTO syntax, but that is a topic for another article. Coming back to SELECT INTO, the statement syntax is:
SELECT col1, col2, ..., coln INTO new_table FROM existing_table WHERE condition
Here, col1, col2, etc. are the column names you would want to copy into the new table (you could use * if you wish to copy all columns). The WHERE condition acts as a filter allowing you to copy only rows meeting specific conditions.
An example would be
SELECT * INTO marks_backup_2021 FROM marks WHERE year = 2021
As you can see, in the above query, you copy all the columns from the marks table into the marks_backup_2021 new table. The year=2021 acts as a filter on the marks table.
Snowflake Equivalent: CREATE TABLE AS SELECT
The Select Into Snowflake equivalent is quite similar to SELECT INTO. The syntax is
CREATE [OR REPLACE] TABLE db_name.schema.table_name AS (SELECT ...)
As you can see, this is a more generalized version of the SELECT INTO Snowflake statement. This lets you store the results of any SELECT Into Snowflake statement into a new table. This helps you modify column types, choose only specific columns to copy, change the names of columns, and a lot more. Let us consider an example.
When you sign up for a Snowflake account, you get a SNOWFLAKE_SAMPLE_DATA DB. This contains a WEATHER schema with a daily_14_total table. You will use this table and copy some rows into a new table. First, you can run a simple query on this table to get an idea of the data.
SELECT * from snowflake_sample_data.weather.daily_14_total LIMIT 10
If you see, it has two columns, V (a json containing the weather info, and location information, like coordinates, name of the city, etc.), and T (timestamp).
Now, we would like to take a backup of all the weather data for Mumbai city. Let us see how that can be done using the CREATE TABLE AS SELECT syntax.
If you have an existing database and schema, you can use them. Otherwise, create a database called MY_DATABASE and a schema within it called BACKUP_DATA. Your query will look like this:
CREATE TABLE my_database.backup_data.mumbai_daily_14_total_backup AS (SELECT * from snowflake_sample_data.weather.daily_14_total WHERE V:city:name = 'Mumbai')
After execution, you should see the table creation message
You can then query this table and verify that the column names and types are similar and that the rows indeed correspond to Mumbai city only.
Note that the above example also shows that a table from one database can be copied into another. Over here, a table from SNOWFLAKE_SAMPLE_DATA DB was copied into MY_DATABASE.
You can make changes to the column names, add additional columns, and even remove some columns, as the query below shows.
CREATE TABLE my_database.backup_data.weather_data_cleaned AS (SELECT V:city:name as city, V:city:id as city_id, V:data as weather_data, to_timestamp(V:time) as ts from snowflake_sample_data.weather.daily_14_total LIMIT 100)
As you can see, this query creates a new table weather_data_cleaned which extracts 100 rows of some derived columns from the V JSON column of daily_14_table and removes the T column altogether.
You can query this newly created table and check that the columns have been created as expected (notice the ts column, which was converted into the timestamp data type from the int data type, using the to_timestamp function).
You saw the purpose of a SELECT INTO statement, and what its syntax looks like. Next, you saw Snowflake’s equivalent for the SELECT INTO Snowflake, i.e., CREATE TABLE AS SELECT. You saw how this was like a generalization of the SELECT INTO Snowflake statement, allowing us to store the results of any SELECT statement into a table. You saw examples that illustrated the use of this syntax for modifying column names, types, etc. when copying into the new table. In case you want to export data from a source of your choice into your desired Database/destination like Snowflake, then Hevo Data is the right choice for you!Visit our Website to Explore Hevo
Hevo Data provides its users with a simpler platform for integrating data from 100+ sources for Analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources. You can use it to transfer data from multiple data sources into your Data Warehouses such as Snowflake, Database, or a destination of your choice. It provides you with a consistent and reliable solution to managing data in real-time, ensuring that you always have Analysis-ready data in your desired destination.
Share your experience of learning about SELECT INTO Snowflake! Let us know in the comments section below!