Loading Data to Snowflake: 4 Best Methods

• December 13th, 2021

Loading Data to Snowflake - Featured Image

In recent years, Snowflake has gained steam in the Cloud-based Data Warehouse space. More and more companies today are using Snowflake to drive operational efficiency, understanding their customers, understanding which products work, and which products don’t, and what kind of products people are interested in. 

This article introduces Snowflake and details 4 methods for loading data to Snowflake. These methods are different from each other in terms of the approach used and their use cases. Read along to decide which method suits you the best!

Table of Contents

What is Snowflake?

Snowflake Logo
Image Source

Snowflake is the leading Cloud-based Data Warehouse that has steadily grown and become popular in the past few years. Snowflake provides a scalable Cloud-based platform for enterprises and developers and supports advanced Data Analytics. There are multiple data stores available, but Snowflake’s architectural capabilities and data sharing capabilities are unique. Snowflake’s architecture enables storage and computing to scale independently, so customers can use storage and computing separately and pay for it.

The best property of Snowflake is that it provides separate storage and calculation options for data. Snowflake is designed to ensure that users do not require minimal effort or interaction to perform performance or maintenance-related activities. The minimum and maximum group size and scaling occurs automatically in this area at a very high speed

To learn more about Snowflake, visit here.

Methods of Loading Data to Snowflake

Method 1: Using SQL Commands for Loading Data to Snowflake

You can bulk load large amounts of data using SQL commands in SnowSQL using the Snowflake CLI. Although many different formats can be used as input in this method, CSV Files are used most commonly.

Method 2: Using Snowpipe for Loading Data to Snowflake

You can also automate the bulk loading of data using Snowpipe. It uses the COPY command and is beneficial when you need to input files from external sources into Snowflake.

Method 3: Using the Web Interface for Loading Data to Snowflake

You can use the web interface to load a limited amount of data. It has an in-built LOAD button that can input data into Snowflake. This method works only for small-sized data.

Method 4: Using Hevo Data for Loading Data to Snowflake

Hevo Data provides a hassle-free solution and helps you directly transfer data from various sources to Snowflake and numerous other Databases/Data Warehouses or destinations of your choice without any intervention in an effortless manner. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Hevo’s pre-built integration with 100+ other data sources (including 30+ free data sources) will take full charge of the data transfer process, allowing you to focus on key business activities.

Get Started with Hevo for Free

Methods of Loading Data to Snowflake

Depending on the volume of data you intend to load and the frequency of loading, you can prefer one method over the other for Loading Data to Snowflake from the following:

Download the Cheatsheet on How to Set Up ETL to Snowflake
Download the Cheatsheet on How to Set Up ETL to Snowflake
Download the Cheatsheet on How to Set Up ETL to Snowflake
Learn the best practices and considerations for setting up high-performance ETL to Snowflake

Method 1: Using SQL Commands for Loading Data to Snowflake

SQL Logo
Image Source

This post details the process of bulk loading data to Snowflake using the SnowSQL client. Using SQL, you can bulk load data from any delimited plain-text file such as Comma-delimited CSV files. You can also bulk load semi-structured data from JSON, AVRO, Parquet, or ORC files. However, this post focuses on loading from CSV files.

Bulk loading is performed in 2 phases:

1) Staging the Files

Snowflake lets you stage files on internal locations called stages. Each table and user has a stage. Snowflake also supports creating named stages for example demo_stage. Staging can be done as follows:

  • First, you upload your data files to a location where Snowflake can access your files. This is referred to as staging your files.
  • Then you load your data from these staged files into your tables.

Internal stages enable convenient and secure storage of data files without requiring any external resources. However, if your data files are already staged in a supported cloud storage location such as GCS or S3 – you can skip Staging and load directly from these external locations – you just need to supply the URLS for the locations as well as access credentials if the location is protected. You can also create named stages that point to your external location.

2) Loading the Data

Loading data to Snowflake requires a running virtual Data Warehouse. The warehouse extracts the data from each file and inserts it as rows in the table. Data Warehouse size can impact loading performance. When loading large numbers of files or large files, you may want to choose a larger Data Warehouse.

You will now learn how to use the SnowSQL SQL client to load CSV files from a local machine into a table named Contacts in the demo Database demo_db. CSV files are easier to import into Database systems like Snowflake because they can represent relational data in a plain-text file.

You will use a named internal stage to store the files before loading. Now the following steps are required for Loading Data to Snowflake:

Step 1: Use the demo_db Database

Last login: Sun Jun 30 15:31:25 on ttys011
Superuser-MacBook-Pro:Documents hevodata$ snowsql -a bulk_data_load
User: johndoe
Password:
* SnowSQL * V1.1.65
Type SQL statements or !help
johndoe#(no warehouse)@(no database).(no schema)>USE DATABASE demo_db;
+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Statement executed successfully.                   |
+----------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.219s

Step 2: Create the Contacts Table

Use the following SQL command to create the Contacts table:

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>CREATE OR REPLACE TABLE        contacts (  id NUMBER   (38, 0)  first_name STRING,  last_name STRING,  company STRING,  email STRING,  workphone STRING,  cellphone STRING,  streetaddress STRING,  city STRING,  postalcode NUMBER   (38, 0));
+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Table CONTACTS successfully created.               |
+----------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.335s

Step 3: Populate the Table with Records

The Contacts table should contain records like this;

1, Chris, Harris, BBC Top Gear, harrismonkey@bbctopgearmagazine.com, 606-237-0055, 502-564-8100, PO Box 3320 3 Queensbridge, Northampton, NN4 7BF
2, Julie, Clark, American Aerobatics Inc, julieclark@americanaerobatics.com, 530-677-0634, 530-676-3434, 3114 Boeing Rd, Cameron Park, CA 95682
3, Doug, Danger, MotorCycle Stuntman LA, dougdanger@mcsla.com, 413-239-7198, 508-832-9494, PO Box 131 Brimfield, Massachusetts, 01010
4, John, Edward, Get Psyched, information@johnedward.net, 631-547-6043, 800-860-7581,  PO Box 383
Huntington, New York, 11743
5, Bob, Hope, Bob Hope Comedy, bobhope@bobhope.com, 818-841-2020, 310-990-7444, 3808 W Riverside Dr-100, Burbank, CA 91505

Step 4: Create an Internal Stage

Next, you will create an internal stage called CSV files.

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>CREATE STAGE csvfiles;
        
+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Stage area CSVFILES successfully created.          |
+----------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.311s

Step 5: Execute a PUT Command to Stage the Records in CSV Files

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>PUT file:///tmp/load/contacts0*.csv @csvfiles;
contacts01.csv_c.gz(0.00MB): [##########] 100.00% Done (0.417s, 0.00MB/s),
contacts02.csv_c.gz(0.00MB): [##########] 100.00% Done (0.377s, 0.00MB/s),
contacts03.csv_c.gz(0.00MB): [##########] 100.00% Done (0.391s, 0.00MB/s),
contacts04.csv_c.gz(0.00MB): [##########] 100.00% Done (0.396s, 0.00MB/s),
contacts05.csv_c.gz(0.00MB): [##########] 100.00% Done (0.399s, 0.00MB/s),

        
+----------------+-------------------+-------------+------------------------+
| source         | target            | source_size | target_size | status   |               
|---------------------------------------------------------------------------|
| contacts01.csv | contacts01.csv.gz |         534 |         420 | UPLOADED |
| contacts02.csv | contacts02.csv.gz |         504 |         402 | UPLOADED |
| contacts03.csv | contacts03.csv.gz |         511 |         407 | UPLOADED |
| contacts04.csv | contacts04.csv.gz |         501 |         399 | UPLOADED |
| contacts05.csv | contacts05.csv.gz |         499 |         396 | UPLOADED |
+------------------+-------------------+-------------+----------------------+
5 Row(s) produced. Time Elapsed: 2.111s

Notice that:

  • This command uses a wildcard contacts0*.csv to load multiple files. 
  • The @ symbol specifies where to stage the files – in this case, @csvfiles;
  • By default, the PUT command will compress data files using GZIP compression.

Step 6: Confirm that the CSV Files have been Staged

To see if the files are staged you can use the LIST command.

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>LIST @csvfiles; 

Step 7: Specify a Virtual Data Warehouse to Use

Now we are ready to load the files from the staged files into the CONTACTS table. First, you will specify a virtual warehouse to use.

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>USE WAREHOUSE dataload; 

+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Statement executed successfully.          |
+----------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.203s

Step 8: Load the Staged Files Into a Snowflake Table

johndoe#(DATALOAD)@(DEMO_DB.PUBLIC)>COPY INTO contacts;
                    FROM @csvfiles
                    PATTERN = ‘.*contacts0[1-4].csv.gz’
                    ON_ERROR = ‘skip_file’;
  • INTO specifies where the table data will be loaded.
  • PATTERN specifies the data files to load. In this case, we are loading files from data files with names that include the numbers 1-4.
  • ON_ERROR tells the command what to do when it encounters errors in the files.

Snowflake also provides powerful options for error handling as the data is loading. You can check out the Snowflake documentation to learn more about these options. 

If the load was successful, you can now query your table using SQL:

johndoe#(DATALOAD)@(DEMO_DB.PUBLIC)>SELECT * FROM contacts LIMIT 10;

Method 2: Using Snowpipe for Loading Data to Snowflake

Snowpipe Architecture
Image Source

You can also use Snowpipe for bulk loading data to Snowflake, particularly from files staged in external locations. Snowpipe uses the COPY command but with additional features that let you automate this process.

Snowpipe also eliminates the need for a virtual warehouse, instead, it uses external compute resources to continuously load data as files are staged and you are charged only for the actual data loaded.

Method 3: Using the Web Interface for Loading Data to Snowflake

Loading data to Snowflake using web interface - image
Image Source

The third option for loading data into Snowflake is the data loading wizard in the Snowflake Web Interface.

The Web UI allows you to simply select the table you want to load and by clicking the LOAD button you can easily load a limited amount of data into Snowflake. The wizard simplifies loading by combining the staging and data loading phases into a single operation and it also automatically deletes all the staged files after loading.

The wizard is intended for loading only small numbers of files containing small amounts of data. For large amounts of data, it is best to use one of the other options.

Method 4: Using Hevo Data for Loading Data to Snowflake

Hevo Data Logo
Image Source

Hevo Data is a No-code Data Pipeline solution that can help you move data from 100+ data sources to Snowflake, Databases such as SQL Server, BI tools, or a destination of your choice in a completely hassle-free & automated manner. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Hevo Data takes care of all your data preprocessing needs and lets you focus on key business activities and draw a much powerful insight on how to generate more leads, retain customers, and take your business to new heights of profitability. It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. 

Sign up here for a 14-Day Free Trial!

Some of the salient features of Hevo include:

  • Fully Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100’s of sources that can help you scale your data infrastructure as required.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.

Conclusion

The article introduced you to Snowflake Data Warehouse. Moreover, it explained 4 methods of Loading Data to Snowflake in a step-by-step manner. Each method has its unique benefits can you must select one of them according to your requirements.

Imagine having to be an expert at every source of data your organization owns and trying to hand-code all the data movement requirements. This is a herculean undertaking by all means but with Hevo, you can extract data from hundreds of sources, transform that data while on transit, and then load that data into your target storage of choice.

Visit our Website to Explore Hevo

Hevo Data will automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc. This platform allows you to transfer data from 100+ multiple sources to Cloud-based Data Warehouses like Snowflake, Google BigQuery, Amazon Redshift, etc. It will provide you with a hassle-free experience and make your work life much easier.

Want to take Hevo for a spin?

Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

What is your preferred method of Loading Data to Snowflake? Let us know in the comments.

No-code Data Pipeline for Snowflake