Snowflake has become a significant participant in the quickly changing field of cloud-based data warehousing, enabling companies to improve operational effectiveness and obtain insightful knowledge about customer behavior and product performance. 

This article explores the nuances of importing data in Snowflake and identifies four best practices. Because the approaches differ in their methods and uses, this guide will help you choose the strategy that best suits your requirements for loading data. Come along on this investigation to see how you can use Snowflake to unlock actionable intelligence for your company and optimize your data integration processes. Let’s dive into the different methods on how to load data into Snowflake.

Methods of Loading Data to Snowflake

Method 1: Using SQL Commands for Loading Data to Snowflake

You can bulk load large amounts of data using SQL commands in SnowSQL using the Snowflake CLI. Although many different formats can be used as input in this method, CSV Files are used most commonly.

Method 2: Using Snowpipe for Loading Data to Snowflake

You can also automate the bulk loading of data using Snowpipe in Snowflake. It uses the COPY command and is beneficial when you need to input files from external sources into Snowflake.

Method 3: Using the Web Interface for Loading Data to Snowflake

You can use the web interface to load a limited amount of data. It has an in-built LOAD button that can input data into Snowflake. This method works only for small-sized data.

Method 4: Using Hevo Data for Loading Data to Snowflake

Hevo Data provides a hassle-free solution and helps you directly transfer data from various sources to Snowflake and numerous other Databases/Data Warehouses or destinations of your choice without any intervention in an effortless manner.

Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Hevo’s pre-built integration with 150+ other data sources (including 50+ free data sources) will take full charge of the data transfer process, allowing you to focus on key business activities.

Get Started with Hevo for Free

Supported File Locations

The data files in cloud storage in Snowflake are called a stage. The COPY INTO <table> command for bulk and continuous data loading (i.e. Snowpipe) supports both the cloud storage accounts controlled by your organization (external stages) and the cloud storage within your Snowflake account (internal stages).

Bulk Loading Using the COPY Command

This option allows you to import batches of data from files already in cloud storage or transfer (or stage) data files from a local system to an internal cloud storage location like Snowflake before loading them into tables with the transfer command.

Continuous Loading Using Snowpipe

This option is intended to import small amounts of data (micro-batches) and gradually prepare them for analysis. Snowpipe loads data in minutes after adding files to a stage and submitting them for ingestion. This ensures that consumers receive the most recent findings as soon as the raw data is accessible.

Methods of Loading Data to Snowflake

Depending on the volume of data you intend to load and the frequency of loading, you can prefer one method over the other for Loading Data to Snowflake from the following:

Method 1: Using SQL Commands for Loading Data to Snowflake

SQL Logo: Loading data to Snowflake
Image Source

You can seamlessly utilize SQL commands to perform bulk-loading into Snowflake with the powerful SnowSQL client. This method is limited to loading data from typical delimited plain-text files like Comma-delimited CSVs and from a spectrum of semi-structured sources—JSON, AVRO, Parquet, or ORC files.

Bulk loading is performed in 2 phases:

1) Staging the Files

Snowflake lets you stage files on internal locations called stages. Each table and user has a stage. Snowflake also supports creating named stages for example demo_stage. Staging can be done as follows:

  • First, you upload your data files to a location where Snowflake can access your files. This is referred to as staging your files.
  • Then you load your data from these staged files into your tables.

Internal stages enable convenient and secure storage of data files without requiring any external resources. However, if your data files are already staged in a supported cloud storage location such as GCS or S3 – you can skip Staging and load directly from these external locations – you just need to supply the URLS for the locations as well as access credentials if the location is protected. You can also create named stages that point to your external location.

2) Loading the Data

Loading data to Snowflake requires a running virtual Data Warehouse. The warehouse extracts the data from each file and inserts it as rows in the table. Data Warehouse size can impact loading performance. When loading large numbers of files or large files, you may want to choose a larger Data Warehouse.

You will now learn how to use the SnowSQL SQL client to load CSV files from a local machine into a table named Contacts in the demo Database demo_db. CSV files are easier to import into Database systems like Snowflake because they can represent relational data in a plain-text file.

You will use a named internal stage to store the files before loading. Now, the following steps are required for data loading in Snowflake:

Step 1: Use the demo_db Database

Last login: Sun Jun 30 15:31:25 on ttys011
Superuser-MacBook-Pro:Documents hevodata$ snowsql -a bulk_data_load
User: johndoe
Password:
* SnowSQL * V1.1.65
Type SQL statements or !help
johndoe#(no warehouse)@(no database).(no schema)>USE DATABASE demo_db;
+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Statement executed successfully.                   |
+----------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.219s

Step 2: Create the Contacts Table

Use the following SQL command to create the Contacts table:

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>CREATE OR REPLACE TABLE        contacts (  id NUMBER   (38, 0)  first_name STRING,  last_name STRING,  company STRING,  email STRING,  workphone STRING,  cellphone STRING,  streetaddress STRING,  city STRING,  postalcode NUMBER   (38, 0));
+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Table CONTACTS successfully created.               |
+----------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.335s

Step 3: Populate the Table with Records

The Contacts table should contain records like this;

1, Chris, Harris, BBC Top Gear, harrismonkey@bbctopgearmagazine.com, 606-237-0055, 502-564-8100, PO Box 3320 3 Queensbridge, Northampton, NN4 7BF
2, Julie, Clark, American Aerobatics Inc, julieclark@americanaerobatics.com, 530-677-0634, 530-676-3434, 3114 Boeing Rd, Cameron Park, CA 95682
3, Doug, Danger, MotorCycle Stuntman LA, dougdanger@mcsla.com, 413-239-7198, 508-832-9494, PO Box 131 Brimfield, Massachusetts, 01010
4, John, Edward, Get Psyched, information@johnedward.net, 631-547-6043, 800-860-7581,  PO Box 383
Huntington, New York, 11743
5, Bob, Hope, Bob Hope Comedy, bobhope@bobhope.com, 818-841-2020, 310-990-7444, 3808 W Riverside Dr-100, Burbank, CA 91505

Step 4: Create an Internal Stage

Next, you will create an internal stage called CSV files.

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>CREATE STAGE csvfiles;
        
+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Stage area CSVFILES successfully created.          |
+----------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.311s

Step 5: Execute a PUT Command to Stage the Records in CSV Files

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>PUT file:///tmp/load/contacts0*.csv @csvfiles;
contacts01.csv_c.gz(0.00MB): [##########] 100.00% Done (0.417s, 0.00MB/s),
contacts02.csv_c.gz(0.00MB): [##########] 100.00% Done (0.377s, 0.00MB/s),
contacts03.csv_c.gz(0.00MB): [##########] 100.00% Done (0.391s, 0.00MB/s),
contacts04.csv_c.gz(0.00MB): [##########] 100.00% Done (0.396s, 0.00MB/s),
contacts05.csv_c.gz(0.00MB): [##########] 100.00% Done (0.399s, 0.00MB/s),

        
+----------------+-------------------+-------------+------------------------+
| source         | target            | source_size | target_size | status   |               
|---------------------------------------------------------------------------|
| contacts01.csv | contacts01.csv.gz |         534 |         420 | UPLOADED |
| contacts02.csv | contacts02.csv.gz |         504 |         402 | UPLOADED |
| contacts03.csv | contacts03.csv.gz |         511 |         407 | UPLOADED |
| contacts04.csv | contacts04.csv.gz |         501 |         399 | UPLOADED |
| contacts05.csv | contacts05.csv.gz |         499 |         396 | UPLOADED |
+------------------+-------------------+-------------+----------------------+
5 Row(s) produced. Time Elapsed: 2.111s

Notice that:

  • This command uses a wildcard contacts0*.csv to load multiple files. 
  • The @ symbol specifies where to stage the files – in this case, @csvfiles;
  • By default, the PUT command will compress data files using GZIP compression.

Step 6: Confirm that the CSV Files have been Staged

To see if the files are staged you can use the LIST command.

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>LIST @csvfiles; 

Step 7: Specify a Virtual Data Warehouse to Use

Now we are ready to load the files from the staged files into the CONTACTS table. First, you will specify a virtual warehouse to use.

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>USE WAREHOUSE dataload; 

+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Statement executed successfully.          |
+----------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.203s

Step 8: Load the Staged Files Into a Snowflake Table

johndoe#(DATALOAD)@(DEMO_DB.PUBLIC)>COPY INTO contacts;
                    FROM @csvfiles
                    PATTERN = ‘.*contacts0[1-4].csv.gz’
                    ON_ERROR = ‘skip_file’;
  • INTO specifies where the table data will be loaded.
  • PATTERN specifies the data files to load. In this case, we are loading files from data files with names that include the numbers 1-4.
  • ON_ERROR tells the command what to do when it encounters errors in the files.

Snowflake also provides powerful options for error handling as the data is loading. You can check out the Snowflake documentation to learn more about these options. 

If the load was successful, you can now query your table using SQL:

johndoe#(DATALOAD)@(DEMO_DB.PUBLIC)>SELECT * FROM contacts LIMIT 10;
Download the Cheatsheet on How to Set Up ETL to Snowflake
Download the Cheatsheet on How to Set Up ETL to Snowflake
Download the Cheatsheet on How to Set Up ETL to Snowflake
Learn the best practices and considerations for setting up high-performance ETL to Snowflake

Method 2: Data ingestion into Snowflake using Snowpipe

Snowpipe Architecture: Loading data to Snowflake
Image Source

Let’s look into how to ingest data into Snowflake using the SnowSQL client. SnowSQL is the command-line client for Snowflake. Here’s an essential guide:

  • Install SnowSQL: Download the SnowSQL client on your local machine. Follow the installation instructions provided by Snowflake.
  • Configure SnowSQL: Run the following command to configure SnowSQL with your Snowflake account details:
bash

     snowsql

Follow the prompts to enter your account URL, username, password, and other necessary details.

  • Prepare Data: Ensure your data is in a compatible format (CSV, JSON, etc.) and stored in a location accessible by Snowflake, such as a cloud storage service.
  • Write SQL Commands: Create a table or ensure the target table already exists in Snowflake. For example:
sql

     CREATE TABLE IF NOT EXISTS your_table (

         column1 DataType,

         column2 DataType,

         -- Add other columns as needed

     );
  • Generate a Stage URL: Use a Snowflake internal stage or an external stage URL to specify the location of your data files. For example:
sql

     CREATE STAGE IF NOT EXISTS your_stage URL = 's3://your-s3-bucket/your-folder/';
  • Load Data: Use the COPY INTO command to load data from your files into the table. Adjust the file format and stage URL accordingly.
sql

     COPY INTO your_table

     FROM @your_stage/your_data_file.csv

     FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY = '"' SKIP_HEADER = 1);

Ensure that @your_stage points to the correct stage, and adjust the file format options based on your data.

  • Execute Commands using SnowSQL: Save your SQL commands in a file, e.g., load_data.sql. Then, execute it using SnowSQL:
bash

     snowsql -f load_data.sql
  • Monitor Load Progress: Check the progress and status of your queries using the Snowflake web interface or SnowSQL.
  • Verify Data: After the load is complete, use SQL queries to check the data in your table and ensure it matches your expectations.
  • Handle Errors: If there are any errors during the data load, review the error messages provided by Snowflake and troubleshoot accordingly.

Remember to replace placeholders like your_table, your_stage, and your_data_file.csv with your actual table name, stage, and file details. Always refer to the Snowflake documentation for the most accurate and up-to-date information.

There are a few pitfalls associated with this method.

  • Incorrect or incomplete data formatting (consider using the VALIDATE command after using COPY INTO to return all errors).
  • Insufficient credentials or permissions (consider running the CREATE THREAD request with the privileged user OWNERSHIP, ensuring the role is granted necessary access rights).
  • Latency and network connectivity issues during the data loading (check the network connectivity stability and latency for acceptable limits).

You can also use Snowpipe in Snowflake for bulk loading data to your Snowflake warehouse, particularly from files staged in external locations. Snowpipe in Snowflake uses the COPY command but with additional features that let you automate this process.

Snowpipe also eliminates the need for a virtual warehouse, instead, it uses external compute resources to load data as files are staged continuously and you are charged only for the actual data loaded.

Method 3: Using the Web Interface for Loading Data to Snowflake

Loading data to Snowflake using web interface
Image Source

The third option for loading data into Snowflake is the data loading wizard in the Snowflake Web Interface.

The Web UI allows you to select the table you want to load simply, and by clicking the LOAD button, you can easily load a limited amount of data into Snowflake. The wizard simplifies loading by combining the staging and data loading phases into a single operation, and it also automatically deletes all the staged files after loading.

The wizard is intended for loading only small numbers of files containing small amounts of data, up to 50 MB. For large amounts of data, it is best to use one of the other options.

You can load files with structured data (such as CSV and TSV) or semi-structured data (such as JSON, Parquet, ORC, Avro, or XML) using Snowsight or Classic Console.

Considered as one of the simplest methods, here are the simple steps to follow:

  1. Login to your Snowflake account. Select an existing table or create a new one to load your data.
  2. In Table details, click on the Load button. This will open the Data Wizard that you can use to load the desired file into your table, including selecting the required data warehouse.
  3. You can either select local files or a cloud storage location such as GCP, Microsoft Azure, Amazon S3, etc.
  4. In the dropdown list, select a file format or create a new named one.
  5. Finally, set the load options in case of an error. Click the Load button to allow Snowflake to load data to the selected table via the warehouse.

Loading data from your local device into an existing table

  1. Log in to your Snowsight account.
  2. Choose Data » Databases in the navigation menu.
  3. Choose your desired database and schema.
  4. Select the table that you want to load data into in the object explorer.
  5. Click on Load Data.
  6. In the Load Data into Table option, click on Browse.
  7. Add data files (structured or semi-structured).
  8. Select a warehouse if you do not have a default warehouse set for your user.
  9. Click Next.
  10. Execute one of the following tasks:
    • Choose a file format from your current database.
    • Choose a file type to customize, then select the relevant settings for your data file.
  11. Click on Load.
    Snowsight will load your file and display the number of rows inserted into the table.
  12. Execute one of the following tasks:
    • Select Query Data to open a worksheet with SQL syntax.
    • Select Done to close the dialog.

If your file cannot be loaded, for instance, because the columns in the file do not match the columns provided in the table, an error notice appears. To change your options, click Back.

Method 4: Using Hevo Data for Loading Data to Snowflake

Imagine having to be an expert at every source of data your organization owns and trying to hand-code all the data movement requirements. This is an arduous undertaking, but with Hevo, you can extract data from hundreds of sources, transform that data while in transit, and then load that data into your target storage of choice.

Hevo Data takes care of all your data preprocessing needs and lets you focus on key business activities and draw a much more powerful insight on how to generate more leads, retain customers, and take your business to new heights of profitability. It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. 

Some of the salient features of Hevo include:

  • Fully Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.

Alternatives to Loading Data

Using the following options, you can query your data directly in cloud storage without having to load it into Snowflake tables.

External Tables (Data Lake)

External tables allow you to query existing data saved in external cloud storage for analysis without initially putting it into Snowflake. The external cloud storage serves as the data’s single source of truth. Data sets created in Snowflake using materialized views are read-only.

This technique is beneficial for accounts with a large quantity of data saved in external cloud storage but only want to query a subset of it, such as the most recent data. Users can use materialized views on portions of this data to improve query performance.

Working With Amazon S3-compatible Storage

External stages and tables can be created on devices and software, on-premises, or in a private cloud that is fully compliant with the Amazon S3 API. This functionality simplifies and improves data management, governance, and analysis, regardless of where it is physically housed.

Conclusion

The article explained the four methods of loading data to Snowflake in a step-by-step manner. Each method has its unique benefits, and you can select one of them according to your requirements.

Read through these essentials to gain a deeper insight into Snowflake:

Imagine having to be an expert at every source of data your organization owns and trying to hand-code all the data movement requirements. This is a herculean undertaking by all means but with Hevo, you can extract data from hundreds of sources, transform that data while on transit, and then load that data into your target storage of choice.

Visit our Website to Explore Hevo

Want to take Hevo for a spin?

Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand. Check out our unbeatable Hevo Pricing to select the right plan for you!

What is your preferred method of loading data to Snowflake? Let us know in the comments.

mm
Software Engineer, Hevo Data

With around a decade of experience, Sarad has designed and developed fundamental components of Hevo. His expertise lies in building lean solutions for various software problems, mentoring fellow engineers and exploring new technologies"

No-code Data Pipeline for Snowflake