Snowflake has become a significant participant in the quickly changing field of cloud-based data warehousing, enabling companies to improve operational effectiveness and obtain insightful knowledge about customer behavior and product performance. 

This article explores the nuances of importing data in Snowflake and identifies four best practices. Because the approaches differ in their methods and uses, this guide will help you choose the strategy that best suits your requirements for loading data.

Come along on this investigation to see how you can use Snowflake to unlock actionable intelligence for your company and optimize your data integration processes.

Let’s deep dive!

Methods of Loading Data to Snowflake

  • Method 1: Using SQL Commands for Loading Data to Snowflake
  • Method 2: Data ingestion into Snowflake using Snowpipe
Load your Data from MySQL to Snowflake
Load your Data from Salesforce to Snowflake
Load your Data from MongoDB to Snowflake

Method 1: Using SQL Commands for Loading Data to Snowflake

  • You can seamlessly utilize SQL commands to perform bulk-loading into Snowflake with the powerful SnowSQL client.
  • This method is limited to loading data from typical delimited plain-text files like Comma-delimited CSVs and from a spectrum of semi-structured sources—JSON, AVRO, Parquet, or ORC files.
  • Bulk loading is performed in 2 phases:

PRO TIP: Derive Meaningful Insights from Your Unified Data on Snowflake

Method 1A – Staging the Files

Snowflake lets you stage files on internal locations called stages. Each table and user has a stage. Snowflake also supports creating named stages for example demo_stage. Staging can be done as follows:

  • First, you upload your data files to a location where Snowflake can access your files. This is referred to as staging your files.
  • Then you load your data from these staged files into your tables.

Internal stages enable convenient and secure storage of data files without requiring any external resources. However, if your data files are already staged in a supported cloud storage location such as GCS or S3 – you can skip Staging and load directly from these external locations – you just need to supply the URLS for the locations as well as access credentials if the location is protected. You can also create named stages that point to your external location.

Method 1B – Loading the Data

Loading data to Snowflake requires a running virtual Data Warehouse. The warehouse extracts the data from each file and inserts it as rows in the table. Data Warehouse size can impact loading performance. When loading large numbers of files or large files, you may want to choose a larger Data Warehouse.

You will now learn how to use the SnowSQL SQL client to load CSV files from a local machine into a table named Contacts in the demo Database demo_db. CSV files are easier to import into Database systems like Snowflake because they can represent relational data in a plain-text file.

You will use a named internal stage to store the files before loading. Now, the following steps are required for data loading in Snowflake:

Step 1: Use the demo_db Database

Last login: Sun Jun 30 15:31:25 on ttys011
Superuser-MacBook-Pro:Documents hevodata$ snowsql -a bulk_data_load
User: johndoe
Password:
* SnowSQL * V1.1.65
Type SQL statements or !help
johndoe#(no warehouse)@(no database).(no schema)>USE DATABASE demo_db;
+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Statement executed successfully.                   |
+----------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.219s

Step 2: Create the Contacts Table

Use the following SQL command to create the Contacts table:

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>CREATE OR REPLACE TABLE        contacts (  id NUMBER   (38, 0)  first_name STRING,  last_name STRING,  company STRING,  email STRING,  workphone STRING,  cellphone STRING,  streetaddress STRING,  city STRING,  postalcode NUMBER   (38, 0));
+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Table CONTACTS successfully created.               |
+----------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.335s

Step 3: Populate the Table with Records

The Contacts table should contain records like this;

1, Chris, Harris, BBC Top Gear, harrismonkey@bbctopgearmagazine.com, 606-237-0055, 502-564-8100, PO Box 3320 3 Queensbridge, Northampton, NN4 7BF
2, Julie, Clark, American Aerobatics Inc, julieclark@americanaerobatics.com, 530-677-0634, 530-676-3434, 3114 Boeing Rd, Cameron Park, CA 95682
3, Doug, Danger, MotorCycle Stuntman LA, dougdanger@mcsla.com, 413-239-7198, 508-832-9494, PO Box 131 Brimfield, Massachusetts, 01010
4, John, Edward, Get Psyched, information@johnedward.net, 631-547-6043, 800-860-7581,  PO Box 383
Huntington, New York, 11743
5, Bob, Hope, Bob Hope Comedy, bobhope@bobhope.com, 818-841-2020, 310-990-7444, 3808 W Riverside Dr-100, Burbank, CA 91505

Step 4: Create an Internal Stage

Next, you will create an internal stage called CSV files.

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>CREATE STAGE csvfiles;
        
+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Stage area CSVFILES successfully created.          |
+----------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.311s

Step 5: Execute a PUT Command to Stage the Records in CSV Files

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>PUT file:///tmp/load/contacts0*.csv @csvfiles;
contacts01.csv_c.gz(0.00MB): [##########] 100.00% Done (0.417s, 0.00MB/s),
contacts02.csv_c.gz(0.00MB): [##########] 100.00% Done (0.377s, 0.00MB/s),
contacts03.csv_c.gz(0.00MB): [##########] 100.00% Done (0.391s, 0.00MB/s),
contacts04.csv_c.gz(0.00MB): [##########] 100.00% Done (0.396s, 0.00MB/s),
contacts05.csv_c.gz(0.00MB): [##########] 100.00% Done (0.399s, 0.00MB/s),

        
+----------------+-------------------+-------------+------------------------+
| source         | target            | source_size | target_size | status   |               
|---------------------------------------------------------------------------|
| contacts01.csv | contacts01.csv.gz |         534 |         420 | UPLOADED |
| contacts02.csv | contacts02.csv.gz |         504 |         402 | UPLOADED |
| contacts03.csv | contacts03.csv.gz |         511 |         407 | UPLOADED |
| contacts04.csv | contacts04.csv.gz |         501 |         399 | UPLOADED |
| contacts05.csv | contacts05.csv.gz |         499 |         396 | UPLOADED |
+------------------+-------------------+-------------+----------------------+
5 Row(s) produced. Time Elapsed: 2.111s

Notice that:

  • This command uses a wildcard contacts0*.csv to load multiple files. 
  • The @ symbol specifies where to stage the files – in this case, @csvfiles;
  • By default, the PUT command will compress data files using GZIP compression.

Step 6: Confirm that the CSV Files have been Staged

To see if the files are staged you can use the LIST command.

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>LIST @csvfiles; 

Step 7: Specify a Virtual Data Warehouse to Use

Now we are ready to load the files from the staged files into the CONTACTS table. First, you will specify a virtual warehouse to use.

johndoe#(no warehouse)@(DEMO_DB.PUBLIC)>USE WAREHOUSE dataload; 

+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Statement executed successfully.          |
+----------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.203s

Step 8: Load the Staged Files Into a Snowflake Table

johndoe#(DATALOAD)@(DEMO_DB.PUBLIC)>COPY INTO contacts;
                    FROM @csvfiles
                    PATTERN = ‘.*contacts0[1-4].csv.gz’
                    ON_ERROR = ‘skip_file’;
  • INTO specifies where the table data will be loaded.
  • PATTERN specifies the data files to load. In this case, we are loading files from data files with names that include the numbers 1-4.
  • ON_ERROR tells the command what to do when it encounters errors in the files.

Snowflake also provides powerful options for error handling as the data is loading. You can check out the Snowflake documentation to learn more about these options. 

If the load was successful, you can now query your table using SQL:

johndoe#(DATALOAD)@(DEMO_DB.PUBLIC)>SELECT * FROM contacts LIMIT 10;

Method 2: Data ingestion into Snowflake using Snowpipe

Let’s look into how to ingest data into Snowflake using the SnowSQL client. SnowSQL is the command-line client for Snowflake. Here’s an essential guide:

  • Install SnowSQL: Download the SnowSQL client on your local machine. Follow the installation instructions provided by Snowflake.
  • Configure SnowSQL: Run the following command to configure SnowSQL with your Snowflake account details:
bash

     snowsql

Follow the prompts to enter your account URL, username, password, and other necessary details.

  • Prepare Data: Ensure your data is in a compatible format (CSV, JSON, etc.) and stored in a location accessible by Snowflake, such as a cloud storage service.
  • Write SQL Commands: Create a table or ensure the target table already exists in Snowflake. For example:
sql

     CREATE TABLE IF NOT EXISTS your_table (

         column1 DataType,

         column2 DataType,

         -- Add other columns as needed

     );
  • Generate a Stage URL: Use a Snowflake internal stage or an external stage URL to specify the location of your data files. For example:
sql

     CREATE STAGE IF NOT EXISTS your_stage URL = 's3://your-s3-bucket/your-folder/';
  • Load Data: Use the COPY INTO command to load data from your files into the table. Adjust the file format and stage URL accordingly.
sql

     COPY INTO your_table

     FROM @your_stage/your_data_file.csv

     FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY = '"' SKIP_HEADER = 1);

Ensure that @your_stage points to the correct stage, and adjust the file format options based on your data.

  • Execute Commands using SnowSQL: Save your SQL commands in a file, e.g., load_data.sql. Then, execute it using SnowSQL:
bash

     snowsql -f load_data.sql
  • Monitor Load Progress: Check the progress and status of your queries using the Snowflake web interface or SnowSQL.
  • Verify Data: After the load is complete, use SQL queries to check the data in your table and ensure it matches your expectations.
  • Handle Errors: If there are any errors during the data load, review the error messages provided by Snowflake and troubleshoot accordingly.

Remember to replace placeholders like your_table, your_stage, and your_data_file.csv with your actual table name, stage, and file details. Always refer to the Snowflake documentation for the most accurate and up-to-date information.

There are a few pitfalls associated with this method.

  • Incorrect or incomplete data formatting (consider using the VALIDATE command after using COPY INTO to return all errors).
  • Insufficient credentials or permissions (consider running the CREATE THREAD request with the privileged user OWNERSHIP, ensuring the role is granted necessary access rights).
  • Latency and network connectivity issues during the data loading (check the network connectivity stability and latency for acceptable limits).

You can also use Snowpipe in Snowflake for bulk loading data to your Snowflake warehouse, particularly from files staged in external locations. Snowpipe in Snowflake uses the COPY command but with additional features that let you automate this process.

Snowpipe also eliminates the need for a virtual warehouse, instead, it uses external compute resources to load data as files are staged continuously and you are charged only for the actual data loaded.

Method 3: Using the Web Interface for Loading Data to Snowflake

  1. Login to your Snowflake account. Select an existing table or create a new one to load your data.
  2. In Table details, click on the Load button. This will open the Data Wizard that you can use to load the desired file into your table, including selecting the required data warehouse.
  3. You can either select local files or a cloud storage location such as GCP, Microsoft Azure, Amazon S3, etc.
  4. In the dropdown list, select a file format or create a new named one.
  5. Finally, set the load options in case of an error. Click the Load button to allow Snowflake to load data to the selected table via the warehouse.

Loading data from your local device into an existing table

  1. Log in to your Snowsight account.
  2. Choose Data » Databases in the navigation menu.
  3. Choose your desired database and schema.
  4. Select the table that you want to load data into in the object explorer.
  5. Click on Load Data.
  6. In the Load Data into Table option, click on Browse.
  7. Add data files (structured or semi-structured).
  8. Select a warehouse if you do not have a default warehouse set for your user.
  9. Click Next.
  10. Execute one of the following tasks:
    • Choose a file format from your current database.
    • Choose a file type to customize, then select the relevant settings for your data file.
  11. Click on Load.
    Snowsight will load your file and display the number of rows inserted into the table.
  12. Execute one of the following tasks:
    • Select Query Data to open a worksheet with SQL syntax.
    • Select Done to close the dialog.

If your file cannot be loaded, for instance, because the columns in the file do not match the columns provided in the table, an error notice appears. To change your options, click Back.

Method 4: Using Hevo Data for Loading Data to Snowflake

Imagine having to be an expert at every source of data your organization owns and trying to hand-code all the data movement requirements. This is an arduous undertaking, but with Hevo, you can extract data from hundreds of sources, transform that data while in transit, and then load that data into your target storage of choice.

Some of the salient features of Hevo include:

  • Fully Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.

Supported File Locations

  • The data files in cloud storage in Snowflake are called a stage.
  • The COPY INTO <table> command for bulk and continuous data loading (i.e. Snowpipe) supports both the cloud storage accounts controlled by your organization (external stages) and the cloud storage within your Snowflake account (internal stages).

Bulk Loading Using the COPY Command

  • This option allows you to import batches of data from files already in cloud storage or transfer (or stage) data files from a local system to an internal cloud storage location like Snowflake before loading them into tables with the transfer command.

Continuous Loading Using Snowpipe

  • This option is intended to import small amounts of data (micro-batches) and gradually prepare them for analysis.
  • Snowpipe loads data in minutes after adding files to a stage and submitting them for ingestion. This ensures that consumers receive the most recent findings as soon as the raw data is accessible.

Alternatives to Loading Data

External Tables (Data Lake)

  1. External tables allow you to query existing data saved in external cloud storage for analysis without initially putting it into Snowflake.
  2. The external cloud storage serves as the data’s single source of truth. Data sets created in Snowflake using materialized views are read-only.
  3. This technique is beneficial for accounts with a large quantity of data saved in external cloud storage but only want to query a subset of it, such as the most recent data.
  4. Users can use materialized views on portions of this data to improve query performance.

Working With Amazon S3-compatible Storage

  1. External stages and tables can be created on devices and software, on-premises, or in a private cloud that is fully compliant with the Amazon S3 API.
  2. This functionality simplifies and improves data management, governance, and analysis, regardless of where it is physically housed.

Read through these essentials to gain a deeper insight into Snowflake:

  1. How to Integrate Salesforce to Snowflake: 3 Easy Methods
  2. MySQL to Snowflake: 2 Easy Methods of Replication
  3. Amazon S3 to Snowflake ETL: 2 Easy Methods
  4. Sql Server To Snowflake

FAQ on How to Load Data into Snowflake

1. How do I load data into a Snowflake database?

  • Using Snowflake Web Interface
  • Using SnowSQL Command Line Interface
  • Using Python with Snowflake Connector
  • Using JDBC or ODBC

2. How do I import data from Excel to Snowflake?

  • Convert Excel to CSV
  • Stage the CSV
  • Copy Data from the Stage into Table

3. Can we load data from SQL Server to Snowflake?
Yes, you can load data from SQL Server to Snowflake. You can manually export the files from SQL Server and then load them into Snowflake or use Hevo Data to automate this process.

4. How to load JSON data into Snowflake?

  • Prepare the JSON data.
  • Create the Target Table.
  • Stage the JSON file.
  • Define the file format.
  • Copy data from the stage into the table.

5. How do you copy data into a Snowflake?
You can load data into your target table using the COPY INTO command.

Conclusion

  1. The article explained the four methods of loading data to Snowflake in a step-by-step manner.
  2. Each method has its unique benefits, and you can select one of them according to your requirements.
Sarad Mohanan
Software Engineer, Hevo Data

With over a decade of experience, Sarad has been instrumental in designing and developing Hevo's fundamental components. His expertise lies in building lean solutions for various software challenges. Sarad is passionate about mentoring fellow engineers and continually exploring new technologies to stay at the forefront of the industry. His dedication and innovative approach have made significant contributions to Hevo's success.

No-code Data Pipeline for Snowflake