Snowflake COPY Command: 5 Critical Aspects

on Data Warehouse, Snowflake • December 28th, 2021

SNOWFLAKE COPY COMMAND

The Snowflake COPY command is used to load data from staged files on internal or external locations to an existing table or vice versa. In this article, you will understand the need for Snowflake COPY, the syntax to use, the required parameters, and also a few examples.

Table of Contents

Introduction to Snowflake

Snowflake Logo
Image Source

Snowflake runs entirely on cloud infrastructure and provides a fully managed data warehouse system. Snowflake runs and manages its infrastructure on a public cloud and cannot be installed or hosted on a private cloud. Snowflake has a hybrid architecture of traditional shared-disk database architectures and shared-nothing database architectures. 

Key features of Snowflake

  • Integrated Platform: Snowflake is a Software-as-a-Service model that can be accessed online via paid subscriptions.
  • Central Repository: Snowflake uses a central repository to store data that is accessible to all compute nodes in the data warehouse.
  • Performance: Snowflake uses MPP (Massive Parallel Processing) technique to process the queries in its compute clusters. Each node of the cluster stores the data locally to provide excellent data management.
  • No Hardware: With Snowflake, there is no need to set up, configure, install, and manage hardware.
  • Scalable: Snowflake has extraordinary scaling capabilities. You can easily scale up or down depending on the volume of data.
  • Fault-Tolerant: Snowflake provides exceptional fault-tolerant capabilities to recover the accidentally dropped Snowflake objects.
  • Standard and Extended SQL Support: Snowflake has the best support for ANSI SQL along with advanced SQL functionalities like Merge, Lateral view, statistical functions, and many more.
  • Improved Data-Driven Decision Making: Snowflake allows you to break down data silos and provide access to actionable insights across the organization. This is an essential first step to improve partner relationships, optimize pricing, reduce operational costs, drive sales effectiveness, and much more. 
  • Improved User Experiences and Product Offerings: With Snowflake in place, you can better understand user behavior and product usage. You can also leverage the full breadth of data to deliver customer success, vastly improve product offerings, and encourage data science innovation.  
  • Customized Data Exchange: Snowflake allows you to build your data exchange, securely sharing live, governed data. It also provides an incentive to build better data relationships across your business units and with your partners and customers. It does this by achieving a 360-degree view of your customer, which provides insight into key customer attributes like interests, employment, and many more. 
  • Robust Security: You can adopt a secure data lake as a single place for all compliance and cybersecurity data. Snowflake Data Lakes guarantee a fast incident response. This allows you to understand the complete picture of an incident by clubbing high-volume log data in a single location, and efficiently analyzing years of log data in seconds. You can now join semi-structured logs and structured enterprise data in one data lake. Snowflake lets you put your foot in the door without any indexing and easily manipulate and transform data once it is in Snowflake.  
Snowflake Copy: Virtual Private Cloud
Image Source

In this blog post, you’ll learn all about the Snowflake COPY command and its usage in real scenarios. You will also look at Hevo Data – an alternate and easy way to extract data.

Snowflake Architecture
Image Source

Hevo Data- A Simpler Approach

Hevo is a No-code Data Pipeline which helps you transfer data from 100+ sources (including 30+ Free Data Sources) to Snowflake in real-time in an effortless manner.

Get Started with Hevo for Free

Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Key Features of Hevo Data:

  • Fully Managed: Hevo Data is a fully managed service and is straightforward to set up.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL databases to name a few.  
  • Ensure Unique Records: Hevo Data helps you ensure that only unique records are present in the tables if Primary Keys are defined.
  • Multiple Sources: Hevo Data has various connectors incorporated with it, which can connect to multiple sources with ease. 
  • Automatic Mapping: Hevo Data automatically maps the source schema to perform analysis without worrying about the changes in the schema.
  • Real-time Data Transfer: Hevo Data works on both batch as well as real-time data transfer. 
  • Resume from Point of Failure: Hevo Data can resume the ingestion from the point of failure if it occurs. 
  • Advanced Monitoring: Advanced monitoring gives you a one-stop view to watch all the activity that occurs within pipelines.
  • 24/7 Support: With 24/7 Support, Hevo provides customer-centric solutions to the business use case.

Steps to load Snowflake data using Hevo Data:

  • Sign up on the Hevo Data, and select Snowflake as the destination.
  • Provide the user credentials and connect to the server.
  • Select the database, and schema to load the data.
Sign up here for a 14-Day Free Trial!

What is Snowflake COPY?

The Snowflake COPY command allows you to load data from staged files on internal/external locations to an existing table or vice versa.

Snowflake offers two types of COPY commands:

  1. COPY INTO <location>: This will copy the data from an existing table to locations that can be:
    1. An internal stage table. 
    2. An external stage table pointing to an external site, i.e., Amazon S3, Google Cloud Storage, or Microsoft Azure.
    3. An external location like Amazon cloud, GCS, or Microsoft Azure.
  1. COPY INTO <table>: This command will copy the data from staged files to the existing table. The staged files can be present at:  
    1. The internal stage table.
    2. The external stage table points to an external location like a public cloud provided by Amazon, Google, or Microsoft.
    3. Or, an external location like Amazon cloud, GCS, or Microsoft Azure.
Download the Cheatsheet on How to Set Up ETL to Snowflake
Download the Cheatsheet on How to Set Up ETL to Snowflake
Download the Cheatsheet on How to Set Up ETL to Snowflake
Learn the best practices and considerations for setting up high-performance ETL to Snowflake

The Need for Snowflake COPY

In Analytics, there is often a requirement to copy data from one table to another table/location to share, collaborate, and perform data analytics within the team. The Snowflake COPY command can also be used to create a clone of the table or to copy the data for failover recoveries. 

There might be several other use cases to use the COPY command, let’s briefly discuss the usage of the COPY command with examples.

COPY Command Usage

You can use the COPY command by using the following syntax:

COPY INTO [<namespace>.]<table_name>
FROM { internalStage | externalStage | externalLocation }
[ FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
[ PATTERN = '<regex_pattern>' ]
[ FILE_FORMAT = ( { FORMAT_NAME = '[<namespace>.]<file_format_name>' |
TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] } ) ]
[ copyOptions ]
[ VALIDATION_MODE = RETURN_<n>_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS ]

There are many parameters. However, we will be discussing mandatory and important ones. To get the full usage, refer to Snowflake documentation here.

Required Parameters  

[<namespace>].<table_name>: Specifies the name of the target table into which data is loaded.

Namespace is the combination of <database_name>.<schema_name> or <schema_name>. If the schema and database have already been selected, then this parameter is optional.

FROM: This specifies the internal or external location that contains the data file to load into the table.

Optional Parameters

FILES: This specifies the list of file names to load into the table. The file must be staged in either Snowflake’s internal or external location. 

PATTERN: A regular expression pattern to match the specific file names.

FILE_FORMAT: Specifies the data file format (CSV, JSON, AVRO, ORC, PARQUET, etc.).

VALIDATION_MODE: This instructs the command to validate the data files instead of loading them into target tables and allows you to perform the dry run to ensure the fail-safe delivery of data. 

Examples

Let’s consider a few examples to demonstrate the Snowflake COPY command.

Loading files using Basic Command

Stages

copy into table1 from '@mystage/path 1/file 1.csv';
copy into table2 from '@%mytable/path 1/file 1.csv';
copy into table3 from '@~/path 1/file 1.csv';

S3 bucket

copy into table4 from 's3://mybucket 1/prefix 1/file 1.csv';

Azure container

copy into table5 from 'azure://myaccount.blob.core.windows.net/mycontainer/encrypted_files/file 1.csv';

Loading files from Named External Stage

copy into mycsvtable
from @my_ext_stage/tutorials/dataloading/contacts1.csv;

Loading files from Amazon S3

copy into mytable
from s3://mybucket/data/files
storage_integration = myint
encryption=(master_key = 'eSxX0jzYfIamtnBKOEOwq80Au6NbSgPH5r4BDDwOaO8=')
file_format = (format_name = ‘csv’);

Loading files Using Patterns

copy into mytable
file_format = (type = 'CSV')
pattern='.*/.*/.*[.]csv[.]gz';

Conclusion

Snowflake is a powerful, fully managed platform used for data warehousing and analytics. In this blog post, you have learned about the Snowflake COPY command in great detail, its use, along with some examples.

However, when it comes to fully managed ETL, you can’t find a better solution than Hevo. It is a No-code Data Pipeline that will help you move data from multiple data sources to your destination. It is consistent and reliable. It has pre-built integrations from 100+ sources.

Visit our Website to Explore Hevo

Sign Up for a 14-day free trial to give Hevo a try.

Share your thoughts on the Snowflake COPY command in the comments below!

Data Engineering
Survey 2022
Calling all data engineers – fill out this short survey to help us build an industry report for our data engineering community.
TAKE THE SURVEY
Amazon Gift Cards of $25 each are on offer for all valid and complete survey submissions.