Data is an essential need for companies to make data-driven decisions and stay one step ahead of their competitors. Managing and Analyzing massive amounts of data can be challenging if not planned and organized properly. Most of the business operations are handled by multiple apps, services, and websites that generate valuable data. One of the best ways to store huge amounts of structured or unstructured data is in Amazon S3. It is a widely used storage service to store any type of data. 

All this data needs to be processed and analyzed for better use. Companies transform this data to directly analyze it with the help of Business Intelligence (BI) tools. Snowflake is a Data Warehousing solution that helps companies store and analyze data by delivering high query performance. When it comes to cost-effectively managing data, many companies use Snowflake Unload to S3. 

Snowflake Unload to S3 allows companies to transfer unwanted data in the Snowflake to S3. In this article, you will learn about Snowflake, Amazon S3, and how to Snowflake Unload to S3 using AWS console and SQL commands.

Prerequisites

  • An active Snowflake account.
  • An active AWS account with S3.

Introduction to Snowflake

Snowflake Logo
Image Source

Snowflake is a Cloud Data Warehouse that is built on top of AWS (Amazon Web Services) to help companies store and analyzes massive amounts of data. It is the Software as a Service (SaaS) platform that means you don’t need to have the hardware to select, install, configure, or manage. All the software updates, maintenance, management, upgrades, and tuning are handled by Snowflake. It separates both computation and storage and charges for the service independently.

Snowflake is highly scalable that allow users to easily scale up and scale down the computation and storage to satisfy their business needs. It delivers faster query performance and can handle terabytes of data with the help of its that involves virtual compute instances.

Key Features of Snowflake

Some of the main features of Snowflake are listed below:

  • SQL Support: Snowflake wants its users to easily access and manipulate data by offering SQL language support that includes DDL, DML, and other advanced commands.
  • Data Import and Export: Snowflake supports bulk import and export of data which also includes character encoding, compressed files, delimited data files, etc.
  • Integrations: Snowflake easily integrates with many 3rd party apps and services majorly used by companies. It enables users to easily sync data between the platform and Snowflake.

To learn more about Snowflake, click here.

Introduction to Amazon S3

Amazon S3 Logo
Image Source

Amazon S3 or Amazon Simple Storage Service offered by AWS (Amazon Web Services) that allow companies to store their data as object storage with the help of a web service interface. It can store any type of object that makes it highly adaptable, scalable, and allow use for backup and recovery, disaster recovery, internet application, etc. It does not store data like the file system that uses data blocks, 

Amazon S3 stores data as independent objects along with complete metadata and a unique object identifier. It is widely used by companies such s Netflix, Amazon for E-Commerce, Twitter, etc.

Key Features of Amazon S3

Some of the main features of Amazon S3 are listed below.

  • Access Management: Amazon comes with various data access features to secure data and includes features for auditing and managing access to objects and businesses.
  • Analytics and Insights: Amazon allows users to gain visibility on the data stored on Amazon making it easier for them to understand, analyze, and optimize your storage at scale.
  • Data Processing: Amazon automates the data transformation activities for you by offering AWS Lambda and other features.

To learn more about Amazon S3, click here.

Steps for Snowflake Unload to S3

Snowflake Unload to S3 Procedure
Image Source

Now that you have understood about Snowflake and Amazon S3. In this section, you will learn the process of Snowflake Unload to S3. The following steps to easily perform Snowflake Unload to S3 are listed below:

Step 1: Allowing the Virtual Private Cloud IDs

The foremost step for Snowflake Unload to S3 is to explicitly grant Snowflake access to your AWS S3 storage account. For this, the location or region for both Amazon S3 located in your AWS account and Snowflake should be the same.

  • First, log in to your Snowflake account.
  • Then, set the ACCOUNTADMIN as an active role for the user session by executing the following code in the Snowflake console.
use role accountadmin;
  • Now, retrieve the IDs of the AWS Virtual Network (VNet) by querying the SYSTEM$GET_SNOWFLAKE_PLATFORM_INFO function in the console. You can follow the following code given below.
select system$get_snowflake_platform_info();
  • After getting the query result from the above replace the VPC IDs.
  • Allow the VPC IDs by creating an Amazon S3 policy for a specific VPC.
  • Now, lets’s provide an AWS Identify and Access Management (IAM) role to Snowflake so that you can access the Amazon S3 Bucket.

Step 2: Configuring an Amazon S3 Bucket 

  • You must allow Snowflake to access your Amazon S3 Bucket and folder to create new files. The following permissions are required for Snowflake Unload to S3:
    • s3:DeleteObject
    • s3:PutObject
  • First, you have to create an IAM policy by logging in to your AWS Management Console and from the dashboard navigating to the Identity & Access Management (IAM) options under Security, Identity and Compliance section, as shown in the image below.
IAM Settings in AWS Console - Snowflake Unload to S3
Image Source
  • Now, choose the “Account Settings” option from the left side navigation bar.
  • Here, just expand the “Security Token Service Regions” list corresponding to the region your AWS account is located.
  • Then, “Activate” the status if “Inactive“.
  • Choose the “Policies” option from the left-hand navigation bar.
  • Then, click on the “Create Policy” option, as shown in the image below.
Creating a New Policy - Snowflake Unload to S3
Image Source
  • Click on the “JSON” tab and add a policy document to allow Snowflake Unload to S3.
  • A sample policy document in JSON format is given below.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
              "s3:PutObject",
              "s3:GetObject",
              "s3:GetObjectVersion",
              "s3:DeleteObject",
              "s3:DeleteObjectVersion"
            ],
            "Resource": "arn:aws:s3:::<bucket>/<prefix>/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": "arn:aws:s3:::<bucket>",
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "<prefix>/*"
                    ]
                }
            }
        }
    ]
}
  • In the above code replace the <bucket> and <prefix> with the bucket name and folder path prefix.
  • Then, click on the “Review policy” button and enter the policy name eg.- snowflake_access. 
  • Click on the “Create policy” button, as shown in the image below.
Finishing New Policy - Snowflake Unload to S3
Image Source
  • Now, it’s time to create the IAM Role in AWS for Snowflake unload to S3.
  • Choose the “Roles” from the left navigation bar and click on the “Create role” button, as shown in the image below.
Creating New Role - Snowflake Unload to S3
Image Source
  • Here, select the “Another AWS account” as the trusted entity type.
  • In the Account ID field, provide your AWS account ID.
  • Then, select the “Require external ID” option. For now, enter the dummy ID as “0000“. You will need to modify it later to grant Snowflake Unload to S3.
  • Click on the “Next” button, locate the policy that you created and again click on the “Next” button, as shown in the image below.
Reviewing the Role - Snowflake Unload to S3
Image Source
  • Create the role by providing a valid name and description. You will have an IAM policy for the bucket, IAM role, and attached policy.
  • After this, you have to record the “Role ARN” value located on the role summary page, as shown in the image below. 
Summary of the Role - Snowflake Unload to S3
Image Source
  • Now, create a Snowflake Unload to S3 integration that references this role.
  • Next, you need to create a Cloud Storage integration in Snowflake using the “CREATE STORAGE INTEGRATION” command.
  • Open up the console and enter the following code given below.
CREATE STORAGE INTEGRATION <integration_name>
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = S3
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = '<iam_role>'
  [ STORAGE_AWS_OBJECT_ACL = 'bucket-owner-full-control' ]
  STORAGE_ALLOWED_LOCATIONS = ('s3://<bucket>/<path>/', 's3://<bucket>/<path>/')
  [ STORAGE_BLOCKED_LOCATIONS = ('s3://<bucket>/<path>/', 's3://<bucket>/<path>/') ]
  • Everything is ready for Snowflake Unload to S3, now you have to retrieve the AWS IAM user for your Snowflake account.
  • For this, execute the following command given below to retrieve the ARN.
DESC INTEGRATION <integration_name>;
  • Here, provide the “integration_name” that you created.
  • For example, the sample command is given below with the output.
desc integration s3_int;

+---------------------------+---------------+================================================================================+------------------+
| property                  | property_type | property_value                                                                 | property_default |
+---------------------------+---------------+================================================================================+------------------|
| ENABLED                   | Boolean       | true                                                                           | false            |
| STORAGE_ALLOWED_LOCATIONS | List          | s3://mybucket1/mypath1/,s3://mybucket2/mypath2/                                | []               |
| STORAGE_BLOCKED_LOCATIONS | List          | s3://mybucket1/mypath1/sensitivedata/,s3://mybucket2/mypath2/sensitivedata/    | []               |
| STORAGE_AWS_IAM_USER_ARN  | String        | arn:aws:iam::123456789001:user/abc1-b-self1234                                 |                  |
| STORAGE_AWS_ROLE_ARN      | String        | arn:aws:iam::001234567890:role/myrole                                          |                  |
| STORAGE_AWS_EXTERNAL_ID   | String        | MYACCOUNT_SFCRole=2_a123456/s0aBCDEfGHIJklmNoPq=                               |                  |
+---------------------------+---------------+================================================================================+------------------+
  • Record the “STORAGE_AWS_IAM_USER_ARN” and “STORAGE_AWS_EXTERNAL_ID” values from the output.
  • Next, click on the “Trust relationships” from the “Roles” option located at the side navigation bar.
  • Then, click on the “Edit trust relationship” button.
  • Now, you can modify the policy document with the “DESC STORAGE INTEGRATION” output values you just recorded.
  • The sample policy document for Snowflake Unload to S3 is given below.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "<snowflake_user_arn>"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "<snowflake_external_id>"
        }
      }
    }
  ]
}
  • In the above document, replace the recorded values.
  • Then, click on the “Update Trust Policy” button.
  • Now, you have to create an external stage for Snowflake Unload to S3 that references the storage integration you created earlier.
  • You have to create the stage using the “CREATE STAGE” command as given below.
grant create stage on schema public to role myrole;

grant usage on integration s3_int to role myrole;
  • Now, set the “mydb.public” as the current Database and schema for the user session by creating the named stage “my_s3_stage”. You can follow the sample code given below.
use schema mydb.public;

create stage my_s3_stage
  storage_integration = s3_int
  url = 's3://bucket1/path1'
  file_format = my_csv_format;

Step 3: Unloading Data into an External Stage

  • Create an external named stage using the web interface by clicking on the “Databases” option then, “<db_name>” and selecting then Stages.
  • You can also do the same thing using the SQL command given below.
CREATE STAGE
  • The following sample code will create an external stage for Snowflake Unload to S3 given below.
create or replace stage my_ext_unload_stage url='s3://unload/files/'
    storage_integration = s3_int
    file_format = my_csv_unload_format;
  • In the above code, the name of the external stage is “my_ext_unload_stage” with bucket name as “unload” and folder “path” as files.
  • Here, the data to unload as the file format object is “my_csv_unload_format“.
  • Now, use the COPY INTO command for Snowflake Unload to S3 from a table using the external stage.
  • The code for Snowflake Unload to S3 using the stage is given below.
copy into @my_ext_unload_stage/d1 from mytable;
  • In the above code, all the rows in “mytable” table are Snowflake Unload to S3 bucket. A “d1” filename prefix is applied to the files using the code given below.
  • Then, retrieve the objects from the S3 bucket using the S3 console from the output.

That’s it! You have successfully performed Snowflake Unload to S3.

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ data sources and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Get Started with Hevo for Free

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensures that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  1. Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  2. Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  3. Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  4. Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  5. Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  6. Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, E-Mail, and support calls.
  7. Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Conclusion

In this article, you learnt about the Snowflake, Amazon S3, and simple steps to Snowflake Unload to S3. The process of Snowflake Unload to S3 consists of accessing required permission from the AWS console and Snowflake account. Then, configuring the staging area along with the data that needs to be unloaded from Snowflake to S3. Snowflake Data Unloading to S3 helps companies cut the cost of storing data that is not needed in Data Warehouse or create a storage area with Snowflake.

Visit our Website to Explore Hevo

Companies have business data available in multiple sources, and it’s a tedious process to load data manually from data sources to Snowflake. Hevo Data is a No-code Data Pipeline that can help you transfer data from any data source to the desired Snowflake. It fully automates the process to load and transform data from 100+ sources to a destination of your choice without writing a single line of code. 

Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of learning about Snowflake Unload to S3 in the comments section below!

Aditya Jadon
Research Analyst, Hevo Data

Aditya Jadon is a data science enthusiast with a passion for decoding the complexities of data. He leverages his B. Tech degree, expertise in software architecture, and strong technical writing skills to craft informative and engaging content. Aditya has authored over 100 articles on data science, demonstrating his deep understanding of the field and his commitment to sharing knowledge with others.