Loading data from Amazon S3 to Snowflake can be challenging, especially when managing complex ETL scripts for data extraction and transformation. Many teams struggle with automation, scalability, and maintenance in this process. In this blog, we’ll explore two methods to simplify this: one using a custom ETL script and the other with a no-code solution like Hevo for seamless data transfer.

Overview of Amazon S3

Amazon S3 is a highly scalable object storage provided by AWS, used for storing and accessing any quantity of data anytime from the web. It allows you to store data in objects inside buckets. These buckets are the containers storing these objects. All stored objects’ sizes range from 0 bytes up to 5 terabytes.

Use Cases

  • Data Backup and Recovery: You can store backups of critical data in S3. This provides a durable store for data and quick recovery in case failures or disasters happen.
  • Static Website Hosting: It can also host static websites by serving directly from buckets in S3; that is, HTML, CSS, JavaScript, and media files will be served since it’s inherently qualified to scale and provide low latency in its access.

Overview of Snowflake

Snowflake is a cloud-based data warehousing platform designed to handle large-scale data analytics and processing. Snowflake uses a multi-cluster, shared data architecture, separating storage and compute. This enables independent scaling of both resources, optimizing performance and cost-efficiency.

Use Cases

  • Data Warehousing: Snowflake is used mainly for Data Warehousing. It stores huge volume of structured and semi-structured data for business intelligence and reporting analysis.
  • Real-time Analytics: Snowflake supports real-time data ingestion and processing, and thus it shall find a perfect place in applications requiring up-to-date insights and operational analytics.
Connect AWS S3 to Snowflake Hassle-free with Hevo Data

Manually setting up ETL pipelines for moving data from Amazon S3 to Snowflake can be time-consuming and complex, requiring constant monitoring and maintenance. Hevo automates the entire process, ensuring secure, real-time, and no-code data transfer with minimal effort.

  • Zero-Code Setup: No need for complex ETL scripts
  • Real-Time Data Sync: Keep Snowflake updated automatically
  • Schema Management: Auto-detect and adapt to schema changes
  • Reliable & Secure: End-to-end encryption and failure handling

Skip the manual hassle and start moving your data in minutes with Hevo!

Get Started with Hevo for Free

How to Set Up Amazon S3 to Snowflake Integration

Prerequisites

  • AWS Account: You have an S3 bucket with data (e.g., s3://my-bucket/datafolder/mydata.csv).
  • Snowflake Account: You have a Snowflake account with a user role that has privileges to create databases, schemas, tables, stages, and storage integrations (e.g., ACCOUNTADMIN or a custom role with similar privileges).
  • Snowflake Worksheet: Access to the Snowflake web UI (Snowsight) or SnowSQL for running commands.
  • IAM Role: Ability to create an IAM role in AWS to grant Snowflake access to your S3 bucket. You can also explore the S3KeySensor in Apache Airflow for monitoring specific files in S3.

Method 1: Manual ETL Process to Set up Amazon S3 to Snowflake Integration

Step 1: Prepare Your S3 Bucket

i) Verify Data in S3:

  • Log in to the AWS Management Console.
  • Navigate to S3 and locate your bucket (e.g., s3://my-bucket/datafolder/).
  • Ensure your data file (e.g., mydata.csv) is uploaded and accessible.
  • Note the bucket name and file path.

ii) Check File Format:

Confirm the format of your data (e.g., CSV with comma-separated values, headers optional). For this example, assume mydata.csv has columns: id, name, value.

Step 2: Set Up AWS IAM Role for Snowflake Access

Select the JSON tab and paste the following policy (replace my-bucket with your bucket name):

i) Log in to AWS Management Console:

Go to the IAM service.

ii) Create an IAM Policy:

  • Navigate to Policies > Create Policy.
  • Select the JSON tab and paste the following policy (replace my-bucket with your bucket name):
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-bucket/*",
        "arn:aws:s3:::my-bucket"
      ]
    }
  ]
}
  • Click Next, name it SnowflakeS3AccessPolicy, and create the policy.

iii) Create an IAM Role:

  • Go to Roles > Create Role.
  • Select AWS account as the trusted entity type, then choose Another AWS account and enter your Snowflake account ID (you’ll get this later from Snowflake).
  • Attach the SnowflakeS3AccessPolicy policy.
  • Name the role SnowflakeS3Role and create it.
  • Note the Role ARN (e.g., arn:aws:iam::123456789012:role/SnowflakeS3Role).
Integrate Amazon S3 to Snowflake
Integrate Amazon S3 to BigQuery
Integrate Amazon S3 to Redshift

Step 3: Configure Snowflake Storage Integration

i) Log in to Snowflake:

Open the Snowflake web UI (Snowsight) and log in with a user that has ACCOUNTADMIN privileges.

ii) Set Role and Warehouse:

In a worksheet, run:

USE ROLE ACCOUNTADMIN;
USE WAREHOUSE COMPUTE_WH; -- Replace with your warehouse name

iii) Create a Storage Integration:

Run the following command (replace my-bucket with your bucket name and use the ARN from Step ii):

CREATE STORAGE INTEGRATION s3_integration
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/SnowflakeS3Role'
  STORAGE_ALLOWED_LOCATIONS = ('s3://my-bucket/datafolder/');

This links Snowflake to your S3 bucket securely.

iv) Retrieve Snowflake IAM User Details:

DESC INTEGRATION s3_integration;

Note the values for STORAGE_AWS_IAM_USER_ARN (e.g., arn:aws:iam::999999999999:user/abc123) and STORAGE_AWS_EXTERNAL_ID.

v) Update IAM Role Trust Relationship:

  • Back in AWS IAM, edit the SnowflakeS3Role trust relationship:
  • Go to Roles > SnowflakeS3Role > Trust relationships > Edit trust relationship.
  • Paste:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::999999999999:user/abc123"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "your_external_id"
        }
      }
    }
  ]
}
  • Replace the ARN and external ID with values from Step iv.
  • Save the changes.

Step 4: Create Snowflake Objects

i) Create Database and Schema:

In the Snowflake worksheet, run

CREATE DATABASE my_db;
USE DATABASE my_db;
CREATE SCHEMA my_schema;
USE SCHEMA my_schema;

ii) Create a Table:

Define a table matching your CSV structure:

CREATE TABLE my_table (
  id INT,
  name STRING,
  value FLOAT
);

iii) Create a File Format:

Define the format of your S3 files (e.g., CSV):

CREATE FILE FORMAT my_csv_format
  TYPE = 'CSV'
  FIELD_DELIMITER = ','
  SKIP_HEADER = 1; -- Skip header row if present

iv) Create an External Stage:

Link the stage to your S3 bucket using the storage integration:

CREATE STAGE my_s3_stage
  STORAGE_INTEGRATION = s3_integration
  URL = 's3://my-bucket/datafolder/'
  FILE_FORMAT = my_csv_format;

Step 5: Load Data from S3 to Snowflake

i) Load Data with COPY INTO:

Run the following command to load data from the stage into the table:

COPY INTO my_table
  FROM @my_s3_stage
  PATTERN = '.*mydata.csv'; -- Matches your file name

ii) Verify the Load:

Check the results:

SELECT * FROM my_table LIMIT 10;

Quick Checks

If something’s off, I’d run:

COPY INTO my_table FROM @my_s3_stage VALIDATION_MODE = 'RETURN_ERRORS';

Or check the load history:

SELECT * FROM information_schema.load_history WHERE table_name = 'MY_TABLE';

Helpful Tips

  • Ensure you replace placeholders (e.g., bucket name, ARN, warehouse name) with your specific values to tailor the setup.
  • If your data is in a different format, such as JSON or Parquet, modify the FILE_FORMAT settings accordingly—refer to the Snowflake documentation for guidance.
  • For future automation, consider exploring Snowpipe. This guide, however, is designed for a straightforward, one-time manual load.

Limitations of Manual ETL Process to Set up Amazon S3 to Snowflake Integration

  • Time-Consuming: Manual setup of IAM roles, stages, and tables is slow and error-prone.
  • No Real-Time Updates: Data only loads when you run COPY INTO, not continuously.
  • Poor Scalability: Struggles with large or growing datasets without constant tweaks.
  • Error Risk: Typos or misconfigurations rely on you to catch and fix.
  • Weak Monitoring: Limited to basic checks like load_history, no automated oversight.
  • Resource Drain: Ties up warehouse compute, potentially raising costs.
  • Rigid Transformations: Requires pre-formatted data with no in-flight adjustments.

Method 2: Using Hevo Data to Set up Amazon S3 to Snowflake Integration

Loading data into Snowflake using Hevo is easier, more reliable, and faster. For any information on Amazon S3 Logs, you can visit the former link.

You can move data from Amazon S3 to Snowflake by following 2 simple steps without writing any piece of code. 

Step 1: Configure Amazon S3 as the Source

  • Connect to Amazon S3 source by providing connection settings.
Amazon S3 to Snowflake: Configuring the S3 as Source | Hevo Data
  • Select the file format (JSON/CSV/AVRO) and create schema folders.

Step 2: Configure Snowflake as the Destination

  • Configure Snowflake Warehouse.
Amazon S3 to Snowflake: Configuring the Snowflake as Destination | Hevo Data

Hevo’s integrations with data warehouses take care of all the groundwork for moving data from Amazon S3 to Snowflake in a secure, consistent, and reliable manner.

Future Trends

Integrating Snowflake with deep analytics and AI/ML tools helps you to draw deeper insights and value out of the data in S3 to continue innovating and enhancing their competitive advantage. This aligns with the DataOps methodology focused on improving communication and automation of data flows.

Increased adoption of the hybrid and multi-cloud strategies that Snowflake will be able to enable across multiple cloud providers while allowing for data consistency and access across S3 and other cloud storage solutions. You can make the most of Snowflake’s auto-scaling properties and its pay-as-you-go cost model, which is in line with changing needs for business and cost strategies.

Conclusion

Moving data from Amazon S3 to Snowflake is a common yet complex data engineering task. While a custom ETL approach offers flexibility, it requires significant effort to set up and maintain. On the other hand, a no-code solution like Hevo simplifies the process, ensuring seamless, automated, and reliable data movement without the operational overhead.

Sign up for a 14-day free trial with Hevo and experience effortless data integration!

Frequently Asked Questions

1. How to send data from S3 to Snowflake?

To send data from S3 to Snowflake, use Snowflake’s COPY INTO command or Snowpipe. Configure a stage in Snowflake linked to the S3 bucket, then execute the command to load data directly into Snowflake tables from the staged files.

2. Does Snowflake use S3?

Yes, Snowflake uses Amazon S3 (Simple Storage Service) as part of its architecture for storing and accessing data. 

3. What is S3 stage in Snowflake?

In Snowflake, an S3 stage is a reference to a location in Amazon S3 (Simple Storage Service) where data files are stored. It serves as an intermediary storage area between S3 and Snowflake’s compute resources.

4. Does Snowflake support JSON?

Yes, Snowflake supports JSON (JavaScript Object Notation) data format. Snowflake can efficiently store, query, and process JSON data, making it suitable for handling semi-structured data alongside traditional structured data. 

Avinash Mohanakrishnan
Freelance Technical Content Writer, Hevo Data

Avinashm loves blending his problem-solving skills with analytical thinking to dissect the complexities of data integration and analysis. He has extensive experience producing well-researched content tailored for helping businesses in the data industry.