Loading data from Amazon S3 to Snowflake can be challenging, especially when managing complex ETL scripts for data extraction and transformation. Many teams struggle with automation, scalability, and maintenance in this process. In this blog, we’ll explore two methods to simplify this: one using a custom ETL script and the other with a no-code solution like Hevo for seamless data transfer.
Table of Contents
Overview of Amazon S3
Amazon S3 is a highly scalable object storage provided by AWS, used for storing and accessing any quantity of data anytime from the web. It allows you to store data in objects inside buckets. These buckets are the containers storing these objects. All stored objects’ sizes range from 0 bytes up to 5 terabytes.
Use Cases
- Data Backup and Recovery: You can store backups of critical data in S3. This provides a durable store for data and quick recovery in case failures or disasters happen.
- Static Website Hosting: It can also host static websites by serving directly from buckets in S3; that is, HTML, CSS, JavaScript, and media files will be served since it’s inherently qualified to scale and provide low latency in its access.
Overview of Snowflake
Snowflake is a cloud-based data warehousing platform designed to handle large-scale data analytics and processing. Snowflake uses a multi-cluster, shared data architecture, separating storage and compute. This enables independent scaling of both resources, optimizing performance and cost-efficiency.
Use Cases
- Data Warehousing: Snowflake is used mainly for Data Warehousing. It stores huge volume of structured and semi-structured data for business intelligence and reporting analysis.
- Real-time Analytics: Snowflake supports real-time data ingestion and processing, and thus it shall find a perfect place in applications requiring up-to-date insights and operational analytics.
Manually setting up ETL pipelines for moving data from Amazon S3 to Snowflake can be time-consuming and complex, requiring constant monitoring and maintenance. Hevo automates the entire process, ensuring secure, real-time, and no-code data transfer with minimal effort.
- Zero-Code Setup: No need for complex ETL scripts
- Real-Time Data Sync: Keep Snowflake updated automatically
- Schema Management: Auto-detect and adapt to schema changes
- Reliable & Secure: End-to-end encryption and failure handling
Skip the manual hassle and start moving your data in minutes with Hevo!
Get Started with Hevo for FreeHow to Set Up Amazon S3 to Snowflake Integration
Prerequisites
- AWS Account: You have an S3 bucket with data (e.g.,
s3://my-bucket/datafolder/mydata.csv
). - Snowflake Account: You have a Snowflake account with a user role that has privileges to create databases, schemas, tables, stages, and storage integrations (e.g.,
ACCOUNTADMIN
or a custom role with similar privileges). - Snowflake Worksheet: Access to the Snowflake web UI (Snowsight) or SnowSQL for running commands.
- IAM Role: Ability to create an IAM role in AWS to grant Snowflake access to your S3 bucket. You can also explore the S3KeySensor in Apache Airflow for monitoring specific files in S3.
Method 1: Manual ETL Process to Set up Amazon S3 to Snowflake Integration
Step 1: Prepare Your S3 Bucket
i) Verify Data in S3:
- Log in to the AWS Management Console.
- Navigate to S3 and locate your bucket (e.g., s3://my-bucket/datafolder/).
- Ensure your data file (e.g., mydata.csv) is uploaded and accessible.
- Note the bucket name and file path.
ii) Check File Format:
Confirm the format of your data (e.g., CSV with comma-separated values, headers optional). For this example, assume mydata.csv has columns: id, name, value.
Step 2: Set Up AWS IAM Role for Snowflake Access
Select the JSON tab and paste the following policy (replace my-bucket with your bucket name):
i) Log in to AWS Management Console:
Go to the IAM service.
ii) Create an IAM Policy:
- Navigate to Policies > Create Policy.
- Select the JSON tab and paste the following policy (replace
my-bucket
with your bucket name):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-bucket/*",
"arn:aws:s3:::my-bucket"
]
}
]
}
- Click Next, name it SnowflakeS3AccessPolicy, and create the policy.
iii) Create an IAM Role:
- Go to Roles > Create Role.
- Select AWS account as the trusted entity type, then choose Another AWS account and enter your Snowflake account ID (you’ll get this later from Snowflake).
- Attach the
SnowflakeS3AccessPolicy
policy. - Name the role
SnowflakeS3Role
and create it. - Note the Role ARN (e.g.,
arn:aws:iam::123456789012:role/SnowflakeS3Role
).
Step 3: Configure Snowflake Storage Integration
i) Log in to Snowflake:
Open the Snowflake web UI (Snowsight) and log in with a user that has ACCOUNTADMIN privileges.
ii) Set Role and Warehouse:
In a worksheet, run:
USE ROLE ACCOUNTADMIN;
USE WAREHOUSE COMPUTE_WH; -- Replace with your warehouse name
iii) Create a Storage Integration:
Run the following command (replace my-bucket
with your bucket name and use the ARN from Step ii):
CREATE STORAGE INTEGRATION s3_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'S3'
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/SnowflakeS3Role'
STORAGE_ALLOWED_LOCATIONS = ('s3://my-bucket/datafolder/');
This links Snowflake to your S3 bucket securely.
iv) Retrieve Snowflake IAM User Details:
DESC INTEGRATION s3_integration;
Note the values for STORAGE_AWS_IAM_USER_ARN
(e.g., arn:aws:iam::999999999999:user/abc123
) and STORAGE_AWS_EXTERNAL_ID
.
v) Update IAM Role Trust Relationship:
- Back in AWS IAM, edit the SnowflakeS3Role trust relationship:
- Go to Roles > SnowflakeS3Role > Trust relationships > Edit trust relationship.
- Paste:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::999999999999:user/abc123"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "your_external_id"
}
}
}
]
}
- Replace the ARN and external ID with values from Step iv.
- Save the changes.
Step 4: Create Snowflake Objects
i) Create Database and Schema:
In the Snowflake worksheet, run
CREATE DATABASE my_db;
USE DATABASE my_db;
CREATE SCHEMA my_schema;
USE SCHEMA my_schema;
ii) Create a Table:
Define a table matching your CSV structure:
CREATE TABLE my_table (
id INT,
name STRING,
value FLOAT
);
iii) Create a File Format:
Define the format of your S3 files (e.g., CSV):
CREATE FILE FORMAT my_csv_format
TYPE = 'CSV'
FIELD_DELIMITER = ','
SKIP_HEADER = 1; -- Skip header row if present
iv) Create an External Stage:
Link the stage to your S3 bucket using the storage integration:
CREATE STAGE my_s3_stage
STORAGE_INTEGRATION = s3_integration
URL = 's3://my-bucket/datafolder/'
FILE_FORMAT = my_csv_format;
Step 5: Load Data from S3 to Snowflake
i) Load Data with COPY INTO:
Run the following command to load data from the stage into the table:
COPY INTO my_table
FROM @my_s3_stage
PATTERN = '.*mydata.csv'; -- Matches your file name
ii) Verify the Load:
Check the results:
SELECT * FROM my_table LIMIT 10;
Quick Checks
If something’s off, I’d run:
COPY INTO my_table FROM @my_s3_stage VALIDATION_MODE = 'RETURN_ERRORS';
Or check the load history:
SELECT * FROM information_schema.load_history WHERE table_name = 'MY_TABLE';
Helpful Tips
- Ensure you replace placeholders (e.g., bucket name, ARN, warehouse name) with your specific values to tailor the setup.
- If your data is in a different format, such as JSON or Parquet, modify the FILE_FORMAT settings accordingly—refer to the Snowflake documentation for guidance.
- For future automation, consider exploring Snowpipe. This guide, however, is designed for a straightforward, one-time manual load.
Limitations of Manual ETL Process to Set up Amazon S3 to Snowflake Integration
- Time-Consuming: Manual setup of IAM roles, stages, and tables is slow and error-prone.
- No Real-Time Updates: Data only loads when you run COPY INTO, not continuously.
- Poor Scalability: Struggles with large or growing datasets without constant tweaks.
- Error Risk: Typos or misconfigurations rely on you to catch and fix.
- Weak Monitoring: Limited to basic checks like load_history, no automated oversight.
- Resource Drain: Ties up warehouse compute, potentially raising costs.
- Rigid Transformations: Requires pre-formatted data with no in-flight adjustments.
Method 2: Using Hevo Data to Set up Amazon S3 to Snowflake Integration
Loading data into Snowflake using Hevo is easier, more reliable, and faster. For any information on Amazon S3 Logs, you can visit the former link.
You can move data from Amazon S3 to Snowflake by following 2 simple steps without writing any piece of code.
Step 1: Configure Amazon S3 as the Source
- Connect to Amazon S3 source by providing connection settings.
- Select the file format (JSON/CSV/AVRO) and create schema folders.
Step 2: Configure Snowflake as the Destination
- Configure Snowflake Warehouse.
Hevo’s integrations with data warehouses take care of all the groundwork for moving data from Amazon S3 to Snowflake in a secure, consistent, and reliable manner.
Future Trends
Integrating Snowflake with deep analytics and AI/ML tools helps you to draw deeper insights and value out of the data in S3 to continue innovating and enhancing their competitive advantage. This aligns with the DataOps methodology focused on improving communication and automation of data flows.
Increased adoption of the hybrid and multi-cloud strategies that Snowflake will be able to enable across multiple cloud providers while allowing for data consistency and access across S3 and other cloud storage solutions. You can make the most of Snowflake’s auto-scaling properties and its pay-as-you-go cost model, which is in line with changing needs for business and cost strategies.
Conclusion
Moving data from Amazon S3 to Snowflake is a common yet complex data engineering task. While a custom ETL approach offers flexibility, it requires significant effort to set up and maintain. On the other hand, a no-code solution like Hevo simplifies the process, ensuring seamless, automated, and reliable data movement without the operational overhead.
Sign up for a 14-day free trial with Hevo and experience effortless data integration!
Frequently Asked Questions
1. How to send data from S3 to Snowflake?
To send data from S3 to Snowflake, use Snowflake’s COPY INTO command or Snowpipe. Configure a stage in Snowflake linked to the S3 bucket, then execute the command to load data directly into Snowflake tables from the staged files.
2. Does Snowflake use S3?
Yes, Snowflake uses Amazon S3 (Simple Storage Service) as part of its architecture for storing and accessing data.
3. What is S3 stage in Snowflake?
In Snowflake, an S3 stage is a reference to a location in Amazon S3 (Simple Storage Service) where data files are stored. It serves as an intermediary storage area between S3 and Snowflake’s compute resources.
4. Does Snowflake support JSON?
Yes, Snowflake supports JSON (JavaScript Object Notation) data format. Snowflake can efficiently store, query, and process JSON data, making it suitable for handling semi-structured data alongside traditional structured data.