Aurora to Snowflake - Move Data in 2 Easy Methods

Key Takeaways

Migrating data from Aurora to Snowflake using a manual approach typically involves exporting data from Aurora (MySQL or PostgreSQL) to Amazon S3, then loading it into Snowflake tables using the COPY INTO command.

Here’s the step-by-step method:
Step 1. Extract Data from Aurora Cluster to S3
Step 2. Convert Data Types and Format them
Step 3. Stage Data Files to the Snowflake Staging Area
Step 4. Import Staged Files to Snowflake Table
Step 5. Update Snowflake Table

Often, businesses have a different Database to store transactions (Eg, Amazon Aurora) and another Data Warehouse (Eg, Snowflake) for the company’s Analytical needs. There are two prime reasons to move data from your transactional Database to a Warehouse (Eg, Aurora to Snowflake).

Firstly, the transaction Database is optimized for fast writes and responses. Running Analytics queries on large data sets with many aggregations and Joins will slow down the Database. This might eventually take a toll on the customer experience. Secondly, Data Warehouses are built to handle scaling data sets and Analytical queries. Moreover, they can host the data from multiple data sources and aid in deeper analysis.

This post will introduce you to Aurora and Snowflake. It will also highlight the steps to move data from Aurora to Snowflake. In addition, you will explore some of the limitations associated with this method. You will be introduced to an easier alternative to solve these challenges. So, read along to gain insights and understand how to migrate data from Aurora to Snowflake.

Table of Contents

Understanding Aurora and Snowflake

AWS RDS (Relational Database Service) is the initial relational database service from AWS, which supports most of the open-source and proprietary databases. Open-source offerings of RDS like MySQL and PostgreSQL are much cost-effective compared to enterprise Database solutions like Oracle. But most of the time, open-source solutions require a lot of performance tuning to get on par with enterprise RDBMS in performance and other aspects like concurrent connections.

Image Source

AWS introduced a new Relational Database service called Aurora, which is compatible with MySQL and PostgreSQL to overcome the well-known weakness of those databases, costing much less than enterprise Databases. No wonder many organizations are moving to Aurora as their primary transaction Database system.

Image Source

On the other end, Snowflake might be the best cost-effective and fast Data Warehousing solution. It has dynamically scaling compute resources, and storage is completely separated and billed. Snowflake can be run on different Cloud vendors, including AWS. So, data movement from Aurora to Snowflake can also be done at a lower cost.

Want to understand the differences between AWS Aurora and Snowflake? Explore our guide to see how these two data solutions compare and find out which one suits your needs.

Looking for the best ETL tools to connect your Snowflake account? Rest assured, Hevo’s no-code platform seamlessly integrates with Snowflake, streamlining your ETL process. Try Hevo and equip your team to:

Integrate data from 150+ sources(60+ free sources).
Simplify data mapping with an intuitive, user-friendly interface.
Instantly load and sync your transformed data into Snowflake.

Choose Hevo and see why Deliverr says- “The combination of Hevo and Snowflake has worked best for us. ”

Get Started with Hevo for Free

This post will discuss Method 1 in detail for migrating data from Aurora to Snowflake. It will also highlight this approach’s limitations and the workarounds to solve them.

Move Data from Aurora to Snowflake using ETL Scripts

The steps to replicate data from Amazon Aurora to Snowflake are as follows:

1. Extract Data from Aurora Cluster to S3

SELECT INTO OUTFILE S3 statement can be used to query data from an Aurora MySQL cluster and save the result to S3. In this method, data reaches the client-side in a fast and efficient manner. To save data to S3 from an Aurora cluster, proper permissions need to be set. For that –

Create a proper IAM policy to access S3 objects.
Create a new IAM role, and attach the IAM policy you created in the above step.
Set aurora_select_into_s3_role or aws_default_s3_role cluster parameter to the ARN of the new IAM role.
Associate the IAM role that you created with the Aurora cluster.
Configure the Aurora cluster to allow outbound connections to S3.

Other important points to be noted while exporting data to S3:

User Privilege – The user who issues the SELECT INTO OUTFILE S3 should have the privilege to do so.
To grant access –

GRANT SELECT INTO S3 ON *.* TO 'user'@'domain'.

Note that this privilege is specific to Aurora. RDS doesn’t have such a privileged option.
Manifest File – You can set the MANIFEST ON option to create a manifest file, which is in JSON format, that lists the output files uploaded to the S3 path. Note that files will be listed in the same order in which they would be created.
Eg:

{ 
"entries": [ 
{ 
"url":"s3-us-east-1://s3_bucket/file_prefix.part_00000" 
}, 
{ 
"url":"s3-us-east-1://s3_bucket/file_prefix.part_00001" 
}, 
{ "url":"s3-us-east-1://s3_bucket/file_prefix.part_00002" 
} 
] 
}

Output Files – The output is stored as delimited text files. As of now, compressed or encrypted files are not supported.
Overwrite Existing File – Set option OVERWRITE ON to delete if a file withthe exact name exists in S3.
The default file size is 6 GB. If the data selected by the statement is less than a single file is created. Otherwise, multiple files are created. No rows will be split across file boundaries.
If the data volume to be exported is larger than 25 GB, it is recommended to run multiple statements to export the data. Each statement is for a different portion of data.
No metadata, like table schema, will be uploaded to S3
As of now, there is no direct way to monitor the progress of data export. One simple method is to set the manifest option on, and the manifest file will be the last file created. Examples:
- The below statement writes to S3 located in a different region. Each field is terminated by a comma, and each row is terminated by ‘n’.

SELECT * FROM students INTO OUTFILE S3 's3-us-west-2://aurora-out/sample_students_data' 
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY 'n';

Below is another example that writes to S3 located in the same region. A manifest file will also be created.

SELECT * FROM students INTO OUTFILE S3 's3://aurora-out/sample_students_data' 
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY 'n' 
MANIFEST ON;

2. Convert Data Types and Format them

There might be data transformations corresponding to business logic or organizational standards to be applied while transferring data from Aurora to Snowflake. Apart from those high-level mappings, some basic things to be considered generally are listed below:

All popular character sets, including UTF-8 and UTF-16, are supported by Snowflake.
Many Cloud-based and open-source Big Data systems compromise on standard Relational Database constraints like the Primary Key. But note that Snowflake supports all SQL constraints like UNIQUE, PRIMARY KEY, FOREIGN KEY, and NOT NULL constraints. This might be helpful when you load data.
Data types support in Snowflake is fairly rich, including nested data structures like an array. Below is the list of Snowflake data types and corresponding MySQL Aurora types.

Aurora(MySQL) Data Type	Snowflake Data Type
TINYINT	TINYINT
SMALLINT	SMALLINT
MEDIUMINT	INTEGER
INT	INTEGER
BIGINT	BIGINT
DECIMAL	DECIMAL
FLOAT	FLOAT, FLOAT4, FLOAT8
DOUBLE	DOUBLE, DOUBLE PRECISION, REAL
BIT	BOOLEAN
CHAR	CHAR
VARCHAR	VARCHAR
BINARY	BINARY
VARBINARY	VARBINARY
TINYTEXT	STRING, TEXT
TEXT	STRING, TEXT
MEDIUMTEXT	STRING, TEXT
LONGTEXT	STRING, TEXT
ENUM	No type for SET.Must use any type that can represent values in SET
SET	No type for SET.Must use any type which can represent values in SET
DATE	DATE
TIME	TIME
DATETIME	DATETIME
TIMESTAMP	TIMESTAMP

Snowflake is really flexible with the date or time format. If a custom format is used in your file, it can be explicitly specified using the File Format Option while loading data into the table.

Integrate PostgreSQL on Amazon Aurora to Snowflake

Get a Demo Try it

Integrate MongoDB to Snowflake

Get a Demo Try it

Integrate Google Analytics to Snowflake

Get a Demo Try it

3. Stage Data Files to the Snowflake Staging Area

Snowflake requires the data to be uploaded to a temporary location before loading it into the table. This temporary location is an S3 location that Snowflake has access to. This process is called staging. The snowflake stage can be either internal or external.

(A) Internal Stage

In Snowflake, each user and table is automatically assigned to an internal stage for data files. It is also possible internal stages explicitly and can be named.

The stage assigned to the user is named ‘@~’.
The stage assigned to a table will have the name of the table.
The default stages assigned to a user or table can’t be altered or dropped.
The default stages assigned to a user or table do not support setting file format options.

As mentioned above, internal stages can also be created explicitly by the user using SQL statements. While creating stages explicitly like this, many data loading options can be assigned to those stages, like file format, date format, etc.

While interacting with Snowflake for data loading or creating tables, SnowSQL is a very handy CLI client available in Linux/Mac/Windows, which can be used to run Snowflake commands.

Below are some example commands to create a stage:

Create a named internal stage as shown below:

my_aurora_stage  and assign some default options:

create or replace stage my_aurora_stage

copy_options = (on_error='skip_file')

file_format = (type = 'CSV' field_delimiter = '|' skip_header = 1);

PUT is the command used to stage files to an internal Snowflake stage. The syntax of the PUT command is :

PUT file://path_to_file/filename internal_stage_name

Eg:

Upload a file named students_data.csv in the /tmp/aurora_data/data/ directory to an internal stage named aurora_stage.

put file:////tmp/aurora_data/data/students_data.csv @aurora_stage;

Snowflake provides many options that can be used to improve the performance of data loading, such as the number of parallelism while uploading the file, automatic compression, etc.

(B) External Stage

Just like the internal stage, Snowflake supports Amazon S3 and Microsoft Azure as an external staging location. If data is already uploaded to an external stage that can be accessed from Snowflake, that data can be loaded directly to the Snowflake table. No need to move the data to an internal stage.

To create an external stage on S3, IAM credentials with proper access permissions need to be provided. In case the data is encrypted, encryption keys should be provided.

create or replace stage aurora_ext_stage url='s3://snowflake_aurora/data/load/files/'

credentials=(aws_key_id='13311a23344rrb3c' aws_secret_key='abddfgrrcd4kx5y6z');

encryption=(master_key = 'eSxX0jzsdsdYfIjkahsdkjamNNNaaaDwOaO8=');

Data can be uploaded to the external stage with the respective Cloud services. Data from Amazon Aurora will be exported to S3, and that location itself can be used as an external staging location, which helps to minimize data movement.

4. Import Staged Files to Snowflake Table

Now, the data is present in an external or internal stage and has to be loaded into a Snowflake table. The command used to do this is COPY INTO. To execute the COPY INTO command, compute resources in the form of Snowflake virtual warehouses are required and will be billed as per consumption.

Eg:

To load from a named internal stage:

copy into aurora_table

from @aurora_stage;

To load data from the external stage. Only a single file is specified.

copy into my_external_stage_table

from @aurora_ext_stage/tutorials/dataloading/students_ext.csv;

You can even copy directly from an external location:

copy into aurora_table

from s3://mybucket/aurora_snow/data/files

credentials=(aws_key_id='$AWS_ACCESS_KEY_ID' aws_secret_key='$AWS_SECRET_ACCESS_KEY')

encryption=(master_key = 'eSxX009jhh76jkIuLPH5r4BD09wOaO8=')

file_format = (format_name = csv_format);

Files can be specified using patterns:

copy into aurora_pattern_table

from @aurora_stage

file_format = (type = 'TSV')

pattern='.*/.*/.*[.]csv[.]gz';

Some commonly used options for CSV file loading using the COPY command

COMPRESSION to specify the compression algorithm used for the files
RECORD_DELIMITER to indicate line separator character
FIELD_DELIMITER is the character separating fields in the file
SKIP_HEADER is the number of header lines skipped
DATE_FORMAT is the date format specifier
TIME_FORMAT is the time format specifier

5. Update Snowflake Table

So far, the blog talks about how to extract data from Aurora and simply insert it into a Snowflake table. Next, let’s look deeper into how to handle incremental data upload to the Snowflake table.

Snowflake’s architecture is unique. It is not based on any current/existing big data framework. Snowflake does not have any limitations for row-level updates. This makes delta data uploading to a Snowflake table much easier compared to systems like Hive. The way forward is to load incrementally extracted data into an intermediate table. Next, as per the data in the intermediate table, modify the records in the final table.

Three common methods that are used to modify the final table once data is loaded into a landing table ( intermediate table) are mentioned below.

1. Update the rows in the target table. Next, insert new rows from the intermediate or landing table that are not in the final table.

UPDATE  aurora_target_table t SET t.value = s.value 

FROM  landing_delta_table in  WHERE t.id = in.id; 
 
INSERT INTO auroa_target_table (id, value)  

SELECT id, value FROM  landing_delta_table  WHERE NOT id IN (SELECT id FROM aurora_target_table);

2. Delete all records from the target table that are in the landing table. Then insert all rows from the landing table to the final table.

DELETE .aurora_target_table f 

WHERE f.id IN (SELECT id from landing_table); 

INSERT aurora_target_table (id, value) 

SELECT id, value FROM  landing_table;

3. MERGE statement – Inserts and updates are combined in a single MERGE statement, and it is used to apply changes in the landing table to the target table with one SQL statement.

MERGE into aurora_target_table t1 using landing_delta_table t2 on t1.id = t2.id 

WHEN matched then update set value = t2.value 

WHEN not matched then INSERT (id, value) values (t2.id, t2.value);

Limitations of Writing Custom ETL Code to Move Data from Aurora to Snowflake

Aurora to Snowflake - Limitations — Image Source

While the approach may look very straightforward to migrate data from Aurora to Snowflake, it does come with limitations. Some of these are listed below:

You would have to invest precious engineering resources to hand-code the pipeline. This will increase the time for the data to be available in Snowflake.
You will have to invest in engineering resources to constantly monitor and maintain the infrastructure. Code Breaks, Schema Changes at the source, Destination Unavailability – these issues will crop up more often than you would account for while starting the ETL project.
The above approach fails if you need data to be streamed in real-time from Aurora to Snowflake. You would need to add additional steps and set up cron jobs to achieve this.

So, to overcome these limitations and to load your data seamlessly from Amazon Aurora to Snowflake, you can use a third-party tool like Hevo.

On the other hand, a Data Pipeline Platform such as Hevo, an official Snowflake ETL partner, can help you bring data from Aurora to Snowflake in no time. Zero Code, Zero Setup Time, Zero Data Loss. Here are the simple steps to load data from Aurora to Snowflake using Hevo:

Authenticate and Connect to your Aurora DB.
Select the replication mode: (a) Full Dump and Load (b) Incremental load for append-only data (c) Change Data Capture
Configure the Snowflake Data Warehouse for data load.

For a next-generation digital organization, there should be a seamless data movement between Transactional and Analytical systems. Using an intuitive and reliable platform like Hevo to migrate your data from Aurora to Snowflake ensures that accurate and consistent data is available in Snowflake in real-time.

Conclusion

In this article, you gained a basic understanding of AWS Aurora and Snowflake. Moreover, you understood the steps to migrate your data from Aurora to Snowflake using Custom ETL scripts. In addition, you explored the limitations of this method. Hence, you were introduced to an easier alternative, Hevo, to move your data from Amazon Aurora to Snowflake seamlessly.

Hevo Data is a No-Code Data Pipeline that offers a faster way to move data from 150+ Data Sources, including 50+ Free Sources, into your Data Warehouse like Amazon Redshift to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code. You can easily load your data from Aurora to Snowflake in a hassle-free manner.

Frequently Asked Questions

1. How do I migrate data from AWS to Snowflake?

You can migrate data from AWS to Snowflake by exporting data from AWS services like S3 or RDS and then using Snowflake’s ‘copy into’ command to load the data into Snowflake tables or by using automated platforms like Hevo.

2. How to connect RDS to Snowflake?

You can connect RDS to Snowflake in two simple steps using automated data pipeline platforms like Hevo.

3. How do I load data from cloud to Snowflake?

You can load data from cloud storage (like AWS S3, Azure Blob Storage, or Google Cloud Storage) to Snowflake using the ‘Copy Into’ command.

Faisal K K Freelance Technical Content Writer, Hevo Data

Faisal loves data science and combines his problem-solving ability and passion for writing to help data teams in solving complex business problems.

Aurora to Snowflake ETL: 5 Steps to Move Data Easily

Understanding Aurora and Snowflake

Move Data from Aurora to Snowflake using ETL Scripts

1. Extract Data from Aurora Cluster to S3

2. Convert Data Types and Format them

3. Stage Data Files to the Snowflake Staging Area

(A) Internal Stage

(B) External Stage

4. Import Staged Files to Snowflake Table

5. Update Snowflake Table

Limitations of Writing Custom ETL Code to Move Data from Aurora to Snowflake

Conclusion

Frequently Asked Questions

1. How do I migrate data from AWS to Snowflake?

2. How to connect RDS to Snowflake?

3. How do I load data from cloud to Snowflake?

Related articles

Aurora to Snowflake ETL: 5 Steps to Move Data Easily

Understanding Aurora and Snowflake

Move Data from Aurora to Snowflake using ETL Scripts

1. Extract Data from Aurora Cluster to S3

2. Convert Data Types and Format them

3. Stage Data Files to the Snowflake Staging Area

(A) Internal Stage

(B) External Stage

4. Import Staged Files to Snowflake Table

5. Update Snowflake Table

Limitations of Writing Custom ETL Code to Move Data from Aurora to Snowflake

Conclusion

Frequently Asked Questions

1. How do I migrate data from AWS to Snowflake?

2. How to connect RDS to Snowflake?

3. How do I load data from cloud to Snowflake?

Related Articles

Optimize your data integration with Hevo!

Related articles