Press "Enter" to skip to content

MySQL to Snowflake – Steps to Migrate Data

mysql to snowflakeSnowflake is a relatively younger modern Data Warehousing solution on the cloud. From the first day, Snowflake is well accepted by enterprises and medium/small businesses as it resolves many challenges with traditional Data Warehouse solutions. If you are thinking of a new cloud-native Data Warehousing solution then you should consider evaluating Snowflake. Hence, before diving into MySQL to Snowflake data migration steps, let’s do a quick walkthrough of Snowflake benefits:

Snowflake Benefits

  • Breathtaking Performance – Compute and storage is separated. You can scale up and down on the fly without any disruption.
  • Support of Unlimited Concurrency – You can scale up computation dynamically. No need to worry about occasional high usages.
  • Cloud Vendor AgnosticSnowflake architecture supports multiple cloud vendors and they keep on adding new vendors to list. So as a user, you will have more choices and even if you are working with different cloud vendors data can be analyzed with the same tools. No need to learn vendor-specific solutions.
  • Simplicity – Snowflake will take care of almost everything from autoscaling of computing to the proper encoding of columns. Generally, no indexes to be defined – Snowflake will cluster data automatically. But in, particularly as DML, particularly a lot of  DML occurs on very large tables, the user can define clustering keys which are used by Snowflake to co-locate the table data.

You can read more about Snowflake Data Warehouse features and Snowflake ETL best practices on our other blogs.

Methods for MySQL to Snowflake replication

There are two popular methods to perform MySQL to Snowflake ETL.

Method 1: A ready to use Hevo, Cloud ETL Platform (7 Days Free Trial).

Method 2: Write a Custom Code to move data from MySQL to Snowflake.

Now, we can have a look at how to move data from your MySQL database to Snowflake using Custom Code. To perform MySQL to Snowflake replication, we have to put data files into the Snowflake staging area (internal or external stage). Next, copy data from the stage area to the table.mysql to snowflake data migration stepsThe high-level steps to be followed for MySQL to Snowflake migration as shown in the figure above are,

  1. Extract data from MySQL

  2. Data Types and Formatting

  3. Stage Data Files

  4. Copy staged files to Snowflake table.

1. Extract Data from MySQL

Mainly two methods are followed to extract from MySQL. One is using the command line tool – mysqldump and the other is running SQL query using MySQL client and saving the output to files.

Extracting data with mysqldump:

Mysqldump is a client utility available by default with standard Mysql installation. Its main usage is to create a logical backup of a database/table. It can be used to extract one table as shown below:

mysqldump -u <username> -h <host_name> -p database_name my_table > my_table_out.sql

Here, the output file table_name.sql will be in the form of insert statements like

INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);

To convert this format into a CSV file you have to write a small script or use some open source library available. You can refer MySQL official documentation for more information.
If the mysqldump is running on the same machine or different machine where the mysqld server runs, you have another simpler option to get CSV directly. Use below command to get CSV file:

mysqldump -u [username] -p -t -T/path/to/directory [database_name] --fields-terminated-by=,

Extract Data Using SQL Query

SQL commands can be executed using MySQL client utility and redirect output to a file.

mysql -B -u user database -h mysql_host   -e "select * from my_table;" > my_table_data_raw.txt

The output can be transformed using text editing utilities like sed or awk to clean and format data.

Example:

mysql -B -u user database -h mysql_host   -e "select * from my_table;" | sed "s/'/\'/;s/\t/\",\"/g;s/^/\"/;s/$/\"/;s/\n//g"
\ > my_table_final_data.csv

2. Data Types and Formatting

Other than business-specific transformations, following things to be noted while replicating data from MySQL to Snowflake.

  • Snowflake support a number of character sets including UTF-8, UTF-16 etc. To see the full list – click here.
  • Snowflake supports UNIQUE, PRIMARY KEY, FOREIGN KEY, NOT NULL constraints, unlike many other cloud analytical solutions.
  • Snowflake has a rich set of data types. Here is the list of Snowflake data types and corresponding MySQL.
MySQL Data Type Snowflake Data Type
TINYINT TINYINT
SMALLINT SMALLINT
MEDIUMINT INTEGER
INT INTEGER
BIGINT BIGINT
DECIMAL DECIMAL
FLOAT FLOAT, FLOAT4, FLOAT8
DOUBLE DOUBLE, DOUBLE PRECISION, REAL
BIT BOOLEAN
CHAR CHAR
VARCHAR VARCHAR
BINARY BINARY
VARBINARY VARBINARY
TINYTEXT STRING, TEXT
TEXT STRING, TEXT
MEDIUMTEXT STRING, TEXT
LONGTEXT STRING, TEXT
ENUM No type for ENUM. Must use any type which can represent values in ENUM.
SET No type for SE. Must use any type which can represent values in SET.
DATE DATE
TIME TIME
DATETIME DATETIME
TIMESTAMP TIMESTAMP
  • Snowflake allows most of the date/time format and it can be explicitly specified while loading data to table using File Format Option ( we will discuss this in detail later). For the complete list of supported format please click here.

3. Stage Data Files

To insert MySQL data into a Snowflake table first data files needs to be uploaded to a temporary location which is called staging. Snowflake support internal and external stages.

Internal Stage

Each user and table is automatically allocated an internal stage for staging data files. You can also create named internal stages.

  • User stage is referenced using ‘@~’.
  • The name of a table stage will be the same as the table name.
  • User/Table stages can’t be altered or dropped.
  • User/Table stages do not support setting file format options.

Internal Named Stages are explicitly created by the user using respective SQL statements. It provides a greater degree of flexibility while loading data. You can assign file format and other options to named stages which makes data load easier.

While working with Snowflake you will need to run a lot of DML and DDL statements in SQL and some specific commands like for data load as shown below. SnowSQL is a very handy CLI client which can be used to run those commands and is available in Linux/Mac/Windows.

Example:

create or replace stage my_mysql_stage
copy_options = (on_error='skip_file')
file_format = (type = 'CSV' field_delimiter = '|' skip_header = 1);

PUT command is used to stage data files to an internal stage. The syntax of the command is as given below :

PUT file://path/to/file/filename internal_stage_name

Example:

Upload a file named mysql_data.csv in the /tmp/mysql/data directory to an internal stage named mysql_stage.

PUT file:////tmp/mysql/data/mysql_data.csv @mysql_stage;

There are many useful options like set parallelism while uploading the file, automatic compression of data files etc. For more information click here.

External Stage

Currently, Snowflake supports Amazon S3 and Microsoft Azure as an external staging location. You can create an external stage with those locations and load data to a Snowflake Table. To create an external stage on S3, you have to provide IAM credentials and encryption keys if data is encrypted as shown in the example below:

create or replace stage mysql_ext_stage url='s3://snoflake/load/files/'
credentials= (aws_key_id='111a222b3c' aws_secret_key='abcd4x5y6z');
encryption= (master_key = 'eSxX0jzYfIamtnBKOEOwq80Au6NDwOaO8=');

Data to the external stage can be uploaded using respective cloud vendor interfaces. For S3 you can upload using web console or any SDK or third-party tools.

4. Copy Staged Files to Snowflake Table.

COPY INTO command is to load the contents of the staged file(s) into a Snowflake table. This command needs to compute resources in the form of virtual warehouses to run.

Example:

To load from a named internal stage:

COPY INTO mysql_table
FROM @mysql_stage;

Loading from the external stage. Only one file is specified.

COPY INTO mycsvtable
FROM @mysql_ext_stage/tutorials/dataloading/contacts1.csv;

You can even copy directly from an external location:

COPY INTO mysql_table
FROM s3://mybucket/data/files
credentials= (aws_key_id='$AWS_ACCESS_KEY_ID' aws_secret_key='$AWS_SECRET_ACCESS_KEY')
encryption= (master_key = 'eSxX0jzYfIamtnBKOEOwq80Au6NbSgPH5r4BDDwOaO8=')
file_format = (format_name = my_csv_format);

Files can be specified using patterns:

COPY INTO mytable
FROM @mysql_stage
file_format = (type = 'CSV')
pattern='.*/.*/.*[.]csv[.]gz';

Some common format options for CSV format supported in the COPY command are the following:

  • COMPRESSION – Compression algorithm for the data files to be loaded.
  • RECORD_DELIMITER – Character that separates records(lines) in an input CSV file
  • FIELD_DELIMITER – Fields separating character in the input file.
  • SKIP_HEADER – Number of header lines to be skipped.
  • DATE_FORMAT – String to specify the date format.
  • TIME_FORMAT – String to specify the time format.

For the full list of options available please visit here.

Incremental Data Load

After initial full data is loaded to the target table, most of the time changed data is extracted from the source and migrated to the target table at a regular interval. Sometimes for small tables, full data dump can be used even for recurring data migration but for the larger table we have to go with delta approach.

Increment Extract from MySQL

To get only modified records after a particular time, run SQL with proper predicates against the table and write output to file. mysqldump not useful here as it always extracts full data.

Example: Extracting records based on last_updated_timestamp column and formatting data using sed command.

mysql -B -u user database -h mysql_host  -e "select * from my_table where last_updated_timestamp < now() and 
last_updated_timestamp >'#max_updated_ts_in_last_run#'"| 
sed "s/'/\'/;s/\t/\",\"/g;s/^/\"/;s/$/\"/;s/\n//g" \ > my_table_data.csv

Any records deleted physically will be missing here and will not be reflected in the target.

Update Snowflake Table

Snowflake supports row-level updates which makes delta data migration much easier. Basic idea is to load incrementally extracted data into an intermediate table and modify records in final table as per data in the intermediate table.

We can choose three methods to modify the final table once data is loaded into the intermediate table.

  • Update the existing rows in the final table and insert new rows from the intermediate table which are not in the final table.
UPDATE  final_table t
SET t.value = s.value
FROM  intermed_table in
WHERE t.id = in.id;
INSERT INTO final_table (id, value)
SELECT id, value
FROM  intermed_table
WHERE NOT id IN (SELECT id FROM final_table);
  • Delete all rows from the final table which are present in the intermediate table. Then insert all rows from the intermediate table to the final table.
DELETE .final_table f
WHERE f.id IN (SELECT id from intermed_table);
INSERT final_table (id, value)
SELECT id, value
FROM  intermed_table;
  • MERGE statement – Insert and update can be done with a single MERGE statement which can be used to apply changes in the intermediate table to the final table.
MERGE into final_table t1 using intermed_table t2 on t1.id = t2.id
WHEN matched then update set value = t2.vaue
WHEN not matched then INSERT (id, value) values (t2.id, t2.value);
Easier Way to move data from MySQL to Snowflake

Using Hevo, official Snowflake ETL partner you can easily load data from MySQL to Snowflake with just 3 simple steps. This does not need you to write any code and will provide you with an error-free, fully managed set up to move data in minutes.

  • Connect to your MySQL database.
  • Select the replication mode: (a) load selected MySQL tables (b) load data via Custom Query (c) load data through Binlog.
  • Configure the Snowflake destination where it should be copied.

Conclusion

Nowadays, high performing Data Warehouse solution like Snowflake is an integral part of an analytics pipeline. Moving data to this kind of cloud-native solution requires familiarity with the cloud best practices, especially with data security. There are third-party tools which take care of all aspects of data movement including security, connecting with other data sources, etc., which might be a better choice like Hevo, a cloud data integration platform.

ETL Data to Redshift, Bigquery, Snowflake

Move Data from any Source to Warehouse in Real-time

Sign up today to get $500 Free Credits to try Hevo!
Start Free Trial

Related Posts