Snowflake is a relatively younger modern Data Warehousing solution on the cloud. From the first day, Snowflake is well accepted by enterprises and medium/small businesses as it resolves many challenges with traditional Data Warehouse solutions. If you are thinking of a new cloud-native Data Warehousing solution then you should consider evaluating Snowflake.
This article will give you a brief overview of MySQL and Snowflake. You will also get to know how you can easily set up MySQL to Snowflake Integration using two methods. In the end, you will be in the position to choose the best of both methods based on your business requirements. Read along to decide which method of connecting MySQL to Snowflake is best for you.
Prerequisites
You will have a much easier time understanding the ways for setting up the MySQL to Snowflake Integration if you have gone through the following aspects:
- An active MySQL account.
- An active Snowflake account.
- Working knowledge of Databases and Data Warehouses.
- Working knowledge of Structured Query Language (SQL).
- Clear idea regarding the type of data to be transferred.
Introduction to MySQL
MySQL Logo
Michael Widenius and David Axmark created MySQL in 1994 as an open-source, dependable, and adaptable Relational Database Management System for PHP. It is a user-friendly RDBMS that is widely used by both small and large businesses. One of its distinguishing advantages is its versatility, the open-source GPL license allows developers to modify MySQL programming to fit their own needs.
To know more about MySQL, visit this link.
Introduction to Snowflake
Snowflake Logo
Snowflake is one of the most popular Cloud-based Data Warehouses, bringing simplicity to its users without sacrificing any of the capabilities. It can scale resources up and down dynamically based on data requirements, ensuring that users get the optimal balance of performance and cost. A vast number of businesses are currently using Snowflake.
To know more about Snowflake, visit this link.
Benefits of Replicating Data from MySQL to Snowflake
Some of the benefits of replicating data from MySQL to Snowflake include:
- Breathtaking Performance: Compute and storage is separated. You can scale up and down on the fly without any disruption.
- Support of Unlimited Concurrency: You can scale up computation dynamically. No need to worry about occasional high usage.
- Cloud Vendor Agnostic: Snowflake architecture supports multiple cloud vendors and they keep on adding new vendors to the list. So as a user, you will have more choices and even if you are working with different cloud vendors data can be analyzed with the same tools. No need to learn vendor-specific solutions.
- Simplicity: Snowflake will take care of almost everything from the autoscaling of computing to the proper encoding of columns. Generally, no indexes are to be defined – Snowflake will cluster data automatically. But, particularly as DML, particularly a lot of DML occurs on very large tables, the user can define clustering keys which are used by Snowflake to co-locate the table data.
Method 1: Manual ETL Process to Set Up MySQL to Snowflake Integration
This method involves the use of mysqldump or SQL Query to extract data from MySQL. Then, the data is moved to the Snowflake staging area and finally, the data is replicated to the Snowflake Data Warehouse. This method requires the knowledge of SQL to successfully set up the connection. Moreover, this is a lengthy method and requires high engineering bandwidth.
Method 2: Set Up MySQL to Snowflake Integration using Hevo’s no-code data pipeline
Hevo Data, an automated Data Pipeline platform, provides a hassle-free solution to load data from MySQL to Snowflake within minutes, in an effortless manner. Hevo is fully managed and completely automates the process of not only loading data from MySQL but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code.
Hevo’s pre-built integration with MySQL along with 150+ data sources (including 40+ free data sources) such as databases, SaaS applications, analytics engines, etc. gives users the flexibility to bring in data of all different kinds in a way that’s as smooth as possible. Hevo will take full charge of the data transfer process, allowing you to set up MySQL to Snowflake migration seamlessly and focus on key business activities.
Methods to Set Up MySQL to Snowflake Integration
There are many ways of loading data from MySQL to Snowflake. In this article, you will be going to look into two popular ways. In the end, you will have a good understanding of each of these two methods. This will help you to make the right decision based on your use case:
Method 1: Manual ETL Process to Set Up MySQL to Snowflake Integration
Now, we can have a look at how to move data from MySQL to Snowflake using Custom Code. To perform MySQL to Snowflake replication, you have to put data files into the Snowflake staging area (internal or external stage). Next, copy data from the staging area to the table.
The high-level steps to be followed for MySQL to Snowflake Integration are shown in the figure above. Below are the steps to set up MySQL to Snowflake Integration:
Broadly, two methods are followed to extract data from MySQL. One is using the command line tool – mysqldump and the other is running SQL query using MySQL client and saving the output to files.
Extracting data with mysqldump:
Mysqldump is a client utility available by default with standard Mysql installation. Its main usage is to create a logical backup of a database/table. It can be used to extract one table as shown below:
mysqldump -u <username> -h <host_name> -p database_name my_table > my_table_out.sql
Here, the output file table_name.sql will be in the form of insert statements like
INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
To convert this format into a CSV file you have to write a small script or use some open source library available. You can refer to MySQL official documentation for more information.
If the mysqldump is running on the same machine or a different machine where the mysqld server runs, you have another simpler option to get CSV directly. Use the below command to get a CSV file:
mysqldump -u [username] -p -t -T/path/to/directory [database_name] --fields-terminated-by=,
Extract Data Using SQL Query
SQL commands can be executed using MySQL client utility and redirect output to a file.
mysql -B -u user database -h mysql_host -e "select * from my_table;" > my_table_data_raw.txt
The output can be transformed using text editing utilities like sed or awk to clean and format data.
Example:
mysql -B -u user database -h mysql_host -e "select * from my_table;" | sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g"
> my_table_final_data.csv
Other than business-specific transformations, the following things are to be noted while replicating data from MySQL to Snowflake.
- Snowflake supports several character sets including UTF-8, UTF-16, etc. To see the full list – click here.
- Snowflake supports UNIQUE, PRIMARY KEY, FOREIGN KEY, NOT NULL constraints, unlike many other cloud analytical solutions.
- Snowflake has a rich set of data types. Here is the list of Snowflake data types and corresponding MySQL.
MySQL Data Type | Snowflake Data Type |
TINYINT | TINYINT |
SMALLINT | SMALLINT |
MEDIUMINT | INTEGER |
INT | INTEGER |
BIGINT | BIGINT |
DECIMAL | DECIMAL |
FLOAT | FLOAT, FLOAT4, FLOAT8 |
DOUBLE | DOUBLE, DOUBLE PRECISION, REAL |
BIT | BOOLEAN |
CHAR | CHAR |
VARCHAR | VARCHAR |
BINARY | BINARY |
VARBINARY | VARBINARY |
TINYTEXT | STRING, TEXT |
TEXT | STRING, TEXT |
MEDIUMTEXT | STRING, TEXT |
LONGTEXT | STRING, TEXT |
ENUM | No type for ENUM. Must use any type which can represent values in ENUM. |
SET | No type for SE. Must use any type which can represent values in SET. |
DATE | DATE |
TIME | TIME |
DATETIME | DATETIME |
TIMESTAMP | TIMESTAMP |
- Snowflake allows most of the date/time format and it can be explicitly specified while loading data to the table using the File Format Option ( we will discuss this in detail later). For the complete list of supported formats please click here.
Step 3: Stage Data Files
To insert MySQL data into a Snowflake table first data files need to be uploaded to a temporary location which is called staging. Snowflakes support internal and external stages.
Internal Stage
Each user and table is automatically allocated an internal stage for staging data files. You can also create named internal stages.
- The user stage is referenced using ‘@~’.
- The name of a table stage will be the same as the table name.
- User/Table stages can’t be altered or dropped.
- User/Table stages do not support setting file format options.
Internal Named Stages are explicitly created by the user using respective SQL statements. It provides a greater degree of flexibility while loading data. You can assign file format and other options to named stages which makes data load easier.
While working with Snowflake you will need to run a lot of DML and DDL statements in SQL and some specific commands like for data load as shown below. SnowSQL is a very handy CLI client which can be used to run those commands and is available in Linux/Mac/Windows.
Example:
create or replace stage my_mysql_stage
copy_options = (on_error='skip_file')
file_format = (type = 'CSV' field_delimiter = '|' skip_header = 1);
PUT command is used to stage data files to an internal stage. The syntax of the command is as given below :
PUT file://path/to/file/filename internal_stage_name
Example:
Upload a file named mysql_data.csv in the /tmp/mysql/data directory to an internal stage named mysql_stage.
PUT file:////tmp/mysql/data/mysql_data.csv @mysql_stage;
There are many useful options like setting parallelism while uploading the file, automatic compression of data files, etc. For more information click here.
External Stage
Currently, Snowflake supports Amazon S3 and Microsoft Azure as external staging locations. You can create an external stage with those locations and load data to a Snowflake Table. To create an external stage on S3, you have to provide IAM credentials and encryption keys if data is encrypted as shown in the example below:
create or replace stage mysql_ext_stage url='s3://snoflake/load/files/'
credentials= (aws_key_id='111a222b3c' aws_secret_key='abcd4x5y6z');
encryption= (master_key = 'eSxX0jzYfIamtnBKOEOwq80Au6NDwOaO8=');
Data to the external stage can be uploaded using respective cloud vendor interfaces. For S3 you can upload using a web console or any SDK or third-party tools.
Step 4: Copy Staged Files to Snowflake Table
COPY INTO command is to load the contents of the staged file(s) into a Snowflake table. This command needs to compute resources in the form of virtual warehouses to run.
Example:
To load from a named internal stage:
COPY INTO mysql_table
FROM @mysql_stage;
Loading from the external stage. Only one file is specified.
COPY INTO mycsvtable
FROM @mysql_ext_stage/tutorials/dataloading/contacts1.csv;
You can even copy directly from an external location:
COPY INTO mysql_table
FROM s3://mybucket/data/files
credentials= (aws_key_id='$AWS_ACCESS_KEY_ID' aws_secret_key='$AWS_SECRET_ACCESS_KEY')
encryption= (master_key = 'eSxX0jzYfIamtnBKOEOwq80Au6NbSgPH5r4BDDwOaO8=')
file_format = (format_name = my_csv_format);
Files can be specified using patterns:
COPY INTO mytable
FROM @mysql_stage
file_format = (type = 'CSV')
pattern='.*/.*/.*[.]csv[.]gz';
Some common format options for CSV format supported in the COPY command are the following:
- COMPRESSION – Compression algorithm for the data files to be loaded.
- RECORD_DELIMITER – Character that separates records(lines) in an input CSV file
- FIELD_DELIMITER – Fields separating characters in the input file.
- SKIP_HEADER – Number of header lines to be skipped.
- DATE_FORMAT – String to specify the date format.
- TIME_FORMAT – String to specify the time format.
For the full list of options available please visit here.
Incremental Data Load
After initial full data is loaded to the target table, most of the time changed data is extracted from the source and migrated to the target table at a regular interval. Sometimes for small tables, a full data dump can be used even for recurring data migration but for the larger table, we have to go with the delta approach.
Increment Extract from MySQL
To get only modified records after a particular time, run SQL with proper predicates against the table and write output to the file. mysqldump is not useful here as it always extracts full data.
Example: Extracting records based on last_updated_timestamp column and formatting data using sed command.
mysql -B -u user database -h mysql_host -e "select * from my_table where last_updated_timestamp < now() and
last_updated_timestamp >'#max_updated_ts_in_last_run#'"|
sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g" > my_table_data.csv
Any records deleted physically will be missing here and will not be reflected in the target.
Update Snowflake Table
Snowflake supports row-level updates which makes delta data migration much easier. The basic idea is to load incrementally extracted data into an intermediate table and modify records in the final table as per data in the intermediate table.
We can choose three methods to modify the final table once data is loaded into the intermediate table.
- Update the existing rows in the final table and insert new rows from the intermediate table which are not in the final table.
UPDATE final_table t
SET t.value = s.value
FROM intermed_table in
WHERE t.id = in.id;
INSERT INTO final_table (id, value)
SELECT id, value
FROM intermed_table
WHERE NOT id IN (SELECT id FROM final_table);
- Delete all rows from the final table which are present in the intermediate table. Then insert all rows from the intermediate table to the final table.
DELETE .final_table f
WHERE f.id IN (SELECT id from intermed_table);
INSERT final_table (id, value)
SELECT id, value
FROM intermed_table;
- MERGE statement – Insert and update can be done with a single MERGE statement which can be used to apply changes in the intermediate table to the final table.
MERGE into final_table t1 using intermed_table t2 on t1.id = t2.id
WHEN matched then update set value = t2.vaue
WHEN not matched then INSERT (id, value) values (t2.id, t2.value);
Now, you have successfully set up MySQL to Snowflake Integration using the manual ETL method.
Method 2: Using Hevo Data to Set Up MySQL to Snowflake Integration
Symbolic Diagram of How Hevo Helps for Data Replication
Hevo Data, a No-code Data Pipeline, helps you directly transfer data from MySQL to Snowflake in a completely hassle-free & automated manner. Hevo supports data ingestion for replication from MySQL servers via binary logs (BinLog). You can also connect to Hevo using an SSH tunnel, instead of directly connecting your MySQL database host to Hevo. This provides an additional level of security to your database.
Hevo is fully managed and completely automates the process of not only loading data from your MySQL database but also enriching the data and transforming it into an analysis-ready form in your Snowflake warehouse without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.
The steps to load data from MySQL to Snowflake using Hevo Data are as follows:
- Connect your MySQL account to Hevo’s platform. Hevo has an in-built MySQL Integration that connects to your account within minutes.
Configure MySQL as Source
- Select the Snowflake as your destination and start moving your data.
Configure Snowflake as Destination
With this, you have successfully set up MySQL to Snowflake Integration using Hevo Data.
Here are more reasons to try Hevo:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Auto Schema Mapping: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo is Built to Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support call
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Hevo helps us centralize our data to Snowflake and effectively use it for reporting. Deliverr’s product and engineering teams manage their own set of microservices, each team creates Hevo data pipelines on its own, without needing help from data engineers or DevOps.
Emmet Murphy, Staff Software Engineer, Deliverr
With continuous Real-Time data movement, Hevo allows you to combine MySQL data along with your other data sources and seamlessly load it to Snowflake with a no-code, easy-to-setup interface. Try our 14-day full-feature access free trial!
Get Started with Hevo for Free
Conclusion
This article gave you a comprehensive guide to MySQL and Snowflake and how you can set up MySQL to Snowflake Integration using 2 methods – a manual process using mysqldump/SQL query or using Hevo’s no code data pipeline.
Hevo seamlessly integrates with MySQL and Snowflake ensuring that you see no delay in terms of setup and implementation. It helps you directly transfer data from MySQL or any other source of your choice to Data Warehouse, Business Intelligence tools, or any other desired destination in a fully automated and secure manner without having to write any code and will provide you with a hassle-free experience.
Learn more about Hevo
Want to take Hevo for a spin? Signup for a 14-day free trial and experience the feature-rich Hevo suite first hand.
Share your experience of loading data from MySQL to Snowflake in the comment section below.