Press "Enter" to skip to content

MariaDB to Snowflake – Steps to Move Data in Minutes

Are you looking to move data from MariaDB to Snowflake for analytics or archival purposes? You have landed on the right post. This post covers two main approaches to move data from MariaDB to Snowflake. Before we dive in, let us briefly understand these individual technologies a little.MariaDB to Snowflake

Understanding MariaDB and Snowflake

MariaDB is an open-source RDBMS that was developed from MySQL source code in 2009. Due to its similarities with MySQL, MariaDB is viewed as an open-source version of MySQL and has consequently been able to maintain cross-compatibility due to similarities in syntax, data structures, and APIs. 

Snowflake is a fast-growing cloud-based Data Warehouse solution that operates using a Software-as-a-Service (SaaS) model. Snowflake hosts its data on Amazon Web Services (S3), Microsoft Azure and Google Cloud Platform. 

Snowflake’s cloud-based model offers some clear advantages over both cloud and on-premise Data Warehouses. Some of these are:  

  • Low overhead – Snowflake is based on a SaaS model. This means it does not require software installations, hardware or much infrastructure management from dedicated internal staff. 
  • Advanced Scalability – Snowflake has separated its storage and compute capabilities into distinct layers that can be scaled up or down, independent of each other. This means that it is more resource-efficient than regular data warehouses that scale storage and CPU cycles concurrently.
  • Speed and Efficiency – Snowflake uses an MPP Columnar datastore engine. Additionally, scaling does not impact already running queries or affect processing speeds. 
  • Semi-structured Data Support – Snowflake offers support for semi-structured data in addition to structured data. This eliminates the need to have different pipelines for processing these data types, thus providing added efficiency over competitors.

How to move data from MariaDB to Snowflake

There are two popular methods to perform MariaDB to Snowflake data migration.

Method 1: Build Custom ETL Scripts to move data from MariaDB to Snowflake

Method 2: Implement an Official Snowflake ETL Partner such as Hevo Data

This post covers the first approach in great detail. The blog also highlights the limitations of this approach and discusses the means to overcome them.

MariaDB to Snowflake: Steps to Load Data Using Custom Code

At a high level, the data replication process can generally be thought of in the following steps:

  1. Extracting the data from MariaDB
  2. Data Type Mapping and Preparation
  3. Data Staging
  4. Loading Data into Snowflake

Step 1: Extracting Data from MariaDB

Data should be extracted based on the use case and the size of the data being exported. 

  • If the data is relatively small, then it can be extracted using SQL SELECT statements into MariaDB’s MySQL command-line client.
    Example:

    mysql -u <name> -p <db> 
    
    SELECT <columns> INTO OUTFILE 'path'
    
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    
    LINES TERMINATED BY '\n'
    
    FROM <table>;

    With the FIELDS TERMINATED BY, OPTIONALLY ENCLOSED BY and LINES TERMINATED BY clauses being optional

  • If a user is looking to export large amounts of data, then MariaDB provides another command-line tool mysqldump which is better suited to export tables, a database or databases into other database servers. Mysqldump creates a backup by dumping database or table information into a text file, which is typically in SQL. However, it can also generate files in other formats like CSV or XML. A use case extracting a full backup of a database is shown below:
    mysqldump -h [database host's name or IP address] -u [the database user's name] -p [the database name] > db_backup.sql

    The resulting file will consist of SQL statements that will create the database specified above.

    Example (snippet):

    CREATE TABLE table1 (
    
    ‘Column1’ bigint(10).......
    
    )

Step 2: Data Type Mapping and Preparation

Once the data is exported, one has to ensure that the data types in the MariaDB export properly correlate with their corresponding data types in Snowflake. 

Snowflake presents documentation on data preparation before the staging process here.

In general, it should be noted that the BIT data type in MariaDB corresponds to the BOOLEAN in Snowflake. Also, Large Object types (both BLOB and CLOB) and ENUM are not supported in Snowflake. The complete documentation on the data types that are not supported by Snowflake can be found here

Step 3: Data Staging

The data is ready to be imported into the staging area after we have ensured that the data types are accurately mapped. 

There are two types of stages that a user can create in Snowflake. These are:

  • Internal Stages 
  • External Stages

Each of these stages can be created using the Snowflake GUI or with SQL code. For the scope of this blog, we have included the steps to do this using SQL code.

  1. Loading Data to Internal Stage:
    CREATE [ OR REPLACE ] [ TEMPORARY ] STAGE [ IF NOT EXISTS ] <internal_stage_name>
    
      [ FILE_FORMAT = ( { FORMAT_NAME = '<file_format_name>' | TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] ) } ]
    
      [ COPY_OPTIONS = ( copyOptions ) ]
    
      [ COMMENT = '<string_literal>' ]
  2. Loading Data to External Stage:Here is the code to load data to Amazon S3:
    CREATE STAGE “[Database Name]”, “[Schema]”,”[Stage Name]”
    
    URL=’S3://<URL>
    
    CREDENTIALS= (AWS_KEY_ID=<your AWS key ID>, AWS_SECRET_KEY= <your AWS secret key>)
    
    ENCRYPTION= (MASTER_KEY=<Master key if required>)
    
    COMMENT= ‘[insert comment]’

    In case you are using Microsoft Azure as your external stage, here is how you can load data:

    CREATE STAGE “[Database Name]”, “[Schema]”,”[Stage Name]”
    
    URL=’azure://<URL>
    
    CREDENTIALS= (AZURE_SAS_TOKEN=’< your token>‘)
    
    ENCRYPTION= (TYPE = “AZURE_CSE, MASTER_KEY=<Master key if required>)
    
    COMMENT= ‘[insert comment]’

There are other internal stage types namely the table stage and the user stage. However, these stages are automatically generated by Snowflake. The table stage is held within a table object and is best used for use cases that require the staged data to be only used exclusively for a specific table. The user table is assigned to each user by the system and cannot be altered or dropped. They are used as personal storage locations for users.

Step 4: Loading Data to Snowflake

In order to load the staged data to Snowflake, we use the COPY INTO DML statement through Snowflake’s SQL command-line interface – SnowSQL. Note that using the FROM clause in the COPY INTO statement is optional as Snowflake will check for files in the stage automatically.

Loading Data from Internal Stages:

  • User Stage Type:
    COPY INTO TABLE1 FROM @~/staged file_format=(format_name=’csv_format’)
    
  • Table Stage Type:
    COPY INTO TABLE1 FILE_FORMAT=(TYPE CSV FIELD DELIMITER=’|’ SKIP_HEADER=1)
    
  • Internal Stage Created as per the previous step:
    COPY INTO TABLE1 FROM @Stage_name
    

Loading Data from External Stages:

  • Amazon S3:
    While you can load data directly from an Amazon S3 bucket, the recommended method is to first create an Amazon S3 external stage as described under the Data Stage section of this guide. The same applies to Microsoft Azure and GCP buckets too.

    COPY INTO TABLE1 FROM s3://bucket
    
    CREDENTIALS= (AWS_KEY_ID='YOUR AWS ACCESS KEY' AWS_SECRET_KEY='YOUR AWS SECRET ACCESS KEY')
    
    ENCRYPTION= (MASTER_KEY = 'YOUR MASTER KEY')
    
    FILE_FORMAT = (FORMAT_NAME = CSV_FORMAT)
  • Microsoft Azure:

    COPY INTO TABLE1 FROM azure://your account.blob.core.windows.net/container
    
    STORAGE_INTEGRATION=(Integration_name)
    
    ENCRYPTION= (MASTER_KEY = 'YOUR MASTER KEY')
    
    FILE_FORMAT = (FORMAT_NAME = CSV_FORMAT)
  • GCS:

    COPY INTO TABLE1 FROM 'gcs://bucket’
    
    STORAGE_INTEGRATION=(Integration_name)
    
    ENCRYPTION= (MASTER_KEY = 'YOUR MASTER KEY')
    
    FILE_FORMAT = (FORMAT_NAME = CSV_FORMAT)

Snowflake offers and supports many format options for data types like Parquet, XML, JSON, and CSV. Additional information can be found here.

This completes the steps to load data from MariaDB to Snowflake. While the method may look fairly straightforward, is not without its limitations. 

MariaDB to Snowflake: Limitations of Using Custom Code

  • The method depicts how to replicate a batch of data from MariaDB. This is a fairly manual process that is tedious, especially as it could get repetitive if the source data changes frequently and there is a need to continually replicate these changes in the Snowflake Data Warehouse. In other words, the above approach is not advised if you are looking to load data to Snowflake in real-time.
  • In case you need to apply any data transformation on this data, then you would need to write additional code to achieve that. This can make the entire process even more cumbersome if you are looking at a large database with a wide array of transformation to be written.
  • Given your MariaDB stores data that is important for your business, you would need to set up additional monitoring systems to ensure that nothing breaks. You would also have to deploy resources to frequently monitor the data to check for consistency and data loss.
  • Lastly, given the complexities involved in building this setup, it would slow down the overall project implementation timelines and would not be best suited for a  fast-paced organization.

Enter – Hevo Data – Simple to use Data Integration Platform that can mask the above limitations and move data from MariaDB to Snowflake instantly. 

A Simpler Alternative to Load Data from MariaDB to Snowflake:

Using a no-code data integration solution like Hevo (Official Snowflake ETL Partner), you can move data from MariaDB to Snowflake in real-time. Since Hevo is fully managed, the setup and implementation time is next to nothing. You can replicate MariaDB to Snowflake using Hevo’s visual interface in 3 simple steps:

  1. Connect to your MariaDB Database
  2. Select the replication mode: (i) Full Table dump (ii) Load data using custom query (iii) Replicate Data using Binlog
  3. Configure the Snowflake database and watch your data load in real-time

Hevo will now move your MariaDB data to Snowflake in a consistent, secure and reliable fashion. In addition to MariaDB, Hevo can load data from a multitude of other data sources including Databases, Cloud Applications, SDKs, and more. This allows you to scale up on demand and start moving data from all the applications important for your business. 

The complete suite of Hevo’s features can be found here. Sign up for a 14-day free trial to experience Hevo’s simplicity and robustness first-hand. 

ETL Data to Redshift, Bigquery, Snowflake

Move Data from any Source to Warehouse in Real-time

Sign up today to get $500 Free Credits to try Hevo!
Start Free Trial