The need to store, transform, analyze and share data is growing exponentially, with demand for cloud-based data analytics and data warehouse solutions also on the rise. Using the cloud for data processing, analytics, and reporting has now become quite popular mainly due to its convenience and superior performance. In this blog post, we will go over a migration scenario where a fictional business is attempting to migrate their data from an on-prem TokuDB to Snowflake, a cloud-based data warehouse. To this aim, let’s first compare both solutions.

Introduction to TokuDB

TokuDB Logo

TokuDB is a highly scalable MySQL and MariaDB storage engine. It offers high data compression, fast insertions, and deletions, among many other features. This makes it a great solution for use in high-performance and write-intensive environments. It uses a fractal tree data structure and huge data pages to efficiently manage and read the data. However, concurrency, scale, resiliency, and security are some of the bottlenecks that limit TokuDB’s performance. It is available in an open-source version and an enterprise edition. 

Introduction to Snowflake

Snowflake Logo

Snowflake is a cloud data warehouse that came out in 2015. It is primarily available on AWS and Azure. Snowflake is similar to BigQuery in that it stores data separately from where it does its compute. It stores the actual data of your tables in S3 and then it can provision any number of compute nodes to process that data. 

In contrast, Snowflake offers instant access to unlimited resources (compute and storage) on-demand.

Snowflake Benefits:

  1. Snowflake is specifically optimized for analytics workloads. It’s therefore ideal for businesses dealing with very complex data sets.
  2. Snowflake offers better performance both in terms of storage capacity and query performance.
  3. Snowflake also offers better security compared to an on-prem data warehouse. This is because cloud data warehouses are required to meet stringent security requirements.
  4. Migrating your data to the cloud is also cost-effective since there is no huge initial outlay and you don’t have to maintain physical infrastructure.

Methods to Connect TokuDB to Snowflake

Moving Data from TokuDB to Snowflake

Method 1: Using Custom ETL Scripts to Connect TokuDB to Snowflake

This approach would need you to invest in heavy engineering resources. The broad steps in this approach would need you to understand the S3 data source, write code to extract data from S3, prepare the data, and finally copy it into Snowflake. The details and challenges of each step are described in the next sections.

Method 2: Using Hevo to Connect TokuDB to Snowflake

Hevo is a cloud data pipeline platform that seamlessly moves data from TokuDB to Snowflake in real-time without having to write any code. By deploying Hevo, the data transfer can be completely automated and would not need any human intervention. This will allow you to direct your team’s bandwidth in extracting meaningful insights instead of wrangling with code.

Get Started with Hevo for Free

Method 1: Using Custom ETL Scripts to Connect TokuDB to Snowflake

Here are the steps involved in using Custom ETL Scripts to connect TokuDB and Snowflake:

Step 1: Export TokuDB Tables to CSV Format

There are multiple ways to backup a TokuDB database and we will be using a simple SQL command to perform a logical backup.

SELECT * FROM `database_name`.`table_name` INTO OUTFILE 'path_to_folder/filename.csv' 
FIELDS ENCLOSED BY '"' TERMINATED BY ';' ESCAPED BY '"' LINES TERMINATED BY 'rn';"

This command dumps the data into CSV format which can then easily be imported into Snowflake.

Repeat this command for all tables and ensure that your TokuDB server has enough storage space to hold the CSV files.

Integrate CleverTap to Snowflake
Integrate Mailchimp to Snowflake
Integrate Pendo to Snowflake

Step 2: Upload Source Data Files to Amazon S3 

After generating the CSV/TXT file, we need to upload this data to a place where Snowflake can access it. Install the AWS CLI on your system 

How to install the AWS CLI 

After that execute the following command.

aws s3 cp filename.csv  s3://{YOUR_BUCKET_NAME}

Step 3: Create an Amazon S3 Stage

Using the SnowSQL CLI client, run this command:

create or replace stage my_csv_stage
  file_format = mycsvformat
  url = 's3://{YOUR_BUCKET_NAME}';

The example above creates an external stage named my_csv_stage.

Step 4: Create a Table in Snowflake

Create a table with your schema. You will load data into this table in the next step.

create or replace table {YOUR_TABLE_NAME}
  ('$TABLE_SCHEMA')

Step 5: Loading Data to Snowflake

Loading data requires a Snowflake compute cluster. Run the following command in the SnowSQL CLI client:

copy into {YOUR_TABLE_NAME}
  from s3://{YOUR_BUCKET_NAME} credentials=(aws_key_id='$AWS_ACCESS_KEY_ID' aws_secret_key='$AWS_SECRET_ACCESS_KEY')
  file_format = (type = csv field_delimiter = '|' skip_header = 1);

This command will load data from all CSV files in the S3 bucket.

Step 6: Validating the Connection from TokuDB to Snowflake

select * from {YOUR_TABLE_NAME} limit 10;

The above approach is effort-intensive. You would need to hand-code many steps that run coherently to achieve the objective.

Limitations of Using Custom ETL Scripts to Connect TokuDB to Snowflake

  • This method is ideal for a one-time bulk load. In case you are looking to stream data in real-time, you might have to configure cron jobs and write additional code to achieve this. 
  • More often than not, the use case to move data from TokuDB to Snowflake is not this straightforward. You might need to clean, transform and enrich the data to make it analysis-ready. This would not be easy to achieve. 
  • Since the data moved from TokuDB is critical to your business, you will need to constantly monitor the infrastructure to ensure that nothing breaks. Failure at any step would lead you to irretrievable data loss. 

Method 2: Using Hevo to Connect TokuDB to Snowflake

Using Hevo (Official Snowflake ETL partner) — a managed system that simplifies data migration. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Hevo takes care of all your data preprocessing to set up TokuDB Snowflake Integration and lets you focus on key business activities and draw a much powerful insight on how to generate more leads, retain customers, and take your business to new heights of profitability. It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. 

Moving data from TokuDB to Snowflake requires just 2 steps:

  • Step 1: Connect to your TokuDB database by providing connection settings.
  • Step 2: Select the mode of replication you want: (a) Load the result set of a Custom Query (b) Full dump of tables (c) Load data via log
  • Step 3: Configure the Snowflake destination by providing the details like Destination Name, Account Name, Account Region, Database User, Database Password, Database Schema, and Database Name.
Snowflake Dest. Config

See how connecting TokuDB to Redshift can enhance your data infrastructure and analytics capabilities.

Conclusion

This blog talks about the two methods you can use to set up TokuDB Snowflake Integration in a seamless fashion: using custom ETL code and a third-party tool, Hevo.

Extracting complex data from a diverse set of data sources can be a challenging task and this is where Hevo saves the day! 

In addition to TokuDB, Hevo can also bring data from a wide array of data sources into Snowflake. Database (MySQL, PostgreSQL, MongoDB and more), Cloud Applications (Google Analytics, Google Ads, Facebook Ads, Salesforce, and more). This allows Hevo to scale on-demand as your data needs grow.

Sign Up for a full-feature free trial (14 days) to see the simplicity of Hevo first-hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

FAQ

How to migrate data from Oracle to Snowflake?

Migrate data from Oracle to Snowflake using the Snowflake Data Transfer Service or third-party ETL tools like Hevo. Export Oracle data to CSV or use the Oracle Connector for Snowflake to directly load data into Snowflake tables.

How to get data from BigQuery to Snowflake?

To transfer data from BigQuery to Snowflake, you can use a tool like Hevo or manual methods such as exporting data from BigQuery to Google Cloud Storage as CSV files, then using Snowflake’s COPY INTO command to load the data.

How to transfer data from MySQL to Snowflake?

Transfer data from MySQL to Snowflake using Snowflake’s MySQL connector or third-party ETL tools like Hevo. You can also export MySQL data to CSV and use the COPY INTO command in Snowflake to load the data.

Talha
Software Developer, Hevo Data

Talha is a seasoned Software Developer, currently driving advancements in data integration at Hevo Data, where he have been instrumental in shaping a cutting-edge data integration platform for the past four years. With a significant tenure at Flipkart prior to their current role, he brought innovative solutions to the space of data connectivity and software development.