The need to store, transform, analyze and share data is growing exponentially, with demand for cloud-based data analytics and data warehouse solutions also on the rise. Using the cloud for data processing, analytics, and reporting has now become quite popular mainly due to its convenience and superior performance. In this blog post, we will go over a migration scenario where a fictional business is attempting to migrate their data from an on-prem TokuDB to Snowflake, a cloud-based data warehouse. To this aim, let’s first compare both solutions.
Introduction to TokuDB
TokuDB is a highly scalable MySQL and MariaDB storage engine. It offers high data compression, fast insertions, and deletions, among many other features. This makes it a great solution for use in high-performance and write-intensive environments. It uses a fractal tree data structure and huge data pages to efficiently manage and read the data. However, concurrency, scale, resiliency, and security are some of the bottlenecks that limit TokuDB’s performance. It is available in an open-source version and an enterprise edition.
Introduction to Snowflake
Snowflake is a cloud data warehouse that came out in 2015. It is primarily available on AWS and Azure. Snowflake is similar to BigQuery in that it stores data separately from where it does its compute. It stores the actual data of your tables in S3 and then it can provision any number of compute nodes to process that data.
In contrast, Snowflake offers instant access to unlimited resources (compute and storage) on-demand.
Snowflake Benefits:
- Snowflake is specifically optimized for analytics workloads. It’s therefore ideal for businesses dealing with very complex data sets.
- Snowflake offers better performance both in terms of storage capacity and query performance.
- Snowflake also offers better security compared to an on-prem data warehouse. This is because cloud data warehouses are required to meet stringent security requirements.
- Migrating your data to the cloud is also cost-effective since there is no huge initial outlay and you don’t have to maintain physical infrastructure.
Method 1: Using Custom ETL Scripts to Connect TokuDB to Snowflake
This approach would need you to invest in heavy engineering resources. The broad steps in this approach would need you to understand the S3 data source, write code to extract data from S3, prepare the data, and finally copy it into Snowflake. The details and challenges of each step are described in the next sections.
Method 2: Using Hevo to Connect TokuDB to Snowflake
Hevo is a cloud data pipeline platform that seamlessly moves data from TokuDB to Snowflake in real-time without having to write any code. By deploying Hevo, the data transfer can be completely automated and would not need any human intervention. This will allow you to direct your team’s bandwidth in extracting meaningful insights instead of wrangling with code.
Get Started with Hevo for Free
Hevo’s pre-built integration with TokuDB (among 100+ Sources) will take full charge of the data transfer process, allowing you to focus on key business activities.
Methods to Connect TokuDB to Snowflake
Here are the methods you can use to establish a connection from TokuDB to Snowflake:
Method 1: Using Custom ETL Scripts to Connect TokuDB to Snowflake
Here are the steps involved in using Custom ETL Scripts to connect TokuDB and Snowflake:
Step 1: Export TokuDB Tables to CSV Format
There are multiple ways to backup a TokuDB database and we will be using a simple SQL command to perform a logical backup.
SELECT * FROM `database_name`.`table_name` INTO OUTFILE 'path_to_folder/filename.csv'
FIELDS ENCLOSED BY '"' TERMINATED BY ';' ESCAPED BY '"' LINES TERMINATED BY 'rn';"
This command dumps the data into CSV format which can then easily be imported into Snowflake.
Repeat this command for all tables and ensure that your TokuDB server has enough storage space to hold the CSV files.
Step 2: Upload Source Data Files to Amazon S3
After generating the CSV/TXT file, we need to upload this data to a place where Snowflake can access it. Install the AWS CLI on your system
How to install the AWS CLI
After that execute the following command.
aws s3 cp filename.csv s3://{YOUR_BUCKET_NAME}
Step 3: Create an Amazon S3 Stage
Using the SnowSQL CLI client, run this command:
create or replace stage my_csv_stage
file_format = mycsvformat
url = 's3://{YOUR_BUCKET_NAME}';
The example above creates an external stage named my_csv_stage.
Step 4: Create a Table in Snowflake
Create a table with your schema. You will load data into this table in the next step.
create or replace table {YOUR_TABLE_NAME}
('$TABLE_SCHEMA')
Step 5: Loading Data to Snowflake
Loading data requires a Snowflake compute cluster. Run the following command in the SnowSQL CLI client:
copy into {YOUR_TABLE_NAME}
from s3://{YOUR_BUCKET_NAME} credentials=(aws_key_id='$AWS_ACCESS_KEY_ID' aws_secret_key='$AWS_SECRET_ACCESS_KEY')
file_format = (type = csv field_delimiter = '|' skip_header = 1);
This command will load data from all CSV files in the S3 bucket.
Step 6: Validating the Connection from TokuDB to Snowflake
select * from {YOUR_TABLE_NAME} limit 10;
The above approach is effort-intensive. You would need to hand-code many steps that run coherently to achieve the objective.
Limitations of Using Custom ETL Scripts to Connect TokuDB to Snowflake
- This method is ideal for a one-time bulk load. In case you are looking to stream data in real-time, you might have to configure cron jobs and write additional code to achieve this.
- More often than not, the use case to move data from TokuDB to Snowflake is not this straightforward. You might need to clean, transform and enrich the data to make it analysis-ready. This would not be easy to achieve.
- Since the data moved from TokuDB is critical to your business, you will need to constantly monitor the infrastructure to ensure that nothing breaks. Failure at any step would lead you to irretrievable data loss.
Method 2: Using Hevo to Connect TokuDB to Snowflake
Using Hevo (Official Snowflake ETL partner) — a managed system that simplifies data migration. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.
Sign up here for a 14-Day Free Trial!
Hevo takes care of all your data preprocessing to set up TokuDB Snowflake Integration and lets you focus on key business activities and draw a much powerful insight on how to generate more leads, retain customers, and take your business to new heights of profitability. It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination.
Moving data from TokuDB to Snowflake requires just 2 steps:
- Step 1: Connect to your TokuDB database by providing connection settings.
- Step 2: Select the mode of replication you want: (a) Load the result set of a Custom Query (b) Full dump of tables (c) Load data via log
- Step 3: Configure the Snowflake destination by providing the details like Destination Name, Account Name, Account Region, Database User, Database Password, Database Schema, and Database Name.
Check out what makes Hevo amazing:
- Real-Time Data Transfer: Hevo with its strong Integration with 100+ sources, allows you to transfer data quickly & efficiently. This ensures efficient utilization of bandwidth on both ends.
- Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Tremendous Connector Availability: Hevo houses a large variety of connectors and lets you bring in data from numerous Marketing & SaaS applications, databases, etc. such as Google Analytics 4, Google Firebase, Airflow, HubSpot, Marketo, MongoDB, Oracle, Salesforce, Redshift, etc. in an integrated and analysis-ready form.
- Simplicity: Using Hevo is easy and intuitive, ensuring that your data is exported in just a few clicks.
- Completely Managed Platform: Hevo is fully managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
That is it, Hevo will now take care of reliably loading data from TokuDB to Snowflake in real-time.
See how connecting TokuDB to Redshift can enhance your data infrastructure and analytics capabilities.
Conclusion
This blog talks about the two methods you can use to set up TokuDB Snowflake Integration in a seamless fashion: using custom ETL code and a third-party tool, Hevo.
Extracting complex data from a diverse set of data sources can be a challenging task and this is where Hevo saves the day!
Visit our Website to Explore Hevo
In addition to TokuDB, Hevo can also bring data from a wide array of data sources into Snowflake. Database (MySQL, PostgreSQL, MongoDB and more), Cloud Applications (Google Analytics, Google Ads, Facebook Ads, Salesforce, and more). This allows Hevo to scale on-demand as your data needs grow.
Sign Up for a full-feature free trial (14 days) to see the simplicity of Hevo first-hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.
Talha is a seasoned Software Developer, currently driving advancements in data integration at Hevo Data, where he have been instrumental in shaping a cutting-edge data integration platform for the past four years. With a significant tenure at Flipkart prior to their current role, he brought innovative solutions to the space of data connectivity and software development.