In today’s marketplace, in most cases, you will find that businesses have a different database to store transactions such as Amazon RDS, and a Warehouse such as Snowflake for carrying out Analytics. In this blog, we will discover to set up Amazon RDS to Snowflake Integration. They do this to avoid impacting query and read performance on their primary database when performing Big Data Analytics. Data Warehouses are an increasingly popular way to store and analyze both structured and unstructured data and it’s no coincidence that data analysis is touted as the prime reason why organizations are moving data from Amazon RDS to Snowflake.
You see, Data Warehouses are optimized to enable fast, complex queries using Online Analytical Processing (OLAP) across large datasets. In contrast, running analytics queries on large datasets with many aggregations and joins will have a serious performance impact on your transactional database not to mention a negative impact on your customer experience.
Our focus in this post is to provide you with a step-by-step method of moving data from Amazon RDS Snowflake. This article introduces Amazon RDS and Snowflake along with their key features. It also discusses the pros and cons of these tools. Read along to learn 2 methods using which you can connect to Amazon RDS Snowflake easily!
Table of Contents
What is Amazon RDS?
Image Source
Relational databases are at the heart of your most critical applications but they can become difficult to manage and operate within your Service Level Objectives (SLOs). Businesses want a simpler way to run relational databases on the cloud with high performance and at less cost. Amazon Relational Database Service or RDS seeks to solve this problem by simplifying database management through the automation of time-consuming administrative functions. With RDS, you only have to concern yourself with the tasks that really bring value to your business such as:
- Schema design
- Query construction
- Query optimization
Amazon RDS gives you the freedom to use your Relational Database of choice including six of the most popular open-source and commercial engines i.e. PostgreSQL, MySQL, MariaDB, Microsoft SQL Server, Oracle, and Amazon Aurora. RDS has placed a management layer around these engines to provide automation for very common tasks that a lot of Engineers who operate databases have to otherwise do manually.
Even highly regulated industries can leverage RDS which meets a broad range of compliance certifications such as ISO, HIPAA, PCI, ITAR, DIACAP, FedRAMP, FISMA, etc.
To learn more about Amazon RDs, visit here.
Key Features
- The online database is usually designed to respond quickly and write quickly. Running massive analytic or aggregation processes on this database will cause it to slow down, which will damage your customers’ experience.
- Data from numerous sources, not just transactional data, can be stored in the warehouse application. For analysis or aggregation, third-party sources or data sources from other areas of the pipeline may be required.
What is Snowflake?
Image Source
Snowflake is a petabyte-scale fully managed Cloud Data Warehouse. It is available on three different public clouds i.e. AWS, Azure, and GCP. Snowflake enables secure governed access to all of your data while delivering a high level of concurrency, performance, and infinite computing and storage needed to store and analyze all of an organization’s data in one solution. Similar to RDS, you do not have to worry about standard management tasks in Snowflake. Snowflake automates all onerous database administration tasks with a few clicks or API calls.
Snowflake top use cases:
- Data Analysis: Business Intelligence
- Data Replication: Record every action
- Real-time Data Synchronization: Keep all distributed systems Up-To-Date
- Data Auditing and Compliance: Track every User activity
You can read more about Snowflake’s data warehouse here.
Key Features
Let’s take a look at some of the Snowflake data warehouse’s important features:
- For better security and data protection, the Snowflake data warehouse offers Multi-Factor Authentication (MFA), federal authentication, Single Sign-on (SSO), and OAuth. All communication between the client and the server is encrypted using TLS.
- SQL (Standard and Extended) Support: The majority of SQL DDL and DML processes are supported by the Snowflake data warehouse. Other capabilities include advanced DML, transactions, lateral views, stored procedures, and more.
- Python connectors, Spark connectors, Node.js drivers, and .NET drivers are just a few of the client connections and drivers available in the Snowflake data warehouse.
- You can securely share your data with other Snowflake accounts.
Which Cloud-Based Data Solution Is Best for You?
You should consider using Amazon RDS when:
- You wish to use traditional databases in the cloud, with the sole requirement being that database maintenance be offloaded.
- Your data volume is measured in terabytes, and you do not expect a significant rise in the near future. At 64 TB, RDS reaches its storage constraints.
- You have a use case for online transaction processing and want immediate results with little data.
- There aren’t any searches that cover millions of rows, and the query complexity is kept to a minimum.
- Your analytical and reporting workloads are light and do not interfere with your OLTP demands.
- Your budget is tighter than ever, and you have no intention of blowing it on exorbitant workloads in the future.
You should consider using Snowflake when:
- You’re searching for methods to cut expenses by utilizing the availability of a Cloud Data Warehouse with practically limitless, automated scaling and dependable performance.
- If you wish to share data with people outside of your company, Snowflake may be used to rapidly set up data sharing without having to transfer or construct pipelines.
- Rapid distribution of insights to the business without having to worry about warehouse management.
Read along with this article, to know how you can connect Amazon RDS to Snowflake with 2 easy methods.
Here are two ways that can be used to approach Amazon RDS to Snowflake ETL:
Method 1: Writing Custom Scripts to Move Data from Amazon RDS to Snowflake
You can use this 2-step method to connect Amazon RDS and Snowflake. You start with exporting Amazon RDS DB snapshot data to Amazon S3 followed by loading it to Amazon S3. This method of moving data from Amazon RDS to Snowflake has considerable advantages but suffers from a few disadvantages as well.
Method 2: Using Hevo Data to Move Data from Amazon RDS to Snowflake
Hevo Data, an Automated No Code Data Pipeline can transfer data from Amazon RDS to Snowflake and provide you with a hassle-free experience. You can easily ingest data from the Amazon RDS database using Hevo’s Data Pipelines and replicate it to your Snowflake account without writing a single line of code. Hevo’s end-to-end Data Management service automates the process of not only loading data from RDS but also transforming and enriching it into an analysis-ready form when it reaches Snowflake. Once you assign Hevo the required Role Permissions in Snowflake, you can rely on its fully managed Data Pipeline to safely transit your RDS data into Snowflake in a loss-less manner.
Hevo supports direct integrations with RDS and 150+ sources (including 40 free sources) and its Data Mapping feature works continuously to replicate your data to Snowflake and builds a single source of truth for your business. Hevo takes full charge of the data transfer process, allowing you to focus your resources and time on other key business activities.
ETL your data from Amazon RDS to Snowflake using Hevo’s easy-to-set-up, no-code interface. Try our full feature access 14-day free trial.
Get Started with Hevo for Free
What are the Methods to Connect Amazon RDS to Snowflake?
Now that you have gained a basic understanding of Amazon RDS and Snowflake, let’s discuss the main highlight of this article i.e. how to load data from Amazon RDS to Snowflake. These are the methods you can use to set up a connection from Amazon RDS to Snowflake in a seamless fashion:
Method 1: Writing Custom Scripts to Move Data from Amazon RDS to Snowflake
Prerequisites:
- Active, running a virtual Warehouse.
- The following is a representative row in the table we are going to be using.
{ "brand": "Apple", "device": { "phone": { "model": [ { "array_element": "iPhone 11 Pro" }, { "array_element": "iPhone XR" }, { "array_element": "iPhone XS" }, { "array_element": "iPhone XS Max" } ] }, "name": "iPhone" } }
These are the steps involved in this method:
Step 1: Exporting Amazon RDS DB Snapshot Data to Amazon S3
You can export DB snapshot data to an Amazon S3 bucket. When you export a DB snapshot, Amazon RDS extracts data from a database or table in the snapshot and stores it in an Amazon S3 bucket in your account in an Apache Parquet format that is compressed and consistent.
By default, all data in the snapshot is exported. However, you can choose to export specific sets of databases, schemas, or tables.
To export DB snapshot data to Amazon S3:
- Create a manual snapshot of a DB instance.
- Create an AWS Identity and Access Management (IAM) policy that grants the snapshot export task access to the S3 bucket.
The following AWS CLI command creates an IAM policy named ExportPolicy and grants access to your S3 bucket.
aws iam create-policy --policy-name ExportPolicy --policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": [ "arn:aws:s3:::*" ] }, { "Effect": "Allow", "Action": [ "s3:PutObject*", "s3:GetObject*", "s3:DeleteObject*" ], "Resource": [ "arn:aws:s3:::<YOUR_BUCKET_NAME>", "arn:aws:s3:::<YOUR_BUCKET_NAME>*" ] } ] }'
- Create the IAM role which Amazon RDS will assume to access your Amazon S3 buckets.
The following example shows using the AWS CLI command to create a role named rds-s3-export-role.
aws iam create-role --role-name rds-s3-export-role --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "export.rds.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }'
- Attach the IAM policy that you created to the IAM role that you created.
aws iam attach-role-policy --policy-arn <YOUR_POLICY_ARN> --role-name rds-s3-export-role
Replace <YOUR_POLICY_ARN> with the policy ARN
- Open the Amazon RDS console and in the navigation pane, choose Snapshots. In the list of snapshots, choose the snapshot that you want to export and export it to Amazon S3.
NOTE: When you export a DB snapshot to an Amazon S3 bucket, Amazon RDS converts data to, exports data in, and stores data in the Parquet format.
Step 2: Loading data to Snowflake
- Create a file format object that specifies the Parquet file format type.
create or replace file format parquet_format type = 'parquet';
- Create an S3 Stage using the SnowSQL CLI client.
create or replace stage rds_stage file_format = parquet_format credentials = (aws_key_id=...,aws_secret_key=...) url = 's3://<YOUR_BUCKET_NAME>';
- Create a target relational table on Snowflake for the S3 data.
create or replace temporary table type = 'parquet'; ( brand varchar default null, device varchar default null, phone variant default null );
- Load the Amazon S3 data into the Snowflake table.
/* Note that all Parquet data is stored in a single column ($1). */ /* Cast element values to the target column data type. */ copy into amazon_rds_snapshot from (select $1:brand::varchar, $1:device:name::varchar, $1:device:phone.model::variant from @rds_stage/<filename>.parquet);
- Verify by running the following query on the relational table:
select * from amazon_rds_snapshot;
Limitations of using Custom Scripts to Connect Amazon RDS to Snowflake
- Exporting snapshots is only supported in the following AWS Regions:
- Asia Pacific (Tokyo)
- Europe (Ireland)
- US East (N. Virginia)
- US East (Ohio)
- US West (Oregon)
- Not all engine versions are supported for exporting snapshot data to Amazon S3. This feature is supported on:
- The S3 bucket must be in the same AWS Region as the snapshot.
- You would additionally need to set up cron jobs to get real-time data into Snowflake.
- In case you need to clean, transform and enrich data before loading to Snowflake, you could need to build additional code to accommodate this.
Don’t feel like doing this yourself?
Chances are the limitations of this method to load data from RDS to Snowflake will make it a non-starter as a reliable ETL solution. Alternatively, there is a simpler way to get data from RDS to Snowflake.
Method 2: Using Hevo Data to Move Data from Amazon RDS to Snowflake
Image Source
Hevo Data, a No-code Data Pipeline helps you to directly transfer your Amazon RDS data to Snowflake in real-time in a completely automated manner. With Hevo you can choose exactly what RDS data you wish to replicate to Snowflake and at what frequency. Hevo’s fault-tolerant architecture will enrich and transform your RDS data in a secure and consistent manner and load it to Snowflake without any assistance from your side. Since Hevo is an official Snowflake Data Pipeline Partner, you can entrust us with your data transfer process and enjoy a hassle-free experience. This way, you can focus more on Data Analysis, instead of data consolidation.
Hevo’s RDS MariaDB, MySQL, and PostgreSQL connectors will allow you to get the data into Snowflake using a fully managed setup. You can use Hevo to connect Amazon RDS to Snowflake by following 2 simple steps:
- Step 1) Authenticate Source: Connect and configure your RDS engine as a source for Hevo and select a suitable replication mode from the following:
- Load selected tables only
- Load data via Custom Query
- Load data through logs.
- Step 2) Configure Destination: In a Hevo Worksheet, add Snowflake as a destination and assign the required Role Permissions. Next, obtain your Snowflake account and region names. Finally, Configure your Snowflake Warehouse as shown in the below image.
Image Source
To learn more about configuring Snowflake as a Hevo destination, visit here.
With this, you have successfully set up Amazon RDS to Snowflake integration using Hevo Data.
Here are more reasons to try Hevo:
- Secure: Hevo has a fault-tolerant architecture that ensures that your RDS data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to your Snowflake schema.
- Quick Setup: Hevo with its automated features, can be set up in minimal time. Moreover, with its simple and interactive UI, it is extremely easy for new customers to work on and perform operations.
- Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag-and-drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use for aggregation.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
With continuous Real-Time data movement, Hevo allows you to combine Amazon RDS data along with your other data sources and seamlessly load it to Snowflake with a no-code, easy-to-setup interface. Try our 14-day full-feature access free trial!
Get Started with Hevo for Free
Pros and Cons of Amazon RDS and Snowflake
Pros of Amazon RDS:
- Supports a variety of databases, including MySQL, PostgreSQL, Aurora, and others.
- The RDS console makes managing all of the multiple instances of relational database services a breeze.
- Provides information on the performance of all database instances.
- It uses read replicas which redirect read-heavy traffic away from the primary database instance, thereby decreasing the load on that single instance.
- RDS separates computing and storage so that administrators can scale them separately.
Cons of Amazon RDS:
- RDS may be restricted in several areas for experienced DBAs. Because there is no direct access to the underlying servers, OS-level changes may be impossible.
- Obtaining logs from a database can be difficult. To gain access logging, other services like CloudWatch may need to be enabled.
- When you install an RDS instance, you must choose a weekly time frame of 30 minutes during which maintenance procedures on your RDS instance can be performed.
- Patching and scaling operations need systems to be taken offline. The time it takes for these procedures to complete varies. Scaling requires a few minutes of downtime on average for computing resources.
Pros of Snowflake:
- Snowflake provides significantly more capacity without the need to upgrade infrastructure. Everything is Cloud-based, with the SaaS being able to be launched on a microscale and then scaled up or down as needed.
- Snowflake replicates data across availability zones or availability domains within a region and between regions automatically. Because of its robust architecture, it can withstand the loss of up to 2 data centers.
- Snowflake databases are simple to use and let users organize their data as they choose. This SaaS is built to be extremely responsive and work well on its own without the need for continual monitoring by a professional.
Cons of Snowflake:
- Bulk data loading is well supported and guided when moving data from data files to Snowflake files. However, users are restricted to Snowpipe if they require continuous loading.
- Snowflake is extremely scalable and lets users pay for exactly what they need, with no data restrictions in both computation and storage. Many companies find it all too simple to overuse their services only to discover the problem when it comes time to bill.
Conclusion
This blog talks about the two methods you can use to set up a connection from Amazon RDS to Snowflake: using custom ETL scripts and with the help of a third-party tool, Hevo. It also gives a brief overview of Amazon RDS and Snowflake highlighting their key features and benefits before diving into the setup process. Now, the manual approach of connecting Amazon RDS to Snowflake will add complex overheads in terms of time, and resources. Such a solution will require skilled engineers and regular data updates. Furthermore, you will have to build an in-house solution from scratch if you wish to transfer your data from Amazon RDS to Snowflake or another Data Warehouse for analysis.
Hevo Data provides an Automated No-code Data Pipeline that empowers you to overcome the above-mentioned limitations. You can leverage Hevo to seamlessly transfer data from Amazon RDS to Snowflake in real-time without writing a single line of code. Hevo’s Data Pipeline enriches your data and manages the transfer process in a fully automated and secure manner. Hevo caters to 100+ data sources (including 40+ free sources) and can directly transfer data from these sources to Data Warehouses, Business Intelligence Tools, or any other destination of your choice in a hassle-free manner. It will make your life easier and make data migration hassle-free.
Learn more about Hevo
Want to take Hevo for a spin? Sign up for a 14-day free trial and experience the feature-rich Hevo suite firsthand.
What are your thoughts on moving data from RDS to Snowflake? Let us know in the comments.