Cloud-based data storage and warehousing solutions have now become the preferred choice for most modern businesses. This is primarily because leveraging a Cloud-based solution makes it easier for businesses to ensure that their Data Warehouses grow along with their unique data requirements and scale up or down on-demand or automatically to accommodate all peak-workload periods.

Cloud-based Data Warehouses enable businesses to efficiently settle all Data Availability and Security concerns as they support seamless Database Replication across multiple geographical locations, in addition to numerous backup and data recovery options. Two of the most popular Cloud-based Data Warehousing solutions are Amazon Redshift and Snowflake.

Both Data Warehouses offer their users a wide variety of features. Amazon Redshift offers its users exceptional features such as robust security and compliance across all its tiers along with discounts on long-term plans whereas Snowflake gives users the ability to scale Compute or Storage Nodes separately based on their requirements, thereby reducing costs significantly but offers security and compliance based on their tiers. Hence, based on the use case, businesses sometimes feel the need to migrate from one Data Warehouse to another.

This article will provide you with an in-depth understanding of how you can set up Snowflake to Redshift Migration seamlessly.

Steps to Set up Snowflake to Redshift Migration

Users can set up Snowflake to Redshift Migration by implementing the following steps:

Snowflake to Redshift Migration Step 1: Installing Required Packages

The following Python packages are required for this process:

  • Pandas: Pandas for Python can be installed by implementing the following command:
pip install pandas
  • Snowflake Connector: This package is required to export all data from the Snowflake Data Warehouse as a Pandas Dataframe for Snowflake to Redshift Migration. It can be installed by running the following command:
pip install snowflake-connector-python==<version>

<version> in the above command has to be replaced with the version of Python installed on your local system.

  • Pandas_redshift: This package is required to store the data extracted from Snowflake as a Pandas Dataframe into Amazon Redshift and fully perform the Snowflake to Redshift Migration. It can be installed by running the following command:
pip install pandas-redshift

More information on these packages can be found on the Pandas, Snowflake Connector, and Pandas_Redshift documentation.

Snowflake to Redshift Migration Step 2: Extracting Data from Snowflake

In order to set up Snowflake to Redshift Migration, the data first has to be extracted from Snowflake. This can be done using Snowflake’s Python connector. The following code can be used to extract data from Snowflake using Python:

import snowflake.connector
import pandas as pd

ctx = snowflake.connector.connect(
          host=host,
          user=user,
          password=password,
          account=account,
          warehouse=warehouse,
          database=database,
          schema=schema,
          protocol='https',
          port=port)

# Create a cursor object.
cur = ctx.cursor()

# Execute a statement that will generate a result set.
sql = "select * from t"
cur.execute(sql)

# Fetch the result set from the cursor and deliver it as the Pandas DataFrame.
df = cur.fetch_pandas_all()

You will have to make the following changes to your Snowflake to Redshift Migration code based on your unique requirements:

  • The first function call to connect Python to Snowflake i.e. snowflake.connector.connect() should have the required parameters of your Snowflake Data Warehouse.
  • The SQL query being executed should extract the data that you wish to load into Redshift. Information on queries supported by Snowflake can be found here.
  • You can choose to execute multiple queries and create multiple Pandas Dataframes to suit your data and business needs.

Snowflake to Redshift Migration Step 3: Loading Data into Amazon Redshift

Simplify ETL Using Hevo’s No-code Data Pipeline

Hevo is a No-code Data Pipeline that offers a fully managed solution to set up data integration from 150+ data sources (including 40+ free data sources) to numerous Data Warehouses such as Amazon Redshift, Snowflake, etc. or a destination of choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully-automated solution to manage data in real-time and always have analysis-ready data.

Get Started with Hevo for Free

Let’s look at Some Salient Features of Hevo:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Once the required data has been extracted from Snowflake and stored in Pandas Dataframes, you will now need to load it into Amazon Redshift to complete your Snowflake to Redshift Migration. You can load your Pandas Dataframes into Amazon Redshift by running the following Python script:

import pandas_redshift as pr
pr.connect_to_redshift(dbname = <dbname>,
                        host = <host>,
                        port = <port>,
                        user = <user>,
                        password = <password>)
# Connect to S3
pr.connect_to_s3(aws_access_key_id = <aws_access_key_id>,
                aws_secret_access_key = <aws_secret_access_key>,
                bucket = <bucket>,
                subdirectory = <subdirectory>
                # As of release 1.1.1 you are able to specify an aws_session_token (if necessary):
                # aws_session_token = <aws_session_token>
                )

# Write the DataFrame to S3 and then to redshift
pr.pandas_to_redshift(data_frame = data,
                        redshift_table_name = <redshift_table_name>, append=False)

You will have to make the following changes to your Snowflake to Redshift Migration code that you execute based on your unique requirements:

  • The first function call to connect Python to Amazon Redshift i.e. pr.connect_to_redshift() should have the required parameters of your Amazon Redshift Data Warehouse.
  • The second function call to connect Python to Amazon S3 i.e. pr.connect_to_s3() should have the required parameters associated with your Amazon S3 account and buckets. This call is required since this Python package first loads data into Amazon S3 and that data is then loaded into Amazon Redshift from Amazon S3.
  • The third function call to load your Pandas Dataframe to Amazon Redshift i.e. pr.pandas_to_redshift() should have the required information about the Pandas Dataframe you wish to import into Amazon Redshift along with the name of the table in which it should be stored. If the append parameter is set as False, any existing table with the same name in Amazon Redshift will be dropped and a new one will be created to store the Pandas Dataframe, and if it is set as True, it will append the Pandas Dataframe to the end of an existing table.
  • If multiple Pandas Dataframes were created to extract the necessary data from Snowflake, you will have to make multiple calls to store that data in Amazon Redshift.

Conclusion

This article provided you with a step-by-step guide on how you can set up Snowflake to Redshift Migration seamlessly using Python. However, this process can be challenging for someone who does not have enough technical knowledge and expertise in Python, Snowflake, and Amazon Web Services (AWS) environment.

Visit our Website to Explore Hevo

Most database migration tasks including Snowflake to Redshift Migration, if done manually, require immense engineering bandwidth and resources for the development and maintenance of Data Pipelines. Hence, businesses can instead use existing automated No-code platforms like Hevo.

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources such as PostgreSQL, MySQL, and MS SQL Server, we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready. 

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

mm
Former Research Analyst, Hevo Data

Manik has a keen interest in data, software architecture, and has a flair for writing hightly technical content. He has experience writing articles on diverse topics related to data integration and infrastructure.

No-code Data Pipeline For Amazon Redshift