An organization’s ability to make data-driven decisions is backed by timely and accurate access to information. But as a business scales, so does its data and the challenges of storing, accessing, and analyzing it. Google Cloud Storage is a service for storing objects of any format from text, video, audio, etc., in Google cloud.

The objects are stored in buckets associated with a particular project, and the projects are grouped under an organization account where they can store all the data securely and reliably. For better cloud-based data storage and analytics service, businesses can connect Google Cloud Storage to Snowflake. It is a cloud data platform that offers Warehouse-as-a-Service.

In this blog, you’ll discuss the features of Google Cloud Storage and Snowflake, along with methods to connect Google Cloud Storage to Snowflake

Prerequisites

Understanding of cloud computing

What is Google Cloud Storage?

  • Google Cloud Storage (GCS) is a fully managed service that makes storing and accessing files in the cloud easier. It offers unified, scalable, and durable object storage.
  • GCS is one of the largest public cloud storage. It has several storage classes so enterprises and developers can determine the right pricing that matches their data storage requirements.
  • Google cloud storage is different from Google One, cloud storage for consumers, which comes with Google Drive, Google Photos, and Gmail.  

What is Snowflake?

  • Snowflake is an all-in-one cloud data warehouse with the primary goal of providing complete data management for data scientists and analysts.
  • It can store data in SQL and NoSQL databases and a data analytics engine that helps in data exploration and analysis.
  • It is a unified platform that allows you to access your cloud data, data lake, and in-memory data in a single place.
  • It also has a fully integrated analytical engine for fast data analysis. Snowflake’s flexible architecture enables you to build and scale analytical applications in the cloud without worrying about infrastructure or data management.
  • Snowflake has been designed with ease of administration in mind, and it has a simple user interface that allows quick access to your data.

Why integrate Google Cloud Storage to Snowflake?

Manually storing your company’s critical data on Google Cloud Storage is unsustainable in a fast-paced e-commerce environment. If your company uses Google Cloud Storage to manage a lot of data, integrating that data with Snowflake can give you more flexibility and useful insights.

This will enable you to have a strong data backup solution for the upcoming analytical workload. Additionally, moving your data to Snowflake will enable faster analysis by allowing you to compare this Google Cloud Storage data with other tools and applications.

Solve your data replication problems with Hevo’s reliable, no-code, automated pipelines with 150+ connectors.
Get your free trial right away!

How to Connect Google Cloud Storage to Snowflake?

Connecting Google Cloud Storage data to Snowflake will allow users to store their data in Snowflake as well as access it from GCS. You can create a table in GCS and then migrate that data from GCS to Snowflake. Google Cloud Storage has multiple options for customers to transfer data efficiently. 

Method 1: Using Hevo to Set Up Google Cloud Storage to Snowflake

google cloud storage to snowflake: Hevo Logo

Hevo provides an Automated No-code Data Pipeline that helps you move your Google Cloud Storage data to Snowflake. Hevo is fully-managed and completely automates the process of not only loading data from your 150+ Sources(including 40+ free sources) but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss. 

Using Hevo Google Cloud Storage to Snowflake Migration can be done in the following 2 steps:

  • Step 1: Configure Google Cloud Storage as the Source in your Pipeline by following the steps below:
    • Step 1.1: In the Asset Palette, select PIPELINES.
    • Step 1.2: In the Pipelines List View, click + CREATE.
    • Step 1.3: Select Google Cloud Storage on the Select Source Type page.
    • Step 1.4: Click + ADD GCS ACCOUNT on the Configure Your GCS Account page.
    • Step 1.5: To give Hevo permission to access your data in Google Cloud Storage, select the Google account that has access to the GCS bucket you want to connect to and click Allow.
    • Step 1.4: Set the following in the Configure your Google Cloud Storage Source page:
      • Pipeline Name: Give your Pipeline a unique name.
      • Bucket: Name of the bucket where the data will be ingested.
      • Path Prefix: The data directory’s path prefix. The files are ordinarily listed starting at the directory’s root.
      • File Format: Make a file format choice. The formats that Hevo currently supports are AVRO, CSV, JSON, and XML. If your source data is in a different format, get in touch with Hevo Support.

There are some additional settings you must specify depending on the format you choose:

  • CSV: Choose a field delimiter. This is the character that divides the fields in each line. For instance, “t” or “,.” If the Source data file does not have column headers, disable the Treat First Row As Column Headers option. Then, when Hevo is ingesting, these are automatically created. Enabled is the default setting. If the path prefix contains subdirectories that contain files in various formats, turn on the Create Event Types from Folders option. Each subdirectory is read by Hevo as a different Event Type. Prefix path files that are not in a subdirectory are ignored.
  • JSON: If the path prefix contains subdirectories that contain files in various formats, turn on the Create Event Types from Folders option. Each subdirectory is read by Hevo as a different Event Type. Prefix path files that are not in a subdirectory are ignored.
  • XML: If you want to load each node below the root node in the XML file as a separate Event, enable the Create Events from child nodes option.
google cloud storage to snowflake: configure gcs as source
  • Step 1.5: TEST & CONTINUE is the button to click.
  • Step 1.6: Set up the Destination and configure the data ingestion.
  • Step 2: To set up Snowflake as a destination in Hevo, follow these steps:
    • Step 2.1: In the Asset Palette, select DESTINATIONS.
    • Step 2.2: In the Destinations List View, click + CREATE.
    • Step 2.3: Select Snowflake from the Add Destination page.
    • Step 2.4: Set the following parameters on the Configure your Snowflake Destination page:
      • Destination Name: A unique name for your Destination.
      • Snowflake Account URL: This is the account URL that you retrieved.
      • Database User: The Hevo user that you created in the database. In the Snowflake database, this user has a non-administrative role.
      • Database Password: The password of the user.
      • Database Name: The name of the Destination database where data will be loaded.
      • Database Schema: The name of the Destination database schema. Default value: public.
      • Warehouse: SQL queries and DML operations are performed in the Snowflake warehouse associated with your database.
google cloud storage to snowflake: configure snowflake as destination
  • Step 2.5: Click Test Connection to test connectivity with the Snowflake warehouse.
  • Step 2.6: Once the test is successful, click SAVE DESTINATION.

Method 2: Using CREATE STORAGE INTEGRATION Command  to Move Data from Google Cloud Storage to Snowflake

This method creates a new storage integration in the Snowflake account. A storage integration is an object that stores an access management (IAM) entity for external cloud storage. 

Follow these steps to configure a Snowflake Storage Integration:

Step 1: Create a Cloud Storage Integration in Snowflake

CREATE STORAGE INTEGRATION <integration_name>
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = GCS
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/')
  [ STORAGE_BLOCKED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/') ]

Step 2: Retrieve the Cloud Storage Account for your Snowflake Account

DESC STORAGE INTEGRATION <integration_name>;

Step 3: Grant the Service Account Permissions to Access Bucket Objects

3.1 Creating a Custom IAM Role
  • Log into Google Cloud Platform. You should have the role of project editor to access GCP for this step. 
  • Choose IAM & admin > Roles.
  • Click on create role.
  • Enter the name and description for the custom role. 
  • Click on Add Permissions.
  • Add the following permission for data loading, data loading with purge, data loading and unloading, and data unloading.
    • storage.buckets.get
    • storage.objects.create
    • storage.objects.delete
    • storage.objects.list
  • Click on Create.
3.2 Assign the Custom Role to the Cloud Storage Service Account
  • Choose Storage > Browser in GCP.
  • Select a bucket.
  • Click SHOW INFO PANEL.
  • Go to Describe Integration and search for the service account name from Step 2.
  • Select Storage > Custom > Role from Select a Role dropdown options. 
  • Click the Add button.
3.3 Grant the Cloud Storage Service Account Permissions on the Cloud Key Management Service Cryptographic Keys
  • Go to Security > Cryptographic keys in GCP. 
  • Select the key that is assigned to your GCS bucket.
  • Click on the SHOW INFO PANEL.
  • Go to Describe Integration and search for the service account name from Step 2.
  • Select the Cloud KMS CrytoKey Encryptor/Decryptor role. 
  • Click on the Add button.

Step 4: Create an External Stage

Create an external stage in Google Cloud Storage using the CREATE STAGE command.

Method 3: Using COPY INTO Command  to Move Data from Google Cloud Storage to Snowflake

If you use Cloud Storage buckets for storing and managing your data files in your Google Cloud Storage account, you can use your existing buckets for bulk loading into Snowflake. You will use the COPY INTO <table>  command to load data from buckets to tables. 

  • Step 1: Stage the data files in a Cloud Storage bucket. 
  • Step 2: Use the COPY INTO <table> command to load the staged files’ content to a Snowflake database table. 

Limitations of Connecting Google Cloud Storage to Snowflake

The two methods mentioned above of connecting Google cloud storage to Snowflake have limitations users cannot access real-time data or automatically refresh their data. However, You can use a no/low code platform like Hevo for real-time access to Google Cloud Storage data in Snowflake. With a platform like Hevo, users can schedule refreshes and monitor their data in real time in Snowflake.

Conclusion

In this blog, you learn about the features of Google Cloud Storage and Snowflake. You also saw two methods of connecting Google Cloud Storage to Snowflake. Connecting Google Cloud Storage to Snowflake will help businesses better store, manage, and analyze their data to derive meaningful insights from various data sources.

Osheen Jain
Technical Content Writer, Hevo Data

Osheen is a seasoned technical writer with over a decade of experience in the data industry. She specializes in writing about B2B, technology, finance, and SaaS domains. Her passion for simplifying intricate technical concepts has established her as a respected expert in the field, making her an invaluable resource for those looking to deepen their understanding of data science.

No-code Data Pipeline For Snowflake