Connect Google Cloud Storage to Snowflake: 3 Easy Methods

on Data Integration, Data Migration, Data Warehouses, google cloud storage, Snowflake • July 15th, 2022 • Write for Hevo

google cloud storage to snowflake: FI

An organization’s ability to make data-driven decisions is backed by timely and accurate access to information. But as a business scales, so does its data and the challenges of storing, accessing, and analyzing it. Google Cloud Storage is a service for storing objects of any format from text, video, audio, etc., in Google cloud. The objects are stored in buckets associated with a particular project, and the projects are grouped under an organization account where they can store all the data securely and reliably. For better cloud-based data storage and analytics service, businesses can connect Google Cloud Storage to Snowflake. It is a cloud data platform that offers Warehouse-as-a-Service.

In this blog, you’ll discuss the features of Google Cloud Storage and Snowflake, along with methods to connect Google Cloud Storage to Snowflake. 

Table of Contents

Prerequisites

Understanding of cloud computing

What is Google Cloud Storage?

google cloud storage to snowflake: google cloud storage logo
Image Source

Google Cloud Storage (GCS) is a fully managed service that makes storing and accessing files in the cloud easier. It offers unified, scalable, and durable object storage. GCS is one of the largest public cloud storage. It has several storage classes so enterprises and developers can determine the right pricing that matches their data storage requirements. Google cloud storage is different from Google One, cloud storage for consumers, which comes with Google Drive, Google Photos, and Gmail. 

The pricing plans for Google Cloud Storage are a bit complex and determined by the amount of storage the user requires, the number of operations performed, and the amount of data read or moved between storage buckets. However, Google Cloud Free Tier offers 5 GB per month, with a limit on performing actions. Google Workspace offers cloud storage at $6/user/month with unlimited actions. The Basic plan includes 30 GB of Google Drive storage, and the Business plan costs $12/user/month. 

Key Features of Google Cloud Storage

  • Offers low latency and high durability (up to 99.9999%).
  • Paid plans have unlimited storage with no minimum object size.
  • Cloud Storage allows users to assign conditions to a storage bucket to move to a less costly storage class or trigger a data deletion.
  • Object Versioning allows users to track the stored objects and access them if they are overwritten or deleted. This feature will increase storage costs and improve Object Lifecycle Management and remove older versions of the objects.

What is Snowflake?

google cloud storage to snowflake: snowflake logo
Image Source

Snowflake is an all-in-one cloud data warehouse with the primary goal of providing complete data management for data scientists and analysts. It can store data in SQL and NoSQL databases and a data analytics engine that helps in data exploration and analysis.

It is a unified platform that allows you to access your cloud data, data lake, and in-memory data in a single place. It also has a fully integrated analytical engine for fast data analysis. Snowflake’s flexible architecture enables you to build and scale analytical applications in the cloud without worrying about infrastructure or data management. Snowflake has been designed with ease of administration in mind, and it has a simple user interface that allows quick access to your data.

Key Features of Snowflake

Snowflake offers a comprehensive suite of features that allow you to run analytics applications in the cloud:

  • Secure: It protects sensitive information and data from unauthorized access. Users can easily set up and apply advanced security measures. Users can manage network policies by restricting access to their accounts. It supports SSO via federated authentication and two-factor authentication. 
  • Zero-Copy Clones: Cloning data in traditional data warehousing services requires the user to create a separate environment. However, it is not feasible when you have to test changes frequently. Snowflake has a zero-copy feature that allows users to clone any database without creating a new copy. Another benefit of this feature is that users can create multiple clones of the same data without further costs. Additionally, you can also create clones of a table or database from a past point in time. 
  • Support for Semi-Structured Data: One of the biggest features of Snowflake is its ability to combine semi-structured and structured data without complex technologies like Hive or Hadoop. Snowflake supports data ingestion in multiple formats such as Avro, ORC, Parquet, JSON, and XML. It extracts as much data as possible in columnar format and stores the rest in single-column format. 
  • Scalable: It has a shared, multi-cluster data architecture that separates the storage and computes resources. This allows users to scale up/down a resource when necessary while keeping the other resources constant. Snowflake also has auto-scaling and auto-suspend features, which automatically starts or suspend the virtual warehouse according to the clusters. 
  • Cloud-Agnostic: Snowflake is available on all three clouds: Amazon Web Services, Microsoft Azure, and Google Cloud Platform. Customers don’t have to migrate to another cloud and easily fit Snowflake into their current cloud solution.  

Why integrate Google Cloud Storage to Snowflake?

Manually storing your company’s critical data on Google Cloud Storage is unsustainable in a fast-paced e-commerce environment. If your company uses Google Cloud Storage to manage a lot of data, integrating that data with Snowflake can give you more flexibility and useful insights. This will enable you to have a strong data backup solution for the upcoming analytical workload. Additionally, moving your data to Snowflake will enable faster analysis by allowing you to compare this Google Cloud Storage data with other tools and applications.

Reliably integrate data with Hevo’s Fully Automated No Code Data Pipeline

If yours anything like the 1000+ data-driven companies that use Hevo, more than 70% of the business apps you use are SaaS applications Integrating the data from these sources in a timely way is crucial to fuel analytics and the decisions that are taken from it. But given how fast API endpoints etc can change, creating and managing these pipelines can be a soul-sucking exercise.

Hevo’s no-code data pipeline platform lets you connect over 150+ sources in a matter of minutes to deliver data in near real-time to your warehouse. What’s more, the in-built transformation capabilities and the intuitive UI means even non-engineers can set up pipelines and achieve analytics-ready data in minutes. 

All of this combined with transparent pricing and 24×7 support makes us the most loved data pipeline software in terms of user reviews.

Take our 14-day free trial to experience a better way to manage data pipelines.

Get started for Free with Hevo!

Connecting Google Cloud Storage to Snowflake

Connecting Google Cloud Storage data to Snowflake will allow users to store their data in Snowflake as well as access it from GCS. You can create a table in GCS and then migrate that data from GCS to Snowflake. Google Cloud Storage has multiple options for customers to transfer data efficiently. 

Method 1: Using Hevo to Set Up Google Cloud Storage to Snowflake

google cloud storage to snowflake: Hevo Logo
Image Source

Hevo provides an Automated No-code Data Pipeline that helps you move your Google Cloud Storage data to Snowflake. Hevo is fully-managed and completely automates the process of not only loading data from your 150+ Sources(including 40+ free sources) but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss. 

Using Hevo Google Cloud Storage to Snowflake Migration can be done in the following 2 steps:

  • Step 1: Configure Google Cloud Storage as the Source in your Pipeline by following the steps below:
    • Step 1.1: In the Asset Palette, select PIPELINES.
    • Step 1.2: In the Pipelines List View, click + CREATE.
    • Step 1.3: Select Google Cloud Storage on the Select Source Type page.
    • Step 1.4: Click + ADD GCS ACCOUNT on the Configure Your GCS Account page.
    • Step 1.5: To give Hevo permission to access your data in Google Cloud Storage, select the Google account that has access to the GCS bucket you want to connect to and click Allow.
    • Step 1.4: Set the following in the Configure your Google Cloud Storage Source page:
      • Pipeline Name: Give your Pipeline a unique name.
      • Bucket: Name of the bucket where the data will be ingested.
      • Path Prefix: The data directory’s path prefix. The files are ordinarily listed starting at the directory’s root.
      • File Format: Make a file format choice. The formats that Hevo currently supports are AVRO, CSV, JSON, and XML. If your source data is in a different format, get in touch with Hevo Support.

There are some additional settings you must specify depending on the format you choose:

  • CSV: Choose a field delimiter. This is the character that divides the fields in each line. For instance, “t” or “,.” If the Source data file does not have column headers, disable the Treat First Row As Column Headers option. Then, when Hevo is ingesting, these are automatically created. Enabled is the default setting. If the path prefix contains subdirectories that contain files in various formats, turn on the Create Event Types from Folders option. Each subdirectory is read by Hevo as a different Event Type. Prefix path files that are not in a subdirectory are ignored.
  • JSON: If the path prefix contains subdirectories that contain files in various formats, turn on the Create Event Types from Folders option. Each subdirectory is read by Hevo as a different Event Type. Prefix path files that are not in a subdirectory are ignored.
  • XML: If you want to load each node below the root node in the XML file as a separate Event, enable the Create Events from child nodes option.
google cloud storage to snowflake: configure gcs as source
Image Source
  • Step 1.5: TEST & CONTINUE is the button to click.
  • Step 1.6: Set up the Destination and configure the data ingestion.
  • Step 2: To set up Snowflake as a destination in Hevo, follow these steps:
    • Step 2.1: In the Asset Palette, select DESTINATIONS.
    • Step 2.2: In the Destinations List View, click + CREATE.
    • Step 2.3: Select Snowflake from the Add Destination page.
    • Step 2.4: Set the following parameters on the Configure your Snowflake Destination page:
      • Destination Name: A unique name for your Destination.
      • Snowflake Account URL: This is the account URL that you retrieved.
      • Database User: The Hevo user that you created in the database. In the Snowflake database, this user has a non-administrative role.
      • Database Password: The password of the user.
      • Database Name: The name of the Destination database where data will be loaded.
      • Database Schema: The name of the Destination database schema. Default value: public.
      • Warehouse: SQL queries and DML operations are performed in the Snowflake warehouse associated with your database.
google cloud storage to snowflake: configure snowflake as destination
Image Source
  • Step 2.5: Click Test Connection to test connectivity with the Snowflake warehouse.
  • Step 2.6: Once the test is successful, click SAVE DESTINATION.

Deliver smarter, faster insights with your unified data

Using manual scripts and custom code to move data into the warehouse is cumbersome. Changing API endpoints and limits, ad-hoc data preparation, and inconsistent schema makes maintaining such a system a nightmare. Hevo’s reliable no-code data pipeline platform enables you to set up zero-maintenance data pipelines that work.

  • Wide Range of Connectors: Instantly connect and read data from 150+ sources including SaaS apps and databases, and precisely control pipeline schedules down to the minute.
  • In-built Transformations: Format your data on the fly with Hevo’s preload transformations using either the drag-and-drop interface or our nifty python interface. Generate analysis-ready data in your warehouse using Hevo’s Postload Transformation 
  • Near Real-Time Replication: Get access to near real-time replication for all database sources with log-based replication. For SaaS applications, near real-time replication is subject to API limits.   
  • Auto-Schema Management: Correcting improper schema after the data is loaded into your warehouse is challenging. Hevo automatically maps source schema with destination warehouse so that you don’t face the pain of schema errors.
  • Transparent Pricing: Say goodbye to complex and hidden pricing models. Hevo’s Transparent Pricing brings complete visibility to your ELT spend. Choose a plan based on your business needs. Stay in control with spend alerts and configurable credit limits for unforeseen spikes in the data flow.
  • 24×7 Customer Support: With Hevo you get more than just a platform, you get a partner for your pipelines. Discover peace with round-the-clock “Live Chat” within the platform. What’s more, you get 24×7 support even during the 14-day free trial.
  • Security: Discover peace with end-to-end encryption and compliance with all major security certifications including HIPAA, GDPR, and SOC-2.

Try Hevo Today!

SIGN UP HERE FOR A 14-DAY FREE TRIAL

Method 2: Using CREATE STORAGE INTEGRATION Command  to Move Data from Google Cloud Storage to Snowflake

google cloud storage to snowflake: using create storage integration command to move data from Google cloud storage to snowflake
Image Source

This method creates a new storage integration in the Snowflake account. A storage integration is an object that stores an access management (IAM) entity for external cloud storage. 

Follow these steps to configure a Snowflake Storage Integration:

Step 1: Create a Cloud Storage Integration in Snowflake

CREATE STORAGE INTEGRATION <integration_name>
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = GCS
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/')
  [ STORAGE_BLOCKED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/') ]

Step 2: Retrieve the Cloud Storage Account for your Snowflake Account

DESC STORAGE INTEGRATION <integration_name>;

Step 3: Grant the Service Account Permissions to Access Bucket Objects

3.1 Creating a Custom IAM Role
  • Log into Google Cloud Platform. You should have the role of project editor to access GCP for this step. 
  • Choose IAM & admin > Roles.
  • Click on create role.
  • Enter the name and description for the custom role. 
  • Click on Add Permissions.
  • Add the following permission for data loading, data loading with purge, data loading and unloading, and data unloading. 
    • storage.buckets.get
    • storage.objects.create
    • storage.objects.delete
    • storage.objects.list
  • Click on Create.
3.2 Assign the Custom Role to the Cloud Storage Service Account
  • Choose Storage > Browser in GCP.
  • Select a bucket.
  • Click SHOW INFO PANEL.
  • Go to Describe Integration and search for the service account name from Step 2.
  • Select Storage > Custom > Role from Select a Role dropdown options. 
  • Click the Add button.
3.3 Grant the Cloud Storage Service Account Permissions on the Cloud Key Management Service Cryptographic Keys
  • Go to Security > Cryptographic keys in GCP. 
  • Select the key that is assigned to your GCS bucket.
  • Click on the SHOW INFO PANEL.
  • Go to Describe Integration and search for the service account name from Step 2.
  • Select the Cloud KMS CrytoKey Encryptor/Decryptor role. 
  • Click on the Add button.

Step 4: Create an External Stage

Create an external stage in Google Cloud Storage using the CREATE STAGE command.

Method 3: Using COPY INTO Command  to Move Data from Google Cloud Storage to Snowflake

google cloud storage to snowflake: using copy into command to move data from Google cloud storage to snowflake
Image Source

If you use Cloud Storage buckets for storing and managing your data files in your Google Cloud Storage account, you can use your existing buckets for bulk loading into Snowflake. You will use the COPY INTO <table>  command to load data from buckets to tables. 

  • Step 1: Stage the data files in a Cloud Storage bucket. 
  • Step 2: Use the COPY INTO <table> command to load the staged files’ content to a Snowflake database table. 

Limitations of Connecting Google Cloud Storage to Snowflake

The two methods mentioned above of connecting Google cloud storage to Snowflake have limitations like users cannot access real-time data or automatically refresh their data. However, You can use a no/low code platform like Hevo for real-time access to Google Cloud Storage data in Snowflake. With a platform like Hevo, users can schedule refreshes and monitor their data in real-time in Snowflake.

Conclusion

In this blog, you learn about the features of Google Cloud Storage and Snowflake. You also saw two methods of connecting Google Cloud Storage to Snowflake. Connecting Google Cloud Storage to Snowflake will help businesses better store, manage, and analyze their data to derive meaningful insights from various data sources.

Visit our Website to Explore Hevo

Hevo offers a No-code Data Pipeline that can automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Marketing, Customer Management, etc.

This platform allows you to transfer data from 150+ sources (including 40+ Free Sources) such as Google Cloud Storage and Cloud-based Data Warehouses like Snowflake, Google BigQuery, etc. It will provide a hassle-free experience and make your work life much easier.

Want to take Hevo for a spin? 

Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

No-code Data Pipeline For Snowflake