Snowflake on GCP Simplified: A Complete Guide 101

on Data Warehouses, Google Cloud Platform, Snowflake, Tutorials • September 28th, 2021 • Write for Hevo

Why is the cloud becoming so popular? It all boils down to the intricate features that come with the platform. For instance, companies no longer have to bear the costs of acquiring expensive and bulky Data Storage and Analysis tools with Cloud Storage. Cloud service providers such as Google Cloud offer virtual machines and hardware and house them in a regional data center. This is extremely cost-effective for companies since data can be shared simultaneously between different individuals.

With the advent of cloud technology, many companies have sprung up to offer related services. However, you cannot talk of cloud technology without mentioning top players like Google Cloud Platform and Snowflake. With this in mind, this post will dive into the nuts and bolts of integrating Snowflake on GCP. Read along to learn more about the steps involved in integrating Snowflake on GCP!

Table of Contents

Introduction to Snowflake

Snowflake Logo
Image Source

In simple terms, Snowflake is a SaaS-based Data Warehouse platform built over AWS Infrastructure. One of the reasons for this software’s popularity among businesses all over the world is its scalability, which makes it cost-effective. The architecture consists of Cloud-only virtual compute instances and efficient storage buckets.

Snowflake is designed in such a way that no hardware or software is required to Install, Configure, or Manage it. Snowflake performs all of the tasks performed by hardware and software, such as Management, Maintenance, and Tuning. Snowflake also enables you to Organize, Discover, and Execute a wide range of data. It provides best-in-class Performance, and Concurrency, allowing you to securely and easily mobilize your data.

Key Features of Snowflake

Features of Snowflake
Image Source

Below are some of the key features of Snowflake:

  • Standard and Extended SQL Support: Since Snowflake is an SQL-based platform, it supports all standard and extended SQL commands. 
  • Web-based Graphical User Interface: Snowflake offers users an interactive dashboard to connect with the cloud. Using the tool, you can monitor system usage and query data. 
  • Command-line Client: This comes as a separate downloadable tool you can install for querying data and other functions. It is built using Python and is a great way to interact with the data warehouse. 
  • Extensive Integration Features: Snowflake supports integration with a wide array of third-party tools such as Google Cloud. Later in this article, you will learn about Snowflake on GCP Integration.

For further information on Snowflake, click here to check out their official website.

Introduction to Google Cloud Platform

GCP logo
Image Source

Google Cloud Platform (GCP) provides computing resources necessary for developing and deploying applications on the web. By creating an application using the platform, Google automatically keeps track of all its resources, including Storage, Processing Power, and Network Connectivity. Instead of leasing a server or a DNS address, as is the case with normal websites with GCP, you pay for the resources used by your application. 

Key Features of Google Cloud Platform

Some of the key features of Google Cloud Platform are as follows:

  • Big Data: GCP offers a dedicated Big Data solution for clients with such needs. Some of the features include BigQuery, which allows users to run SQL-like commands on large chunks of data. 
  • Hosting: GCP offers two hosting solutions for customers: the AppEngine, the Platform-as-a-Service, and Compute Engine that acts as Infrastructure-as-a-Service. 
  • Containers: These come in handy for PaaS applications since they help boost app deployment. 

Simplify Snowflake ETL and Data Integration using Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline, helps load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ Data Sources including (40+ Free Sources). It is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo loads the data onto the desired Data Warehouse/destination like Snowflake and enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Get started with hevo for free

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled securely and consistently with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-day free trial!

Steps to Create Snowflake on GCP Integration

Snowflake on GCP provides the users with best of the both worlds. All the benefits of these robust platforms combined are bound to give you unmatched data processing and analyzing power. Below is a detailed outline of how you integrate Snowflake on GCP: 

Step 1: Create a Cloud Storage Integration

The first step involved to integrate Snowflake on GCP is to allow Snowflake access to GCP data using a GCP bucket. Use the following commands to do so:

CREATE STORAGE INTEGRATION <integration_name>
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = GCS
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/')
  [ STORAGE_BLOCKED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/') ]

Once this is done, Snowflake will create a Cloud Storage account. You can retrieve it by keying the demo_int command. Next, you can locate the value under the property storage_gcp_service account. 

Step 2: Grant Permissions

Snowflake creates a single Cloud Storage service account for each Snowflake account, so all Cloud Storage integrations for a single Snowflake account will use the same service account. Grant the following permissions under your GCP bucket, which will be posted to the service account that you created above: 

storage.buckets.get

storage.objects.create

storage.objects.delete

storage.objects.get

storage.objects.list

Step 3: Setting up the Snowflake Environment

You need a Data Warehouse where the data will be loaded and a database schema that will hold the tables. Below is the code that will serve this purpose.

create warehouse if not exists demo_warehouse
create database if not exists demo_database
use database demo_database
create schema if not exists demo_schema

Next up, you can use the account admin role to set up the database and the schema:

use role accountadmin
use warehouse demo_warehouse
use database demo_database
use schema demo_schema

Finally, you will need to create a table that will populate the data.

create or replace table demo_table (region string, country string, item_type string, sales_channel string, order_priority char(1),
order_date string, order_id integer, ship_date string, 
units_sold integer, unit_price float, unit_cost float, 
total_revenue float, total_cost float, total_profit float);

Step 4: Loading the Data

Now that you have a database and schema for use, you can create a stage that references the CGS bucket and folder paths to load and unload data. It is also important to define the file format as is the case below: 

create or replace stage demo_stage
url = 'gcs://demo_snowflake/sample_sales_data/2020/06/12/small_sample.csv'
storage_integration = demo_int
file_format = demo_csv_format;

Once this is done, you can now load data from the stage to the table. You can do this from external storage or directly from the bucket. That’s it. By following the steps mentioned above, you can load and unload data from GCP to Snowflake and perform Snowflake on GCP Integration. 

Conclusion

The age of Cloud Data Storage is here with us. It is up to you to decide to utilize the full benefits of Cloud Storage. One of the best ways is to integrate Snowflake on GCP since you will be getting the best of these two powerful platforms. In this article, you learned about Snowflake and Google Cloud platform along with the salient features that they offer. You also learned about the steps involved in performing Snowflake on GCP Integration.

Moreover, extracting complex data from a diverse set of data sources can get quite challenging and cumbersome, so a simpler alternative like Hevo Data is the right solution for you! 

visit our website to explore hevo

Hevo Data is a No-Code Data Pipeline that offers a faster way to move data from 100+ Data Sources including (40+ Free Sources), into your Data Warehouse such as Snowflake to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.

Give Hevo Data a try and sign up for a 14-day free trial today. Hevo offers plans & pricing for different use cases and business needs!

Share your experience of learning about Snowflake on GCP Integration. Tell us in the comments below!

No-Code Data Pipeline for Snowflake