How To Integrate Snowflake PrivateLink?: 3 Easy Steps

on AWS, Big Data, Data Integration, Data Warehouse, ETL Tutorials, Snowflake • January 25th, 2022 • Write for Hevo

Integrate Snowflake and AWS PrivateLink

With the advent of technology, most of the users are modernizing and moving into VPCs (Virtual Private Cloud). With every consumer moving its data into the Cloud, it is essential to establish private connectivity between VPCs, Data Warehouse services, and SaaS applications securely. Snowflake is a Data Warehouse that has become an industry-leading Cloud-Based SaaS (Software-as-a-service) Data Platform. Snowflake integrates with AWS PrivateLink to offer a secure and private connection between customers’ VPC and their Snowflake account. Snowflake PrivateLink integration allows customers to connect to Snowflake without exposing their data to the public Internet.

Snowflake is always seeking ways to improve its offerings and enhance its data security thereby, making it a Data Warehouse of choice. PrivateLink, offered by AWS, is the newest generation of VPC Endpoints that allows private and secure connectivity between AWS VPCs, without passing over the public Internet. This article will take you through various important aspects of Snowflake PrivateLink integration.

Table of Contents

Introduction to Snowflake

Snowflake PrivateLink: Snowflake Logo
Image Source: www.commons.wikimedia.org

Snowflake is a Cloud Data Warehousing solution provided as a SaaS offering. It is built on Amazon Web Service, Microsoft Azure, or Google Cloud infrastructure that provides an unbounded platform for storing and retrieving data. Snowflake Data Warehouse uses a different proprietary SQL Database Engine with a unique architecture designed for the cloud.

The architecture of Snowflake separates its “Compute” and “Storage” units, thereby scaling differently. This allows the customers to use and pay for both services independently. It means organizations that have high storage demands but less need for CPU cycles, or vice versa, do not have to pay for an integrated bundle that requires payment for both, making it very attractive to companies. Like other popular Data Warehouses, it also uses Columnar Storage for parallel query execution.

With Snowflake, there is no hardware or software to select, install, configure, or manage, therefore, making it ideal for organizations that do not want to have dedicated resources for setup, maintenance, and support for in-house servers. Snowflake security and sharing functionalities make it easy for organizations to quickly share and secure data in real-time using any available ETL solution. Snowflake’s architecture allows flexibility with Big Data. Snowflake is known for its scalability and relative ease of use when compared to other Data Warehouses in the market.

Simplify Snowflake ETL and Data Integration using Hevo’s No-code Data Pipeline

Hevo Data helps you directly transfer data from 100+ data sources (including 30+ free sources) to Snowflake, Business Intelligence tools, Data Warehouses, or a destination of your choice in a completely hassle-free & automated manner. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Get started with hevo for free

Let’s look at some of the salient features of Hevo:

  • Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Scalable Infrastructure: Hevo has in-built integrations for 100’s of sources that can help you scale your data infrastructure as required.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within Data Pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-day free trial!

What is AWS PrivateLink?

Snowflake PrivateLink: AWS PrivateLink
Image Source: www.aws.amazon.com

Connecting to applications on the Cloud should be easy, and your data and services should also remain secure. AWS PrivateLink is an AWS offering for creating private VPC endpoints that enables you to directly and securely connect your AWS Virtual Private Clouds (VPC) to supported services such as AWS Marketplace, your own VPCs, other SaaS and Data Warehouses, etc.

AWS PrivateLink routes the traffic between VPCs and other services over the AWS Network, meaning it doesn’t traverse the public Internet. You no longer need an Internet Gateway or a Public IP Address to access your VPC with AWS PrivateLink. In addition, you can also use AWS Direct Connect, in conjunction with AWS PrivateLink to connect all your virtual and physical environments in a single, private network.

Snowflake AWS PrivateLink Integration

Snowflake is implemented as a VPC on AWS, hence, PrivateLink creates a private and highly-secure network between Snowflake and your other AWS VPCs, fully protected from unauthorized external access. Let’s learn more about the Snowflake PrivateLink integration.

Improved Security and Simplify Connectivity

Snowflake is one of the industry-leading Cloud Data Warehouses and leading SaaS companies in the field of storage. From day one, security has been a central pillar of Snowflake’s architecture, it is very functional in detecting security risks, preventing as many threats as possible, and reacting to security incidents in the best way possible.

Snowflake is based on a multi-cluster, shared data architecture purpose-built for the cloud. It secures customer data at all stages, protecting it in transit and at rest. Your sensitive information stored in Snowflake is transparently encrypted via a key hierarchy, which provides enhanced security levels by encrypting individual pieces of data using different keys. Snowflake also gives you full control to manage the roles and access rights of users, and it comes with multi-factor authentication. To learn more about Snowflake’s security, check out our piece on Snowflake Security Best Practices.

Snowflake PrivateLink: Snowflake Architecture
Image Source: www.snowflake.com

Snowflake’s multi-tenant service runs inside a Virtual Private Cloud (VPC), which means its internal components are isolated and can’t be accessed directly. Traffic incoming from customer VPCs is routed to the Snowflake VPC through an Elastic Load Balancer (ELB).

However, a key area of concern lies around how data is being transferred from a private subnetwork to Snowflake. Some of the customers have restrictive policies on their resources accessing the public Internet and hence, they want to transfer data without allowing unrestricted outbound access to the public Internet. This is where Snowflake PrivateLink comes in.

As discussed, it enables direct and secure connectivity between VPCs while keeping network traffic and communication within the AWS private network only. Hence, customers can transfer data to Snowflake without traversing the public Internet, and without setting up proxies between their network and Snowflake.

Snowflake PrivateLink: Multi-tenant Snowflake VPC
Image Source: www.snowflake.com
Snowflake PrivateLink: Dedicated Snowflake VPC
Image Source: www.snowflake.com

Snowflake PrivateLink integration runs its service behind a Network Load Balancer (NLB) and shares the endpoint with customers’ VPCs. This enables direct connectivity to Snowflake via private IP Addresses. Customers have full control over the endpoint, and they can choose which of their VPCs can access Snowflake. You can observe Snowflake’s architecture showing private connectivity from customer VPCs to Snowflake in both multi-tenant (ESD) and single-tenant (VPS) scenarios.

Components of Snowflake PrivateLink Integration

There are several components involved in moving traffic from the endpoints to Snowflake via AWS PrivateLink. The VPC Endpoint and the AWS Network Load Balancer are the essential components of this Snowflake PrivateLink integration.

The Snowflake PrivateLink integration starts with configuring PrivateLink between your VPC and Snowflake. It is possible to configure Snowflake PrivateLink integration in more than one VPC. Hence, you can configure PrivateLink in multiple VPCs in a customer environment. However, you can also set it up in one VPC and then route traffic to Snowflake through that single VPC from the others.

You can also use AWS Direct Connect to establish direct, private communication channels from your subnetwork to the AWS network. This way any clients outside AWS can leverage on-premises communication directly into a VPC of your choice. They can then be routed directly from on-premises to Snowflake without using the public Internet.

In addition to that, AWS also offers AWS VPN to establish a virtual, dedicated network connection into a VPC. Just like AWS Direct Connect, AWS VPN also allows your clients to engage with the VPC’s network and then be routed to Snowflake through the PrivateLink connection.

Enabling AWS PrivateLink for Snowflake

Now that you’re familiar with various aspects of Snowflake PrivateLink integration, you can now enable AWS PrivateLink for your Snowflake account to get started. It may take up to 2 business days to enable PrivateLink. You need to provide valid Account IDs of the VPCs you want to connect to Snowflake.

Step 1: Provide AWS VPC Account IDs and Account URLs

To enable AWS PrivateLink, you need to contact Snowflake Support and provide them with a list of all of your AWS VPC Account IDs along with the corresponding Account URLs you use to access Snowflake.

Step 2: Snowflake Provides a VPC Endpoint Address for Your Region

Snowflake provides you with a region-specific VPC Endpoint (VPCE) Address after accepting your VPC Account ID:

com.amazonaws.vpce.<region_id>.vpce-svc-xxxxxxxxxxxxxxxxx

Where:

  • <region_id> is the ID for the AWS Region where your VPCs and Snowflake account are located.

You can access your AWS VPCE by executing the following command:

SYSTEM$GET_PRIVATELINK_CONFIG

Configuring your AWS VPC Environment

After enabling AWS PrivateLink for your account, you must configure your AWS VPC environment to get started with Snowflake PrivateLink.

Step 1: Create and Configure a VPC Endpoint (VPCE)

In your AWS VPC environment:

  1. You need to create an endpoint for the VPCE Address provided by Snowflake.
  2. Next, you need to authorize a VPCE security group to allow the following ports:
    • 443: Required for general Snowflake traffic.
    • 80: Required for the Snowflake OCSP Cache Server.
  3. Then, you need to authorize a security group of services that can connect the Snowflake outgoing connection to ports 443 and 80 of the VPCE CIDR (Classless Inter-Domain Routing).

Step 2: Configure Your VPC Network

To access Snowflake through an AWS PrivateLink endpoint, it is necessary to create CNAME records in your DNS to allow the privatelink-account-url and the privatelink-ocsp-url values from the SYSTEM$GET_PRIVATELINK_CONFIG function.

Snowflake may require you to create an additional CNAME record in your DNS to use some of the features with AWS PrivateLink. You can either create an additional CNAME record or even combine the following feature URL values with the privatelink-account-url and the privatelink-ocsp-url values from the SYSTEM$GET_PRIVATELINK_CONFIG output.

Snowflake Data Marketplace or Snowsight:

app.<region_id>.privatelink.snowflakecomputing.com

Organizations:

<org_name>-<account_name>.privatelink.snowflakecomputing.com

Step 3: Create AWS VPC Interface Endpoints for Amazon S3

The Snowflake clients require access to Amazon S3 to perform various runtime operations. As discussed already, the PrivateLink VPC network doesn’t allow unrestricted outbound access to the public Internet. Hence, it does not allow Amazon S3 access over the public Internet, so you must configure private connectivity to the Amazon S3. To do so, you can follow the below-mentioned options.

  1. Configure an AWS VPC interface endpoint for internal stages.
  2. Configure an Amazon S3 gateway endpoint.

For more information on Snowflake PrivateLink integration, you can refer to the official documentation.

Conclusion

As companies begin to explore a number of ways of safeguarding their data, they look for the possibility of direct and secure private connections to various services, SaaS, Data Warehouses, etc. AWS PrivateLink allows you to create a one-way private connection between your Virtual Private Network (VPC) and Snowflake. It directly connects Snowflake with the customer’s network keeping the data flow off the public internet. This article introduced you to AWS PrivateLink and took you through various aspects of Snowflake PrivateLink integration.

Snowflake has a list of tools that can be integrated into it by simply accessing its tools page and selecting the platform you need. Hevo Data is a good data tool to integrate with Snowflake as it helps you to create efficient datasets and transforms your data into insightful actionable leads.

visit our website to explore hevo

Hevo Data, with its strong integration with 100+ Sources & BI tools, allows you to not only export data from sources & load data in the destinations such as Snowflake, but also transform & enrich your data, & make it analysis-ready so that you can focus only on your key business needs and perform insightful analysis using BI tools. In short, Hevo can help you store your data securely in Snowflake.

Give Hevo Data a try and sign up for a 14-day free trial today. Hevo offers plans & pricing for different use cases and business needs!

Share your experience of working with Snowflake PrivateLink integration in the comments section below.

No-code Data Pipeline for Snowflake