With the advent of technology, most of the users are modernizing and moving into VPCs (Virtual Private Cloud). With every consumer moving its data into the Cloud, it is essential to establish private connectivity between VPCs, Data Warehouse services, and SaaS applications securely. Snowflake is a Data Warehouse that has become an industry-leading Cloud-Based SaaS (Software-as-a-service) Data Platform. Snowflake integrates with AWS PrivateLink to offer a secure and private connection between customers’ VPC and their Snowflake account. Snowflake PrivateLink integration allows customers to connect to Snowflake without exposing their data to the public Internet.
Snowflake is always seeking ways to improve its offerings and enhance its data security thereby, making it a Data Warehouse of choice. PrivateLink, offered by AWS, is the newest generation of VPC Endpoints that allows private and secure connectivity between AWS VPCs, without passing over the public Internet. This article will take you through various important aspects of Snowflake PrivateLink integration.
Introduction to Snowflake
Snowflake is a Cloud Data Warehousing solution provided as a SaaS offering. It is built on Amazon Web Service, Microsoft Azure, or Google Cloud infrastructure that provides an unbounded platform for storing and retrieving data. Snowflake Data Warehouse uses a different proprietary SQL Database Engine with a unique architecture designed for the cloud.
The architecture of Snowflake separates its “Compute” and “Storage” units, thereby scaling differently. This allows the customers to use and pay for both services independently. It means organizations that have high storage demands but less need for CPU cycles, or vice versa, do not have to pay for an integrated bundle that requires payment for both, making it very attractive to companies. Like other popular Data Warehouses, it also uses Columnar Storage for parallel query execution.
Hevo Data, a No-code Data Pipeline helps to transfer data from 150+ sources to Snowflake. Hevo is fully-managed and completely automates the process of not only exporting data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.
Check out some amazing features of Hevo (Official Snowflake ETL Partner):
- Minimal Learning: Hevo with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
- Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to export.
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of Schema management & automatically detects the Schema of incoming data and maps it to the destination Schema.
Get Started with Hevo for Free
What is AWS PrivateLink?
Connecting to applications on the Cloud should be easy, and your data and services should also remain secure. AWS PrivateLink is an AWS offering for creating private VPC endpoints that enables you to directly and securely connect your AWS Virtual Private Clouds (VPC) to supported services such as AWS Marketplace, your own VPCs, other SaaS and Data Warehouses, etc.
AWS PrivateLink routes the traffic between VPCs and other services over the AWS Network, meaning it doesn’t traverse the public Internet. You no longer need an Internet Gateway or a Public IP Address to access your VPC with AWS PrivateLink. In addition, you can also use AWS Direct Connect, in conjunction with AWS PrivateLink to connect all your virtual and physical environments in a single, private network.
Snowflake AWS PrivateLink Integration
Snowflake is implemented as a VPC on AWS, hence, PrivateLink creates a private and highly-secure network between Snowflake and your other AWS VPCs, fully protected from unauthorized external access. Let’s learn more about the Snowflake PrivateLink integration.
Part 1: Improved Security and Simplify Connectivity
Snowflake is based on a multi-cluster, shared data architecture purpose-built for the cloud. It secures customer data at all stages, protecting it in transit and at rest. Your sensitive information stored in Snowflake is transparently encrypted via a key hierarchy, which provides enhanced security levels by encrypting individual pieces of data using different keys. Snowflake also gives you full control to manage the roles and access rights of users, and it comes with multi-factor authentication. To learn more about Snowflake’s security, check out our piece on Snowflake Security Best Practices.
Snowflake’s multi-tenant service runs inside a Virtual Private Cloud (VPC), which means its internal components are isolated and can’t be accessed directly. Traffic incoming from customer VPCs is routed to the Snowflake VPC through an Elastic Load Balancer (ELB).
However, a key area of concern lies around how data is being transferred from a private subnetwork to Snowflake. Some of the customers have restrictive policies on their resources accessing the public Internet and hence, they want to transfer data without allowing unrestricted outbound access to the public Internet. This is where Snowflake PrivateLink comes in.
As discussed, it enables direct and secure connectivity between VPCs while keeping network traffic and communication within the AWS private network only. Hence, customers can transfer data to Snowflake without traversing the public Internet, and without setting up proxies between their network and Snowflake.
Snowflake PrivateLink integration runs its service behind a Network Load Balancer (NLB) and shares the endpoint with customers’ VPCs. This enables direct connectivity to Snowflake via private IP Addresses. Customers have full control over the endpoint, and they can choose which of their VPCs can access Snowflake. You can observe Snowflake’s architecture showing private connectivity from customer VPCs to Snowflake in both multi-tenant (ESD) and single-tenant (VPS) scenarios.
Part 2: Components of Snowflake PrivateLink Integration
The constituents of the integration of Snowflake Private Link include a VPC Endpoint and an AWS Network Load Balancer. Private Link may also be established between your VPC and Snowflake, which also allows you to connect more than one VPC or route traffic through a single VPC.
In addition, AWS Direct Connect can be applied to establish private communication channels from the on-premises networks into a selected VPC, thereby making a direct route to Snowflake without going over the public Internet. Analogously, AWS VPN provides a dedicated network connection into a VPC and allows a client to have access to the VPC network while furthering the routing to Snowflake using the PrivateLink connection.
Part 3: Enabling AWS PrivateLink for Snowflake
Now that you’re familiar with various aspects of Snowflake PrivateLink integration, you can now enable AWS PrivateLink for your Snowflake account to get started. It may take up to 2 business days to enable PrivateLink. You need to provide valid Account IDs of the VPCs you want to connect to Snowflake.
Step 1: Provide AWS VPC Account IDs and Account URLs
To enable AWS PrivateLink, you need to contact Snowflake Support and provide them with a list of all of your AWS VPC Account IDs along with the corresponding Account URLs you use to access Snowflake.
Step 2: Snowflake Provides a VPC Endpoint Address for Your Region
Snowflake provides you with a region-specific VPC Endpoint (VPCE) Address after accepting your VPC Account ID:
com.amazonaws.vpce.<region_id>.vpce-svc-xxxxxxxxxxxxxxxxx
Where:
- <region_id> is the ID for the AWS Region where your VPCs and Snowflake account are located.
You can access your AWS VPCE by executing the following command:
SYSTEM$GET_PRIVATELINK_CONFIG
Part 4: Configuring your AWS VPC Environment
After enabling AWS PrivateLink for your account, you must configure your AWS VPC environment to get started with Snowflake PrivateLink.
Step 1: Create and Configure a VPC Endpoint (VPCE)
In your AWS VPC environment:
- You need to create an endpoint for the VPCE Address provided by Snowflake.
- Next, you need to authorize a VPCE security group to allow the following ports:
- 443: Required for general Snowflake traffic.
- 80: Required for the Snowflake OCSP Cache Server.
- Then, you need to authorize a security group of services that can connect the Snowflake outgoing connection to ports 443 and 80 of the VPCE CIDR (Classless Inter-Domain Routing).
Step 2: Configure Your VPC Network
To access Snowflake through an AWS PrivateLink endpoint, it is necessary to create CNAME records in your DNS to allow the privatelink-account-url and the privatelink-ocsp-url values from the SYSTEM$GET_PRIVATELINK_CONFIG function.
Snowflake may require you to create an additional CNAME record in your DNS to use some of the features with AWS PrivateLink. You can either create an additional CNAME record or even combine the following feature URL values with the privatelink-account-url and the privatelink-ocsp-url values from the SYSTEM$GET_PRIVATELINK_CONFIG output.
Snowflake Data Marketplace or Snowsight:
app.<region_id>.privatelink.snowflakecomputing.com
Organizations:
<org_name>-<account_name>.privatelink.snowflakecomputing.com
Step 3: Create AWS VPC Interface Endpoints for Amazon S3
The Snowflake clients require access to Amazon S3 to perform various runtime operations. As discussed already, the PrivateLink VPC network doesn’t allow unrestricted outbound access to the public Internet. Hence, it does not allow Amazon S3 access over the public Internet, so you must configure private connectivity to the Amazon S3. To do so, you can follow the below-mentioned options.
- Configure an AWS VPC interface endpoint for internal stages.
- Configure an Amazon S3 gateway endpoint.
Integrate your data in minutes!
No credit card required
Conclusion
As companies begin to explore a number of ways of safeguarding their data, they look for the possibility of direct and secure private connections to various services, SaaS, Data Warehouses, etc. AWS PrivateLink allows you to create a one-way private connection between your Virtual Private Network (VPC) and Snowflake.
Hevo Data, with its strong integration with 150+ Sources & BI tools, allows you to not only export data from sources & load data in destinations such as Snowflake, but also transform & enrich your data, & make it analysis-ready so that you can focus only on your key business needs and perform insightful analysis using BI tools. Try a 14-day free trial and experience the feature-rich Hevo suite firsthand. Also, check out our unbeatable pricing to choose the best plan for your organization.
FAQs
1. Is PrivateLink bidirectional?
No, AWS PrivateLink is not bidirectional. That’s the only difference; it’ll allow one-way traffic. That is so that clients can access services but the services cannot start a connection back to the clients.
2. Is Snowflake hosted on AWS or Azure?
Snowflake runs on many cloud platforms: AWS, Azure, and Google Cloud. Users will choose which cloud provider they prefer at their discretion based on the needs of their infrastructure.
3. What is virtual private Snowflake?
Virtual Private Snowflake is one of the deployable options, as it provides security by secluding Snowflake’s environment in a dedicated Virtual Private Cloud. Hence, organizations that have a higher compliance requirement will have other controls and also security.
Raj, a data analyst with a knack for storytelling, empowers businesses with actionable insights. His experience, from Research Analyst at Hevo to Senior Executive at Disney+ Hotstar, translates complex marketing data into strategies that drive growth. Raj's Master's degree in Design Engineering fuels his problem-solving approach to data analysis.