Snowflake Dynamic Data Masking – A Beginner’s Guide 2022

• February 11th, 2022

Owning a business in the 21st century is not as simple as it was, say five decades ago. Today, you need to understand several concepts critical to your enterprise’s success. With this in mind, Data Privacy is arguably the most crucial aspect of any business. Accordingly, you need to understand what Data Privacy is and how it will likely affect your business in the future. In simple terms, Data Privacy is an umbrella concept that encompasses the storage, access, and preservation of data in any company. 

With all this information in mind, you will get acquainted with data privacy in relation to Snowflake in this post. More specifically, you will learn what Snowflake Dynamic Data Masking is and how it can affect your business positively. So, let’s deep dive into Snowflake Dynamic Data Masking policies and learn how you can secure your data.

Table of Contents

What is Snowflake?

Snowflake Dynamic Data Masking - Snowflake Logo
Image Source

If you think back to around two decades ago, setting up a data warehouse was a long and complicated process involving many resources. Firstly, you had to spend a lot of money acquiring the hardware to store the data. Let’s not forget the tiresome process of selecting the proper hardware for your company’s needs. This becomes worse if you are not a tech-savvy individual since you will have difficulty understanding the different specs in data storage hardware. Snowflake eliminates the need for all these complications using Cloud-based technology. So, what exactly is it? 

In simple terms, Snowflake is a Software as a Service (Saas) platform that offers an all-in-one platform for Data Warehousing, Data Lakes, Data Engineering, Application Development, and much much more. The platform was developed in 2012 and is built upon the Amazon Web Service, Microsoft Azure, and Google Cloud infrastructure. One of the platform’s most significant benefits is that you do not need to install any hardware, making it suitable for companies that do not want to set up physical data warehouses on site. Instead, everything is handled off-site over the cloud.

Key Features of Snowflake

Let’s explore the key features of Snowflake below:

  • Cloud Provider Agnostic: Snowflake is a cloud-agnostic solution meaning it can work on all major cloud service providers. The benefit? You can easily fit Snowflake into your current cloud service plan. 
  • Scalability: Snowflake offers users multi-cluster cloud technology. This implies that users can scale up resources whenever they have extensive data loads. 
  • Minimal Administration: With Snowflake, users can set up the software with minimal administrative need from the IT team. This is facilitated by state-of-the-art features such as auto-scaling that alters warehouse size depending on the need. 
  • Top-Tier Security Features: It has a wide array of security features such as two-factor authentication. 

By now, you should have a rough idea of what Snowflake is and some of its top features. Now, let us get to the core purpose of this post, Snowflake Dynamic Data Masking. What is it? What benefits does it pose to your business? Read on below to find out. 

What is Data Masking?

Snowflake Dynamic Data Masking - What is Data Masking
Image Source

In every organization, there comes a time when sensitive data arises. This information is provided on a need-to-know basis to the relevant parties since it is critical to the business. Therefore, not all company personnel needs to gain access to this information. This is where Data Masking comes into play. Data Masking helps you to censor/hide the sensitive data being shared in any organization. The most significant benefit of this process is that it significantly reduces the risk of data exposure. 

Why do Companies need Data Masking?

So why do companies need to mask their data? Read on to find out. 

  • Business Reasons: This is where the business administrators conclude that exposing financial data would put the company at significant business risk. 
  • Security Reasons: The IT security team mainly uses data masking due to security reasons, where exposing data would lead to a data breach. 
  • Privacy reasons: A typical example is when data contains Personally Identifiable Information. Here, data masking is critical to maintaining privacy. 
  • Compliance Reasons: Here, data masking is mainly done to comply with security frameworks such as the NIST Cybersecurity framework. 

Read along the next section of this article, to know more about how you can use the Snowflake Dynamic Data Masking techniques to secure your data.

Simplify Snowflake ETL & Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline, helps load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ Data Sources including 40+ Free Sources. It is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. 

Hevo loads the data onto the desired Data Warehouse/destination such as Snowflake in real-time and enriches the data and transforms it into an analysis-ready form without having to write a single line of code. Its completely automated pipeline, fault-tolerant, and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

GET STARTED WITH HEVO FOR FREE

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled securely and consistently with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.

Simplify your Data Analysis with Hevo today! 

SIGN UP HERE FOR A 14-DAY FREE TRIAL!

Example of Data that should be masked

A sample of masked data looks like can be understood with an image below. You can see in the example below Credit_Card_Number column value is hashed. This method helps in protecting sensitive data from unauthorized persons.

Masked Data example

Origin of Snowflake Dynamic Data Masking: Data Governance

Data governance is a collection of policies, practices, standards, and processes that help in ensuring formal data asset management across an organization. Data supervision, data quality, and data management concepts come under the umbrella of data governance. These mechanisms help organizations in dealing with data security, privacy, compliance, integration, and integrity with complete management of data flow (including both internal and external data flow) on an organization-wide scale.

Snowflake divided Data Governance into three categories as shown in the image below:

Snowflake Data Governance
Image Source

Snowflake Dynamic Data Masking comes under the “protect your data” section.

What is Snowflake Dynamic Data Masking? 

Snowflake Dynamic Data Masking
Image Source

Now that you have a rough idea of data masking, let us look at how it relates to Snowflake. More specifically, let us dive into the nuts and bolts of Snowflake Dynamic Data Masking. These techniques are available in the Snowflake Enterprise plan (or higher). Snowflake Dynamic Data Masking allows you to set dynamic masking policies on your selected data columns. 

Snowflake Dynamic Data Masking - Masked and Unmasked Snowflake Data
Image Source

Let us have a quick look at how Snowflake Dynamic Data Masking works. The first step is to define the policy, which can be done as shown below: 

CREATE OR REPLACE MASKING POLICY phone_masking AS (val string)
RETURNS string ->
    CASE
        WHEN CURRENT_ROLE() IN ('ADMIN_TEAM', 'ACCOUNTING_TEAM') THEN val
        ELSE '[REDACTED]'
    END;

Now, let us analyze the code above. It uses a case analysis where the employee’s role is first checked. If the user falls under either the accounting or administration team, the information is displayed. Otherwise, the data is redacted for all other employees. This code satisfies our definition of Snowflake Dynamic Data Masking, where data is only revealed to the relevant parties in the organization. 

You can then apply the defined policy to any column depending on the need. It is worth noting that the Snowflake Dynamic Data Masking policy is used on table and view-only data and executed during query time. 

Let’s look at another instance. Take, for instance, a scenario where you have created a Snowflake table where some columns contain sensitive information. You will need to mask these columns, right? So, how do you go about it? 

We assume that you created the table using the CREATE TABLE statement. You will need to view the columns using the CREATE VIEW statement. Now, the code below will apply the Snowflake Dynamic Data Masking policy to the column you selected.

alter table if exists user_info modify column email set masking policy email_mask;

-- apply the masking policy to a view column

alter view user_info_v modify column email set masking policy email_mask;

Now, let us look at another situation- Different users with varied roles in your company want to query your table. How will this work using the Snowflake Dynamic Data Masking policy? In the below code, the query will first check for the user’s role and then proceed further. Analysts will see the full information, while all other users will see the redacted information.

-- using the ANALYST role

use role analyst;
select email from user_info; -- should see plain text value

-- using the PUBLIC role

use role public;
select email from user_info; -- should see full data mask

That’s it. There really is nothing much to it. The Snowflake Dynamic Data Masking policy is simple and intuitive to use. Furthermore, Snowflake Dynamic Data Masking offers companies immense data privacy benefits since sensitive information is only revealed to the concerned parties. 

Benefits of Dynamic Data Masking

Dynamic Data Masking ensures the protection of column data and makes it visible to only authorized persons or groups of people.

The Top 5 key benefits of using Dynamic Data Masking are summarized below:

  1. Ease of use: Dynamic Data Masking allows you to write policy once apply to thousands of columns across databases and schemas. You don’t have to write multiple policies for different columns, dynamic data masking allows you to write only once and use it for multiple columns.
  2. Data Administration and SoD: The security officer or privacy officer decides which columns to protect, not the owner of the object. Masking policies are easy to manage and support centralized and distributed management models.
  3. Data authorization and governance: It supports data governance administered by security or privacy officers. Moreover, DDM prevents privileged users from accessing the ACCOUNTADMIN or SECURITYADMIN role from unnecessarily viewing data.
  4. Data Sharing: DDM ensures essential data masking before sharing.
  5. Change Management: Masking policy in DDM can be easily changed without a load of reapplying policy to thousands of columns.

How to View the DDM policy body?

You can use the following query to view existing policies in your snowflake data warehouse.

Syntax: Describe masking policy <POLICY_NAME>;

Example: 

Describe masking policy salesformask;
View Dynamic Data Masking Policy Example
Image Source

How to Alter the existing masking policy?

Before altering the existing policy you might need to view the current policy, call the GET_DDL function, or execute the DESCRIBE MASKING POLICY command.  Changing the policy signature is not allowed (that is, the argument name or the input/output data type of the policy cannot be altered).

Example:

describe masking policy email_mask;

-- evaluate output

+-----+------------+---------------+-------------------+-----------------------------------------------------------------------+
| Row | name       | signature     | return_type       | body                                                                  |
+-----+------------+---------------+-------------------+-----------------------------------------------------------------------+
| 1   | EMAIL_MASK | (VAL VARCHAR) | VARCHAR(16777216) | case when current_role() in ('ANALYST') then val else '*********' end |
+-----+------------+---------------+-------------------+-----------------------------------------------------------------------+

alter masking policy email_mask set body ->
  case
    when current_role() in ('ANALYST') then val
    else sha2(val, 512)
  end;

How to Drop the DDM policy?

Drop masking policy command helps in removing an existing policy from the system.

Syntax: DROP MASKING POLICY <name>;

Here <name> is the name of the policy that needs to be dropped

Example:

drop masking policy ssn_mask;

Limitations of Dynamic Data Masking

Shared Objects: Dynamic data masking does not allow you to reference an external function if the masking policy is applied on a table or view column. Therefore, the table or view cannot be shared. 

Drop Database or Drop Schema: In DDM to delete a database or schema, the masking policy and its mapping must be self-contained within the database or schema.

Virtual Column: With DDM, the masking policies for virtual columns are not supported. 

Future Grant: Future granting of masking policy permissions is not supported.

Conclusion 

In this post, you learned about the Snowflake Dynamic Data Masking policies. First off, you began with a short but comprehensive introduction to Snowflake and its features. Next off, you got a thorough introduction to data masking and its benefits, after which you now learned how to apply Snowflake Dynamic Data Masking. With this in mind, you now stand a better chance of protecting your organization’s important data using the Snowflake Dynamic Data Masking techniques.

However, as a Developer, extracting complex data from a diverse set of data sources to your Snowflake can seem to be quite challenging. This is where a simpler alternative like Hevo can save your day! Hevo Data is a No-Code Data Pipeline that offers a faster way to move data from 100+ Data Sources including 40+ Free Sources, into your Snowflake to be visualized in a BI tool.

VISIT OUR WEBSITE TO EXPLORE HEVO

Want to take Hevo for a spin?

SIGN UP and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share your experience with Snowflake Dynamic Data Masking in the comments section below!

No-Code Data Pipeline For Snowflake