The damage that data breaches can do to companies and their reputations is well known. To ensure security, access to data inside and outside an organization must be tightly controlled on a granular level. Sometimes, sensitive information is mixed in with non-sensitive data, and access restrictions need to be enabled for sensitive data based on factors like the location of data or the presence of financial information.
The good news is that modern data warehouses are built with excellent capabilities for controlling precisely who has access to which data. These capabilities are commonly referred to as Identity and Access Management or IAM.
This article introduces Google BigQuery and explains its Column and Row-Level Security in detail. Read along to learn the implementation, use cases, and limitations of these 2 Google Bigquery Security Techniques!
Table of Contents
What is Google BigQuery?
Google BigQuery is a cloud-based scalable Data warehouse that provides fast data processing and huge storage space for your data. It uses a serverless architecture and does not require any infrastructure configuration on your part. It is also cable of automatically replicating your data stored in various locations. This way high availability and reliability of your business data are ensured.
Unlike some cloud-based data warehouses on the market, this tool from Google can run with little to no maintenance. Additionally, Google allows you to access its data warehouse from any location that has Google Cloud availability. Another benefit of using Google BigQuery is that you get smooth and free integrations with other Google-based services like Google Analytics, Google Drive, etc.
What are the Types of Google BigQuery Security?
Google BigQuery Security is available in the following 2 types:
1) Column Level BigQuery Security
Column-level security gives you control over what data is filtered and what columns are made available to a user based on qualifying conditions. At query time, policies can be set up to check whether a user has the appropriate access to read each column using BigQuery Column-level security. This form of access control sits on top of the existing dataset access control. For access to data protected by column-level security, users need dataset and policy tag permission
2) Row-Level BigQuery Security
You can use row-level security to filter data and enable access to specific rows in a table based on conditions that qualify the user. Using policy tags, access controls can be implemented for projects, datasets, tables, and column-level security. By establishing row-level access policies, row-level security extends the principle of least privilege by enabling fine-grained access control to specific subsets of data in BigQuery tables.
Hevo Data, a No-code Data Pipeline helps to Load Data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ data sources and loads the data onto the desired Data Warehouse like Google BigQuery, enriches the data, and transforms it into an analysis-ready form without writing a single line of code.
Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different Business Intelligence (BI) tools as well.
Get Started with Hevo for Free
Check out why Hevo is the Best:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!
What are the Steps to Implement the Column Level BigQuery Security?
It begins with Google’s Data Catalog, which manages data and metadata. The first step is to define the taxonomy (Collection) and policy tags (Type of individual column) in the data catalog.
Step 1: Create Policy Tags
Using BigQuery, sensitive columns can be easily accessed by choosing policy tags that identify the data type. In addition, column-level security in BigQuery allows you to create policies that check a user’s access rights at query time.
First, click on ‘Create and manage policy tags’ under the ‘Data Catalog’ section.
After that, we must create a Taxonomy, which is nothing more than a collection of tags. Click ‘+CREATE’
Please note that taxonomies and tags can only be applied to tables residing in the exact location as BigQuery. Our Taxonomy and tags will be created in the US location.
We can then create all of the tags under this Taxonomy.
Step 2: Apply Policy Tags to the Bigquery Column
The BigQuery Web UI allows you to view a table’s schema while using schema annotations to assign a policy tag to each column where access is restricted.
You can do this by tagging these fields and clicking ‘Edit Schema.’ After selecting one or more fields, we can click ‘Add policy tag.’ Please note that one policy tag can be associated with each field.
Once again, you will only view the Taxonomy and its tags within one location. Once you click save, you have successfully tagged one field.
Continue to tag all fields that need to be restricted until all are tagged. It is important to note that different tags are used depending on the type of data, which will help separate access based on roles later on.
Step 3: Grant Access to the Policy Tags
To restrict access to policy tags, use Identity and Access Management (IAM) policies. For the user to query the columns that have level security applied, you need to grant “Fine-grained access” to the policy tag you defined on the column to the IAM group.
Navigate to the Catalog section and configure the Taxonomy. For example, selecting the Policy Tag allowed us to add a Member to the ‘Data Catalog’ -> ‘Fine-grained Reader’ role.
The ‘Fine-grained Reader’ permission can be applied at various levels, including taxonomies, policy tags, and child policy tags. In addition, having this capability allows us to archive access control granularity with more flexibility.
How to Work with Column Level BigQuery Security?
Once your Column Level BigQuery Security is in place, you can start your work on its Tables. The following are 2 operations that you can perform:
1) Reading Data From A Table With Column Level BigQuery Security
It is impossible to query columns with column-level security enforced if you don’t access the policy. The following points should be kept in mind while reading from a Table with Column Level Security:
- A user with access to Dataset and Catalog Reader role can see the column data. Then, a query is run as usual.
- If you do not possess the Data Catalog Fine-Grained Reader role but have dataset access, you cannot see the column data. An error will appear if you run SELECT *, which lists the columns you cannot access. The error can be fixed by excluding the columns you cannot access from your query.
- The same is true for views. You can query the view column only if you have access to the table and column. Alternatively, you will have to use the list of columns instead of the SELECT *.
2) Loading Data Into a Table with Column Level BigQuery Security
All data loading/modifying operations will work fine without any problems if you have full access (to both the table and the columns). Some points to keep in mind are:
- In the case of table access but no column clearance, Inserts should work fine. It will not be possible to update, delete, or merge.
- The loading of files is not permitted.
- Streaming inserts are permitted.
What is the Pricing of Column Level BigQuery Security?
The Data Catalog also incurs a cost in addition to the standard price for BigQuery.
Pricing for Data Catalog is two-fold:
- Data Catalog charges you based on the size of your metadata.
- Data Catalog API operations are billed to you.
What are the Best Practices for using the Column Level BigQuery Security?
Among Google’s best practices, you can test-run in monitor-only mode before enforcing access policies for your organization. As the name implies, this model does not implement access control, but it audits what your policy tags do.
Assumptions underlying this best practice include:
- Your data is already accessible by a set of authorized users.
- Ideally, you want to determine whether enforcement of new column-level security changes will prevent those users from accessing the data as expected.
- To operate only in monitor mode, you will need to create a taxonomy and assign policy tags to columns, but you won’t need to enforce access yet. You can then continue to allow your previously authorized users access to the system. Each time they access the system, an audit trail is created.
- The audit logs can be scanned to view access to policy-protected columns. First, check whether any unauthorized access occurred. In other words, whether the access would have resulted in a PERMISSION_DENIED error if the policy had been enforced. Once you are satisfied that column-level security has been set up correctly, implement access control. If access control is not enforced, you will not see PERMISSION_DENIED errors.
What are the Limitations of the Column Level BigQuery Security?
Column Level BigQuery Security comes along with the following limitations:
- The policy tags on the destination table are lost if you overwrite the table.
- Column policies cannot be defined in table DDLs.
- Policy tags can only be applied to a single column.
How to Implement the Row Level BigQuery Security?
In BigQuery, row-level security allows different user personas to access other parts of the same table. Currently, these use cases are based on authorized views, but RLS can simplify its management.
Data owners and administrators can determine which users can view rows by creating a low-level access policy. Administrators or data owners can create row-level access policies for target tables, and these policies will be applied when a query is run on the table. Several policies can be used at the same table at the same time. SESSION_USER() can also specify row-level access policies so that the user running the query can only access rows that belong to them. Anyone without access to any row access policies will not see the data in the table.
BigQuery displays a banner indicating that those queries might be filtered by row-level access policies when a user queries a table with a row-level access policy. No matter whether a user is a member of the *grantee_list*, the notice will appear nonetheless.
What are the Use Cases for the Row Level BigQuery Security?
The Row-Level BigQuery Security can be used as follows:
- To control access to data for filter conditions, row-level access policies are helpful. The filter predicate of row-access policies can support arbitrary SQL expressions. It is conceptually related to a SQL query’s WHERE clause.
- The SESSION_USER() function allows filter predicates to restrict access only to rows belonging to the user running the query. The querying user will not have access to the data in the table if none of the row access policies applies to them. There is currently no support for subqueries in filter expressions to filter data based on lookup tables and calculated values. A future release will support subqueries in filter expressions.
- The DDL statement can be used to create, update, and drop row-level access policies. In addition, the BigQuery schema pane in the Cloud Console simplifies the management of policies per table, and the bq command-line tool allows you to see a list of row-level access policies applied to a table.
Conclusion
This article introduced you to Google BigQuery and its 2 security types, Column Level & Row Level. It further covered the implementation, working, best practices, and limitations of the Column Level BigQuery Security. Furthermore, the article explained the various aspects like implementation and use cases related to the Row Level BigQuery Security. Now, you can decide which security will suit your needs better.
Visit our Website to Explore Hevo
Google BigQuery is a great Data Warehousing service. However, first, you need to transfer data from multiple sources to this Data Warehouse for further analysis. Building an in-house solution for this process could be an expensive and time-consuming task Hevo Data, on the other hand, offers a No-code Data Pipeline that can automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc. This platform allows you to transfer data from 100+ sources such as QuickBooks to Cloud-based Data Warehouses like Snowflake, Google BigQuery, Amazon Redshift, etc. It will provide you with a hassle-free experience and make your work life much easier.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.
Share your views on Google BigQuery Security in the comments section!