All of us have heard and seen about Data Breaches and the damage they can cause, both financially and reputationally. As more and more of our decisions are based on our data, data is becoming a critical asset for any organization. Therefore, securing your data is as important as protecting any other company asset.
Fortunately, there are many great features built into modern Data Warehouses for precisely managing who has access to what. This is known as Identity and Access Management, or IAM. This post explores how Google controls access to data with BigQuery, a service that Cloud providers commonly offer.
What is Google BigQuery?
Google BigQuery is a Cloud-based Data Warehouse that provides a Big Data Analytic Web Service for processing petabytes of data. It is intended for analyzing data on a large scale. It consists of two distinct components: Storage and Query Processing. It employs the Dremel Query Engine to process queries and is built on the Colossus File System for storage. These two components are decoupled and can be scaled independently and on-demand.
Key Features of Google BigQuery
- Scalable Architecture: BigQuery has a scalable architecture and offers a petabyte scalable system that users can scale up and down as per load.
- Faster Processing: Being a scalable architecture, BigQuery executes petabytes of data within the stipulated time and is more rapid than many conventional systems. BigQuery allows users to run analysis over millions of rows without worrying about scalability.
- Fully-Managed: BigQuery is a product of Google Cloud Platform, and thus it offers fully managed and serverless systems.
- Security: BigQuery has the utmost security level that protects the data at rest and in flight.
- Real-time Data Ingestion: BigQuery can perform real-time data analysis, thereby making it famous across all the IoT and Transaction platforms.
- Fault Tolerance: BigQuery offers replication that replicates data across multiple zones or regions. It ensures consistent data availability when the region/zones go down.
- Pricing Models: The Google BigQuery platform is available in both on-demand and flat-rate subscription models. Although data storage and querying will be charged, exporting, loading, and copying data is free. It has separated computational resources from storage resources. You are only charged when you run queries. The quantity of data processed during searches is billed.
Understanding BigQuery IAM Role Types
BigQuery IAM Policies consists of three types of Roles:
- Predefined Roles: The Google Cloud Access tokens provide granular access to a specific service and are managed by Google. Predefined Roles are designed to support common use cases and Access Control patterns.
- Custom Roles: The Custom Roles grant access according to permissions specified by the user.
- Basic Roles: The Basic Roles include the roles of Owner, Editor, and Viewer.
You can use one of the following methods to find out if given permission is included in a Predefined, Custom, or Basic Role:
As a result of assigning Predefined and Basic Roles to a user, the permissions granted are a union of the roles’ permissions.
Understanding BigQuery IAM Roles and Permissions
A User, Group, or Service account can be granted access to a Google BigQuery resource by assigning one or more roles. Google BigQuery resources can be accessed at the following levels:
- Google Cloud project or Organization Level
- At the Dataset Level
- View or Table Level
A) Roles Applied at Google Cloud Project or Organization Level
You can assign roles at the organization and project level to grant permission for running BigQuery jobs or accessing all BigQuery resources of a project.
B) Roles Applied at a Dataset Level
A dataset can be assigned roles so that users have access to a particular dataset without having complete access to a project’s resources. A BigQuery dataset is a child resource of a project in the BigQuery IAM hierarchy.
Controlling Access to Folders, Projects, and Organizations
Access is managed through Identity and Access Management (IAM). These BigQuery IAM policies are attached to Google Cloud resources. A collection of Role bindings in each policy binds a principal to a BigQuery IAM role, like a user or service account. The role bindings grant the principals the specified roles, both on the policy’s resource and all of its descendants.
Managing access to projects, folders, and organizations can be done using the Google Cloud Console, Google Cloud CLI, REST API, and Resource Manager Client Libraries.
How to View Current Access in BigQuery IAM?
With the Cloud Console, the gcloud CLI, the REST API, or Resource Manager client libraries, you can see who has access to your project, folder, or organization.
- Go to the Google BigQuery IAM page in the Cloud Console.
- Choose a project, folder, or organization. In the Cloud Console, you can see all the principals who have been assigned roles in your project, folder, or organization. Principals who have inherited roles on this resource from parent resources are included in this list.
How to Grant or Revoke a Single Role in BigQuery IAM?
You can quickly grant or revoke a single role for a single principle by using the Cloud Console or the gcloud CLI without directly editing the resource’s IAM policy. The most common types of principals are Google accounts, Google service accounts, Google groups, and domains.
A) Granting a Single Role using BigQuery IAM Policies
To assign a single role to a principal, follow these steps:
- Step 1: Go to the BigQuery IAM page in the Cloud Console.
- Step 2: Choose a project, a folder, or an organization.
- Step 3: Choose a principal to grant a role to:
- For a principal with other roles already on the resource, find the row containing the principal’s email address, click edit Edit principal, and then click add Add another role.
- If you wish to grant a role to a Google-managed service account, select Include Google-provided role grants to see its email address.
- To grant a role to an individual who does not currently have any roles on the resource, click person_add Add, then enter the individual’s email address.
- Step 4: Choose a role to grant from the drop-down list. Choosing a role that contains only the permissions the principal needs is the best security practice.
- Step 5: Add a condition to the role if desired.
- Step 6: Save. The principal is granted access to the resource.
If you want to assign a role to a principal for multiple projects, folders, or organizations, do the following:
- Step 1: Go to Manage resources in the Cloud Console.
- Step 2: You will need to select all resources for which permissions need to be granted.
- Step 3: Click the Show info panel. If the info panel is not visible after that, click Permissions.
- Step 4: Choose a principal to grant a role to:
- Adding another role to an existing principal is as simple as finding the row with the principal’s email address, clicking the Edit principal button, and clicking the Add another role button.
- Adding a principal without other roles requires clicking person_add Add principal, then entering their email address.
- Step 5: From the drop-down menu, choose a role to grant.
- Step 6: Add a condition to the role if desired.
- Step 7: Then click Save. The principal will be granted access to the selected resources.
B) How to Revoke a Single Role using BigQuery IAM?
Follow these steps to revoke a role from a principal:
- Step 1: Go to the BigQuery IAM page in the Cloud Console.
- Step 2: Choose a project, a folder, or an organization.
- Step 3: You want to revoke the principal’s access, so you need to locate the row with his email address. In that row, click Edit principal.
- Step 4: For each role, you want to revoke, click the Delete delete button, and then click Save.
C) How to Grant or Revoke Multiple Roles with BigQuery IAM?
To make large-scale access changes that involve granting and revoking multiple roles, use the read-modify-write pattern to update the resource’s IAM policy:
- Reading the current policy by calling getIamPolicy().
- Editing the returned policy, either by using a text editor or programmatically, adds or removes any principals or role bindings.
- Writing the updated policy by calling setIamPolicy().
To update the policy, you can use the gcloud CLI, the REST API, or the Resource Manager client libraries.
Control Access to Datasets in BigQuery
A dataset’s permissions specify which users, groups, and service accounts may access its tables, views, and table data. You can, for example, grant the bigquery.dataOwner Identity and Access Management (IAM) role to the owner of a dataset, which allows the owner to create, update and delete tables and views.
Access Controls can be applied during dataset creation by using the datasets.insert API method. It is impossible to apply Access Controls during dataset creation in the Google Cloud Console, the bq command-line tool, or data definition language (DDL) statements.
Once a dataset has been created, Access Controls can be applied in the following ways:
- You can use the Cloud Console.
- GRANT and REVOKE DCL statements are used.
- By using the bq update command in the bq command-line tool.
- Using the datasets.patch API method.
- Utilizing the client libraries.
A) Steps to Grant Access to a Dataset in BigQuery IAM
The following steps must be taken to grant access to a dataset:
- Step 1: Expand your project in the Explorer panel and then select a dataset.
- Step 2: Click the Share dataset button in the details panel.
- Step 3: Add the entity you wish to add to the Add members field of the Share dataset panel on the Dataset permissions tab. There are several entities you can select from:
- Using a Google Account Email: Grants access a Google account to access the dataset.
- Google Group: Provides access to a dataset to all members of a Google Group.
- Google Apps Domain: Access to the dataset is available to users and groups within a Google domain.
- Service Account: Allows access to the dataset by a service account.
- Anybody: Select all users to grant access to the general public.
- All Google Accounts: Select allAuthenticatedUsers to grant access to all service accounts and all Internet users with a Google account. For instance, user@gmail.com.
- Step 4: Select a role, choose BigQuery, and a predefined IAM role for the new members.
- Step 5: Click on Done.
B) Steps to Revoke Access to a Dataset in BigQuery
A dataset can be revoked by:
- Step 1: Expand your project in the Explorer panel and then select a dataset.
- Step 2: Click Share dataset in the details panel.
- Step 3: Expand the role whose membership you wish to change in the Dataset permissions tab of the Share dataset panel.
- Step 4: Delete the user account you wish to remove by clicking Delete delete.
- Step 5: In the Remove member? In a dialog box, click the Remove button.
- Step 6: And lastly, click Done.
Conclusion
In a modern Cloud platform, such as Google Cloud, many layers limit data access. Despite the many control layers, knowing the best practices for controlling access and granting permissions can be challenging. We discussed some recommended best practices for BigQuery IAM Management in this post. More so we discussed granting permissions and controlling access to your resources in BigQuery.
Share your experience of learning about BigQuery IAM in the comments section below!
Samuel is a versatile writer specializing in the data industry. With over seven years of experience, he excels in data science, data integration, and data analysis, crafting engaging content on these topics. He is also adept at WordPress development. Samuel holds a Bachelor's degree in Computer Science from Lagos State University.