Google BigQuery Logs & Audit Logging Simplified 101

on API, Cloud Computing, Data Aggregation, Data Analytics, Data Integration, Data Security, Google BigQuery, Marketing Cloud • September 15th, 2021 • Write for Hevo

Google BigQuery is one of the most popular Enterprise Data Warehouse solutions used by businesses to manage and analyze their data. One of the biggest features of Google BigQuery is its Serverless Architecture. Essentially, this allows organizations to use SQL queries without having to worry about Server Management. The Warehouse also has its own analysis engine which makes it easy for organizations to sequence and analyze petabytes of data within a matter of minutes. Google BigQuery Logs, on the other hand, are a set of logs offered by Google Cloud that give you insight into operational issues with your Google Cloud services.

This article explains how to use Google BigQuery to examine Logged activity and gives specifics about Google BigQuery-specific log information, along with some best practices that you must implement to level up your Log Analysis strategies! It will also provide with you in-depth understanding on why it is essential to carry out Log Analysis and how you can route your logs to Google BigQuery.

Table of Contents

Introduction to Google BigQuery

BigQuery Logs - Google BigQuery Logo
Image Source

Google BigQuery is a robust Serverless Data Warehouse solution that lets you store and analyse large amounts of data in a matter of a few seconds. It allows you to leverage its phenomenal functionalities such as robust querying & processing abilities to analyse data from various sources such as Google Analytics and a lot more.

It further comes with a plethora of different features that allow companies to evaluate their data including Spatial Data Science Analysis, Machine Learning, and Business Intelligence (BI). 

Key features of Google BigQuery

  • Scalability: Google BigQuery offers true scalability and consistent performance using its massively parallel computing and secure storage engine.
  • Data Ingestion Formats: Google BigQuery allows users to load data in various formats such as AVRO, CSV, JSON, etc.
  • Built-in AI & ML: It supports predictive analysis using its auto ML tables feature, a codeless interface that helps develop models having the best in class accuracy. Google BigQuery ML is another feature that supports algorithms such as K means, Logistic Regression, etc.
  • Parallel Processing: It uses a cloud-based parallel query processing engine that reads data from thousands of disks at the same time.

For further information on Google BigQuery, you can check the official website here.

Understanding Google BigQuery Audit Logs

Google BigQuery Logs are a series of Auditing Logs that are provided by Google Cloud. BigQuery Logs are designed to give businesses a more comprehensive insight into their use of Google Cloud’s services, as well as providing information that pertains to specific Google BigQuery lots. 

The message system uses a structured format, with Google BigQuery offering three discrete message options. These include:

  • Audit Log: These are the logs used by Google BigQuery Connections and Reservations for reporting a request.
  • Audit Data: This is the old version of Google BigQuery’s Audit logs. The Report API invocations mainly.
  • BigQuery Audit Metadata: The newer version of Google BigQuery’s Audit Logs – used for reporting on resource interactions. For instance, in case a Specified Query Job on tables is being read or written to, it will be reported by these logs. Similarly, the logs will also specify tables that expired because of an expiration time that was configured for them. 

To put it simply, a Log is just a generated request from a Server. These are designed to give you a better understanding of how you use Google Cloud’s services.

3 Key Google BigQuery Log Formats

The Audit Log Message system is based on Structured Logs, and the Google BigQuery service offers three types of Log Formats:

  • AuditData Format: The messages for Audit data are passed on in the protoPayload.serviceData sub-message. These can be found in the LogEntry message. 
  • AuditLog Format: The AuditLog format is used by Google BigQuery Reservations when reporting on a request. Important information can be found in these logs, including:
    • Resource.type
    • Resource.labels Location
  • AuditMetadata Format: Details for the Google BigQuery Audit Metadata can be found in the protoPayload.metadata sub-message. You will find that in the LogEntry message. In these logs, the information is not used or set. More information for resource type and Bigquery Dataset can be found in these messages.

Understanding Importance of Analysing BigQuery Logs

Google BigQuery Logs analysis is essential for a variety of reasons. If your company has a website, you should know how it’s performing. Conducting analysis using BigQuery Logs is a great way to identify problems. Log Analysis can help you in many ways:

1) Identifying Bugs

Larger websites usually have a lot of changes taking place on a regular basis. It can become a bit overwhelming to identify and analyze each change individually. In certain situations, this can lead to links breaking. You might end up with 404 links or 302 redirects that lead to nowhere.

If your website is a bit too big to crawl properly, you could conduct a thorough Log Analysis, provided you are using Google BigQuery as your primary database. This will give you a better understanding of different error types and allow you to prioritize certain things over others. 

2) Figuring out Indexing and Crawling Problems

This is perhaps the biggest use of analyzing your Google BigQuery Logs. In case Google is unable to crawl your website properly, it could lead to a myriad of issues. 

However, with Google BigQuery Log files, you can easily identify the problem. Logs are generated when Google is crawling the website, as well as specific areas where the Search Engine is having issues. Using BigQuery is the best way to get access to such Logs. SQL is designed to support incredibly complicated queries, so you could easily create one that tells you how long Google takes to crawl specific pages. 

Understanding the Basics of Cloud Audit Logging

If you use the Google Cloud Platform in one way or another for monitoring or managing applications, you should definitely consider using it with Google BigQuery. This will allow you to generate queries and write requests to gain access to very specific information. 

Google Cloud Audit Logs can be used along with Google BigQuery to identify problems and implement solutions. The Google Cloud Platform is capable of emitting three distinct types of Audit logs. These include:

  • System Event: Logs for every administrative activity that takes place, including modifications to resource configurations.
  • Admin Activity: if any user modifies resource configurations or makes entries for API calls, these will be logged.
  • Access to Data: Entries are made for all API calls that go through Metadata or require access to read or write data. 

It’s fairly easy to view Access Logs using the Google Cloud Platform console. All you have to do when you log in is to select Logging and then check the Log Viewer. 

BigQuery logs - Log history
Image Source

You must understand that while Project-Level logs can be viewed directly in the console, only Organization-Level logs are available through the Cloud Logging API. If you want to view all of your Audit logs in a single place, you must move them using a Data Pipeline solution like Hevo Data

Simplify BigQuery ETL using Hevo’s No-code Data Pipelines

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 100+ Data Sources (including 40+ free sources) to a Data Warehouse such as Google BigQuery or Destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Get Started with Hevo for Free

Its completely automated Data Pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner and supports different forms of data. The solutions provided are consistent and work with different Business Intelligence (BI) tools as well.

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built to Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Routing Log Entries to Google BigQuery

BigQuery Logs can be routed directly using Sinks. A Sink is designed to control the manner in which Cloud Logging routes your Logs. With Sinks, you can easily route Logs to different destinations, such as Google BigQuery. You must understand that Log Sinks are independent of each other. 

They must all pass through the Log Router. Sinks must be created at the project level in Google Cloud, and you can also build Aggregated Sinks if you prefer.

Here’s a brief overview of how Cloud Logging works:

BigQuery Logs - Working
Image Source

Best Practices for Cloud Audit Logs

Cloud Audit Logs are designed to improve security and compliance by allowing businesses to maintain an Audit Trail in their Google Cloud.

With Cloud Audit Logs, your business can achieve a higher level of transparency over, all administrative resources and activities, allowing your company to function in a smooth and efficient manner. 

1) Best Tips to Configure Cloud Audit Logs for Data Access

  • All Data Access Audit Logs, apart from Google BigQuery, are disabled from the get-go. You must enable them individually and tweak individual aspects of each Log. You can create service-specific configurations and optimize the kind of information that is recorded by the Audit Logs. 
  • Ideally, you will want to run a test Google Cloud project to determine whether your Data Access Audit Logs are being generated properly. Keep in mind that by default, only Google BigQuery has Data Access Audit Logs enabled. 
  • Data Access Audit Logs are also generally quite big in size. You are likely to spend a higher amount of money on additional storage costs, so it’s best to remove logging data that you feel won’t be useful. 
  • When setting up Cloud Audit Logs, you also need to apply appropriate internal controls for Identity and Access Management (IAM). Setting up appropriate Cloud Logging rules is essential so access is only granted to cleared individuals for viewing the Logs.

2) Naming Conventions Adopted in Google BigQuery Logs

It’s important to understand that certain naming conventions must be adhered to when receiving BigQuery Logs from the Google Cloud Platform

  • Log entry field names must not be greater than 128 characters.
  • Unsupported characters must be removed from field names and replaced with underscored characters. 
  • All log entry fields that fall under the LogEntry type must have similar Google BigQuery field names as their log entry fields.

Conclusion

This article introduced you to Google BigQuery Logs and provided you with an in-depth understanding of Google Cloud Audit Logging. It further provided you with some essential practices you must follow while working with Google BigQuery Logs. Carrying out an insightful Log Analysis would require you to load your Log Data from Google Cloud to BigQuery. This might sound easy, but it can be challenging, especially for beginners, and this is where Hevo Data saves the day!

Visit our Website to Explore Hevo

Hevo Data allows you to transfer data from 100+ multiple sources such as Google Cloud Storage to Cloud-based Data Warehouses like Snowflake, Google BigQuery, Amazon Redshift, Firebolt, etc. It will provide you with a hassle-free experience and make your work life much easier.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of learning about Google BigQuery Logs and Cloud Audit Logging. Let us know in the comments section below!

No-code Data Pipeline for Google BigQuery