Google BigQuery Aggregate Functions: Simplified 101

on BigQuery Functions, Data Aggregation, Google BigQuery, Google BigQuery ML • September 29th, 2021 • Write for Hevo

Google BigQuery is a Cloud Storage platform provided by Google. It provides its users with an online platform where they can store their data. Google BigQuery offers many benefits to its users. It is very elastic, scaling up and down to meet the changing storage and compute needs of its users. Google BigQuery also uses various mechanisms to ensure that the users’ data is secure. Data stored in Google BigQuery can be accessed anytime and from any location, giving users improved accessibility. 

Google BigQuery organizes the user’s data into tables, which are made up of rows and columns. It makes it easy for the users to understand their data. When using Google BigQuery, you will need to get summaries from your data. For example, you may need to know the total number of rows contained in a table, the average value for a particular table column, and more. 

Instead of calculating these manually, Google BigQuery provides Google BigQuery Aggregate Functions to make your work easier. You simply have to call the Google BigQuery Aggregate Function, and it will return the results to you. In this article, you will learn about Google BigQuery Aggregate Functions in detail. 

Table of Contents

Prerequisites

  • A Google BigQuery account.

Introduction to Google BigQuery

Google BigQuery Logo
Image Source

Google BigQuery is a cost-effective, serverless, and highly scalable multi-cloud data warehouse designed and developed to offer business agility.  Google BigQuery was developed by Google, hence, it uses the processing power of Google’s infrastructure. It comes with built-in machine learning capabilities that can help you to understand your data better. 

You can use Google BigQuery in the following three main ways:

  • Load and Export Data: With Google BigQuery, you can easily and quickly load your data into Google BigQuery. Google BigQuery will then process your data, after which you can export it for further analysis. 
  • Query and View Data: Google BigQuery allows you to run interactive queries. You can also run batch queries and create virtual tables from data. 
  • Manage Data: You can use Google BigQuery to list jobs, datasets, projects, and tables. It’s easy to get information about any of these and patch or update your datasets. Google BigQuery makes it feasible for you to delete and manage any data that you upload. 

Google BigQuery also allows you to create dashboards and reports that you can use to analyze your data and gain meaningful insights from it. It is also a powerful tool for real-time Data Analytics. 

Key Features of Google BigQuery

Google BigQuery helps companies store a large volume of data and run analysis on it. There are many more features for choosing Google BigQuery. A few of them are listed below:

  • Google BigQuery ML: Google BigQuery comes with a Google BigQuery ML feature that allows users to create, train and execute Machine Learning models in Data Warehouse using standard SQL.
  • Integrations: Google BigQuery offers easy integrations with other Google products and its partnered apps. Moreover, developers can easily create integration with its API.
  • Fault-tolerant Structure: It delivers a fault-tolerant structure to prevent data loss and provide real-time logs for any error in an ETL process.

To learn more about Google BigQuery, click here.

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ data sources and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Get Started with Hevo for Free

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensures that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  1. Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  2. Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  3. Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  4. Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  5. Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  6. Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, E-Mail, and support calls.
  7. Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Understanding Google BigQuery Aggregate Functions

An Aggregate Function does calculations on a set of values and returns only one value. They use multiple rows of data as inputs and form a single value with a significant meaning. Most Aggregate Functions ignore null values. They are used together with the SELECT statement and other SQL clauses. 

Google BigQuery supports different types of Aggregate Functions. These can help you to obtain summaries from your data stored in Google BigQuery. In the next section, we will be discussing how to use different Google BigQuery Aggregate Functions.

Google BigQuery Aggregate Functions

Google BigQuery supports different types of Aggregate Functions. The 3 Google BigQuery Aggregate Functions are listed below:

1) Google BigQuery Aggregate Functions: COUNT

The COUNT function returns the total number of rows in the input. It can be used with any of the syntax given below:

COUNT(*) [OVER (...)]

And,

COUNT(
  [DISTINCT]
  expression
)
[OVER (...)]

The first syntax above will return the number of rows found in the input. The second syntax will return the total number of rows with expressions that evaluate to any value other than null, and expression can take any data type. If you use the DISTINCT clause, the expression must belong to a groupable data type. The OVER clause is optional and it specifies a window. 

The COUNT function can help you to return the number of rows contained in a table or the number of distinct values in an expression. For example:

SELECT
  COUNT(*) AS count_all,
  COUNT(DISTINCT x) AS count_distinct
FROM UNNEST([2, 5, 4, 9, 5]) AS x;

In the above query, we are counting the total number of items in the input as well as the total number of distinct items in the input. It should return the following output as shown in the image below:

Google BigQuery Aggregate Functions: COUNT
Image Source

The output shows that there are 5 items in the input, and 4 distinct items in the input (5 have been repeated twice in the input).

2) Google BigQuery Aggregate Functions: AVG

The AVG function returns the average of all non-null values in the input, or NaN if there is a NaN in the input. It supports numeric inputs, like INT64. Note that if the input is a floating-point type, the result will be non-deterministic, meaning that you may get different results each time you use it with the same inputs. 

It takes the following syntax:

AVG(
  [DISTINCT]
  expression
)
[OVER (...)]

The OVER clause is optional and it specifies a window. If you use it with the DISTINCT clause, each distinct value in the expression will only be aggregated once into the result. 

Consider the example given below:

SELECT AVG(x) as avg
FROM UNNEST([0, 4, 6, 9, 1]) as x;

The query will calculate and return the average of the items in the input. Their sum is 20, and there are 5 items, so the average is 4 as as shown in the image below:

Google BigQuery Aggregate Functions: AVG
Image Source

You can combine the function with the DISTINCT clause as shown below:

SELECT AVG(DISTINCT x) AS avg
FROM UNNEST([0, 4, 7, 9, 4]) AS x;

There are two 4s in the input, but only one will be counted. The reason is that we have used the DISTINCT clause. The query will return the following result as shown in the image below:

Google BigQuery Aggregate Functions: AVG with DISTINCT CLAUSE
Image Source

3) Google BigQuery Aggregate Functions: SUM

The SUM function calculates and returns the sum of non-null values. It takes the following syntax:

SUM(
  [DISTINCT]
  expression
)
[OVER (...)]

If the value of the expression is a floating-point value, the output of the function will be non-deterministic. It can also be used together with the optional DISTINCT and OVER clauses.

Consider the following example:

SELECT SUM(x) AS sum
FROM UNNEST([3, 2, 4, 4, 1, 4, 2, 2, 3]) AS x;

The query will return the sum of all items in the input as shown below:

Google BigQuery Aggregate Functions: SUM
Image Source

To avoid summing duplicate values in the input, combine it with the DISTINCT clause as shown below:

SELECT SUM(DISTINCT x) AS sum
FROM UNNEST([3, 2, 4, 4, 1, 4, 2, 2, 3]) AS x;

The query will return the following as shown in the image below:

Google BigQuery Aggregate Functions: SUM with DISTINCT CLAUSE
Image Source

That is how Google BigQuery Aggregate Functions work.

Conclusion

In this article, you read about Google BigQuery and Google BigQuery Aggregate Functions. You also learnt how to use Google BigQuery Aggregate Functions. These functions save a lot of time and manual processes for simple calculations and counting. They are easy to use and help developers get summaries of data to get an overview of data before querying it further.

Visit our Website to Explore Hevo

Companies store valuable data from multiple data sources into Google BigQuery. The manual process to transfer data from source to destination is a tedious task. Hevo Data is a No-code Data Pipeline that can help you transfer data from any data source to desired Google BigQuery. It fully automates the process to load and transform data from 100+ sources to a destination of your choice without writing a single line of code. 

Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of learning about Google BigQuery Aggregate Functions in the comments section below!

No-code Data Pipeline For your Google BigQuery