Google BigQuery is a Cloud Storage platform provided by Google. It provides its users with an online platform where they can store their data. Google BigQuery offers many benefits to its users. It is very elastic, scaling up and down to meet the changing storage and compute needs of its users. Google BigQuery also uses various mechanisms to ensure that the users’ data is secure. Data stored in Google BigQuery can be accessed anytime and from any location, giving users improved accessibility. 

Google BigQuery organizes the user’s data into tables, which are made up of rows and columns. It makes it easy for the users to understand their data. When using Google BigQuery, you will need to get summaries from your data. For example, you may need to know the total number of rows contained in a table, the average value for a particular table column, and more. 

Instead of calculating these manually, Google BigQuery provides Google BigQuery Aggregate Functions to make your work easier. You simply have to call the Google BigQuery Aggregate Function, and it will return the results to you. In this article, you will learn about Google BigQuery Aggregate Functions in detail. 

Prerequisites

  • A Google BigQuery account.

Introduction to Google BigQuery

Google BigQuery

Google BigQuery is a cost-effective, serverless, and highly scalable multi-cloud data warehouse designed and developed to offer business agility.  Google BigQuery was developed by Google, hence, it uses the processing power of Google’s infrastructure. It comes with built-in machine learning capabilities that can help you to understand your data better. 

You can use Google BigQuery in the following three main ways:

  • Load and Export Data: With Google BigQuery, you can easily and quickly load your data into Google BigQuery. Google BigQuery will then process your data, after which you can export it for further analysis. 
  • Query and View Data: Google BigQuery allows you to run interactive queries. You can also run batch queries and create virtual tables from data. 
  • Manage Data: You can use Google BigQuery to list jobs, datasets, projects, and tables. It’s easy to get information about any of these and patch or update your datasets. Google BigQuery makes it feasible for you to delete and manage any data that you upload. 

Google BigQuery also allows you to create dashboards and reports that you can use to analyze your data and gain meaningful insights from it. It is also a powerful tool for real-time Data Analytics. 

Simplify Data Analysis with Hevo’s No-Code Data Pipeline for Google BigQuery

Hevo Data offers a no-code data pipeline that makes it incredibly easy to load data from over 150+ sources—including databases, SaaS applications, cloud storage, and streaming services—into Google BigQuery
Key Benefits of Using Hevo with Google BigQuery:

  • Automated, Real-Time Data Transfer: Hevo ensures real-time data delivery to BigQuery without any loss or delays, allowing for seamless, up-to-date analytics.
  • Fault-Tolerant and Scalable: Hevo’s architecture is designed to scale as your data grows, handling millions of records per minute while maintaining data security and integrity.
  • Schema Management: Hevo automatically detects incoming data schemas and maps them to BigQuery, eliminating manual schema management.

Hevo is a robust and scalable solution, making it an ideal partner for managing and analyzing large-scale datasets in Google BigQuery.

Get Started with Hevo for Free

Key Features of Google BigQuery

Google BigQuery helps companies store a large volume of data and run analysis on it. There are many more features for choosing Google BigQuery. A few of them are listed below:

  • Google BigQuery ML: Google BigQuery comes with a Google BigQuery ML feature that allows users to create, train and execute Machine Learning models in Data Warehouse using standard SQL.
  • Integrations: Google BigQuery offers easy integrations with other Google products and its partnered apps. Moreover, developers can easily create integration with its API.
  • Fault-tolerant Structure: It delivers a fault-tolerant structure to prevent data loss and provide real-time logs for any error in an ETL process.

Understanding Google BigQuery Aggregate Functions

An Aggregate Function does calculations on a set of values and returns only one value. They use multiple rows of data as inputs and form a single value with a significant meaning. Most Aggregate Functions ignore null values. They are used together with the SELECT statement and other SQL clauses. 

Google BigQuery supports different types of Aggregate Functions. These can help you to obtain summaries from your data stored in Google BigQuery. In the next section, we will be discussing how to use different Google BigQuery Aggregate Functions.

Google BigQuery Aggregate Functions

Google BigQuery supports different types of Aggregate Functions. The 3 Google BigQuery Aggregate Functions are listed below:

1) COUNT

The COUNT function returns the total number of rows in the input. It can be used with any of the syntax given below:

COUNT(*) [OVER (...)]

And,

COUNT(
  [DISTINCT]
  expression
)
[OVER (...)]

The first syntax above will return the number of rows found in the input. The second syntax will return the total number of rows with expressions that evaluate to any value other than null, and expression can take any data type. If you use the DISTINCT clause, the expression must belong to a groupable data type. The OVER clause is optional and it specifies a window. 

The COUNT function can help you to return the number of rows contained in a table or the number of distinct values in an expression. For example:

SELECT
  COUNT(*) AS count_all,
  COUNT(DISTINCT x) AS count_distinct
FROM UNNEST([2, 5, 4, 9, 5]) AS x;

In the above query, we are counting the total number of items in the input as well as the total number of distinct items in the input. It should return the following output as shown in the image below:

Google BigQuery Aggregate Functions: COUNT

The output shows that there are 5 items in the input, and 4 distinct items in the input (5 have been repeated twice in the input).

2) AVG

The AVG function returns the average of all non-null values in the input, or NaN if there is a NaN in the input. It supports numeric inputs, like INT64. Note that if the input is a floating-point type, the result will be non-deterministic, meaning that you may get different results each time you use it with the same inputs. 

It takes the following syntax:

AVG(
  [DISTINCT]
  expression
)
[OVER (...)]

The OVER clause is optional and it specifies a window. If you use it with the DISTINCT clause, each distinct value in the expression will only be aggregated once into the result. 

Consider the example given below:

SELECT AVG(x) as avg
FROM UNNEST([0, 4, 6, 9, 1]) as x;

The query will calculate and return the average of the items in the input. Their sum is 20, and there are 5 items, so the average is 4 as as shown in the image below:

Google BigQuery Aggregate Functions: AVG

You can combine the function with the DISTINCT clause as shown below:

SELECT AVG(DISTINCT x) AS avg
FROM UNNEST([0, 4, 7, 9, 4]) AS x;

There are two 4s in the input, but only one will be counted. The reason is that we have used the DISTINCT clause. The query will return the following result as shown in the image below:

Google BigQuery Aggregate Functions: AVG with DISTINCT CLAUSE

3) SUM

The SUM function calculates and returns the sum of non-null values. It takes the following syntax:

SUM(
  [DISTINCT]
  expression
)
[OVER (...)]

If the value of the expression is a floating-point value, the output of the function will be non-deterministic. It can also be used together with the optional DISTINCT and OVER clauses.

Consider the following example:

SELECT SUM(x) AS sum
FROM UNNEST([3, 2, 4, 4, 1, 4, 2, 2, 3]) AS x;

The query will return the sum of all items in the input as shown below:

Google BigQuery Aggregate Functions: SUM

To avoid summing duplicate values in the input, combine it with the DISTINCT clause as shown below:

SELECT SUM(DISTINCT x) AS sum
FROM UNNEST([3, 2, 4, 4, 1, 4, 2, 2, 3]) AS x;

The query will return the following as shown in the image below:

Google BigQuery Aggregate Functions: SUM with DISTINCT CLAUSE

That is how Google BigQuery Aggregate Functions work.

4) MAX

The MAX function returns the maximum value in the input. It works on numeric, date, or string data types, and it ignores NULL values. The syntax is as follows:

MAX(
  expression
)
[OVER (...)]

The OVER clause is optional and specifies a window for performing the aggregation. The function can be used to find the largest value in a dataset.

Example:

SELECT MAX(x) AS max_value
FROM UNNEST([7, 2, 9, 3, 5]) AS x;

In this example, the query will return the maximum value of 9 as shown in the output.

max output

5) MIN

The MIN function returns the minimum value in the input. Like the MAX function, it works on numeric, date, or string data types, and ignores NULL values. The syntax is as follows:

MIN(
  expression
)
[OVER (...)]

For Example:

SELECT MIN(x) AS min_value
FROM UNNEST([8, 5, 2, 9, 6]) AS x;

This query will return 2 as the minimum value in the input dataset.

min output

Conclusion

In this article, you read about Google BigQuery and Google BigQuery Aggregate Functions. You also learnt how to use Google BigQuery Aggregate Functions. These functions save a lot of time and manual processes for simple calculations and counting. They are easy to use and help developers get summaries of data to get an overview of data before querying it further. Master the art of data aggregation in BigQuery with a focus on summing values across multiple columns.

Companies store valuable data from multiple data sources into Google BigQuery. The manual process to transfer data from source to destination is a tedious task. Hevo Data is a No-code Data Pipeline that can help you transfer data from any data source to desired Google BigQuery. It fully automates the process to load and transform data from 100+ sources to a destination of your choice without writing a single line of code. 

Want to take Hevo for a spin? Try Hevo’s 14-day free trial and experience the feature-rich Hevo suite firsthand.

Share your experience of learning about Google BigQuery Aggregate Functions in the comments section below!

FAQs

1. What is an aggregate function in BigQuery?

An aggregate function in BigQuery performs calculations on multiple rows of data and returns a single result. These functions are used to compute summaries of data, like counts, sums, averages, maximums, and minimums.

2. What are the five aggregate functions of SQL?

The five common aggregate functions in SQL are:

COUNT() – Counts the number of rows or non-null values.
SUM() – Returns the sum of numeric values.
AVG() – Calculates the average of non-null values.
MIN() – Finds the minimum value.
MAX() – Finds the maximum value.

3. What is the AVG function in BigQuery?

The AVG function in BigQuery returns the average (mean) of non-null values from a specified column or expression. If any floating-point values are included, the result might be non-deterministic. You can also use the DISTINCT clause to calculate the average of unique values.

Nicholas Samuel
Technical Content Writer, Hevo Data

Nicholas Samuel is a technical writing specialist with a passion for data, having more than 14+ years of experience in the field. With his skills in data analysis, data visualization, and business intelligence, he has delivered over 200 blogs. In his early years as a systems software developer at Airtel Kenya, he developed applications, using Java, Android platform, and web applications with PHP. He also performed Oracle database backups, recovery operations, and performance tuning. Nicholas was also involved in projects that demanded in-depth knowledge of Unix system administration, specifically with HP-UX servers. Through his writing, he intends to share the hands-on experience he gained to make the lives of data practitioners better.