ARRAY_AGG BigQuery Functions 101: Syntax and Usage

on BigQuery Functions, Data Warehouse, Google BigQuery, Tutorials • September 29th, 2021 • Write for Hevo

ARRAY_AGG BigQuery

As businesses and individuals have acknowledged the benefits of keeping their data in Cloud Storage Systems, it has increased the adoption of the Cloud as a Data Storage Platform. As a result, the majority of businesses have migrated their data from On-Premise databases to Cloud Storage Systems. The choice of data storage is critical for every company since it influences the speed with which queries are handled as well as the expenses associated. Analyzing data with pre-built functions such as ARRAY_AGG BigQuery, Max(), Min(), etc allows users to quickly perform data-related tasks.

Since its public availability in 2011, Google BigQuery has been marketed as a one-of-a-kind Analytics Data Warehouse solution. Its serverless design enables it to operate at scale and speed, allowing it to perform analytics on massive datasets in a matter of seconds. Since its debut, numerous features and enhancements have been added to increase speed, security, reliability, and ease of use for users to uncover insights.

In this article, you will delve into the basics of the Cloud Data Warehouse from Google, understand its features, and also learn about the ARRAY_AGG BigQuery Functions with key usage and examples. Read along to learn more about ARRAY_AGG BigQuery Functions.

Table of Contents

Prerequisites

  • Basic understanding of SQL.
  • Basic understanding of databases.

Introduction to Google BigQuery

Google BigQuery Logo
Image Source

Google BigQuery is a Cloud-based Data Warehouse with a Big Data Analytic Web Service that can handle Petabytes of data during analytics. Designed for large-scale Data Analysis, BigQuery is divided into two parts: Storage and Query Processing. Google BigQuery is built on the Colossus File System and uses the Dremel Query Engine over REST interface to process queries. These two parts are separated and rely on Google’s Jupiter Network to transport data incredibly quickly from one location to another. 

Google BigQuery employs a Columnar Storage structure, which facilitates rapid query processing and great data compression capabilities. It may be used in conjunction with other Google products and services, such as Predictive Analytics, Data Imports, and Google Analytics, to enhance your workflow.

Key Features of Google BigQuery

Some of the key features of Google BigQuery are as follows:

  • Flexible Scaling: With BigQuery, you do not have to manually adjust the cluster since computational resources are automatically scaled according to the workload, it can scale storage to Petabytes on-demand with ease. Google BigQuery also handles Patching, Updates, Computing, and Storage Resource Scaling, making this a totally managed service.
  • Partitioning: A partitioned table is a unique table that is separated into parts, known as partitions, to make data management and querying easier. The decoupled Storage and Computation architecture of Google BigQuery uses column-based partitioning to reduce the amount of data read from disc by slot workers. Once slot workers have read their data from the disc, BigQuery can automatically find more optimum data sharding and instantly repartition data using BigQuery’s in-memory shuffle function.
  • Programming Access: Google BigQuery can be readily accessible in applications using Rest API requests, Client Libraries like Java, .Net, Python, the Command-Line Tool, or the GCP Console. It also has built-in query and database administration tools. All query operations are ACID-Compliant, and in the event of a failure, the integrity of the modified data is preserved.
  • Performance: Considering Google BigQuery is designed on a column-based architecture, it provides numerous advantages over traditional row-based storage, including increased storage efficiency and the ability to scan data faster. It supports nested tables for more effective data storage and retrieval, reducing slot utilization, querying time, and data usage.
  • Robust Security: BigQuery, which is based on Google’s Colossus Global Storage System, guarantees no data loss to its users through Timely Backup, Data Replication, and Clever Recovery Techniques. For optimal data durability and service uptime, data is stored across several data centers by default. You may establish access rights to your datasets and projects for both groups and individuals if you have Admin capabilities.
  • Federated Query: Google BigQuery follows a unique approach for sending a query statement to an external database and getting the result as a temporary table. Assuming user data is stored in Bigtable, GCS, or Google Drive, they can query it directly from Google BigQuery by creating a connection with the external database using the Google BigQuery Connection API.
  • Integrated Machine Learning: The in-built BigQuery Machine Learning Capabilities allow users to immediately assess machine learning models with simple SQL queries without needing any machine learning technical skills. This avoids the need to create a separate machine learning solution or send it to a third-party platform.
  • Third-Party Integration: Google BigQuery has a free click-to-connect feature with other Google products and services. It also offers a variety of options for integrating with Google-partnered third-party apps.
  • Flexible Pricing: Google Bigquery has a flexible pricing model that only costs you for the resources you use. Compared to other available Cloud Data Warehouse options, it is an inexpensive alternative for maintaining and analyzing your data, with $0.02 per GB per month for storage, $0.01 per 200MB for streaming inserts, and a free first 1 TB of query data handled each month.

For further information on Google BigQuery, check out their official webite.

Functions in Google BigQuery

BigQuery Functions
Image Source

There are a number of functions offered by BigQuery, viz:

  • Aggregate functions: They offer a summary of a table’s values. E.g., SUM(field) and COUNT(field).
  • Date Functions: These functions let you alter the format of dates, choose the required field (day, month, or year), and move the date by a specified interval. E.g. DATE(Timestamp). You can learn more about the BigQuery Date Functions by clicking here.
  • String Functions: You may use String Functions to produce a string, pick and replace substrings, determine the length of a string and the index sequence of the substring in the original string. E.g. Lower(‘str’) and Upper(‘str’).
  • Window Functions: These functions are quite similar to Aggregate functions. The primary distinction is that Window Functions don’t calculate the whole set of data specified by the query but just on a subset or window of that data. E.g. MAX(), MIN().

Simplify BigQuery ETL and Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 100+ different sources (including 40+ free sources) to a Data Warehouse such as Google BigQuery or Destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Get Started with Hevo for free

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, Databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Firebolt, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.  
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (including 40+ free sources) that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-day Free Trial!

Aggregate Functions in Standard SQL

Functions that summarize the rows of a group into a single value are known as Aggregate Functions. They are as follows:

  • ANY_VALUE(): Returns expression for some row chosen from the group.
  • ARRAY_AGG(): Returns an ARRAY of expression values. Later in this article, you will also learn about ARRAY_AGG BigQuery Functions.
  • ARRAY_CONCAT_AGG(): Concatenates items from an ARRAY expression and returns a single ARRAY as a result. The NULL items in non-NULL input arrays are recognized by this function, while the NULL elements in NULL input arrays are ignored.
  • AVG(): If the input contains a NaN, this function returns the average of non-NULL input values or NaN.
  • BIT_AND(): Returns the result of a bitwise AND operation on expression.
  • BIT_OR(): Returns the result of a bitwise OR operation on expression.
  • BIT_XOR(): Returns the result of a bitwise XOR operation on expression.
  • COUNT: Returns the number of different items in a given expression.
  • COUNTIF: This function returns the number of TRUE values in an expression.
  • LOGICAL_AND(): The logical AND of all non-NULL expressions is returned.
  • LOGIAL_OR(): Returns the logical OR of all non-NULL expressions.
  • MAX: Returns the maximum non-NULL value of an expression.
  • MIN: Returns the minimum non-NULL value of an expression.
  • STRING AGG(): Returns a value (STRING or BYTES) consisting of non-null values concatenated together.
  • SUM(): Returns the total of all non-zero values.

ARRAY_AGG BigQuery Functions

ARRAY_AGG BigQuery Functions
Image Source

Before proceeding to learn about the use of ARRAY_AGG BigQuery Functions, you must know what an array is. In BigQuery, an array is an ordered list consisting of zero or more values of the same data type. An array can also include NULL values.

The ARRAY_AGG BigQuery Functions returns an ARRAY of expression values. It is basically the opposite of UNNEST (used to flatten an array into its components to make it usable for analysis and database operations).

Syntax for ARRAY_AGG BigQuery Functions

ARRAY_AGG(
  [DISTINCT]
  expression
  [{IGNORE|RESPECT} NULLS]
  [ORDER BY key [{ASC|DESC}] [, ... ]]
  [LIMIT n]
)
[OVER (...)]
  • OVER: Used to specify a window, which refers to a group of rows in a table on which a function is applied. All other clauses in ARRAY_AGG BigQuery are presently incompatible with this clause ().
  • DISTINCT: Each distinct expression value is only aggregated once in the result.
  • IGNORE NULLS or RESPECT NULLS: If IGNORE NULLS is provided, the NULL values will be ignored. The NULL values are included in the outcome if RESPECT NULLS is provided. The NULL values are included in the result if neither is supplied. 
  • ORDER BY: This specifies the order in which the values are displayed. For each sort key, the default sort direction is ASC.
  • LIMIT: Limits the number of expression inputs that can be used in the result. The limit n must be an INT64.

Example 1

SELECT FORMAT("%T", ARRAY_AGG(x LIMIT 5)) AS array_agg
FROM UNNEST([2, 1, -2, 3, -2, 1, 2]) AS x;

+-------------------+
| array_agg         |
+-------------------+
| [2, 1, -2, 3, -2] |
+-------------------+

Example 2

SELECT
  x,
  FORMAT("%T", ARRAY_AGG(x) OVER (ORDER BY ABS(x))) AS array_agg
FROM UNNEST([2, 1, -2, 3, -2, 1, 2]) AS x;

+----+-------------------------+
| x  | array_agg               |
+----+-------------------------+
| 1  | [1, 1]                  |
| 1  | [1, 1]                  |
| 2  | [1, 1, 2, -2, -2, 2]    |
| -2 | [1, 1, 2, -2, -2, 2]    |
| -2 | [1, 1, 2, -2, -2, 2]    |
| 2  | [1, 1, 2, -2, -2, 2]    |
| 3  | [1, 1, 2, -2, -2, 2, 3] |
+----+-------------------------+

Conclusion

In this article, you learned about what makes Google BigQuery fast and scalable and its other features. The article also offers a better idea about the ARRAY_AGG BigQuery Functions. This method is useful when we have a denormalized table (or query result) and need to aggregate all data labels into array format to get fewer rows. As a result, query performance is improved, allowing for faster and more efficient analysis of big data stored in pre-joined tables, especially from object-based schemas.

With your Data Warehouse, Google BigQuery live and running, you’ll need to extract data from multiple platforms to carry out your analysis. However, integrating and analyzing your data from a diverse set of data sources can be challenging and this is where Hevo Data comes into the picture.

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations such as Google BigQuery, with a few clicks. Hevo Data with its strong integration with 100+ sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!

Share your experience of learning about ARRAY_AGG BigQuery Functions. Let us know in the comments section below!

No-code Data Pipeline for Google BigQuery