How to Use BigQuery STRING_AGG Function? Explained in 3 Easy Steps

on Data Warehouse, Google BigQuery • December 13th, 2021 • Write for Hevo

Bigquery String_agg- Featured Image

Google BigQuery is a quick and powerful Cloud-based Data Warehousing solution for businesses of any size, with capabilities such as rapid SQL querying and interactive analysis of enormous datasets.

While you are working with a large number of values in Google BigQuery, it’s generally useful to be familiar with some of the regularly used SQL aggregation functions, such as BigQuery STRING_AGG, which can return a group of values in a single pass. This might assist you in better understanding and acting on the data you already have.

This blog will illuminate the basics of using the BigQuery STRING_AGG function along with examples to help you understand better. 

Table of Contents

What is Google BigQuery?

Google BigQuery Logo: BigQuery STRING_AGG
Image Source: Google BigQuery

Google BigQuery has become one of the most popular Cloud-based Data Warehouses over the years. It runs large and complex SQL queries, performs in-depth analysis, stores and visualizes data. It is one of the best platforms for business acceleration. It supports Geospatial Analysis, Business Intelligence, Machine Learning, Data Scanning, and more. 

BigQuery is a fast and serverless enterprise Data Warehouse that can perform large dataset queries on terabytes in seconds and petabytes in minutes. Google BigQuery supports a plethora of functions and operators including array functions, aggregate functions like BigQuery STRING_AGG, Bit functions like BIT_COUNT(), and many more. 

Google manages the serverless design infrastructure and is highly scalable. But, remember it is you who needs to upload the information into the BigQuery Data Warehouse. Later, the platform helps ingest, store, analyze, and visualize the data for business acceleration.

BigQuery also supports Columnar Architecture and different data models to manage and store dynamic data. It is a cost-effective, speedy, and easy to implement Data Warehouse for analyzing large datasets and creating detailed insight reports. Check out some of the key features of Google’s BigQuery, an enterprise Data Warehouse.

Key Features of Google BigQuery

Google BigQuery is a trusted Cloud-based Data Warehouse that is highly beneficial for enterprises and their growth. 

  • Managed Under Google: Google plays a vital role in managing the enterprise Data Warehouse Infrastructure. BigQuery is fully managed and monitored by Google. It stores, updates, and deploys all information from its end so that you can focus on analysis rather than resource management. Also, Google will be the first one to receive an update if a task fails.
  • Rapid Speed: Compared to other platforms, BigQuery can run SQL queries on terabytes and petabytes in seconds and minutes. Also, it can process billions of row expressions in the blink of an eye.
  • Implementation: Google BigQuery is one of the cost-effective cloud-based platforms on Google that require no cluster deployment, virtual machines, software, or additional tools. One can easily implement and upload data in batches on BigQuery and run queries for analysis.
  • Easy Scalability: Google BigQuery provides a Highly Scalable Storage Engine for consistent performance. Google has a complex software stack that manages the infrastructure. You don’t have to pay for the whole thing, but only for the portion, you use. 
  • Integrations: Google BigQuery allows integration with other Google products/services, application APIs, and third-party services not supported by Google.

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK’s, and Streaming Services and simplifies the ETL process. It supports 100+ data sources and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Get Started with Hevo for Free

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  1. Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  2. Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  3. Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  4. Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  5. Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  6. Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, E-Mail, and support calls.
  7. Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

What are String Functions?

Strings are crucial functions for any dataset that can be manipulated and transformed in order to create a huge impact on data analysis. The string function is commonly used in computer programming languages. The main purpose of these functions is to manipulate a query and perform operations on character data. 

There are two different values: STRING and BYTES data types on which these strings generally work. When working on STRING types, the result value or returned positions are referred to as characters or bytes. The value 1 describes the first character (or byte), and so on. But the 0 value is counted as an invalid index in the strings function. 

What are String Aggregate Functions?

STRING_AGG is a member of the SQL aggregate function that summarizes all row expressions and concatenates a set of string values. They follow different parameters and signs like a hyphen to separate expressions and generate a concatenated string. For example in BigQuery, the string aggregate function is denoted as BigQuery STRING_AGG function. 

Using BigQuery’s STRING_AGG Function

BigQuery STRING_AGG Function: Description

BigQuery STRING_AGG function returns a value by gathering all expressions in a row and generating a concatenated string of non-null values. The value can be either STRING or BYTES. The result generally depends on the first expression taken by the BigQuery STRING_AGG function. But, it will return NULL only if the row expressions evaluate NULL or zero input rows are specified.

The concatenated expressions are separated through a delimiter if available, or else a comma will be used in its place.

BigQuery STRING_AGG Function: Syntax

The syntax for the BigQuery STRING_AGG function is as follows: 

STRING_AGG(
  [DISTINCT]
  expression [, delimiter]
  [ORDER BY key [{ASC|DESC}] [, ... ]]
  [LIMIT n]
)
[OVER (...)]

BigQuery STRING_AGG Function: Example Queries

Here are a few examples to help you better understand and apply the BigQuery STRING_AGG function.

SELECT STRING_AGG(vegetable) AS string_agg
FROM UNNEST(["Cauliflower", NULL, "Turnip", "Potato", "Turnip"]) AS vegetable;

+----------------------------------------------+
| string_agg                                   |
+----------------------------------------------+
| Cauliflower, Turnip, Potato, Turnip          |
+----------------------------------------------+


SELECT STRING_AGG(vegetable, " & ") AS string_agg
FROM UNNEST(["Cauliflower", "Turnip", "Potato", "Turnip"]) AS vegetable;

+------------------------------------------------------+
| string_agg                                           |
+------------------------------------------------------+
| Cauliflower & Turnip & Potato & Turnip               |
+------------------------------------------------------+

SELECT STRING_AGG(DISTINCT vegetable, " & ") AS string_agg
FROM UNNEST(["Cauliflower", "Turnip", "Potato", "Turnip"]) AS vegetable;

+-----------------------------------------+
| string_agg                              |
+-----------------------------------------+
| Cauliflower & Turnip & Potato           |
+-----------------------------------------+

SELECT STRING_AGG(vegetable, " & " ORDER BY LENGTH(vegetable)) AS string_agg
FROM UNNEST(["Cauliflower", "Turnip", "Potato", "Turnip"]) AS vegetable;

+-----------------------------------------------------+
| string_agg                                          |
+-----------------------------------------------------+
| Turnip & Turnip & Cauliflower & Potato              |
+-----------------------------------------------------+

SELECT STRING_AGG(vegetable, " & " LIMIT 2) AS string_agg
FROM UNNEST(["Cauliflower", "Turnip", "Potato", "Turnip"]) AS vegetable;

+-----------------------------+
| string_agg                  |
+-----------------------------+
| Cauliflower & Turnip        |
+-----------------------------+

SELECT STRING_AGG(DISTINCT vegetable, " & " ORDER BY vegetable DESC LIMIT 2) AS string_agg
FROM UNNEST(["Cauliflower", "Turnip", "Potato", "Turnip"]) AS vegetable;

+-----------------------+
| string_agg            |
+-----------------------+
| Turnip & Potato       |
+-----------------------+

SELECT
  vegetable,
  STRING_AGG(vegetable, " & ") OVER (ORDER BY LENGTH(vegetable)) AS string_agg
FROM UNNEST(["Cauliflower", NULL, "Turnip", "Potato", "Turnip"]) AS vegetable;

+-------------+------------------------------------------------------+
| vegetable   | string_agg                                           |
+-------------+------------------------------------------------------+
| NULL        | NULL                                                 |
| Turnip      | Turnip & Turnip                                      |
| Turnip      | Turnip & Turnip                                      |
| Cauliflower | Turnip & Turnip & Cauliflower                        |
| Potato         | Turnip & Turnip & Cauliflower & Potato            |
+-------------+------------------------------------------------------+

BigQuery STRING_AGG Function: Arguments & Optional Clauses

BigQuery STRING_AGG function’s supported argument types also include STRING BYTES. Check out the order in which clauses are applied:

  1. OVER: This clause describes a window, which currently does not support all other clauses available within STRING_AGG().
  2. DISTINCT: Only once the distinct value of the expression can be aggregated into the result.
  3. ORDER BY: This clause describes the order of the values.
    • ASC is the default sort direction for every sort key.
    • NULLs: NULLs are the minimum possible value in the case of the ORDER BY clause. This implies the value appears firstly in ASC sorts and at last in the DESC sorts.
    • Floating-point data types: These are visible on ordering and grouping.
    • The sort key must be similar to the expression in case DISTINCT is specified.
    • In case the ORDER BY clause has not been specified, it gets difficult to determine the order of the elements in the output array, i.e., chances are that you may receive different values every time you practice the function. 
  4. LIMIT: The clause describes the maximum input expression received in the result. These are the input strings and not the input characters or bytes. An empty string will be valued as 1. A NULL string has no value and is not considered. The limit n must value as a constant INT64.

Conclusion

In this blog, we explored BigQuery and the String Aggregate Functions in SQL. With the help of the BigQuery STRING_AGG function, you can manipulate and transform complex dataset queries and perform successful operations and analysis. 

The String Aggregate Functions in SQL are highly useful as it helps transform all expression rows into a single string. The results generated from the string function can be either STRING or BYTES. Further, we have shared the supported Argument Type and optional clauses of the BigQuery STRING_AGG Function. 

A lot of companies store their valuable data from multiple data sources into Google BigQuery. The manual process to transfer data from source to destination is a tedious task, which is why you should consider our super-cool and light product- Hevo. 

Visit our Website to Explore Hevo

Hevo Data is a No-code Data Pipeline that can help you transfer data from any data source to desired Google BigQuery. It fully automates the process to load and transform data from 100+ sources to a destination of your choice without writing a single line of code. 

Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of learning about the Google BigQuery STRING_AGG function in the comments section below!

No-code Data Pipeline For Google BigQuery