While you are working with a large number of values in Google BigQuery, it’s generally useful to be familiar with some of the regularly used SQL aggregation functions, such as BigQuery STRING_AGG, which can return a group of values in a single pass. This might assist you in better understanding and acting on the data you already have.

6 Best Practices for Optimizing Google BigQuery
Download Your Free EBook Now

This blog will illuminate the basics of using the STRING_AGG BigQuery function, along with examples to help you understand better.

What is Google BigQuery?

Google BigQuery Logo

Google BigQuery has become one of the most popular cloud-based data warehouses in the years. It runs large and complex SQL queries, performs in-depth analysis, and stores and visualizes data. It is a fast and serverless enterprise Data Warehouse that can perform large dataset queries on terabytes in seconds and petabytes in minutes. Google BigQuery supports many functions and operators, including array functions, aggregate functions like BigQuery STRING_AGG, Bit functions like BIT_COUNT(), and many more. 

BigQuery also supports Columnar Architecture and different data models to manage and store dynamic data. It is a cost-effective, speedy, and easy-to-implement data warehouse that analyzes large datasets and creates detailed insight reports. Check out some key features of Google’s BigQuery, an enterprise Data Warehouse.

Seamlessly Migrate to BigQuery with Hevo

Are you having trouble migrating your data into Redshift? With our no-code platform and competitive pricing, Hevo makes the process seamless and cost-effective.

Check out why Hevo is best:

  • Easy Integration: Connect and migrate data into BigQuery without any coding.
  • Auto-Schema Mapping: Automatically map schemas to ensure smooth data transfer.
  • In-Built Transformations: Transform your data on the fly with Hevo’s powerful transformation capabilities.
  • 150+ Data Sources: Access data from over 150 sources, including 60+ free sources.

You can see it for yourselves by looking at our 2000+ happy customers, such as Meesho, Cure.Fit, and Pelago.

Get Started with Hevo for Free

Key Features of Google BigQuery

Google BigQuery is a trusted Cloud-based Data Warehouse that is highly beneficial for enterprises and their growth.

  • Rapid Speed: BigQuery can run SQL queries on terabytes and petabytes compared to other platforms in seconds and minutes. Also, it can process billions of row expressions in the blink of an eye.
  • Implementation: Google BigQuery is one of the cost-effective cloud-based platforms on Google that requires no cluster deployment, virtual machines, software, or additional tools. One can easily implement and upload data in batches on BigQuery and run queries for analysis.
  • Easy Scalability: Google BigQuery provides a Highly Scalable Storage Engine for consistent performance. Google’s complex software stack manages the infrastructure. You don’t have to pay for the whole thing; you pay only for the portion you use. 
  • Integrations: Google BigQuery allows integration with other Google products/services, application APIs, and third-party services not supported by Google.

What are String Functions?

Strings are crucial functions for any dataset that can be manipulated and transformed in order to create a huge impact on data analysis. The string function is commonly used in computer programming languages. The main purpose of these functions is to manipulate a query and perform operations on character data. 

There are two different values: STRING and BYTES data types on which these strings generally work. When working on STRING types, the result value or returned positions are referred to as characters or bytes. The value 1 describes the first character (or byte), and so on. But the 0 value is counted as an invalid index in the strings function. 

What are String Aggregate Functions?

STRING_AGG is a member of the SQL aggregate function that summarizes all row expressions and concatenates a set of string values. They follow different parameters and signs like a hyphen to separate expressions and generate a concatenated string. For example in BigQuery, the string aggregate function is denoted as BigQuery STRING_AGG function

Using BigQuery’s STRING_AGG Function

BigQuery STRING_AGG Function: Description

BigQuery STRING_AGG function returns a value by gathering all expressions in a row and generating a concatenated string of non-null values. The value can be either STRING or BYTES. The result generally depends on the first expression taken by the BigQuery STRING_AGG function. But, it will return NULL only if the row expressions evaluate NULL or zero input rows are specified.

The concatenated expressions are separated through a delimiter if available, or else a comma will be used in its place.

Sync BigQuery to BigQuery
Sync Amazon S3 to BigQuery
Sync AWS Elasticsearch to BigQuery

BigQuery STRING_AGG Function: Syntax

The syntax for the BigQuery STRING_AGG function is as follows: 

STRING_AGG(
  [DISTINCT]
  expression [, delimiter]
  [ORDER BY key [{ASC|DESC}] [, ... ]]
  [LIMIT n]
)
[OVER (...)]

BigQuery STRING_AGG Function: Example Queries

Here are a few examples to help you better understand and apply the BigQuery STRING_AGG function.

SELECT STRING_AGG(vegetable) AS string_agg
FROM UNNEST(["Cauliflower", NULL, "Turnip", "Potato", "Turnip"]) AS vegetable;

+----------------------------------------------+
| string_agg                                   |
+----------------------------------------------+
| Cauliflower, Turnip, Potato, Turnip          |
+----------------------------------------------+


SELECT STRING_AGG(vegetable, " & ") AS string_agg
FROM UNNEST(["Cauliflower", "Turnip", "Potato", "Turnip"]) AS vegetable;

+------------------------------------------------------+
| string_agg                                           |
+------------------------------------------------------+
| Cauliflower & Turnip & Potato & Turnip               |
+------------------------------------------------------+

SELECT STRING_AGG(DISTINCT vegetable, " & ") AS string_agg
FROM UNNEST(["Cauliflower", "Turnip", "Potato", "Turnip"]) AS vegetable;

+-----------------------------------------+
| string_agg                              |
+-----------------------------------------+
| Cauliflower & Turnip & Potato           |
+-----------------------------------------+

SELECT STRING_AGG(vegetable, " & " ORDER BY LENGTH(vegetable)) AS string_agg
FROM UNNEST(["Cauliflower", "Turnip", "Potato", "Turnip"]) AS vegetable;

+-----------------------------------------------------+
| string_agg                                          |
+-----------------------------------------------------+
| Turnip & Turnip & Cauliflower & Potato              |
+-----------------------------------------------------+

SELECT STRING_AGG(vegetable, " & " LIMIT 2) AS string_agg
FROM UNNEST(["Cauliflower", "Turnip", "Potato", "Turnip"]) AS vegetable;

+-----------------------------+
| string_agg                  |
+-----------------------------+
| Cauliflower & Turnip        |
+-----------------------------+

SELECT STRING_AGG(DISTINCT vegetable, " & " ORDER BY vegetable DESC LIMIT 2) AS string_agg
FROM UNNEST(["Cauliflower", "Turnip", "Potato", "Turnip"]) AS vegetable;

+-----------------------+
| string_agg            |
+-----------------------+
| Turnip & Potato       |
+-----------------------+

SELECT
  vegetable,
  STRING_AGG(vegetable, " & ") OVER (ORDER BY LENGTH(vegetable)) AS string_agg
FROM UNNEST(["Cauliflower", NULL, "Turnip", "Potato", "Turnip"]) AS vegetable;

+-------------+------------------------------------------------------+
| vegetable   | string_agg                                           |
+-------------+------------------------------------------------------+
| NULL        | NULL                                                 |
| Turnip      | Turnip & Turnip                                      |
| Turnip      | Turnip & Turnip                                      |
| Cauliflower | Turnip & Turnip & Cauliflower                        |
| Potato         | Turnip & Turnip & Cauliflower & Potato            |
+-------------+------------------------------------------------------+
  • SELECT vegetable: Retrieves each vegetable from the list.
  • STRING_AGG(vegetable, " & ") OVER (ORDER BY LENGTH(vegetable)): Combines all non-NULL vegetables into a single string, separated by ” & “, ordered by the length of the vegetable names.
  • FROM UNNEST(["Cauliflower", NULL, "Turnip", "Potato", "Turnip"]): Treats the list of vegetables as individual rows, ignoring NULL values.
  • Each vegetable is listed with a column that contains the string aggregation of all vegetables processed so far.
  • The result is a table where each row shows a vegetable and the growing combination of vegetables, ordered by name length.
Note: You can refer to Aggregate Functions provided by Google.

BigQuery STRING_AGG Function: Arguments & Optional Clauses

BigQuery STRING_AGG function’s supported argument types also include STRING BYTES. Check out the order in which clauses are applied:

  1. OVER: This clause describes a window, which currently does not support all other clauses available within STRING_AGG().
  2. DISTINCT: Only once the distinct value of the expression can be aggregated into the result.
  3. ORDER BY: This clause describes the order of the values.
    • ASC is the default sort direction for every sort key.
    • NULLs: NULLs are the minimum possible value in the case of the ORDER BY clause. This implies the value appears firstly in ASC sorts and at last in the DESC sorts.
    • Floating-point data types: These are visible on ordering and grouping.
    • The sort key must be similar to the expression in case DISTINCT is specified.
    • In case the ORDER BY clause has not been specified, it gets difficult to determine the order of the elements in the output array, i.e., chances are that you may receive different values every time you practice the function. 
  4. LIMIT: The clause describes the maximum input expression received in the result. These are the input strings and not the input characters or bytes. An empty string will be valued as 1. A NULL string has no value and is not considered. The limit n must value as a constant INT64.

Conclusion

In this blog, we explored BigQuery and the String Aggregate Functions in SQL. With the help of the BigQuery STRING_AGG function, you can manipulate and transform complex dataset queries and perform successful operations and analysis. 

Discover how to use ARRAY_AGG BigQuery functions to aggregate and manage arrays efficiently with our comprehensive guide.

The String Aggregate Functions in SQL are highly useful as it helps transform all expression rows into a single string. The results generated from the string function can be either STRING or BYTES. Further, we have shared the supported Argument Type and optional clauses of the BigQuery STRING_AGG Function. 
Many companies store their valuable data from multiple data sources into Google BigQuery. Transferring data manually from source to destination is tedious, so you should consider, Hevo.

Frequently Asked Questions

1. What is the syntax of String_agg in BigQuery?

STRING_AGG(expression[, delimiter])

2. What does string_agg do?

STRING_AGG is used to aggregate string values from multiple rows into a single string. It is particularly useful for creating comma-separated lists or other delimited lists from query results.

3. What is aggregation in BigQuery?

Aggregation in BigQuery involves performing operations on multiple rows of data to produce a single result. Aggregation functions are typically used in conjunction with the GROUP BY clause to group rows that share a common value and apply the aggregation function to each group.

Hitesh Jethva
Technical Content Writer, Hevo Data

Hitesh is a skilled freelance writer in the data industry, known for his engaging content on data analytics, machine learning, AI, big data, and business intelligence. With a robust Linux and Cloud Computing background, he combines analytical thinking and problem-solving prowess to deliver cutting-edge insights. Hitesh leverages his Docker, Kubernetes, AWS, and Azure expertise to architect scalable data solutions that drive business growth.