Understanding BigQuery Structs with Syntax and Examples

Q: 1. What are structs in BigQuery?

In BigQuery, STRUCTs are complex data types that allow you to group multiple fields into a single record. A STRUCT is like a row in a table, where each field can have a different data type.

Q: 2. What is the difference between an ARRAY and a STRUCT in BigQuery?

ARRAY : An ordered list of values of the same data type (e.g., an array of integers or strings). STRUCT : A record that can contain multiple fields of different data types (e.g., string, int).

Q: 3. Is BigQuery structured or unstructured?

BigQuery is primarily designed to handle structured and semi-structured data (e.g., JSON). It’s optimized for working with tabular data but can also process nested and repeated fields such as JSON and ARRAY/STRUCT types. While it can process some semi-structured data, it is not ideal for handling fully unstructured data like text or images.

If you’ve ever worked with complex, nested data in BigQuery, you know how challenging it can be to manage and query effectively. That’s where BigQuery Structs come in handy! Structs allow you to organize related fields into a single object, making your data more intuitive and easier to work with.

In this blog, I’ll walk you through the basics of BigQuery Structs, why they’re useful, and how to create and query them. By the end, you’ll be ready to simplify your queries and make the most of BigQuery’s powerful data-handling capabilities.

Table of Contents

Prerequisites

It is assumed that you have worked with Google BigQuery in the past and know how to create datasets and tables in Google BigQuery. If you are not familiar with these concepts, it will be worthwhile to look at these helper articles:

Introduction to Google BigQuery

Google BigQuery is a fully-managed Cloud Data Warehouse that lets you use SQL to manage terabytes of data. It enables businesses to evaluate their data more quickly and generate insights using normal SQL queries. Since Google BigQuery is part of the Google Cloud Platform (GCP), it can take advantage of Google Cloud Functions and other Google products to help you save time and get better results. To process read-only data, Google BigQuery is built on Google’s Dremel engine. Users can scale up or down both the Storage and Compute power on their own, depending on their needs.

Best Practices: High Performance ETL to BigQuery

Separating the Processing Engine from the storage space gives Google BigQuery additional flexibility. This capability allows Google BigQuery to Store, Analyze and Evaluate data from any location. Google BigQuery also features advanced Data Analysis and Visualization capabilities, such as the Google BigQuery ML (Machine Learning) and BI (Business Intelligence) Engine.

Also, you can utilize Google BigQuery in three different ways:

Data Loading and Exporting: You may easily and rapidly import your data into BigQuery. After BigQuery has processed your data, you can export it for further study.
Interactive Queries: You may also use data to conduct Batch queries and construct Virtual tables.
Manage Data: Google BigQuery allows you to list jobs, datasets, projects, and tables, as well as update them. Any data you upload to Google BigQuery can be managed.

What is BigQuery Structs?

BigQuery Structs are a powerful data type in Google BigQuery that allows you to group multiple related fields into a single, organized object. According to Google Docs, a Structure is defined as “a Container of ordered fields, each with a type (required) and field name (optional).”

Struct Data type is declared as

STRUCT<T>

Struct types are declared using the angle brackets (< and >). The type of the elements of a struct can be arbitrarily complex.

Examples:

STRUCT<inner_array ARRAY<INT64>> : A struct containing an array named inner_array that holds 64-bit integer elements.

STRUCT<INT64>: Simple struct with a single unnamed 64-bit integer field.

STRUCT<x STRING(10)>: Simple struct with a single parameterized string field named x.

STRUCT<x STRUCT<y INT64, z INT64>>: A struct with a nested struct named x inside it. The struct x has two fields, y, and z, both of which are 64-bit integers.

Working with Google BigQuery Structs

Creating Google BigQuery Structs
Querying Google BigQuery Structs
Performing Operations on Google BigQuery Structs

Leverage BigQuery’s features like machine learning, search, geospatial analysis, and business intelligence by migrating your data to it using Hevo. Skip long and tedious manual setup and choose Hevo’s no-code platform to:

Migrate your data to BigQuery to visualize and analyze it using BigQuery analytics.
Transform and map data easily with drag-and-drop features.
Real-time data migration to leverage AI/ML features of BigQuery.

Try Hevo and discover why 2000+ customers like Ebury have chosen Hevo over tools like Fivetran and Stitch to upgrade to a modern data stack.

Get a 14-Day Free Trial for Data Transfer

How to Create Google BigQuery Structs?

Let’s get started by creating a table with a Struct column. Let’s create a student_records table, containing rollNo as one integer column and info as a Struct column. The info column is itself composed of 3 attributes: name, age and department, with age being an integer, and the other two columns being strings.

You can set an expiration time for your table as shown. Also, replace the ‘my_first_dataset’ with your dataset name.

CREATE TABLE my_first_dataset.student_records
 (
   rollNo INT OPTIONS(description="The identifier for each student"),
   info STRUCT<
     name STRING,
     age INT,
     department STRING
   >
 )
 OPTIONS(
   expiration_timestamp=TIMESTAMP "2021-10-07 00:00:00 UTC",
   description="Table with a struct column"
 )

Now that the table is created, let’s populate it with values. You can run a query like the one below (with more values as required).

INSERT INTO my_first_data.student_records (rollNo, info) VALUES (1,STRUCT("Yash",26,"Mechanical Engineering"))

Now, in case you are using the Google BigQuery Sandbox, then the above query won’t execute, because DML (Data Manipulation Language) queries like INSERT, UPDATE, or DELETE are not supported in Sandbox and you will have to provide billing information. However, that doesn’t mean you can’t have a table populated with data. Run the following query. It will not only create the table but also populate it with data.

CREATE OR REPLACE TABLE my_first_dataset.student_records AS (
SELECT 1 AS rollNo, STRUCT("Isha" as name,22 as age,"Pharmacy" as department) AS info
UNION ALL
SELECT 2 AS rollNo, STRUCT("Yash" as name,22 as age,"Mechanical Engineering" as department) AS info
UNION ALL 
SELECT 3 AS rollNo, STRUCT("Siddhi" as name,18 as age,"Metallurgy" as department) AS info
UNION ALL 
SELECT 4 AS rollNo, STRUCT("Kushal" as name,21 as age,"Electrical Engineering" as department) AS info
);

As you can see, you are creating a table using the result of a query and you are adding multiple rows by performing the UNION ALL operation. Of course, this approach is not scalable (you won’t do this to populate thousands of rows), but it will help you proceed further with this tutorial.

As things stand right now, you have a table student_records containing a column of type struct, populated with 4 rows of data. Now let’s explore further.

First of all, if you click on the student_records table in the explorer, and look at the schema, you will see that the type for the info column is RECORD and the mode is NULLABLE. This combination (RECORD + NULLABLE) identifies a Struct BigQuery.

Integrate Azure Blob Storage to BigQuery

Get a Demo Try it

Integrate Amazon DocumentDB to BigQuery

Get a Demo Try it

Integrate Hub Planner to BigQuery

Get a Demo Try it

Note: If the type is RECORD and the mode is REPEATED, it means that the column contains an Array of Structs.

Querying Google BigQuery Structs

If you directly query a Struct column in Google BigQuery, the result will contain multiple columns, one for each of the attributes within the BigQuery Structs. Here’s an example:

SELECT info from my_first_dataset.student_records

The output contains 3 columns since the info column has 3 attributes.

BigQuery Structs - Querying BigQuery Structs

In case you wish to query specific attributes of the Struct, you can use the Dot Notation. For instance, the following query fetches the roll no, name, and age for each student:

SELECT rollNo, info.name, info.age from my_first_dataset.student_records

BigQuery Structs - Querying BigQuery Structs from Database

How to Perform Operations on Google BigQuery Structs?

Structs support limited operations: Equal (=), Not equal (!= or <>), IN, and NOT IN. An example with Equal is given below.

SELECT * from my_first_dataset.student_records
WHERE info = ("Yash",22,"Mechanical Engineering")

BigQuery Structs - Operations on Structs — Image Source: Self

Note that you did not use the STRUCT keyword before (“Yash”,22, “Mechanical Engineering”) in the above query. This is because Google BigQuery constructs an anonymous Struct when it is given a tuple of values. See the tuple syntax of constructing a struct.

An example with NOT IN is given below:

SELECT * from my_first_dataset.student_records
WHERE info NOT IN unnest([("Siddhi",18,"Metallurgy"), ("Kushal",21,"Electrical Engineering")])

As you can see, we checked if the Struct belongs to an Array of Structs or not, by listing items of the array, using unnest.

Note that the limited set of operations listed above apply only to the Struct as a whole. Individual attributes within the Struct can support other operations (>, <, etc.) depending on the data type of that attribute. An example is given below showing the use of the ‘<’ operator on the age attribute.

SELECT * from my_first_dataset.student_records
WHERE info.age < 22

BigQuery Structs - Where Clause Applied — Image Source: Self

Understanding Nested Structs in Google BigQuery

A Struct having another Struct as one or more of its attributes is known as a Nested Struct. If you have understood the concept of Structs, then creating and querying Nested Structs should be straightforward for you.

Whether it’s a single record or repeated data, Google BigQuery allows nested records within tables. Unlike traditional denormalization methods, records in Google BigQuery are expressed using nested and repeated fields.

Google BigQuery’s inbuilt support for Nested and Repeated structures in JSON on the other hand is the preferred way for denormalizing data. This strategy, rather than flattening attributes into a table, localizes a record’s subattributes into a single table. Employing nested records during analysis eliminates the necessity for repeating data, generating new subtables or using joins in Google BigQuery Structs.

The following example shows the creation and population of a table containing the info column which is a Struct, which contains another BigQuery Struct (subjects) as one of its attributes.

CREATE OR REPLACE TABLE my_first_dataset.student_records_detailed AS (
SELECT 1 AS rollNo, STRUCT("Isha" as name,22 as age,"Pharmacy" as department, STRUCT("Chemistry" as sub1, "Bio" as sub2) as subjects) AS info
 
UNION ALL
 
SELECT 2 AS rollNo, STRUCT("Yash" as name,22 as age,"Mechanical Engineering" as department, STRUCT("Physics" as sub1, "Math" as sub2) as subjects) AS info);

When you query the Nested Struct column, the attributes within the Inner Struct also appear as columns.

SELECT info from my_first_dataset.student_records_detailed

BigQuery Structs - Nested Query — Image Source: Self

As you would have expected, the dot notation can be extended to queries of Nested Structs:

SELECT info.subjects.sub1 from my_first_dataset.student_records_detailed

BigQuery Structs - Dot Notation — Image Source: Self

Finally, operations work on Nested Structs like on normal Google BigQuery Structs. An example is shown below for the equal operator:

SELECT * from my_first_dataset.student_records_detailed
WHERE info = ("Isha",22,"Pharmacy",("Chemistry","Bio"))

BigQuery Structs - Equal Operator — Image Source: Self

I hope you enjoyed reading this article on creating & working with Google BigQuery Structs and found it useful. You can refer to the official documentation for any further reading on structs.

Conclusion

This article introduced you to the concept of Google BigQuery Structs. It also provided you with an in-depth guide with proper syntax and examples of creating, querying, and managing Google BigQuery Structs. It also covered Nested Structs and how it is just an extension of the concept of Google BigQuery Structs.

Companies store valuable data from multiple data sources into Google BigQuery. The manual process to transfer data from source to destination is a tedious task but this is where Hevo saves the day!

Visit our Website to Explore Hevo

Hevo Data is a No-code Data Pipeline that can help you transfer data from any data source to Google BigQuery. It fully automates the process to load and transform data from 150+ Data Sources (Including 60+ Free Sources) to a destination of your choice such as Google BigQuery without writing a single line of code.

Sign up here for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Frequently Asked Questions

1. What are structs in BigQuery?

In BigQuery, STRUCTs are complex data types that allow you to group multiple fields into a single record. A STRUCT is like a row in a table, where each field can have a different data type.

2. What is the difference between an ARRAY and a STRUCT in BigQuery?

ARRAY: An ordered list of values of the same data type (e.g., an array of integers or strings).
STRUCT: A record that can contain multiple fields of different data types (e.g., string, int).

3. Is BigQuery structured or unstructured?

BigQuery is primarily designed to handle structured and semi-structured data (e.g., JSON). It’s optimized for working with tabular data but can also process nested and repeated fields such as JSON and ARRAY/STRUCT types. While it can process some semi-structured data, it is not ideal for handling fully unstructured data like text or images.

Yash Sanghvi Technical Content Writer, Hevo Data

Yash is a trusted expert in the data industry, recognized for his role in driving online success for companies through data integration and analysis. With a Dual Degree (B. Tech + M. Tech) in Mechanical Engineering from IIT Bombay, Yash brings strategic vision and analytical rigor to every project. He is proficient in Python, R, TensorFlow, and Hadoop, using these tools to develop predictive models and optimize data workflows.

Understanding BigQuery Struct with Syntax and Examples

Prerequisites

Introduction to Google BigQuery

What is BigQuery Structs?

Working with Google BigQuery Structs

How to Create Google BigQuery Structs?

Querying Google BigQuery Structs

How to Perform Operations on Google BigQuery Structs?

Understanding Nested Structs in Google BigQuery

Conclusion

Frequently Asked Questions

1. What are structs in BigQuery?

2. What is the difference between an ARRAY and a STRUCT in BigQuery?

3. Is BigQuery structured or unstructured?

Related articles

Understanding BigQuery Struct with Syntax and Examples

Prerequisites

Introduction to Google BigQuery

What is BigQuery Structs?

Working with Google BigQuery Structs

How to Create Google BigQuery Structs?

Querying Google BigQuery Structs

How to Perform Operations on Google BigQuery Structs?

Understanding Nested Structs in Google BigQuery

Conclusion

Frequently Asked Questions

1. What are structs in BigQuery?

2. What is the difference between an ARRAY and a STRUCT in BigQuery?

3. Is BigQuery structured or unstructured?

Optimize your data integration with Hevo!

Related articles