BigQuery Nested Fields Simplified: A Comprehensive Guide 101

BigQuery is a popular Cloud Data Platform developed by Google. It allows its users to store their data in the cloud and enjoy massive scalability. BigQuery also comes with Data Analytics features to allow its users to analyze their data and draw meaningful insights for decision-making.

BigQuery has also added Nested and Repeated Fields for storing nested records, whether a single BigQuery Record or repeated values. These two features provide BigQuery users with different ways to reorganize their data within single tables.

They are also good when joining data from different tables to perform analytic operations. However, users may take some time to get used to BigQuery Nested Fields. In this article, we will be discussing BigQuery Nested Fields & BigQuery Repeated Fields in detail.

Table of Contents

Understanding BigQuery Repeated Fields

Repeated Fields help BigQuery users to approximate a “one-to-many” relationship and give them an opportunity to define a field that can hold many values in a single row.

Consider the table given below:

The table shows that rows 3 and 4 have more than one value in the clinical_trial_ids column, and all these values have been listed in one row. That is how a repeated BigQuery record looks like.

Understanding BigQuery Nested Fields

BigQuery Nested Fields are fields linked together like a single entity, just like an object or a struct. Consider the following table:

The “title” field in the above table is a good example of a BigQuery Nested Field. Instead of having a single string that shows the title, there is a nested field with two strings within it, that is, “original” and “preferred” with the goal of storing titles written in multiple languages.

For you to create a field with nested data, the data type of the field should be set to BigQuery RECORD in the schema. You can access a BigQuery RECORD as a STRUCT in SQL.

How to Create BigQuery Nested and Repeated Fields

The following steps can help you to specify BigQuery Nested and Repeated fields using the BigQuery Cloud Console:

Step 1: Open the BigQuery page in the cloud console.
Step 2: Expand your target project from the BigQuery Explorer panel and select a dataset.
Step 3: Click “Create table” in the “View actions” icon.

Step 4: Select “Empty table” for the “Source”.
Step 5: For “Destination”, choose your project, dataset, and enter the name of the table in the “Table” field.
Step 6: For Schema, click the + icon to add a new field.

Step 7: Type “addresses” for “Field name”, select “RECORD” for “Type”, and “REPEATED” for “Mode”. Remember you have to choose BigQuery RECORD as the data type any time you are creating a field with nested data.

Step 8: Click the + icon to the right of “addresses” to add a nested field.
Step 9: For Name, type “status”, for Type, select “STRING”, and set Mode to “NULLABLE”.
Step 10: Repeat the same steps to add address, city, state, and zip all as NULLABLE STRINGS.
Step 11: Once done, click the “CREATE TABLE” button located at the bottom.

How to Query BigQuery Nested Fields

It is very easy to query a BigQuery Nested Field. To demonstrate this, we will use the publications table:

The “Title” column has been nested with two columns, “original” and “preferred”. If you query the table without specifying the column that you want, all will be returned:

SELECT title
FROM `dimensions-ai.data_analytics.publications`
LIMIT 5;

If you only need to see the “original” part of the title, you can use periods (.) to specify it. Some nested fields have many nested fields, meaning that more entries may be required. In this case, we only need a single period as shown below:

SELECT title.original
FROM `dimensions-ai.data_analytics.publications`
LIMIT 5;

The query will only return the “original” part of the title.

How to Query BigQuery Repeated Fields

To extract information from a repeated field in BigQuery, you must use a more exotic pattern. This is normally done using the UNNEST function, which converts an array of values in a table into rows.
These can then be joined to the original table to be queried.
Let’s use the funder_orgs field of the publications table to demonstrate this. This field shows the GRID IDs of organizations that funded the research in a publication.

Suppose we want to know the list of publications funded by the Brazilian Agricultural Research Corporation. We can use its GRID ID in a WHERE clause a shown below:

SELECT COUNT(id) AS brazilian_funded_pubs
FROM `dimensions-ai.data_analytics.publications`
WHERE 'grid.460200.0' IN UNNEST(funder_orgs)
GROUP BY type;

Only the IDs of the publications funded by the Brazilian Agricultural Research Corporation will be returned.

JOIN Operations using BigQuery Repeated Fields

We can use queries to retrieve the contents of a repeated field. We can use a CROSS JOIN to distribute the information into different rows instead of having arrays inside single rows.

For example, suppose we need to know the GRID IDs of organizations that have funded articles published in the eLife journal. We can use the following query:

SELECT p.id, org_grid
FROM `dimensions-ai.data_analytics.publications` AS p
CROSS JOIN UNNEST(funder_orgs) AS org_grid
WHERE type='article'
  AND journal.id='jour.1046517';

In the above query, we are querying a BigQuery Nested Field in the “journal” field. We are only interested in publications where the “journal” field shows an id that is similar to that of eLife.
A CROSS JOIN operation has also been applied to the funder_orgs field, which will give us the cartesian product of the tables to be joined.
Every value on the left side will be shown with all matching values on the right side of the join. That is how to work with Nested and Repeated BigQuery Records.

Learn More About:

How to rename a column name in Google BigQuery

Conclusion

This article introduced you to the steps required to BigQuery Nested Fields & BigQuery Repeated Fields.

Furthermore, you learned about how to query BigQuery Nested & Repeated Fields.

With your Data Warehouse, Google BigQuery live and running, you’ll need to extract data from multiple platforms to carry out your analysis.

Nicholas Samuel Technical Content Writer, Hevo Data

Nicholas Samuel is a technical writing specialist with a passion for data, having more than 14+ years of experience in the field. With his skills in data analysis, data visualization, and business intelligence, he has delivered over 200 blogs. In his early years as a systems software developer at Airtel Kenya, he developed applications, using Java, Android platform, and web applications with PHP. He also performed Oracle database backups, recovery operations, and performance tuning. Nicholas was also involved in projects that demanded in-depth knowledge of Unix system administration, specifically with HP-UX servers. Through his writing, he intends to share the hands-on experience he gained to make the lives of data practitioners better.