BigQuery is a popular Cloud Data Platform developed by Google. It allows its users to store their data in the cloud and enjoy massive scalability. BigQuery also comes with Data Analytics features to allow its users to analyze their data and draw meaningful insights for decision-making.
BigQuery has also added Nested and Repeated Fields for storing nested records, whether a single BigQuery Record or repeated values. These two features provide BigQuery users with different ways to reorganize their data within single tables. They are also good when joining data from different tables to perform analytic operations. However, users may take some time to get used to BigQuery Nested Fields. In this article, we will be discussing BigQuery Nested Fields & BigQuery Repeated Fields in detail.
Table of Contents
Understanding BigQuery Repeated Fields
Repeated Fields help BigQuery users to approximate a “one-to-many” relationship and give them an opportunity to define a field that can hold many values in a single row.
Consider the table given below:
The table shows that rows 3 and 4 have more than one value in the clinical_trial_ids column, and all these values have been listed in one row. That is how a repeated BigQuery record looks like.
Understanding BigQuery Nested Fields
BigQuery Nested Fields are fields linked together like a single entity, just like an object or a struct. Consider the following table:
The “title” field in the above table is a good example of a BigQuery Nested Field. Instead of having a single string that shows the title, there is a nested field with two strings within it, that is, “original” and “preferred” with the goal of storing titles written in multiple languages.
For you to create a field with nested data, the data type of the field should be set to BigQuery RECORD in the schema. You can access a BigQuery RECORD as a STRUCT in SQL.
How to Create BigQuery Nested and Repeated Fields
The following steps can help you to specify BigQuery Nested and Repeated fields using the BigQuery Cloud Console:
- Step 1: Open the BigQuery page in the cloud console.
- Step 2: Expand your target project from the BigQuery Explorer panel and select a dataset.
- Step 3: Click “Create table” in the “View actions” icon.
- Step 4: Select “Empty table” for the “Source”.
- Step 5: For “Destination”, choose your project, dataset, and enter the name of the table in the “Table” field.
- Step 6: For Schema, click the + icon to add a new field.
- Step 7: Type “addresses” for “Field name”, select “RECORD” for “Type”, and “REPEATED” for “Mode”. Remember you have to choose BigQuery RECORD as the data type any time you are creating a field with nested data.
- Step 8: Click the + icon to the right of “addresses” to add a nested field.
- Step 9: For Name, type “status”, for Type, select “STRING”, and set Mode to “NULLABLE”.
- Step 10: Repeat the same steps to add address, city, state, and zip all as NULLABLE STRINGS.
- Step 11: Once done, click the “CREATE TABLE” button located at the bottom.
A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 100+ Different Sources (including 40+ free sources) to a Data Warehouse such as Google BigQuery or Destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line.
Get Started with Hevo for free
Check out some of the cool features of Hevo:
Sign up here for a 14-day Free Trial!
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
- Connectors: Hevo supports 100+ integrations to SaaS platforms, files, Databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Firebolt, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.
- Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (including 40+ free sources) that can help you scale your data infrastructure as required.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
How to Query BigQuery Nested Fields
It is very easy to query a BigQuery Nested Field. To demonstrate this, we will use the publications table:
The “Title” column has been nested with two columns, “original” and “preferred”. If you query the table without specifying the column that you want, all will be returned:
If you only need to see the “original” part of the title, you can use periods (.) to specify it. Some nested fields have many nested fields, meaning that more entries may be required. In this case, we only need a single period as shown below:
The query will only return the “original” part of the title.
How to Query BigQuery Repeated Fields
To extract information from a repeated field in BigQuery, you must use a more exotic pattern. This is normally done using the UNNEST function, which converts an array of values in a table into rows. These can then be joined to the original table to be queried.
Let’s use the funder_orgs field of the publications table to demonstrate this. This field shows the GRID IDs of organizations that funded the research in a publication.
Suppose we want to know the list of publications funded by the Brazilian Agricultural Research Corporation. We can use its GRID ID in a WHERE clause a shown below:
SELECT COUNT(id) AS brazilian_funded_pubs
WHERE 'grid.460200.0' IN UNNEST(funder_orgs)
GROUP BY type;
Only the IDs of the publications funded by the Brazilian Agricultural Research Corporation will be returned.
JOIN Operations using BigQuery Repeated Fields
We can use queries to retrieve the contents of a repeated field. We can use a CROSS JOIN to distribute the information into different rows instead of having arrays inside single rows.
For example, suppose we need to know the GRID IDs of organizations that have funded articles published in the eLife journal. We can use the following query:
SELECT p.id, org_grid
FROM `dimensions-ai.data_analytics.publications` AS p
CROSS JOIN UNNEST(funder_orgs) AS org_grid
In the above query, we are querying a BigQuery Nested Field in the “journal” field. We are only interested in publications where the “journal” field shows an id that is similar to that of eLife. A CROSS JOIN operation has also been applied to the funder_orgs field, which will give us the cartesian product of the tables to be joined. Every value on the left side will be shown with all matching values on the right side of the join. That is how to work with Nested and Repeated BigQuery Records.
This article introduced you to the steps required to BigQuery Nested Fields & BigQuery Repeated Fields. Furthermore, you learned about how to query BigQuery Nested & Repeated Fields.
With your Data Warehouse, Google BigQuery live and running, you’ll need to extract data from multiple platforms to carry out your analysis. However, integrating and analyzing your data from a diverse set of data sources can be challenging and this is where Hevo Data comes into the picture.
Visit our Website to Explore Hevo
Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations such as Google BigQuery, with a few clicks. Hevo Data with its strong integration with 100+ sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!
Share your experiment of learning about BigQuery Nested & Repeated Fields in the comments below!