Setting up BigQuery JSON Extract: Made Easy 101

on BigQuery Functions, BigQuery JSON Functions, Data Warehouse, Google BigQuery, Tutorials, Uncategorized • September 21st, 2021 • Write for Hevo

Venturing into Data Science and deciding on a tool to use to solve a given problem can be challenging at times especially when you have a wide array of choices. In this age of data transformation where organizations are constantly seeking out ways to improve the day to day handling of data being produced and looking for methods to minimize the cost of having these operations, it has become imperative to handle such data transformations in the Cloud as it is a lot easier to manage and is also cost-efficient.

Data Warehousing architectures have rapidly changed over the years and most of the notable service providers are now Cloud-based. Therefore, companies are increasingly on the move to align with such offerings on the Cloud as it provides them with a lower upfront cost, enhances scalability, and performance as opposed to traditional On-premise Data Warehousing systems. Google BigQuery is among one of the well-known and widely accepted Cloud-based Data Warehouse Applications.

In this article, you will gain information about BigQuery JSON Extraction Functions. You will also gain a holistic understanding of Google BigQuery, its key features, and the types of BigQuery JSON Extraction Functions. Read along to find out in-depth information about BigQuery JSON Extraction Functions.

Table of Contents

Introduction to Google BigQuery

Google BigQuery JSON Extract - Google BigQuery
Image Source

Google BigQuery is a Cloud-based Data Warehouse that provides a Big Data Analytic Web Service for processing petabytes of data. It is intended for analyzing data on a large scale. It consists of two distinct components: Storage and Query Processing. It employs the Dremel Query Engine to process queries and is built on the Colossus File System for storage. These two components are decoupled and can be scaled independently and on-demand.

Google BigQuery is fully managed by Cloud service providers. We don’t need to deploy any resources, such as discs or virtual machines. It is designed to process read-only data. Dremel and Google BigQuery use Columnar Storage for quick data scanning, as well as a tree architecture for executing queries using ANSI SQL and aggregating results across massive computer clusters. Furthermore, owing to its short deployment cycle and on-demand pricing, Google BigQuery is serverless and designed to be extremely scalable.

For further information about Google Bigquery, follow the Official Documentation.

Key Features of Google BigQuery

Google BigQuery JSON Extract - Features of BigQuery
Image Source

Some of the key features of Google BigQuery are as follows:

1) Performance

Partitioning is supported by BigQuery, which improves Query performance. The data may be readily queried using SQL or Open Database Connectivity (ODBC)

2) Scalability

Being quite elastic, BigQuery separates computation and storage, allowing customers to scale processing and memory resources according to their needs. The tool has significant vertical and horizontal scalability and runs real-time queries on petabytes of data in a very short period.

3) Security

When a third-party authorization exists, users can utilize OAuth as a standard approach to get the cluster. By default, all data is encrypted and in transit. Cloud Identity and Access Management (IAM) allows for fine-tuning administration.

4) Usability

Google BigQuery is a highly user-friendly platform that requires a basic understanding of SQL commands, ETL tools, etc.

5) Data Types

 It supports JSON and XML file formats.

6) Data Loading

It employs the conventional ELT/ETL Batch Data Loading techniques by employing standard SQL dialect, as well as Data Streaming to load data row by row using Streaming APIs.

7) Integrations

In addition to operational databases, the system supports integration with a wide range of data integration tools, business intelligence (BI), and artificial intelligence (AI) solutions. It also works with Google Workspace and Cloud Platform.

8) Data Recovery

Data backup and disaster recovery are among the services provided by Google BigQuery. Users can query point-in-time snapshots of data changes from the last seven days.

9) Pricing Models

The Google BigQuery platform is available in both on-demand and flat-rate subscription models. Although data storage and querying will be chargedexporting, loading, and copying data is free. It has separated computational resources from storage resources. You are only charged when you run queries. The quantity of data processed during searches is billed.

Simplify BigQuery ETL and Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 100+ different sources (including 30+ free sources) to a Data Warehouse such as Google BigQuery or Destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Get Started with Hevo for Free

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, Databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.  
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (including 30+ free sources) that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Understanding BigQuery JSON Extraction Functions

JSON Extraction functions in Google BigQuery helps in retrieving and transforming JSON data into a suitable format. There are majorly 2 categories of BigQuery JSON Extraction Functions. These are as follows:

In the article, we will be majorly focussing on Standard BigQuery JSON Extraction Functions.

Standard BigQuery JSON Extraction Functions

Standard BigQuery Extraction functions use double quotes to escape invalid JSONPath characters: “a.b”.It is consistent with the ANSI standard.

The 4 types of Standard BigQuery JSON Extraction Functions are as follows:

1) JSON_QUERY

A) Syntax

JSON_QUERY(json_string_expr, json_path)

B) Description

JSON_QUERY Function is a Standard BigQuery Extract Function that extracts a JSON value, such as an array or object, or a JSON scalar value, such as a string, number, or boolean. If a JSON key contains invalid JSONPath characters, you can escape them using double-quotes.

  • json_string_expr: A JSON-formatted string. For example:
'{"class" : {"students" : [{"name" : "Jane"}]}}'

extracts a SQL NULL when a JSON-formatted string “NULL” is encountered. For example:

SELECT JSON_QUERY("null", "$") -- Returns a SQL NULL.
  • json_path: The JSONPath identifies the data that you want to obtain from the input. This parameter is optional and if it is not provided, then the JSONPath $ symbol is applied, which indicates that all of the data is analyzed.
SELECT JSON_QUERY('{"a":null}', "$.a"); -- Returns an SQL NULL
SELECT JSON_QUERY('{"a":null}', "$.b"); -- Returns an SQL NULL 

You can use the JSON_QUERY function if you want to include non-scalar values such as arrays in the extraction. But, if you only want to extract scalar values such as strings, numbers, and booleans, then use the JSON_VALUE function.

C) Return type

A JSON-formatted STRING

D) Examples

In the following examples, JSON data is extracted and returned as JSON-formatted strings.

SELECT JSON_QUERY(json_text, '$') AS json_text_string
FROM UNNEST([
  '{"class" : {"students" : [{"name" : "Jane"}]}}',
  '{"class" : {"students" : []}}',
  '{"class" : {"students" : [{"name" : "John"}, {"name": "Jamie"}]}}'
  ]) AS json_text;

Output:

+-----------------------------------------------------------+
| json_text_string                                          |
+-----------------------------------------------------------+
| {"class":{"students":[{"name":"Jane"}]}}                  |
| {"class":{"students":[]}}                                 |
| {"class":{"students":[{"name":"John"},{"name":"Jamie"}]}} |
+-----------------------------------------------------------+
SELECT JSON_QUERY(json_text, '$.class.students[0]') AS first_student
FROM UNNEST([
  '{"class" : {"students" : [{"name" : "Jane"}]}}',
  '{"class" : {"students" : []}}',
  '{"class" : {"students" : [{"name" : "John"}, {"name": "Jamie"}]}}'
  ]) AS json_text;

Output:

+-----------------+
| first_student   |
+-----------------+
| {"name":"Jane"} |
| NULL            |
| {"name":"John"} |
+-----------------+
SELECT JSON_QUERY(json_text, '$.class.students[1].name') AS second_student_name
FROM UNNEST([
  '{"class" : {"students" : [{"name" : "Jane"}]}}',
  '{"class" : {"students" : []}}',
  '{"class" : {"students" : [{"name" : "John"}, {"name" : null}]}}',
  '{"class" : {"students" : [{"name" : "John"}, {"name": "Jamie"}]}}'
  ]) AS json_text;

Output:

+-------------------+
| second_student    |
+-------------------+
| NULL              |
| NULL              |
| NULL              |
| "Jamie"           |
+-------------------+
SELECT JSON_QUERY(json_text, '$.class."students"') AS student_names
FROM UNNEST([
  '{"class" : {"students" : [{"name" : "Jane"}]}}',
  '{"class" : {"students" : []}}',
  '{"class" : {"students" : [{"name" : "John"}, {"name": "Jamie"}]}}'
  ]) AS json_text;

Output:

+------------------------------------+
| student_names                      |
+------------------------------------+
| [{"name":"Jane"}]                  |
| []                                 |
| [{"name":"John"},{"name":"Jamie"}] |
+------------------------------------+

2) JSON_VALUE

A) Syntax

JSON_VALUE(json_string_expr[, json_path])

B) Description

This BigQuery JSON Extraction function extracts a scalar value and returns it as a string. A scalar value can represent a string, number, or boolean. JSON_VALUE Function removes the outermost quotes and unescapes the return values. If a JSON key uses invalid JSONPath characters, then you can escape those characters using double-quotes.

  • json_string_expr: A JSON-formatted string. For example:
'{"class" : {"students" : [{"name" : "Jane"}]}}'
  • json_path: The JSONPath, helps you identify the data that you want to obtain from the input. This is an optional parameter which if not provided, then the JSONPath $ symbol is applied, which indicates that all of the data is analyzed.
    • If json_path returns a JSON null or a non-scalar value (in other words, if json_path refers to an object or an array), then an SQL NULL is returned. This is an optional parameter which if not provided, then the JSONPath $ symbol is applied, which indicates that the entire JSON-formatted string is analyzed.

You can use the JSON_VALUE function if you only want to extract scalar values such as strings, numbers, and booleans. If you want to include non-scalar values such as arrays in the extraction, then use JSON_QUERY.

C) Return type

STRING Data type

D) Examples

The following example illustrated how outputs are returned for the JSON_QUERY and JSON_VALUE functions.

SELECT JSON_QUERY('{ "name" : "Jakob", "age" : "6" }', '$.name') AS json_name,
  JSON_VALUE('{ "name" : "Jakob", "age" : "6" }', '$.name') AS scalar_name,
  JSON_QUERY('{ "name" : "Jakob", "age" : "6" }', '$.age') AS json_age,
  JSON_VALUE('{ "name" : "Jakob", "age" : "6" }', '$.age') AS scalar_age;

Output:

+-----------+-------------+----------+------------+
| json_name | scalar_name | json_age | scalar_age |
+-----------+-------------+----------+------------+
| "Jakob"   | Jakob       | "6"      | 6          |
+-----------+-------------+----------+------------+
SELECT JSON_QUERY('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_query,
  JSON_VALUE('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_value;

Output:

+--------------------+------------+
| json_query         | json_value |
+--------------------+------------+
| ["apple","banana"] | NULL       |
+--------------------+------------+

In cases where a JSON key contains invalid JSONPath characters, you can escape those characters using double-quotes. For example:

SELECT JSON_VALUE('{"a.b": {"c": "world"}}', '$."a.b".c') AS hello;

Output:

+-------+
| hello |
+-------+
| world |
+-------+

3) JSON_QUERY_ARRAY

A) Syntax

JSON_QUERY_ARRAY(json_string_expr[, json_path])

B) Description

This BigQuery JSON Extraction function helps to extract an array of JSON values, such as arrays or objects, and JSON scalar values, such as strings, numbers, and booleans. If a JSON key contains invalid JSONPath characters, then you can escape those characters using double-quotes.

  • json_string_expr: A JSON-formatted string. For example:
'{"class" : {"students" : [{"name" : "Jane"}]}}'
  • json_path: The JSONPath, helps you identify the data that you want to obtain from the input. This is an optional parameter which if not provided, then the JSONPath $ symbol is applied, which indicates that all of the data is analyzed.

C) Return type

ARRAY<JSON-formatted STRING> Data type

D) Examples

The following example extracts the items in a JSON-formatted string to a string array:

SELECT JSON_QUERY_ARRAY('[1,2,3]') AS string_array;

Output:

+--------------+
| string_array |
+--------------+
| [1, 2, 3]    |
+--------------+

The following example extracts a string array and converts it to an integer array:

SELECT ARRAY(  SELECT CAST(integer_element AS INT64)  FROM UNNEST(    JSOSELECT ARRAY(
  SELECT CAST(integer_element AS INT64)
  FROM UNNEST(
    JSON_QUERY_ARRAY('[1,2,3]','$')
  ) AS integer_element
) AS integer_array;N_QUERY_ARRAY('[1,2,3]','$')  ) AS integer_element) AS integer_array;

Output:

+---------------+
| integer_array |
+---------------+
| [1, 2, 3]     |
+---------------+

The following example extracts string values in a JSON-formatted string to an array:

-- Doesn't strip the double quotes
SELECT JSON_QUERY_ARRAY('["apples","oranges","grapes"]', '$') AS string_array;

Output:

+---------------------------------+
| string_array                    |
+---------------------------------+
| ["apples", "oranges", "grapes"] |
+---------------------------------+
-- Strips the double quotes
SELECT ARRAY(
  SELECT JSON_VALUE(string_element, '$')
  FROM UNNEST(JSON_QUERY_ARRAY('["apples","oranges","grapes"]','$')) AS string_element
) AS string_array;

Output:

+---------------------------+
| string_array              |
+---------------------------+
| [apples, oranges, grapes] |
+---------------------------+

The following example extracts only the items in the fruit property to an array:

SELECT JSON_QUERY_ARRAY(
  '{"fruit":[{"apples":5,"oranges":10},{"apples":2,"oranges":4}],"vegetables":[{"lettuce":7,"kale": 8}]}',
  '$.fruit'
) AS string_array;

Output:

+-------------------------------------------------------+
| string_array                                          |
+-------------------------------------------------------+
| [{"apples":5,"oranges":10}, {"apples":2,"oranges":4}] |
+-------------------------------------------------------+

These are equivalent:

SELECT JSON_QUERY_ARRAY('{"fruits":["apples","oranges","grapes"]}','$.fruits') AS string_array;

SELECT JSON_QUERY_ARRAY('{"fruits":["apples","oranges","grapes"]}','$."fruits"') AS string_array;

Output:

-- The queries above produce the following result:
+---------------------------------+
| string_array                    |
+---------------------------------+
| ["apples", "oranges", "grapes"] |
+---------------------------------+

In cases where a JSON key uses invalid JSONPath characters, you can escape those characters using double quotes: ” “. For example:

SELECT JSON_QUERY_ARRAY('{"a.b": {"c": ["world"]}}', '$."a.b".c') AS hello;

Output:

+-----------+
| hello     |
+-----------+
| ["world"] |
+-----------+

The following examples show how invalid requests and empty arrays are handled:

-- An error is ret-- An error is returned if you provide an invalid JSONPath.
SELECT JSON_QUERY_ARRAY('["foo","bar","baz"]','INVALID_JSONPath') AS result;

-- If the JSONPath does not refer to an array, then NULL is returned.
SELECT JSON_QUERY_ARRAY('{"a":"foo"}','$.a') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If a key that does not exist is specified, then the result is NULL.
SELECT JSON_QUERY_ARRAY('{"a":"foo"}','$.b') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- Empty arrays in JSON-formatted strings are supported.
SELECT JSON_QUERY_ARRAY('{"a":"foo","b":[]}','$.b') AS result;

Output:

+--------+
| result |
+--------+
| []     |
+--------+

4) JSON_VALUE_ARRAY

A) Syntax

JSON_VALUE_ARRAY(json_string_expr[, json_path])

B) Description

This BigQuery JSON Extraction function extracts an array of scalar values and returns an array of string-formatted scalar values. A scalar value can represent a string, number, or boolean. If a JSON key contains invalid JSONPath characters, you can escape those characters using double-quotes.

  • json_string_expr: A JSON-formatted string. For example:
'{"class" : {"students" : [{"name" : "Jane"}]}}'
  • json_path: The JSONPath helps to identify the data that you want to obtain from the input. This is an optional parameter which if not provided, then the JSONPath $ symbol is applied, which indicates that all of the data is analyzed.

C) Return type

ARRAY<STRING> Data type

D) Examples

The following example compares how results are returned for the JSON_QUERY_ARRAY and JSON_VALUE_ARRAY functions.

SELECT JSON_QUERY_ARRAY('["apples","oranges"]') AS json_array,
       JSON_VALUE_ARRAY('["apples","oranges"]') AS string_array;

Output:

+-----------------------+-------------------+
| json_array            | string_array      |
+-----------------------+-------------------+
| ["apples", "oranges"] | [apples, oranges] |
+-----------------------+-------------------+

The following example extracts the items in a JSON-formatted string to a string array:

-- Strips the double quotes
SELECT JSON_VALUE_ARRAY('["foo","bar","baz"]','$') AS string_array;

Output:

+-----------------+
| string_array    |
+-----------------+
| [foo, bar, baz] |
+-----------------+

The following example extracts a string array and converts it to an integer array:

SELECT ARRAY(  SELECT CAST(integer_element AS INT64) SELECT ARRAY(
  SELECT CAST(integer_element AS INT64)
  FROM UNNEST(
    JSON_VALUE_ARRAY('[1,2,3]','$')
  ) AS integer_element
) AS integer_array; FROM UNNEST(    JSON_VALUE_ARRAY('[1,2,3]','$')  ) AS integer_element) AS integer_array;+---------------+| integer_array |+---------------+| [1, 2, 3]     |+---------------+

Output:

+---------------+
| integer_array |
+---------------+
| [1, 2, 3]     |
+---------------+

These are equivalent:

SELECT JSON_VALUE_ARRAY('{"fruits":["apples","oranges","grapes"]}','$.fruits') AS string_array;
SELECT JSON_VALUE_ARRAY('{"fruits":["apples","oranges","grapes"]}','$."fruits"') AS string_array;

Output:

-- The queries above produce the following result:
+---------------------------+
| string_array              |
+---------------------------+
| [apples, oranges, grapes] |
+---------------------------+

In cases where a JSON key uses invalid JSONPath characters, you can escape those characters using double quotes: " ". For example:

SELECT JSON_VALUE_ARRAY('{"a.b": {"c": ["world"]}}', '$."a.b".c') AS hello;

Output:

+---------+
| hello   |
+---------+
| [world] |
+---------+

The following examples explore how invalid requests and empty arrays are handled:

-- An error is thrown if you provide an invalid JSONPath.
SELECT JSON_VALUE_ARRAY('["foo","bar","baz"]','INVALID_JSONPath') AS result;

-- If the JSON-formatted string is invalid, then NULL is returned.
SELECT JSON_VALUE_ARRAY('}}','$') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If the JSON document is NULL, then NULL is returned.
SELECT JSON_VALUE_ARRAY(NULL,'$') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If a JSONPath does not match anything, then the output is NULL.
SELECT JSON_VALUE_ARRAY('{"a":["foo","bar","baz"]}','$.b') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If a JSONPath matches an object that is not an array, then the output is NULL.
SELECT JSON_VALUE_ARRAY('{"a":"foo"}','$') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If a JSONPath matches an array of non-scalar objects, then the output is NULL.
SELECT JSON_VALUE_ARRAY('{"a":[{"b":"foo","c":1},{"b":"bar","c":2}],"d":"baz"}','$.a') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If a JSONPath matches an array of mixed scalar and non-scalar objects,
-- then the output is NULL.
SELECT JSON_VALUE_ARRAY('{"a":[10, {"b": 20}]','$.a') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If a JSONPath matches an empty JSON array, then the output is an empty array instead of NULL.
SELECT JSON_VALUE_ARRAY('{"a":"foo","b":[]}','$.b') AS result;

Output:

+--------+
| result |
+--------+
| []     |
+--------+
-- If a JSONPath matches an array that contains scalar objects and a JSON null,
-- then the output of the JSON_VALUE_ARRAY function must be transformed
-- because the final output cannot be an array with NULL values. This example
-- uses the UNNEST operator to convert the output array into a table as the final output.
SELECT string_value FROM UNNEST(JSON_VALUE_ARRAY('["world", 1, null]')) AS string_value;

Output:

+--------------+
| string_value |
+--------------+
| world        |
| 1            |
| NULL         |
+--------------+

Legacy BigQuery JSON Extraction functions

The following Legacy BigQuery JSON Extraction functions use single quotes and brackets to escape invalid JSONPath characters: [‘a.b’].

While these functions are supported by BigQuery, it is highly recommended to use the Standard BigQuery JSON Extraction.

JSON functionDescriptionReturn type
JSON_EXTRACTExtracts a JSON value, such as an array or object, or a JSON scalar value, such as a string, number, or boolean.JSON-formatted STRING
JSON_EXTRACT_SCALARExtracts a scalar value. A scalar value can represent a string, number, or boolean. Removes the outermost quotes and unescapes the values. Returns a SQL NULL if a non-scalar value is selected.STRING
JSON_EXTRACT_ARRAYExtracts an array of JSON values, such as arrays or objects, and JSON scalar values, such as strings, numbers, and booleans.ARRAY<JSON-formatted STRING>
JSON_EXTRACT_STRING_ARRAYExtracts an array of scalar values. A scalar value can represent a string, number, or boolean. Removes the outermost quotes and unescapes the values. Returns a SQL NULL if the selected value is not an array or not an array containing only scalar values.ARRAY<STRING>

For further information on Legacy BigQuery JSON Extraction Functions, follow the Official Documentation.

Other JSON functions

JSON functionDescriptionReturn type
TO_JSON_STRINGTakes a SQL value and returns a JSON-formatted string representation of the value.JSON-formatted STRING

Conclusion

In this article, you have learned about the types of BigQuery JSON Extraction functions. This article also provided information on Google BigQuery, its key features, and Standard JSON Extraction Functions.

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.

Visit our Website to Explore Hevo

Hevo Data with its strong integration with 100+ data sources (including 30+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice such as Google BigQuery, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools. 

Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing price, which will assist you in selecting the best plan for your requirements.

Share your experience of understanding Google BigQuery JSON Extraction Functions in the comment section below! We would love to hear your thoughts.

No-code Data Pipeline for Google BigQuery