Setting up BigQuery JSON Extract: Made Easy 101

By: Published: September 21, 2021

BigQuery JSON Extract - Feature Image | | Hevo Data

Venturing into Data Science and deciding on a tool to use to solve a given problem can be challenging at times especially when you have a wide array of choices. In this age of data transformation where organizations are constantly seeking out ways to improve the day to day handling of data being produced and looking for methods to minimize the cost of having these operations, it has become imperative to handle such data transformations in the Cloud as it is a lot easier to manage and is also cost-efficient.

Data Warehousing architectures have rapidly changed over the years and most of the notable service providers are now Cloud-based. Therefore, companies are increasingly on the move to align with such offerings on the Cloud as it provides them with a lower upfront cost, and enhances scalability, and performance as opposed to traditional On-premise Data Warehousing systems. Google BigQuery is among one the well-known and widely accepted Cloud-based Data Warehouse Applications.

In this article, you will gain information about BigQuery JSON Extraction Functions. You will also gain a holistic understanding of Google BigQuery, its key features, and the types of BigQuery JSON Extraction Functions. Read along to find out in-depth information about BigQuery JSON Extraction Functions.

Table of Contents

What is Google BigQuery?

Google BigQuery is a Cloud-based Data Warehouse that provides a Big Data Analytic Web Service for processing petabytes of data. It is intended for analyzing data on a large scale. It consists of two distinct components: Storage and Query Processing. It employs the Dremel Query Engine to process queries and is built on the Colossus File System for storage. These two components are decoupled and can be scaled independently and on-demand.

Google BigQuery is fully managed by Cloud service providers. We don’t need to deploy any resources, such as discs or virtual machines. It is designed to process read-only data. Dremel and Google BigQuery use Columnar Storage for quick data scanning, as well as a tree architecture for executing queries using ANSI SQL and aggregating results across massive computer clusters. Furthermore, owing to its short deployment cycle and on-demand pricing, Google BigQuery is serverless and designed to be extremely scalable.

For further information about Google Bigquery, follow the Official Documentation.

Key Features of Google BigQuery

Some of the key features of Google BigQuery are as follows:

1) Performance

Partitioning is supported by BigQuery, which improves Query performance. The data may be readily queried using SQL or Open Database Connectivity (ODBC)

2) Scalability

Being quite elastic, BigQuery separates computation and storage, allowing customers to scale processing and memory resources according to their needs. The tool has significant vertical and horizontal scalability and runs real-time queries on petabytes of data in a very short period.

3) Security

When a third-party authorization exists, users can utilize OAuth as a standard approach to get the cluster. By default, all data is encrypted and in transit. Cloud Identity and Access Management (IAM) allows for fine-tuning administration.

4) Usability

Google BigQuery is a highly user-friendly platform that requires a basic understanding of SQL commands, ETL tools, etc.

5) Data Types

 It supports JSON and XML file formats.

6) Data Loading

It employs the conventional ELT/ETL Batch Data Loading techniques by employing standard SQL dialect, as well as Data Streaming to load data row by row using Streaming APIs.

7) Integrations

In addition to operational databases, the system supports integration with a wide range of data integration tools, business intelligence (BI), and artificial intelligence (AI) solutions. It also works with Google Workspace and Cloud Platform.

8) Data Recovery

Data backup and disaster recovery are among the services provided by Google BigQuery. Users can query point-in-time snapshots of data changes from the last seven days.

9) Pricing Models

The Google BigQuery platform is available in both on-demand and flat-rate subscription models. Although data storage and querying will be chargedexporting, loading, and copying data is free. It has separated computational resources from storage resources. You are only charged when you run queries. The quantity of data processed during searches is billed.

What is JSON?

JSON is a widely-used text notation/format used for structured data. This schema-less format operates on ordered lists & uses key-value pairs to store data. JSON, launched as a derivation of JavaScript, is supported by most programming languages today. They have libraries from which you can retrieve the JSON Data Type. The key application of JSON is exchanging data among web clients and web servers. 

Since its inception around 15 years ago, JSON’s popularity has always increased. Today, most Web services that are available in the public domain, deploy JSON for their data exchange operations.JSON operate as a string, and are beneficial if you wish to send data across a network. However, you must first convert it into a JavaScript object. Afterward, you can access the transferred JSON data. JavaScript offers a global JSON Object that can streamline your JSON conversions and allow you to use this format easily. 

Why do we use JSON?

As the popularity of Asynchronous Java Scripts increased more and more websites are using this programming knowledge for functioning. These websites need to load data quickly in an asynchronous manner without causing page delays. Moreover, such websites, and social media platforms like Twitter, Flickr, and others use the RSS Feeds from these websites. Now loading these feeds from one website to other using JavaScript is not feasible. This is where JSON comes in!

JSON enables you to overcome the cross-domain constraints with its JSONP method. It implements a callback function that transfers data in JSON format from one domain to another.

To learn more about JSON, visit here

What is JSON Data Type?

JSON will be supported as a data type in BigQuery, according to the company. First and foremost, BigQuery already supports JSON. BigQuery, for example, has long had JSON querying and generation capabilities. What you can do is:

SELECT 
   JSON_QUERY('{"user_id": 3, "name": "Alice","address": {"street": "A Ave","city": "Aville"}}',
              "$.address.city"),
   TO_JSON_STRING(STRUCT(5 AS user_id, STRUCT('Bob' AS name)))

as a result of which:

Image Source

Consider how Geography is a data type (plus a set of functions to work with it) and how that opened up the entire universe of ST_* functions rather than us having to muck around with latitude and longitude calculations ourselves.

  • You can create a JSON field in one of two ways:* by calling PARSE JSON on a JSON-formatted string. SAFE. If the JSON string contains a syntax error, PARSE JSON instructs BigQuery to use null as the object.
  • Using the TO JSON function on a STRUCT.

This exemplifies both strategies:

WITH input_table AS (
   SELECT SAFE.PARSE_JSON('{"user_id": 3, "name": "Alice","address": {"street": "A Ave","city": "Aville"}}') AS user
   UNION ALL
   SELECT TO_JSON(STRUCT(5 AS user_id, 'Bob' as name,
                         STRUCT('B Blvd' AS street, 'Bville' AS city) AS address)) AS user
)
SELECT 
   user, 
   user['name'], 
   INT64(user.user_id),
   user.address.street
FROM input_table
  • User is a column in the JSON data type, as shown in the code above. As a result, you can select it and get the entire object as a result.
  • The syntax user[‘name’] can be used to extract parts of the JSON.
  • Dot syntax is another way to extract data. A JSON object is returned. It can be converted to the following formats: INT64 (user.user id)
  • To go into nested structures, you can use multiple dots. Up to 500 feet.
  • [0], [1], and so on can be used to access elements in arrays.

*There’s a third way to make a JSON data type: if you have a table with a JSON-type column, you can INSERT a string into that column to get auto-conversion:

INSERT INTO NEW_TABLE_WITH_JSON_TYPE
SELECT * FROM OLD_TABLE_WITH_STRINGS;

The data is stored in parsed form when you create a column of type JSON. As a result, you have access to all of the advantages of a columnar database.
Let’s make a new BigQuery dataset and simulate a table with JSON values to see how this works. JSON will be stored as a string in one column and as a JSON type in the other:

-- START Random generation function
CREATE TEMP FUNCTION MapChar(c INT64) AS (
  CASE
    WHEN c BETWEEN 0 AND 9 THEN 48 + c -- 0 to 9
    WHEN c BETWEEN 10 AND 35 THEN 55 + c -- A to Z
    ELSE 61 + c -- a to z
  END
);
CREATE TEMP FUNCTION random_string(id INTEGER) AS ((
  SELECT CODE_POINTS_TO_STRING(ARRAY_AGG(MapChar(MOD(c, 62))))
  FROM UNNEST(TO_CODE_POINTS(MD5(CAST(RAND() + id AS STRING)))) AS c
));
-- END Random generation function
CREATE OR REPLACE TABLE advdata.simulated_json AS
WITH input_data AS (
  SELECT STRUCT(id AS user_id,
                 STRUCT(random_string(id) AS name, random_string(id) AS city) AS address) AS user
  FROM UNNEST(GENERATE_ARRAY(1, 1000*1000)) AS id
)
SELECT 
  TO_JSON(user) AS user_json,
  TO_JSON_STRING(user) AS user_json_string

The above query creates one million STRUCT named users with random values. This is how they appear:

Let’s try writing a query the “traditional” way:

CREATE TEMPORARY FUNCTION INT64(x STRING) AS
(
  CAST(JSON_VALUE(x) AS INT64)
);
SELECT AVG(INT64(JSON_QUERY(user_json_string, "$.user_id")))
FROM advdata.simulated_json

The result is 500,000.5, but the amount of data processing is even more intriguing: 80 MB.

This is the total length of the JSON string.
Let’s rewrite the query this time, utilizing the JSON data type and dot syntax to parse it:

SELECT AVG(INT64(user_json.user_id))
FROM advdata.simulated_json

The query will only pull the user id “sub-column” this time, resulting in only 7.6 MB of data being handled.

So, storing log or other data that comes as JSON strings as a column whose type is JSON makes querying much more efficient (and not string).
It’s also less expensive because you’re dealing with less data (if you are paying on-demand). When you pay flat-rate, you use your slots for a shorter period of time, allowing you to do more with the time you have.

Simplify BigQuery ETL and Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 100+ different sources (including 30+ free sources) to a Data Warehouse such as Google BigQuery or Destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Get Started with Hevo for Free

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, Databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.  
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (including 40+ free sources) that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

What is BigQuery JSON Extraction Function?

JSON Extraction functions in Google BigQuery help in retrieving and transforming JSON data into a suitable format. There are majorly 2 categories of BigQuery JSON Extraction Functions. These are as follows:

In the article, we will be majorly focussing on Standard BigQuery JSON Extraction Functions.

Standard BigQuery JSON Extraction Functions

Standard BigQuery Extraction functions use double quotes to escape invalid JSONPath characters: “a.b”.It is consistent with the ANSI standard.

The 4 types of Standard BigQuery JSON Extraction Functions are as follows:

1) JSON_QUERY

A) Syntax

JSON_QUERY(json_string_expr, json_path)

B) Description

JSON_QUERY Function is a Standard BigQuery Extract Function that extracts a JSON value, such as an array or object, or a JSON scalar value, such as a string, number, or boolean. If a JSON key contains invalid JSONPath characters, you can escape them using double-quotes.

  • json_string_expr: A JSON-formatted string. For example:
'{"class" : {"students" : [{"name" : "Jane"}]}}'

extracts a SQL NULL when a JSON-formatted string “NULL” is encountered. For example:

SELECT JSON_QUERY("null", "$") -- Returns a SQL NULL.
  • json_path: The JSONPath identifies the data that you want to obtain from the input. This parameter is optional and if it is not provided, then the JSONPath $ symbol is applied, which indicates that all of the data is analyzed.
SELECT JSON_QUERY('{"a":null}', "$.a"); -- Returns an SQL NULL
SELECT JSON_QUERY('{"a":null}', "$.b"); -- Returns an SQL NULL 

You can use the JSON_QUERY function if you want to include non-scalar values such as arrays in the extraction. But, if you only want to extract scalar values such as strings, numbers, and booleans, then use the JSON_VALUE function.

C) Return type

A JSON-formatted STRING

D) Examples

In the following examples, JSON data is extracted and returned as JSON-formatted strings.

SELECT JSON_QUERY(json_text, '$') AS json_text_string
FROM UNNEST([
  '{"class" : {"students" : [{"name" : "Jane"}]}}',
  '{"class" : {"students" : []}}',
  '{"class" : {"students" : [{"name" : "John"}, {"name": "Jamie"}]}}'
  ]) AS json_text;

Output:

+-----------------------------------------------------------+
| json_text_string                                          |
+-----------------------------------------------------------+
| {"class":{"students":[{"name":"Jane"}]}}                  |
| {"class":{"students":[]}}                                 |
| {"class":{"students":[{"name":"John"},{"name":"Jamie"}]}} |
+-----------------------------------------------------------+
SELECT JSON_QUERY(json_text, '$.class.students[0]') AS first_student
FROM UNNEST([
  '{"class" : {"students" : [{"name" : "Jane"}]}}',
  '{"class" : {"students" : []}}',
  '{"class" : {"students" : [{"name" : "John"}, {"name": "Jamie"}]}}'
  ]) AS json_text;

Output:

+-----------------+
| first_student   |
+-----------------+
| {"name":"Jane"} |
| NULL            |
| {"name":"John"} |
+-----------------+
SELECT JSON_QUERY(json_text, '$.class.students[1].name') AS second_student_name
FROM UNNEST([
  '{"class" : {"students" : [{"name" : "Jane"}]}}',
  '{"class" : {"students" : []}}',
  '{"class" : {"students" : [{"name" : "John"}, {"name" : null}]}}',
  '{"class" : {"students" : [{"name" : "John"}, {"name": "Jamie"}]}}'
  ]) AS json_text;

Output:

+-------------------+
| second_student    |
+-------------------+
| NULL              |
| NULL              |
| NULL              |
| "Jamie"           |
+-------------------+
SELECT JSON_QUERY(json_text, '$.class."students"') AS student_names
FROM UNNEST([
  '{"class" : {"students" : [{"name" : "Jane"}]}}',
  '{"class" : {"students" : []}}',
  '{"class" : {"students" : [{"name" : "John"}, {"name": "Jamie"}]}}'
  ]) AS json_text;

Output:

+------------------------------------+
| student_names                      |
+------------------------------------+
| [{"name":"Jane"}]                  |
| []                                 |
| [{"name":"John"},{"name":"Jamie"}] |
+------------------------------------+

2) JSON_VALUE

A) Syntax

JSON_VALUE(json_string_expr[, json_path])

B) Description

This BigQuery JSON Extraction function extracts a scalar value and returns it as a string. A scalar value can represent a string, number, or boolean. JSON_VALUE Function removes the outermost quotes and unescapes the return values. If a JSON key uses invalid JSONPath characters, then you can escape those characters using double-quotes.

  • json_string_expr: A JSON-formatted string. For example:
'{"class" : {"students" : [{"name" : "Jane"}]}}'
  • json_path: The JSONPath, helps you identify the data that you want to obtain from the input. This is an optional parameter that if not provided, then the JSONPath $ symbol is applied, which indicates that all of the data is analyzed.
    • If json_path returns a JSON null or a non-scalar value (in other words, if json_path refers to an object or an array), then an SQL NULL is returned. This is an optional parameter that if not provided, then the JSONPath $ symbol is applied, which indicates that the entire JSON-formatted string is analyzed.

You can use the JSON_VALUE function if you only want to extract scalar values such as strings, numbers, and booleans. If you want to include non-scalar values such as arrays in the extraction, then use JSON_QUERY.

C) Return type

STRING Data type

D) Examples

The following example illustrated how outputs are returned for the JSON_QUERY and JSON_VALUE functions.

SELECT JSON_QUERY('{ "name" : "Jakob", "age" : "6" }', '$.name') AS json_name,
  JSON_VALUE('{ "name" : "Jakob", "age" : "6" }', '$.name') AS scalar_name,
  JSON_QUERY('{ "name" : "Jakob", "age" : "6" }', '$.age') AS json_age,
  JSON_VALUE('{ "name" : "Jakob", "age" : "6" }', '$.age') AS scalar_age;

Output:

+-----------+-------------+----------+------------+
| json_name | scalar_name | json_age | scalar_age |
+-----------+-------------+----------+------------+
| "Jakob"   | Jakob       | "6"      | 6          |
+-----------+-------------+----------+------------+
SELECT JSON_QUERY('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_query,
  JSON_VALUE('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_value;

Output:

+--------------------+------------+
| json_query         | json_value |
+--------------------+------------+
| ["apple","banana"] | NULL       |
+--------------------+------------+

In cases where a JSON key contains invalid JSONPath characters, you can escape those characters using double quotes. For example:

SELECT JSON_VALUE('{"a.b": {"c": "world"}}', '$."a.b".c') AS hello;

Output:

+-------+
| hello |
+-------+
| world |
+-------+

3) JSON_QUERY_ARRAY

A) Syntax

JSON_QUERY_ARRAY(json_string_expr[, json_path])

B) Description

This BigQuery JSON Extraction function helps to extract an array of JSON values, such as arrays or objects, and JSON scalar values, such as strings, numbers, and booleans. If a JSON key contains invalid JSONPath characters, then you can escape those characters using double quotes.

  • json_string_expr: A JSON-formatted string. For example:
'{"class" : {"students" : [{"name" : "Jane"}]}}'
  • json_path: The JSONPath, helps you identify the data that you want to obtain from the input. This is an optional parameter that if not provided, then the JSONPath $ symbol is applied, which indicates that all of the data is analyzed.

C) Return type

ARRAY<JSON-formatted STRING> Data type

D) Examples

The following example extracts the items in a JSON-formatted string to a string array:

SELECT JSON_QUERY_ARRAY('[1,2,3]') AS string_array;

Output:

+--------------+
| string_array |
+--------------+
| [1, 2, 3]    |
+--------------+

The following example extracts a string array and converts it to an integer array:

SELECT ARRAY(  SELECT CAST(integer_element AS INT64)  FROM UNNEST(    JSOSELECT ARRAY(
  SELECT CAST(integer_element AS INT64)
  FROM UNNEST(
    JSON_QUERY_ARRAY('[1,2,3]','$')
  ) AS integer_element
) AS integer_array;N_QUERY_ARRAY('[1,2,3]','$')  ) AS integer_element) AS integer_array;

Output:

+---------------+
| integer_array |
+---------------+
| [1, 2, 3]     |
+---------------+

The following example extracts string values in a JSON-formatted string to an array:

-- Doesn't strip the double quotes
SELECT JSON_QUERY_ARRAY('["apples","oranges","grapes"]', '$') AS string_array;

Output:

+---------------------------------+
| string_array                    |
+---------------------------------+
| ["apples", "oranges", "grapes"] |
+---------------------------------+
-- Strips the double quotes
SELECT ARRAY(
  SELECT JSON_VALUE(string_element, '$')
  FROM UNNEST(JSON_QUERY_ARRAY('["apples","oranges","grapes"]','$')) AS string_element
) AS string_array;

Output:

+---------------------------+
| string_array              |
+---------------------------+
| [apples, oranges, grapes] |
+---------------------------+

The following example extracts only the items in the fruit property to an array:

SELECT JSON_QUERY_ARRAY(
  '{"fruit":[{"apples":5,"oranges":10},{"apples":2,"oranges":4}],"vegetables":[{"lettuce":7,"kale": 8}]}',
  '$.fruit'
) AS string_array;

Output:

+-------------------------------------------------------+
| string_array                                          |
+-------------------------------------------------------+
| [{"apples":5,"oranges":10}, {"apples":2,"oranges":4}] |
+-------------------------------------------------------+

These are equivalent:

SELECT JSON_QUERY_ARRAY('{"fruits":["apples","oranges","grapes"]}','$.fruits') AS string_array;
SELECT JSON_QUERY_ARRAY('{"fruits":["apples","oranges","grapes"]}','$."fruits"') AS string_array;

Output:

-- The queries above produce the following result:
+---------------------------------+
| string_array                    |
+---------------------------------+
| ["apples", "oranges", "grapes"] |
+---------------------------------+

In cases where a JSON key uses invalid JSONPath characters, you can escape those characters using double quotes: ” “. For example:

SELECT JSON_QUERY_ARRAY('{"a.b": {"c": ["world"]}}', '$."a.b".c') AS hello;

Output:

+-----------+
| hello     |
+-----------+
| ["world"] |
+-----------+

The following examples show how invalid requests and empty arrays are handled:

-- An error is ret-- An error is returned if you provide an invalid JSONPath.
SELECT JSON_QUERY_ARRAY('["foo","bar","baz"]','INVALID_JSONPath') AS result;
-- If the JSONPath does not refer to an array, then NULL is returned.
SELECT JSON_QUERY_ARRAY('{"a":"foo"}','$.a') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If a key that does not exist is specified, then the result is NULL.
SELECT JSON_QUERY_ARRAY('{"a":"foo"}','$.b') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- Empty arrays in JSON-formatted strings are supported.
SELECT JSON_QUERY_ARRAY('{"a":"foo","b":[]}','$.b') AS result;

Output:

+--------+
| result |
+--------+
| []     |
+--------+

4) JSON_VALUE_ARRAY

A) Syntax

JSON_VALUE_ARRAY(json_string_expr[, json_path])

B) Description

This BigQuery JSON Extraction function extracts an array of scalar values and returns an array of string-formatted scalar values. A scalar value can represent a string, number, or boolean. If a JSON key contains invalid JSONPath characters, you can escape those characters using double quotes.

  • json_string_expr: A JSON-formatted string. For example:
'{"class" : {"students" : [{"name" : "Jane"}]}}'
  • json_path: The JSONPath helps to identify the data that you want to obtain from the input. This is an optional parameter which if not provided, then the JSONPath $ symbol is applied, which indicates that all of the data is analyzed.

C) Return type

ARRAY<STRING> Data type

D) Examples

The following example compares how results are returned for the JSON_QUERY_ARRAY and JSON_VALUE_ARRAY functions.

SELECT JSON_QUERY_ARRAY('["apples","oranges"]') AS json_array,
       JSON_VALUE_ARRAY('["apples","oranges"]') AS string_array;

Output:

+-----------------------+-------------------+
| json_array            | string_array      |
+-----------------------+-------------------+
| ["apples", "oranges"] | [apples, oranges] |
+-----------------------+-------------------+

The following example extracts the items in a JSON-formatted string to a string array:

-- Strips the double quotes
SELECT JSON_VALUE_ARRAY('["foo","bar","baz"]','$') AS string_array;

Output:

+-----------------+
| string_array    |
+-----------------+
| [foo, bar, baz] |
+-----------------+

The following example extracts a string array and converts it to an integer array:

SELECT ARRAY(  SELECT CAST(integer_element AS INT64) SELECT ARRAY(
  SELECT CAST(integer_element AS INT64)
  FROM UNNEST(
    JSON_VALUE_ARRAY('[1,2,3]','$')
  ) AS integer_element
) AS integer_array; FROM UNNEST(    JSON_VALUE_ARRAY('[1,2,3]','$')  ) AS integer_element) AS integer_array;+---------------+| integer_array |+---------------+| [1, 2, 3]     |+---------------+

Output:

+---------------+
| integer_array |
+---------------+
| [1, 2, 3]     |
+---------------+

These are equivalent:

SELECT JSON_VALUE_ARRAY('{"fruits":["apples","oranges","grapes"]}','$.fruits') AS string_array;
SELECT JSON_VALUE_ARRAY('{"fruits":["apples","oranges","grapes"]}','$."fruits"') AS string_array;

Output:

-- The queries above produce the following result:
+---------------------------+
| string_array              |
+---------------------------+
| [apples, oranges, grapes] |
+---------------------------+

In cases where a JSON key uses invalid JSONPath characters, you can escape those characters using double quotes: " ". For example:

SELECT JSON_VALUE_ARRAY('{"a.b": {"c": ["world"]}}', '$."a.b".c') AS hello;

Output:

+---------+
| hello   |
+---------+
| [world] |
+---------+

The following examples explore how invalid requests and empty arrays are handled:

-- An error is thrown if you provide an invalid JSONPath.
SELECT JSON_VALUE_ARRAY('["foo","bar","baz"]','INVALID_JSONPath') AS result;
-- If the JSON-formatted string is invalid, then NULL is returned.
SELECT JSON_VALUE_ARRAY('}}','$') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If the JSON document is NULL, then NULL is returned.
SELECT JSON_VALUE_ARRAY(NULL,'$') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If a JSONPath does not match anything, then the output is NULL.
SELECT JSON_VALUE_ARRAY('{"a":["foo","bar","baz"]}','$.b') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If a JSONPath matches an object that is not an array, then the output is NULL.
SELECT JSON_VALUE_ARRAY('{"a":"foo"}','$') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If a JSONPath matches an array of non-scalar objects, then the output is NULL.
SELECT JSON_VALUE_ARRAY('{"a":[{"b":"foo","c":1},{"b":"bar","c":2}],"d":"baz"}','$.a') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If a JSONPath matches an array of mixed scalar and non-scalar objects,
-- then the output is NULL.
SELECT JSON_VALUE_ARRAY('{"a":[10, {"b": 20}]','$.a') AS result;

Output:

+--------+
| result |
+--------+
| NULL   |
+--------+
-- If a JSONPath matches an empty JSON array, then the output is an empty array instead of NULL.
SELECT JSON_VALUE_ARRAY('{"a":"foo","b":[]}','$.b') AS result;

Output:

+--------+
| result |
+--------+
| []     |
+--------+
-- If a JSONPath matches an array that contains scalar objects and a JSON null,
-- then the output of the JSON_VALUE_ARRAY function must be transformed
-- because the final output cannot be an array with NULL values. This example
-- uses the UNNEST operator to convert the output array into a table as the final output.
SELECT string_value FROM UNNEST(JSON_VALUE_ARRAY('["world", 1, null]')) AS string_value;

Output:

+--------------+
| string_value |
+--------------+
| world        |
| 1            |
| NULL         |
+--------------+

Legacy BigQuery JSON Extraction functions

The following Legacy BigQuery JSON Extraction functions use single quotes and brackets to escape invalid JSONPath characters: [‘a.b’].

While these functions are supported by BigQuery, it is highly recommended to use the Standard BigQuery JSON Extraction.

JSON functionDescriptionReturn type
JSON_EXTRACTExtracts a JSON value, such as an array or object, or a JSON scalar value, such as a string, number, or boolean.JSON-formatted STRING
JSON_EXTRACT_SCALARExtracts a scalar value. A scalar value can represent a string, number, or boolean. Removes the outermost quotes and unescapes the values. Returns a SQL NULL if a non-scalar value is selected.STRING
JSON_EXTRACT_ARRAYExtracts an array of JSON values, such as arrays or objects, and JSON scalar values, such as strings, numbers, and booleans.ARRAY<JSON-formatted STRING>
JSON_EXTRACT_STRING_ARRAYExtracts an array of scalar values. A scalar value can represent a string, number, or boolean. Removes the outermost quotes and unescapes the values. Returns a SQL NULL if the selected value is not an array or not an array containing only scalar values.ARRAY<STRING>

For further information on Legacy BigQuery JSON Extraction Functions, follow the official documentation.

  • Legacy BigQuery JSON Extraction Function : JSON_EXTRACT
  • Legacy BigQuery JSON Extraction Function : JSON_EXTRACT_SCALAR
  • Legacy BigQuery JSON Extraction Function : JSON_EXTRACT_ARRAY
  • Legacy BigQuery JSON Extraction Function : JSON_EXTRACT_STRING_ARRAY

Legacy BigQuery JSON Extraction Function : JSON_EXTRACT

Extracts a JSON scalar value, such as a string, number, or boolean value, or a JSON value, such as an array or object. If a JSON key contains invalid JSONPath characters, you can use single quotes and brackets to escape them.

JSON_EXTRACT(json_string_expr, json_path)
JSON_EXTRACT(json_expr, json_path)

json_string_expr: A JSON-formatted string. For example:

'{"class" : {"students" : [{"name" : "Jane"}]}}'

When a JSON-formatted string “null” is encountered, this function extracts a SQL NULL. Consider the following scenario:

SELECT JSON_EXTRACT("null", "$") -- Returns a SQL NULL

json_expr: JSON. For example:

JSON '{"class" : {"students" : [{"name" : "Jane"}]}}'

When a JSON null is encountered, this method extracts a JSON null value.

SELECT JSON_EXTRACT(JSON 'null', "$") -- Returns a JSON 'null'

json_path: The JSONPath is a class that represents a path in JSON format. This identifies the data you’re looking for in the input. If this parameter is omitted, the JSONPath $ symbol is used, which means that the entire dataset is analyzed.

SELECT JSON_EXTRACT('{"a":null}', "$.a"); -- Returns a SQL NULL
SELECT JSON_EXTRACT('{"a":null}', "$.b"); -- Returns a SQL NULL
SELECT JSON_EXTRACT(JSON '{"a":null}', "$.a"); -- Returns a JSON 'null'
SELECT JSON_EXTRACT(JSON '{"a":null}', "$.b"); -- Returns a SQL NULL

Use JSON EXTRACT if you want to include non-scalar values like arrays in the extraction. Use JSON EXTRACT SCALAR if you only want to extract scalar values like strings, numbers, and booleans.

Return type

  • json_string_expr: A JSON-formatted STRING
  • json_exprJSON

JSON data is extracted and returned as JSON in the following example.

Legacy BigQuery JSON Extraction Function : JSON_EXTRACT_SCALAR

JSON_EXTRACT_SCALAR(json_string_expr[, json_path])
JSON_EXTRACT_SCALAR(json_expr[, json_path])

Returns a string after extracting a scalar value. A string, number, or boolean value can all be represented by a scalar value. The outermost quotes are removed, and the return values are unescaped. If a JSON key contains invalid JSONPath characters, single quotes and brackets can be used to escape those characters.

json_string_expr: A JSON-formatted string. For example:

'{"class" : {"students" : [{"name" : "Jane"}]}}'

json_expr: JSON. For example:

JSON '{"class" : {"students" : [{"name" : "Jane"}]}}'

json path: The JSONPath . This identifies the data you’re looking for in the input. If this parameter is omitted, the JSONPath $ symbol is used, which means that the entire dataset is analysed.

A SQL NULL is returned if json path returns a JSON null or a non-scalar value (in other words, if json path refers to an object or an array). If this parameter is not specified, the JSONPath $ symbol is used, which means the entire JSON-formatted string is analysed.

Use JSON EXTRACT SCALAR if you only want to extract scalar values like strings, numbers, and booleans. Use JSON EXTRACT if you want to include non-scalar values like arrays in the extraction.

Return type

STRING

Examples

Age is extracted in the following example.

SELECT JSON_EXTRACT_SCALAR(JSON '{ "name" : "Jakob", "age" : "6" }', '$.age') AS scalar_age;

+------------+
| scalar_age |
+------------+
| 6          |
+------------+

The JSON EXTRACT and JSON EXTRACT SCALAR functions return different results, as shown in the following example.

SELECT JSON_EXTRACT('{ "name" : "Jakob", "age" : "6" }', '$.name') AS json_name,
  JSON_EXTRACT_SCALAR('{ "name" : "Jakob", "age" : "6" }', '$.name') AS scalar_name,
  JSON_EXTRACT('{ "name" : "Jakob", "age" : "6" }', '$.age') AS json_age,
  JSON_EXTRACT_SCALAR('{ "name" : "Jakob", "age" : "6" }', '$.age') AS scalar_age;

+-----------+-------------+----------+------------+
| json_name | scalar_name | json_age | scalar_age |
+-----------+-------------+----------+------------+
| "Jakob"   | Jakob       | "6"      | 6          |
+-----------+-------------+----------+------------+
SELECT JSON_EXTRACT('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_extract,
  JSON_EXTRACT_SCALAR('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_extract_scalar;

+--------------------+---------------------+
| json_extract       | json_extract_scalar |
+--------------------+---------------------+
| ["apple","banana"] | NULL                |
+--------------------+---------------------+

If a JSON key contains invalid JSONPath characters, you can use single quotes and brackets, [‘ ‘], to escape those characters. Consider the following scenario:

SELECT JSON_EXTRACT_SCALAR('{"a.b": {"c": "world"}}', "$['a.b'].c") AS hello;

+-------+
| hello |
+-------+
| world |
+-------+

Legacy BigQuery JSON Extraction Function : JSON_EXTRACT_ARRAY

JSON_EXTRACT_ARRAY(json_string_expr[, json_path])
JSON_EXTRACT_ARRAY(json_expr[, json_path])

Extracts a list of JSON values, such as arrays or objects, as well as JSON scalar values, like strings, numbers, and booleans. If a JSON key contains invalid JSONPath characters, you can use single quotes and brackets to escape them.

son_string_expr: A JSON-formatted string. For example:

'{"class" : {"students" : [{"name" : "Jane"}]}}'

json_expr: JSON. For example:

JSON '{"class" : {"students" : [{"name" : "Jane"}]}}'

json_path: The JSONPath.

This identifies the data from the input that you want to extract. If this parameter is omitted, the JSONPath $ symbol is used, which means that all data is analysed.

If this parameter is omitted, the JSONPath $ symbol is used, which means the entire JSON-formatted string is examined.

Return type

  • json_string_exprARRAY<JSON-formatted STRING>
  • json_exprARRAY<JSON>

Examples

This converts JSON items into an array of JSON values:

SELECT JSON_EXTRACT_ARRAY(
  JSON '{"fruits":["apples","oranges","grapes"]}','$.fruits'
  ) AS json_array;

+---------------------------------+
| json_array                      |
+---------------------------------+
| ["apples", "oranges", "grapes"] |
+---------------------------------+

This creates a string array from the items in a JSON-formatted string:

SELECT JSON_EXTRACT_ARRAY('[1,2,3]') AS string_array;

+--------------+
| string_array |
+--------------+
| [1, 2, 3]    |
+--------------+

This function converts a string array to an integer array:

This extracts string values in a JSON-formatted string to an array:

-- Doesn't strip the double quotes
SELECT JSON_EXTRACT_ARRAY('["apples","oranges","grapes"]', '$') AS string_array;

+---------------------------------+
| string_array                    |
+---------------------------------+
| ["apples", "oranges", "grapes"] |
+---------------------------------+

-- Strips the double quotes
SELECT ARRAY(
  SELECT JSON_EXTRACT_SCALAR(string_element, '$')
  FROM UNNEST(JSON_EXTRACT_ARRAY('["apples","oranges","grapes"]','$')) AS string_element
) AS string_array;

+---------------------------+
| string_array              |
+---------------------------+
| [apples, oranges, grapes] |
+---------------------------+

Only the items in the fruit property are extracted to an array:

SELECT JSON_EXTRACT_ARRAY(
  '{"fruit":[{"apples":5,"oranges":10},{"apples":2,"oranges":4}],"vegetables":[{"lettuce":7,"kale": 8}]}',
  '$.fruit'
) AS string_array;

+-------------------------------------------------------+
| string_array                                          |
+-------------------------------------------------------+
| [{"apples":5,"oranges":10}, {"apples":2,"oranges":4}] |
+-------------------------------------------------------+

These are equivalent:

SELECT JSON_EXTRACT_ARRAY('{"fruits":["apples","oranges","grapes"]}','$[fruits]') AS string_array;

SELECT JSON_EXTRACT_ARRAY('{"fruits":["apples","oranges","grapes"]}','$.fruits') AS string_array;

-- The queries above produce the following result:
+---------------------------------+
| string_array                    |
+---------------------------------+
| ["apples", "oranges", "grapes"] |
+---------------------------------+

Invalid requests and empty arrays are handled in the following examples:

An error is thrown if a JSONPath isn’t valid.
The output is NULL if the JSON formatted string is invalid.
It’s fine if the JSON-formatted string includes empty arrays.

-- An error is thrown if you provide an invalid JSONPath.
SELECT JSON_EXTRACT_ARRAY('["foo","bar","baz"]','INVALID_JSONPath') AS result;

-- If the JSONPath does not refer to an array, then NULL is returned.
SELECT JSON_EXTRACT_ARRAY('{"a":"foo"}','$.a') AS result;

+--------+
| result |
+--------+
| NULL   |
+--------+

-- If a key that does not exist is specified, then the result is NULL.
SELECT JSON_EXTRACT_ARRAY('{"a":"foo"}','$.b') AS result;

+--------+
| result |
+--------+
| NULL   |
+--------+

-- Empty arrays in JSON-formatted strings are supported.
SELECT JSON_EXTRACT_ARRAY('{"a":"foo","b":[]}','$.b') AS result;

+--------+
| result |
+--------+
| []     |
+--------+

Legacy BigQuery JSON Extraction Function : JSON_EXTRACT_STRING_ARRAY

JSON_EXTRACT_STRING_ARRAY(json_string_expr[, json_path])
JSON_EXTRACT_STRING_ARRAY(json_expr[, json_path])

Returns an array of string-formatted scalar values from an array of scalar values extracted. A string, number, or boolean can all be represented by a scalar value. You can use single quotes and brackets to escape invalid JSONPath characters in a JSON key.

json_string_expr: A JSON-formatted string. For example:

'{"class" : {"students" : [{"name" : "Jane"}]}}'

json_expr: JSON. For example:

The JSONPath is represented by json path. This identifies the data you’re looking for in the input. If this parameter is omitted, the JSONPath $ symbol is used, which means that the entire dataset is analysed.

Return type

ARRAY<STRING>

Examples

This extracts items in JSON to a string array:

SELECT JSON_EXTRACT_STRING_ARRAY(
  JSON '{"fruits":["apples","oranges","grapes"]}','$.fruits'
  ) AS string_array;

+---------------------------+
| string_array              |
+---------------------------+
| [apples, oranges, grapes] |
+---------------------------+

Other JSON functions

JSON FunctionDescriptionReturn Type
PARSE_JSONTakes a JSON-formatted string and returns a JSON value.JSON
TO_JSONTakes a SQL value and returns a JSON value.JSON
TO_JSON_STRINGTakes a SQL value and returns a JSON-formatted string representation of the value.JSON-formatted STRING
STRINGExtracts a string from JSON.STRING
BOOLExtracts a boolean from JSON.BOOL
INT64Extracts a 64-bit integer from JSON.INT64
FLOAT64Extracts a 64-bit floating-point number from JSON.FLOAT64
JSON_TYPEReturns the type of the outermost JSON value as a string.STRING

Conclusion

In this article, you have learned about the types of BigQuery JSON Extraction functions. This article also provided information on Google BigQuery, its key features, and Standard JSON Extraction Functions.

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.

Visit our Website to Explore Hevo

Hevo Data with its strong integration with 100+ data sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice such as Google BigQuery, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools. 

Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing pricing, which will assist you in selecting the best plan for your requirements.

Share your experience of understanding Google BigQuery JSON Extraction Functions in the comment section below! We would love to hear your thoughts.

mm
Former Research Analyst, Hevo Data

Manisha is a data analyst with experience in diverse data tools like Snowflake, Google BigQuery, SQL, and Looker. She has hadns on experience in using data analytics stack for various problem solving through analysis. Manisha has written more than 100 articles on diverse topics related to data industry. Her quest for creative problem solving through technical content writing and the chance to help data practitioners with their day to day challenges keep her write more.

No-code Data Pipeline for Google BigQuery