One of Snowflake’s greatest strengths is how fast it can ingest both structured and unstructured data. But, understanding different aspects of handling Snowflake requires effort. You need a guide to figure out how Snowflake handles its JSON objects, and arrays in JSON data. This is exactly what we are going to cover in this blog.

In this post, you will look at how you can work with JSON files in Snowflake, how to directly query the Snowflake JSON data and copy it into a Snowflake table. Let’s get started!

Basic JSON Syntax

Snowflake JSON - JSON Structure
Image Source

Rules:

  • <KEY>:<VALUE> Data is in key/value pairs
  • Data is separated by commas
  • Objects are enclosed in curly braces ({})
  • Arrays are enclosed in square brackets ([])
"movies":[
    {"movie":"Iron Man", "producer":"Avi Arad"},
    {"movie":"Sherlock Holmes", "producer":"Joel Silver"},
    {"movie":"Dolittle", "producer":"Joe Roth"}
]

JSON Syntax Rewrite : 

  1. Key/Value Pairs: JSON data is written as key/value pairs. It consists of a field name (in double quotes), followed by a colon, then the value. For example: “firstName”:”John”.
  2. Objects: In JSON, objects are a collection of name/value pairs and are enclosed in curly braces {}. An object can contain multiple name/value pairs. Each name is followed by a colon and the name/value pairs are separated by commas. For example: {“firstName”:”John”, “lastName”:”Doe”}.
  3. Arrays: Arrays in JSON are ordered collections and are enclosed in square brackets []. An array can contain multiple values, which can be strings, numbers, objects, arrays, boolean, null, etc. The values in an array are separated by commas. For example:
"movies":[
    {"movie":"Iron Man", "producer":"Avi Arad"},
    {"movie":"Sherlock Holmes", "producer":"Joel Silver"},
    {"movie":"Dolittle", "producer":"Joe Roth"}
]

4. Values: JSON values can be:

  • A number (integer or floating point). For example: 100, 3.14.
  • A string (must be in double quotes). For example: “Hello”.
  • A Boolean (true or false). For example: true, false.
  • An array (enclosed in square brackets). For example: [“apple”, “banana”, “cherry”].
  • An object (enclosed in curly braces). For example: {“name”:”John”, “age”:30}.

null , which represents an empty value.

Why Worry About JSON?

Over the years, Javascript has risen in popularity to become the most popular server-side scripting language. This has led to there being lots of JSON data out there. 

  • There are lots of applications and websites that use JSON for tracking web logs.
  • Massive amounts of JSON data is being outputted by IoT and wMobile devices as they communicate with REST APIs in JSON format.
  • Lots of open datasets are publicly published in JSON, for example;
    • data.gov
    • datasf.org
    • opendata.cityofnewyork.us
    • data.austintexas.gov

Data engineers and analysts often use the ‘snowflake parse json’ feature in Snowflake to extract valuable insights from semi-structured JSON data. This shows that there is a lot of data out there that can be incorporated into the analysis processes that companies are doing in their Data Warehouses.

Why Query Snowflake JSON Data?

In Snowflake, you can natively ingest semi-structured data not only in JSON but also in XML, Parquet, Avro, ORC, and other formats. This means that in Snowflake, you can efficiently store JSON data and then access it using SQL.

Snowflake JSON allows you to parse json into columns. You can then query this data with SQL and join it to other structured data without having to do any transformations. This allows organizations to reduce the complexity of their data pipelines as well as increase the speed at which this data becomes available for analysis. The ease of Snowflake JSON parsing makes it a preferred choice for organizations seeking streamlined data extraction and analysis from JSON files.

How does Snowflake handle Snowflake JSON Objects?

Snowflake does this using a data type they call a VARIANT. This data type allows semi-structured data to be loaded, as is, into a column called VARIANT in a relational table as you shall see later. As this data is ingested, Snowflake automatically discovers the attributes, keys, and structure that exists in the JSON document. Statistics are also collected in Snowflake’s metadata repository which enables optimization.

Download the Cheatsheet on How to Set Up ETL to Snowflake
Download the Cheatsheet on How to Set Up ETL to Snowflake
Download the Cheatsheet on How to Set Up ETL to Snowflake
Learn the best practices and considerations for setting up high-performance ETL to Snowflake

How to Parse JSON in Snowflake?

When working with complex JSON data in Snowflake, the ‘snowflake parse json into columns’ functionality simplifies the process of extracting structured information. Parse_json Snowflake function is a powerful tool that allows you to effortlessly extract and manipulate JSON data within your data warehouse. Follow the steps below to parse Snowflake JSON data in Snowflake:

Hevo Data: A Smart Alternative to Load Data into Snowflake

Looking for an easy way to centralize all your data to Snowflake? Hevo Data, a No-Code Data Pipeline platform, helps to transfer data from 150+ Data sources to Snowflake. Hevo completely automates the process of not only exporting data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code.

Get started with Hevo today!

Sign up here for a 14-Day Free Trial!

Step 1: Create a Table

Execute a simple create statement.

create or replace table json_table (v variant);

This command creates a single-column table with a column “v”.

Step 2: Load JSON Data

Load a sample JSON document using a simple INSERT statement and Snowflake’s PARSE_JSON function.

insert into json_table
select
parse_json(
'{
      "fullName":"Robert Downey",
      "age":55,
      "gender":"Male",
      "address": {
            "areaCode":"91506",
          "suite":"916 West Burbank Blvd"
                },
      "movies": [
          {"name":"Iron Man","budget":"$150M","producer":"Avi Arad"},
          {"name":"Sherlock Holmes","budget":"$200M","producer":"Joel Silver"},
          {"name":"Dolittle","budget":"$175M","producer":"Joe Roth"}
                ]
}');

In this example, you have an array of highly nested objects. In a production environment, you would use snowpipe or a copy command e.g.

copy into json_table from '@demo_stage/path_to_file/file1.json;
OR
copy into json_table from 's3://demo_bucket/file1.json;

Snowflake parses the JSON into sub-columns based on the key-value pairs at load time. These keys are recorded as pointers in metadata for optimization. Structural information is dynamically derived based on the schema definition embedded in the JSON string (Schema-on-Read).

Step 3: Start Pulling Data

Let’s get the data from the sub-column “fullName”.

select v:fullName from json_table;
1 Row Produced
row #V:FULLNAME
1“Robert Downey”

A colon is used to reference the JSON sub-columns where:

v = the column name in the json_table table

fullName = attribute in the JSON schema

v:fullName = notation to indicate which attribute in column “v” we want to select

Step 4: Casting the Data

The table that was returned in the previous step came back with double quotes around it and now you are going to cast that data into an actual data type.

select v:fullName::string as full_name from json_table;
1 Row Produced
row #V:FULL_NAME
1Robert Downey

You used :: to cast to a string data type and added an alias with “as” just like in regular SQL. 

You can now query JSON data without learning a new programming language.

select
    v:fullName::string as full_name,
    v:age::int as age,
    v:gender::string as gender
from json_table
1 Row Produced
row #FULL_NAMEAGEGENDER
1Robert Downey55Male

Handling More Snowflake JSON Complexity

The sample Snowflake JSON document that you used contains some nested data. “address” is a sub-column and “areaCode” and “suite” are sub-columns of the sub-column. You can access these columns using a very familiar table.column dot notation.

select
    v:address.areaCode::string as area_code,
    v:address.suite::string as suite
from json_table;
1 Row Produced
row #AREA_CODESUITE
191506916 West Burbank Blvd

If the structure changes and a new attribute-like state is added to the Snowflake JSON document, any SQL you’ve written already will work. It is also fairly easy to modify the query and easily adapt the SQL, for example;

select
    v:address.areaCode::string as area_code,
    v:address.suite::string as suite,    
    v:address.state::string as state
from json_table;

If the reverse happens and an attribute is dropped, the query will not fail. It will simply return a NULL value.

Handling Arrays in Snowflake JSON

You can embed arrays inside Snowflake JSON documents, for example;

"movies": [
          {"name": "Iron Man", "budget": "$150", "producer": "Avi Arad"},
          {"name": "Sherlock Homes", "budget": "$200", "producer": "Joel Silver"},
          {"name": "Dolittle", "budget": "$175", "producer": "Joe Roth"}
          ]

“movies” is a sub-column with an array. There are 3 rows in the array and each row has 3 sub-columns (name, budget, and producer). To pull the data of each row, use FLATTEN with the FROM clause and give it a table alias. FLATTEN takes an array and returns a row for each element in the array. It selects all the data in the array as though it were rows in the table.

select
    f.value:name::string as movie_name,
    f.value:budget::string as movie_budget,
    f.value:producer::string as movie_producer
from json_table, table(flatten(v:movies)) f;

f = table alias for the flattened array

name = sub-column in the array

value = returns the value of the sub-column

3 Rows Produced
row #MOVIE_NAMEMOVIE_BUDGETMOVIE_PRODUCER
1Iron Man$150MAvi Arad
2Sherlock Holmes$200MJoel Silver
3Dolittle$175MJoe Roth

Method 2: Parsing JSON Arrays Directly from a Staged JSON File

You can access the first entry in the JSON object array within the JSON data file uploaded as json_sample_data2 like so:

SELECT 'The First Employee Record is '||
 S.$1:root[0].employees[0].firstName||
 ' '||S.$1:root[0].employees[0].lastName
FROM @~/json/json_sample_data2 (file_format => 'json') as S;

The returned output would be: The First Employee Record Is: John Doe.

Utilizing the FLATTEN Function for JSON Array Parsing

The FLATTEN function is utilized to parse JSON arrays. FLATTEN is a table function that transforms a repeated field into a set of rows. It takes a single scalar value containing multiple values for a repeated field and unrolls it into multiple records, generating one record for each value of the previously repeated field. Any non-repeated fields are duplicated to populate each of the new records that are created. FLATTEN effectively removes one level of nesting.

In the context of uploaded JSON data stored in a single column (S.$1), the provided query parses the array within the json_sample_data file. When the query applies the LATERAL FLATTEN function, it results in three values for the fullName field.

select t.value 
from @~/json/json_sample_data.gz (file_format => 'json') as S
, table(flatten(S.$1,'root')) t;
select t.value:fullName 
from @~/json/json_sample_data.gz (file_format => 'json') as S, 
table(flatten(S.$1,'root')) t;

Method 3: Using the PARSE_JSON Function

This method parses text as a JSON document and returns a VARIANT value.  If the input is NULL, the output is also NULL. If the input text is ‘null’, it is processed as a JSON null value, therefore the output is not a SQL NULL, but a legal VARIANT value containing null (the distinction is obvious when displaying this VARIANT value).

SELECT t.value:children[0],
       t.value:children[0].age,
       parse_json(t.value:children[0].age)
FROM @~/json/json_sample_data.gz (file_format => 'json') as S, table(flatten(S.$1,'root')) t;

Method 4: Using the GET_PATH Function

The GET_PATH method extracts a value from structured data based on a path name. It is a GET variation that accepts a variation, OBJECT, or ARRAY value as the first parameter and then extracts the VARIANT value of the field or element using the path name as the second argument.

Path name notation is a typical JavaScript notation consisting of a concatenation of field names (identifiers) followed by dots and index operators [index]. The leading dot in the first field name is optional. The index operations accept non-negative decimal integers (for arrays) or single or double-quoted text literals (for objects).

GET_PATH represents a chain of GET functions.  If the path name matches any element, it returns NULL.

SELECT GET_PATH(S.$1, 'root[0].fullName')
FROM @~/json/json_sample_data.gz (file_format => 'json') as S;
SELECT t.value:fullName
FROM @~/json/json_sample_data.gz (file_format => 'json') as S
, table(flatten(S.$1,'root')) t;
SELECT 
t.value[0]:children as First_person_children,  
t.value[1]:children as Second_person_children,
t.value[2]:children as Third_Person_children                           
FROM @~/json/json_sample_data.gz (file_format => 'json') as S
, table(flatten(S.$1,'')) t;

Output

The SELECT command returns the whole contents of the json_sample_data file (some results are shown):):

+-----------------------------------+
| VALUE                             |
|-----------------------------------|
| [                                 |
|   {                               |
|     "age": 22,                    |
|     "children": [                 |
|       {                           |
|         "age": "6",               |
|         "gender": "Female",       |
|         "name": "Jane"            |
|       },                          |
|       {                           |
|         "age": "15",              |
|         "gender": "Male",         |
|         "name": "John"            |
|       }                           |
|     ],                            |
|     "citiesLived": [              |
|       {                           |
|         "place": "Seattle",       |
|         "yearsLived": [           |
|           "1995"                  |
|         ]                         |
|       },                          |
|       {                           |
|         "place": "Stockholm",     |
|         "yearsLived": [           |
|           "2005"                  |
|         ]                         |
|       }                           |
|     ],                            |
|     "fullName": "John Doe",       |
|     "gender": "Male",             |
|     "kind": "person",             |
|     "phoneNumber": {              |
|       "areaCode": "206",          |
|       "number": "1234567"         |
|     }                             |
|   },                              |
..
+-----------------------------------
SELECT t.value
FROM @~/json/json_sample_data.gz (file_format => 'json') as S
, table(flatten(S.$1,'root')) t;

Output

The three data arrays representing the three workers’ records and their dependant data are returned.

SELECT
array_size(t.value)
FROM @~/json/json_sample_data.gz (file_format => 'json') as S
, table(flatten(S.$1,'')) t;

Output

In the JSON example file used in this lesson, there are three employee records and all of their dependent entries (children’s names, ages, and so on).

SELECT emps.value:fullName, citliv.value:place, citliv.value:yearsLived , yrliv.value 
FROM @~/json/json_sample_data.gz (file_format => 'json') as S,
table(flatten(S.$1,'root')) emps,
table(flatten(emps.value:citiesLived,'')) citliv ,
table(flatten(citliv.value:yearsLived,'')) yrliv;

Output:

SELECT emps.value:fullName, citliv.value:place, count(yrliv.value) as Years_Count 
FROM @~/json/json_sample_data.gz (file_format => 'json') as S,
table(flatten(S.$1,'root')) emps,
table(flatten(emps.value:citiesLived,'')) citliv,
table(flatten(citliv.value:yearsLived,'')) yrliv
group by emps.value:fullName, citliv.value:place
order by 1,2;

Output

Conclusion

In summary, you can see that the VARIANT data type offers native support for querying Snowflake JSON without the need to analyze the structure ahead of time or perform complex transformations. Querying Snowflake JSON data provides the same performance as all the standard relational data types.

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready. 

Try Hevo today, a fully managed data pipeline for all your data needs.

Visit our Website to Explore Hevo

Want to take Hevo for a spin?

Sign Up and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable Hevo pricing that will help you choose the right plan for your business needs.

Share your experience with Snowflake JSON in the comments section below!

mm
Former Director of Product Management, Hevo Data

Vivek Sinha has extensive experience in real-time analytics and cloud-native technologies. With a focus on Apache Pinot, he was a driving force in shaping innovation and defensible differentiators, including enhanced query processing, data mutability support, and cost-effective tiered storage solutions at Hevo. He also demonstrates a passion for exploring and implementing innovative trends within the dynamic data industry landscape.

No-Code Data Pipeline for Snowflake