Working with Snowflake JSON Parse Made Easy 101

Vivek Sinha • Last Modified: August 29th, 2023

Snowflake JSON - Featured Image

One of Snowflake’s greatest strengths is how fast it can ingest both structured and unstructured data. In this post, you will look at how you can work with JSON files in Snowflake, how to directly query the Snowflake JSON data and copy it into a Snowflake table.

Table of Contents

What is JSON (JavaScript Object Notation)?

Snowflake JSON - JSON
Image Source

JSON pronounced as Jay-Sawn or Jason is a popular data serialization format that is both easy for humans to read and write and also easy for machines to parse and generate. The JSON file format is derived from the JavaScript Programming Language, Standard ECMA-262 3rd edition. It is used primarily to transmit data between a server and a web application and was originally designed as an alternative to XML.

Basic JSON Syntax

Snowflake JSON - JSON Structure
Image Source

Rules:

  • <KEY>:<VALUE> Data is in key/value pairs
  • Data is separated by commas
  • Objects are enclosed in curly braces ({})
  • Arrays are enclosed in square brackets ([])
"movies":[
    {"movie":"Iron Man", "producer":"Avi Arad"},
    {"movie":"Sherlock Holmes", "producer":"Joel Silver"},
    {"movie":"Dolittle", "producer":"Joe Roth"}
]

JSON values can be:

  • A number
  • A string 
  • A Boolean
  • An array
  • A JSON object
  • Null

Why Worry About JSON?

Over the years, Javascript has risen in popularity to become the most popular server-side scripting language. This has led to there being lots of JSON data out there. 

  • There are lots of applications and websites that use JSON for tracking web logs.
  • Massive amounts of JSON data is being outputted by IoT and wMobile devices as they communicate with REST APIs in JSON format.
  • Lots of open datasets are publicly published in JSON, for example;
    • data.gov
    • datasf.org
    • opendata.cityofnewyork.us
    • data.austintexas.gov

This shows that there is a lot of data out there that can be incorporated into the analysis processes that companies are doing in their Data Warehouses.

What is Snowflake?

Snowflake JSON - Snowflake Logo
Image Source

Snowflake is a Cloud-based Warehousing platform that provides you with a framework that is easy to use, faster, and much more adaptable than traditional Data Warehouses. Since Snowflake is completely Cloud-based, it features a robust SaaS (Software as a Service) architecture. It simplifies data processing by letting users do operations such as data blending, analysis, and transformations on a range of data formats using SQL. Snowflake’s multi-tenant architecture enables real-time data sharing throughout your organization.

Snowflake is a SQL-based Data Warehouse that can handle both structured and semi-structured data types. The separation of Storage, Compute, and Metadata Services is its key benefit. Snowflake Primitives allows Database Administrators (DBAs) to manage logical assets from a central location. Snowflake can spin up new nodes (CPU and Storage) in seconds while the Data Warehouse is active. It also saves a lot of money when compared to typical Data Warehouse solutions.

Why Query Snowflake JSON Data?

In Snowflake, you can natively ingest semi-structured data not only in JSON but also in XML, Parquet, Avro, ORC, and other formats. This means that in Snowflake, you can efficiently store JSON data and then access it using SQL.

Snowflake JSON allows you to load JSON  data directly into relational tables. You can then query this data with SQL and join it to other structured data without having to do any transformations. This allows organizations to reduce the complexity of their data pipelines as well as increase the speed at which this data becomes available for analysis.

Hevo Data: A Smart Alternative to Load Data into Snowflake

Hevo Data, a No-Code Data Pipeline helps to transfer data from 100+ Data sources to Snowflake. Hevo is fully-managed and completely automates the process of not only exporting data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code.

Get Started with Hevo for Free

Check out some amazing features of Hevo:

  • Completely Managed Platform: Hevo is fully managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to export. 
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of Schema management & automatically detects the Schema of incoming data and maps it to the destination Schema.
  • Minimal Learning: Hevo with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.

Get started with Hevo today!

Sign up here for a 14-Day Free Trial!

How does Snowflake handle Snowflake JSON Objects?

Snowflake does this using a data type they call a VARIANT. This data type allows semi-structured data to be loaded, as is, into a column called VARIANT in a relational table as you shall see later. As this data is ingested, Snowflake automatically discovers the attributes, keys, and structure that exists in the JSON document. Statistics are also collected in Snowflake’s metadata repository which enables optimization.

Download the Cheatsheet on How to Set Up ETL to Snowflake
Download the Cheatsheet on How to Set Up ETL to Snowflake
Download the Cheatsheet on How to Set Up ETL to Snowflake
Learn the best practices and considerations for setting up high-performance ETL to Snowflake

Working with Snowflake JSON

Prerequisites

  1. The instructions assume you have some familiarity with Snowflake already.
  2. You will need a Snowflake account to execute the commands in these examples.
  3. You will also need to install and configure the SnowSQL command-line tool properly. You can find instructions on installing this here.

Follow the steps below to parse Snowflake JSON data in Snowflake:

Step 1: Create a Table

Execute a simple create statement.

create or replace table json_table (v variant);

This command creates a single-column table with a column “v”.

Step 2: Load JSON Data

Load a sample JSON document using a simple INSERT statement and Snowflake’s PARSE_JSON function.

insert into json_table
select
parse_json(
'{
      "fullName":"Robert Downey",
      "age":55,
      "gender":"Male",
      "address": {
            "areaCode":"91506",
          "suite":"916 West Burbank Blvd"
                },
      "movies": [
          {"name":"Iron Man","budget":"$150M","producer":"Avi Arad"},
          {"name":"Sherlock Holmes","budget":"$200M","producer":"Joel Silver"},
          {"name":"Dolittle","budget":"$175M","producer":"Joe Roth"}
                ]
}');

In this example, you have an array of highly nested objects. In a production environment, you would use snowpipe or a copy command e.g.

copy into json_table from '@demo_stage/path_to_file/file1.json;
OR
copy into json_table from 's3://demo_bucket/file1.json;

Snowflake parses the JSON into sub-columns based on the key-value pairs at load time. These keys are recorded as pointers in metadata for optimization. Structural information is dynamically derived based on the schema definition embedded in the JSON string (Schema-on-Read).

Step 3: Start Pulling Data

Let’s get the data from the sub-column “fullName”.

select v:fullName from json_table;
1 Row Produced
row #V:FULLNAME
1“Robert Downey”

A colon is used to reference the JSON sub-columns where:

v = the column name in the json_table table

fullName = attribute in the JSON schema

v:fullName = notation to indicate which attribute in column “v” we want to select

Step 4: Casting the Data

The table that was returned in the previous step came back with double quotes around it and now you are going to cast that data into an actual data type.

select v:fullName::string as full_name from json_table;
1 Row Produced
row #V:FULL_NAME
1Robert Downey

You used :: to cast to a string data type and added an alias with “as” just like in regular SQL. 

You can now query JSON data without learning a new programming language.

select
    v:fullName::string as full_name,
    v:age::int as age,
    v:gender::string as gender
from json_table
1 Row Produced
row #FULL_NAMEAGEGENDER
1Robert Downey55Male

Handling More Snowflake JSON Complexity

The sample Snowflake JSON document that you used contains some nested data. “address” is a sub-column and “areaCode” and “suite” are sub-columns of the sub-column. You can access these columns using a very familiar table.column dot notation.

select
    v:address.areaCode::string as area_code,
    v:address.suite::string as suite
from json_table;
1 Row Produced
row #AREA_CODESUITE
191506916 West Burbank Blvd

If the structure changes and a new attribute-like state is added to the Snowflake JSON document, any SQL you’ve written already will work. It is also fairly easy to modify the query and easily adapt the SQL, for example;

select
    v:address.areaCode::string as area_code,
    v:address.suite::string as suite,    
    v:address.state::string as state
from json_table;

If the reverse happens and an attribute is dropped, the query will not fail. It will simply return a NULL value.

Handling Arrays in Snowflake JSON

You can embed arrays inside Snowflake JSON documents, for example;

"movies": [
          {"name": "Iron Man", "budget": "$150", "producer": "Avi Arad"},
          {"name": "Sherlock Homes", "budget": "$200", "producer": "Joel Silver"},
          {"name": "Dolittle", "budget": "$175", "producer": "Joe Roth"}
          ]

“movies” is a sub-column with an array. There are 3 rows in the array and each row has 3 sub-columns (name, budget, and producer). To pull the data of each row, use FLATTEN with the FROM clause and give it a table alias. FLATTEN takes an array and returns a row for each element in the array. It selects all the data in the array as though it were rows in the table.

select
    f.value:name::string as movie_name,
    f.value:budget::string as movie_budget,
    f.value:producer::string as movie_producer
from json_table, table(flatten(v:movies)) f;

f = table alias for the flattened array

name = sub-column in the array

value = returns the value of the sub-column

3 Rows Produced
row #MOVIE_NAMEMOVIE_BUDGETMOVIE_PRODUCER
1Iron Man$150MAvi Arad
2Sherlock Holmes$200MJoel Silver
3Dolittle$175MJoe Roth

Conclusion

In summary, you can see that the VARIANT data type offers native support for querying Snowflake JSON without the need to analyze the structure ahead of time or perform complex transformations. Querying Snowflake JSON data provides the same performance as all the standard relational data types.

Want to get rid of the hassle of working with different data types? Try Hevo today, a fully managed data pipeline for all your data needs.

Visit our Website to Explore Hevo

Want to take Hevo for a spin?

Sign Up and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share your experience with Snowflake JSON in the comments section below!

No-Code Data Pipeline for Snowflake