Snowflake is a more Cost-Effective and Instantly Scalable data warehouse with industry-leading Query Performance. It’s a one-stop-shop for Cloud Data Warehousing and Analytics, with complete SQL support for Data Analysis and Transformations. Snowflake can access data saved in other Cloud Storage Systems as well as store data locally.

When it comes to Semi-Structured data storage, the data is usually stored in either VARIANT Columns or Flattening Nested Structure. In this article, you will learn everything about Snowflake Flatten, along with the process that you might want to carry out while using it with simple SQL code to make the process run smoothly.

Understanding Snowflake

Snowflake is a cloud data warehouse having capabilities to store and query your data better. The following are its key features:

  • Query Optimization: By using Clustering and Partitioning, Snowflake may optimize a query on its own. With Snowflake, Query Optimization isn’t something to be concerned about.
  • Secure Data Sharing: Data can be exchanged securely from one account to another using Snowflake Database Tables, Views, and UDFs.
  • Support for File Formats: JSON, Avro, ORC, Parquet, and XML are all Semi-Structured data formats that Snowflake can import. It has a VARIANT column type that lets you store Semi-Structured data.
  • Caching: Snowflake has a caching strategy that allows the results of the same query to be quickly returned from the cache when the query is repeated. Snowflake uses permanent (during the session) query results to avoid regenerating the report when nothing has changed.
  • SQL and Standard Support: Snowflake offers both standard and extended SQL support, as well as Advanced SQL features such as Merge, Lateral Views, Statistical Functions, and many others.
  • Fault Resistant: Snowflake provides exceptional fault-tolerant capabilities to recover the Snowflake object in the event of a failure (tables, views, database, schema, and so on).

To get further information check out the official website here

What is a Snowflake Lateral Join?

The Lateral keyword in Snowflake allows an in-line view to refer to Columns from a table expression that comes before it, although it may not refer to columns from the left table in some instances. Only the rows generated by the in-line View or Subquery are included in the Snowflake Lateral Joins Output.

A) Syntax

The LATERAL keyword can be used to perform a Lateral Join. The Lateral Connect keyword is next.

SELECT column-1, column-2,... column-n
FROM <left_hand_table_expression>, LATERAL ( <in_line_view> )
…
  • left_hand_table_expression: A User Table, View, Subquery, Table Function, or the result of an earlier join can be left_hand_table_expression.
  • in_line_view: A Subquery, or table functions (FLATTEN or a user defined table functions (UDTF)) are all examples of in_line_view.

B) Example Queries

The examples below show how to use Lateral Joins in Snowflake.

E_EMP
+-------+----------+-----+-----------------+
| EMPID | LASTNAME | DID | PROJECTNAMES    |
|-------+----------+-----+-----------------|
|   101 | ABC      |   1 | [               |
|       |          |     |   "IT",         |
|       |          |     |   "PROD"        |
|       |          |     | ]               |
|   102 | BCD      |   1 | [               |
|       |          |     |   "PS",         |
|       |          |     |   "PRODSupport" |
|       |          |     | ]               |
|   103 | CDE      |   2 | NULL            |
+-------+----------+-----+-----------------+
E_DEPT
+-----+-------------+
| DID | NAME        |
|-----+-------------|
|   1 | Engineering |
|   2 | Support     |
+-----+-------------+

For example, consider Lateral Join:

select * 
from e_dept as d, lateral (select * from e_emp as e where e.did = d.did) as LV 
order by empid;

+-----+-------------+-------+----------+-----------------+
| DID | NAME        | EMPID | LASTNAME | PNAME           |
|-----+-------------+-------+----------+-----------------|
|   1 | Engineering |   101 | ABC      | [               |
|     |             |       |          |   "IT",         |
|     |             |       |          |   "PROD"        |
|     |             |       |          | ]               |
|   1 | Engineering |   102 | BCD      | [               |
|     |             |       |          |   "PS",         |
|     |             |       |          |   "PRODSupport" |
|     |             |       |          | ]               |
|   2 | Support     |   103 | CDE      | NULL            |
+-----+-------------+-------+----------+-----------------+

Explode Array to Rows: Using Snowflake Flatten Function & Lateral

The FLATTEN function is a table function that explores the values of an object or array object into rows. A lateral perspective is created by using the flatten function. When converting array data to table rows, the flatten function is most typically employed.

Consider the following example of producing rows from the e_emp table’s PROJECTNAMES columns. The projectnames column is an array build column, so keep that in mind.

select emp.empid, emp.lastName, index as array_index, value as projectNames 
from e_emp as emp, lateral flatten(input => emp.projectNames) as proj_names 
order by empid;	

+-------+----------+-------------+---------------+
| EMPID | LASTNAME | ARRAY_INDEX | PROJECTNAMES  |
|-------+----------+-------------+---------------|
|   101 | ABC      |           0 | "IT"          |
|   101 | ABC      |           1 | "PROD"        |
|   102 | BCD      |           0 | "PS"          |
|   102 | BCD      |           1 | "PRODSupport" |
+-------+----------+-------------+---------------+

Flatten Array SQL Example

create or replace transient table emp01(
   id number,
   first_name varchar,
   last_name varchar,
   designation varchar,
   certifications array
);
-- 1st records with one certification
insert into emp01
select 1, 'Alexander', 'Kostas','Snowflake Developer',array_construct('SnowPro Core');
-- 2nd recocrd with two certification
insert into emp01
select 2,'Pierre', 'Dupont','Sr. Snowflake Developer',array_construct('SnowPro Core','SnowPro Adv DE');
-- 3rd recocrd with three certification
insert into emp01
select 3,'Isabella', 'Rossi','Snowflake Architect',array_construct('SnowPro Core','SnowPro Adv DE','SnowPro Architect');
-- check the table data
select * from emp01 ;
select flatten_tbl.value::varchar as certification from table(flatten (input => array_construct('SnowPro Core','SnowPro Adv DE','SnowPro Architect'))) flatten_tbl
order by 1;
select emp.first_name, emp.last_name, cert.value::varchar as cert_name from
   emp01 emp,
   lateral flatten (input => emp.certifications) cert;

Flatten Object SQL Example

create or replace transient table emp02(
   id number,
   first_name varchar,
   last_name varchar,
   designation varchar,
   certifications object
);
-- 1st records with one certification
insert into emp02
select  1, 'Alexander', 'Kostas','Snowflake Developer',
       object_construct('SnowPro Core','790');
-- 2nd recocrd with two certification
insert into emp02
select  2,'Pierre', 'Dupont','Sr. Snowflake Developer',
       object_construct('SnowPro Core','890','SnowPro Adv DE','810');
-- 3rd recocrd with three certification
insert into emp02
select  3,'Isabella', 'Rossi','Snowflake Architect',       object_construct('SnowPro Core','950','SnowPro Adv DE','780','SnowPro Architect','900');
-- check the table data
select * from emp02 ;
-- first, let’s understand the flatten table function..
select flatten_tbl.value::varchar as certification from table(flatten (input => array_construct('SnowPro Core','SnowPro Adv DE','SnowPro Architect'))) flatten_tbl
order by 1;
select emp.first_name, emp.last_name, cert.value::varchar as cert_name from
   emp02 emp,
   lateral flatten (input => emp.certifications) cert;
select emp.first_name, emp.last_name, cert.key::varchar as cert_name,
cert.value::varchar as cert_marks from
   emp02 emp,
   lateral flatten (input => emp.certifications) cert;

Flatten Array & Object Together SQL Example

create or replace transient table camera(
   name varchar,
   brand varchar,
   front_camera varchar,
   rear_camera object,
   dim_lwh array  
);
-- 1st records with one certification
insert into camera
select  'iPhone 12', 'Apple','12Mp',
       object_construct('Primary/Std','14MP', 'Wide Angle','16MP'),
       array_construct('16.49','8.96','2.82')
       ;
     select
   camera.name,
   camera.brand,
   camera.front_camera,
   rear.key::varchar as rear_camera_type,
   rear.value::varchar as rear_camera_px,
   dim.value::number as dimension
from
   camera,
  lateral flatten (input => camera.rear_camera) rear,
   lateral flatten (input => camera.dim_lwh) dim
   ;
  select
   camera.name,
   camera.brand,
   dim.value::number as dimension
from
   camera,
   lateral flatten (input => camera.dim_lwh) dim
   ;

Array values are burst into rows as a result of the flatten tables function. The array index of Project Names is also visible.

Revolutionizing Cloud Data Warehousing with a Web-Based IDE and ANSI Standard Compliance

Snowflake is the world’s first Cloud Data Warehouse solution, built on the customer’s preferred Cloud Provider’s infrastructure (AWS, Azure, or GCP). Snowflake (SnowSQL) adheres to the ANSI Standard and includes typical Analytics and Windowing Capabilities. There are some differences in Snowflake’s syntax, but there are also some parallels. 

Snowflake’s Integrated Development Environment (IDE) is totally Web-based. Visit XXXXXXXX.us-east-1.snowflakecomputing.com. You’ll be sent to the primary Online GUI, which works as an IDE, where you can begin interacting with your Data Assets after logging in. Each query tab in the Snowflake interface is referred to as a “Worksheet” for simplicity. These “Worksheets,” like the tab history function, are automatically saved and can be viewed at any time.

How does Snowflake Handle Semi-structured Data?

The Semi-structured data is stored in two forms:

  • VARIANT Columns
  • Flattening Nested Structure

When Semi-structured data is inserted into a VARIANT column, Snowflake extracts as much data as possible into the column format based on specific rules. The rest is stored as a single column of Parsed Semi-Structured structure. Currently, items with the following characteristics are not extracted into columns. 

  • Items that contain at least one “Null” value are not extracted into the column. Note that this applies to items with “null” values, not to items with missing values ​​displayed in column format. This rule ensures that no information is lost. The difference between a VARIANT “NULL” value and a SQL NULL value is not confusing. 
  • Elements with multiple data types. Example:
    • The foo element on line contains the following numbers: 
{"foo":1} 
  • The same element on another line contains the string:
{"foo":"1"}

When a Semi-Structured element is queried: 

  • If the element was extracted into a column, Snowflake`s Execution Engine (which is columnar) scans only the extracted column. 
  • If the element was not extracted into a column, the Execution Engine must scan the entire JSON structure, and then for each row traverse the structure to Output Values, impacting Performance.

 To avoid this Performance impact: 

  • Extract Semi-structured data elements containing “null” values into relational columns before loading them. 
  • Alternatively, if the “null” values in your files indicate missing values and have no other special meaning, it’s recommended to set the file format option STRIP_NULL_VALUES to TRUE when loading the Semi-Structured data files. This option removes object elements or array elements containing “null” values. 
  • Make sure that each unique element stores a value of a single native data type (string or number).

For getting more information on how Snowflake handles Semi-structured data, check out the official documentation here.

Simplify Snowflake Data Transfer with Hevo’s No-code Pipeline

Looking for centralizing data to your Snowflake? Hevo Data is a No-code Data Pipeline that helps you transfer data from 150+ sources (including 40+ Free Data Sources) to Snowflake in real-time in an effortless manner. Using Hevo, you can easily carry out Snowflake Create Users Tasks.

Sign up here for a 14-Day Free Trial!

How to Use the Snowflake Flatten Command? 

With the help of Snowflake Flatten Command, compound values are flattened (exploded) into many rows.

Snowflake FLATTEN is a table function that creates a Lateral View from a VARIANT, OBJECT, or ARRAY column (i.e. an inline view that contains correlation referring to other tables that precede it in the FROM clause).

Snowflake FLATTEN is a tool for converting Semi-structured data into a relational format.

A) Snowflake Flatten Command: Syntax

FLATTEN( INPUT => <expr> [ , PATH => <constant_expr> ]
                         [ , OUTER => TRUE | FALSE ]
                         [ , RECURSIVE => TRUE | FALSE ]
                         [ , MODE => 'OBJECT' | 'ARRAY' | 'BOTH' ] )

B) Snowflake Flatten Command: Key Arguments

INPUT => <expr>: The expression that will be sitting in rows will be unseated. The data type of the expression must be VARIANT, OBJECT, or ARRAY.

C) Snowflake Flatten Command: Output

The columns in the returned rows are all the same:

+-----+------+------+-------+-------+------+
| SEQ |  KEY | PATH | INDEX | VALUE | THIS |
|-----+------+------+-------+-------+------|
  • SEQ: The input record has a unique sequence number; the sequence is not guaranteed to be gap-free or organized in any specific way.
  • KEY: This column contains the key to the exploded value for Maps or Objects.
  • PATH: The path to the element that needs to be flattened within a Data Structure.
  • INDEX: If the element is an array, INDEX is the index; otherwise, NULL.
  • VALUE: The value of the flattened array/object.
  • THIS: It is the element that has been Flattened (useful in Recursive Flattening).

D) Snowflake Flatten Command: Example Queries

The following is a simple example of Flattening one record (notice that the Snowflake flatten array’s middle member is missing):

select * from table(flatten(input => parse_json('[1, ,77]'))) f;

+-----+------+------+-------+-------+------+
| SEQ |  KEY | PATH | INDEX | VALUE | THIS |
|-----+------+------+-------+-------+------|
|   1 | NULL | [0]  |     0 |     1 | [    |
|     |      |      |       |       |   1, |
|     |      |      |       |       |   ,  |
|     |      |      |       |       |   77 |
|     |      |      |       |       | ]    |
|   1 | NULL | [2]  |     2 |    77 | [    |
|     |      |      |       |       |   1, |
|     |      |      |       |       |   ,  |
|     |      |      |       |       |   77 |
|     |      |      |       |       | ]    |
+-----+------+------+-------+-------+------+

The OUTER parameter has the following effect:

select * from table(flatten(input => parse_json('[]'))) f;

+-----+-----+------+-------+-------+------+
| SEQ | KEY | PATH | INDEX | VALUE | THIS |
|-----+-----+------+-------+-------+------|
+-----+-----+------+-------+-------+------+

select * from table(flatten(input => parse_json('[]'), outer => true)) f;

+-----+------+------+-------+-------+------+
| SEQ |  KEY | PATH | INDEX | VALUE | THIS |
|-----+------+------+-------+-------+------|
|   1 | NULL |      |  NULL |  NULL | []   |
+-----+------+------+-------+-------+------+

How to Flatten JSON Files/Data in Snowflake? 

Due to its compactness, the JSON file type is a popular choice for data transferred online. It is also the standard format for accessing API data. The information is saved as a string in a specific structure.

Rules: 

  • <NAME>: The data is organized into name/value pairs. A name/value pair is made up of a field name (in double quotes), a colon, and then a value: “firstName”:”John”
  • Commas are used to separate the data.
  • Curly Braces () are used to enclose objects.
  • Multiple Name/Value pairs can be found in an object: {“firstName”:”John”, “lastName”:”Doe”}.
  • Square Brackets are used to surround arrays ([]).

Multiple objects can be stored in an array:

"employees":[
   {"firstName":"John", "lastName":"Doe"},
   {"firstName":"Anna", "lastName":"Smith"},
   {"firstName":"Peter", "lastName":"Jones"}
]

Steps: 

You are required to download Data for a Snowflake Project via an API from an Alteryx connection, then load it directly to a Snowflake Server and manipulate it within the Snowflake environment. The goal is to show how to switch from Extract, Transform, and Load (ETL) to ELT mode. Let’s understand this with a help of an example. Consider you are making a Snowflake Project on Bike Rental Data.

Step 1: To begin, you’ll work with a single file.

Flatten JSON Data in Snowflake 1
Flatten JSON Data in Snowflake 2
Image Source

Let’s look at the code in a Code Editor to better comprehend the structure.

Flatten JSON Data in Snowflake 3
Image Source

Step 2: The next thing is to notice is the external ‘[]‘ on the entire entry, which denotes an array that needs to be flattened, followed by another ‘[]‘ for additionalProperties, which denotes a Nested Array. In Snowflake, there is a function called LATERAL FLATTEN that flattens JSON

Let’s demonstrate this function with specific cases in this example.

Flatten JSON Data in Snowflake 4

Step 3: From the Project_BikePoint Data table, you have a table with a single column BikePoint_JSON, as shown in the first image. The Lateral Flatten function is applied to the column that holds the JSON file (need a common in between). Each object within the Name and Value Pair will be moved as independent rows once the array has been flattened. Most of the time, you’re only concerned with the output’s value, so we may use .value: followed by the appropriate columns.

Flatten JSON Data in Snowflake 5
Image Source

Step 4: You notice the first column still contains the array which you need to Flatten as well, this is a Nested Structure, but the Lateral flatten function is easy to use where you can just repeatedly apply this process. 

Flatten JSON Data in Snowflake 6
Image Source

Step 5: You can set up a similar structure as the above code, with an additional name for results from the inner array where you called t1 here. To get the results from the inner array can be called with .value on t1 instead, while results from the outer array are called with .value on t0.

Flatten JSON Data in Snowflake 7
Image Source
Flatten JSON Data in Snowflake 8
Image Source

Since every Object (782) from the outer array has 9 Objects, the resultant has 7038 rows, so the Flatten process has taken place efficiently and created an Appended Table

Now you may simply upload additional calls to get any of the desired columns from the JSON Report in Snowflake in a query, as opposed to a few complex RegEx codes. 

How to Partition JSON Data using Snowflake Flatten JSON Command?

Snowflake FLATTEN is a table function that displays a VARIANT, OBJECT, or ARRAY column in a Lateral View. This phase uses the function to generate two tables with varying flattening levels.

1) Store Object Data in Snowflake Variant Columns

To divide events into individual JSON objects while keeping the global data, utilize a LATERAL JOIN and the Snowflake FLATTEN function. The LATERAL modifier connects the data with any information outside of the object, in this case, the device type, and version, and returns a row for each object. The value of the flattened object is output in a VALUE column by the function.

Step A: Perform a data query for each event:

select src:device_type::string
, src:version::string
, value
from
  raw_source
, lateral flatten( input => src:events );

The following is the output of Snowflake:

+-------------------------+---------------------+-------------------------------------------------------------------------------+
| SRC:DEVICE_TYPE::STRING | SRC:VERSION::STRING | VALUE                                                                         |
|-------------------------+---------------------+-------------------------------------------------------------------------------|
| server                  | 2.6                 | {                                                                             |
|                         |                     |   "f": 83,                                                                    |
|                         |                     |   "rv": "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19",   |
|                         |                     |   "t": 1437560931139,                                                         |
|                         |                     |   "v": {                                                                      |
|                         |                     |     "ACHZ": 42869,                                                            |
|                         |                     |     "ACV": 709489,                                                            |
|                         |                     |     "DCA": 232,                                                               |
|                         |                     |     "DCV": 62287,                                                             |
|                         |                     |     "ENJR": 2599,                                                             |
|                         |                     |     "ERRS": 205,                                                              |
|                         |                     |     "MXEC": 487,                                                              |
|                         |                     |     "TMPI": 9                                                                 |
|                         |                     |   },                                                                          |
|                         |                     |   "vd": 54,                                                                   |
|                         |                     |   "z": 1437644222811                                                          |
|                         |                     | }                                                                             |
| server                  | 2.6                 | {                                                                             |
|                         |                     |   "f": 1000083,                                                               |
|                         |                     |   "rv": "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22", |
|                         |                     |   "t": 1437036965027,                                                         |
|                         |                     |   "v": {                                                                      |
|                         |                     |     "ACHZ": 6953,                                                             |
|                         |                     |     "ACV": 346795,                                                            |
|                         |                     |     "DCA": 250,                                                               |
|                         |                     |     "DCV": 46066,                                                             |
|                         |                     |     "ENJR": 9033,                                                             |
|                         |                     |     "ERRS": 615,                                                              |
|                         |                     |     "MXEC": 0,                                                                |
|                         |                     |     "TMPI": 112                                                               |
|                         |                     |   },                                                                          |
|                         |                     |   "vd": 626,                                                                  |
|                         |                     |   "z": 1437660796958                                                          |
|                         |                     | }                                                                             |
+-------------------------+---------------------+-------------------------------------------------------------------------------+

Step B: To save the aforementioned results in a table, use the CREATE TABLE AS statement:

create or replace table flattened_source as
  select
    src:device_type::string as device_type
  , src:version::string     as version
  , value                   as src
  from
    raw_source
  , lateral flatten( input => src:events );

  select * from flattened_source;

+-------------+---------+-------------------------------------------------------------------------------+
| DEVICE_TYPE | VERSION | SRC                                                                           |
|-------------+---------+-------------------------------------------------------------------------------|
| server      | 2.6     | {                                                                             |
|             |         |   "f": 83,                                                                    |
|             |         |   "rv": "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19",   |
|             |         |   "t": 1437560931139,                                                         |
|             |         |   "v": {                                                                      |
|             |         |     "ACHZ": 42869,                                                            |
|             |         |     "ACV": 709489,                                                            |
|             |         |     "DCA": 232,                                                               |
|             |         |     "DCV": 62287,                                                             |
|             |         |     "ENJR": 2599,                                                             |
|             |         |     "ERRS": 205,                                                              |
|             |         |     "MXEC": 487,                                                              |
|             |         |     "TMPI": 9                                                                 |
|             |         |   },                                                                          |
|             |         |   "vd": 54,                                                                   |
|             |         |   "z": 1437644222811                                                          |
|             |         | }                                                                             |
| server      | 2.6     | {                                                                             |
|             |         |   "f": 1000083,                                                               |
|             |         |   "rv": "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22", |
|             |         |   "t": 1437036965027,                                                         |
|             |         |   "v": {                                                                      |
|             |         |     "ACHZ": 6953,                                                             |
|             |         |     "ACV": 346795,                                                            |
|             |         |     "DCA": 250,                                                               |
|             |         |     "DCV": 46066,                                                             |
|             |         |     "ENJR": 9033,                                                             |
|             |         |     "ERRS": 615,                                                              |
|             |         |     "MXEC": 0,                                                                |
|             |         |     "TMPI": 112                                                               |
|             |         |   },                                                                          |
|             |         |   "vd": 626,                                                                  |
|             |         |   "z": 1437660796958                                                          |
|             |         | }                                                                             |
+-------------+---------+-------------------------------------------------------------------------------+

2) Extract JSON Key-value Pairs across Separate Columns

Snowflake allows you to query JSON directly, as demonstrated in this tutorial. When formats change, you don’t have to recreate and re-populate massive tables, which is one advantage of using the JSON structure. You can, however, copy JSON Key Values into Typed Columns if you like.

The following statement creates a new table, EVENTS, in which the values from the RAW SOURCE table are copied into distinct columns. Using a double-colon (::) followed by the type, each value is cast to a data type that is appropriate for the value. If you don’t cast the column, it takes on the data type VARIANT, which can carry any value:

create or replace table events as
  select
    src:device_type::string                             as device_type
  , src:version::string                                 as version
  , value:f::number                                     as f
  , value:rv::variant                                   as rv
  , value:t::number                                     as t
  , value:v.ACHZ::number                                as achz
  , value:v.ACV::number                                 as acv
  , value:v.DCA::number                                 as dca
  , value:v.DCV::number                                 as dcv
  , value:v.ENJR::number                                as enjr
  , value:v.ERRS::number                                as errs
  , value:v.MXEC::number                                as mxec
  , value:v.TMPI::number                                as tmpi
  , value:vd::number                                    as vd
  , value:z::number                                     as z
  from
    raw_source
  , lateral flatten ( input => SRC:events );

The statement flattens the EVENTS’ layered data. Add a separate column for each value to the SRC:V key. For each key/value pair, the statement returns a row. The first two records in the new EVENTS table are shown in the output below:

select * from events;

+-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------+
| DEVICE_TYPE | VERSION |       F | RV                                                                   |             T |  ACHZ |    ACV | DCA |   DCV | ENJR | ERRS | MXEC | TMPI |  VD |             Z |
|-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------|
| server      | 2.6     |      83 | "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19"   | 1437560931139 | 42869 | 709489 | 232 | 62287 | 2599 |  205 |  487 |    9 |  54 | 1437644222811 |
| server      | 2.6     | 1000083 | "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22" | 1437036965027 |  6953 | 346795 | 250 | 46066 | 9033 |  615 |    0 |  112 | 626 | 1437660796958 |
+-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------+

How to Extract Nested JSON Data using Lateral and Flatten? 

Internal Objects with one or more fields and no set structure are sometimes seen in JSON Objects. To extract a Nested Variation, object, or array from JSON data, utilize the (LATERAL) Snowflake FLATTEN function.

Let’s say you want to construct a table VNT with a single JSON field:

CREATE OR REPLACE TABLE vnt
(
 src variant
)
AS SELECT parse_json(column1) as src
FROM values
('{ 
 "topleveldate" : "2017-04-28", 
 "toplevelname" : "somename", 
 "extraFields": [ 
 { 
 "value": "somevalue1", 
 "key": "somekey1", 
 "type": "sometype1", 
 "booleanflag": false 
 }, 
 { 
 "value": "", 
 "key": "somekey2"
 }]}');

Let’s further assume that the topleveldate and toplevelname fields are well-known and that the extraFields field contains an array of fields that can vary from record to record, making it impossible to extract them using the standard syntax.

However, you can still use the LATERAL FLATTEN to convert this JSON data into a collection of rows containing the appropriate Field Names and Values:

WITH a as
(
  select 
    src:topleveldate::string as topleveldate
  , src:toplevelname::string as toplevelname
  , value as val
  from vnt, lateral flatten( input => src:extraFields )
)
select topleveldate, toplevelname, key, value from a, lateral flatten( input => val );
;

;

Using Snowflake Flatten with Lateral Joins

To divide events into distinct JSON objects, utilize the LATERAL FLATTEN method:

However, you now see how that snippet of documentation is incorrect and deceptive. The term “LATERAL FLATTEN function” does not exist. Although there is a Snowflake FLATTEN function and a LATERAL join technique, the phrase “LATERAL FLATTEN function” is incorrect and misleading. The Snowflake tutorial should instead say:

To divide events into distinct JSON objects, utilize a LATERAL join to a FLATTEN subquery:

The reasoning of this query makes it more obvious now that you know what a LATERAL Join is. The following are crucial elements:

  • The FLATTEN function is not Scalar; it returns rows and columns and functions similarly to the subqueries we saw earlier. The SRC column from the outer table RAW SOURCE is supplied to the FLATTEN subquery as a function argument.
  • The ‘Event’ key is used to begin JSON parsing (see the tutorial for its example JSON). This is similar to the AVG() FLATTEN aggregate technique used in the preceding instances. The Join has no condition; it is Cartesian. But don’t worry: while each row of the outer table is effectively processed one at a time, each invocation of FLATTEN results in only one row being linked to the single outer calling row.
select src:device_type::string
, src:version::string
, value
from
raw_source
, lateral flatten( input => src:events );

Conclusion

This article has exposed you to the various Snowflake Flatten to help you improve your overall decision-making and experience when trying to make the most out of your data. In case you want to export data from a source of your choice into your desired Database/destination like Snowflake, then Hevo Data is the right choice for you! 

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable Hevo Pricing that will help you choose the right plan for your business needs!

Share your experience of learning about Snowflake Flatten! Let us know in the comments section below!

References

  1. Medium article on flattening function in Snowflake
mm
Former Research Analyst, Hevo Data

Harsh comes with experience in performing research analysis who has a passion for data, software architecture, and writing technical content. He has written more than 100 articles on data integration and infrastructure.

No-code Data Pipeline for Snowflake