To empower your business decisions with data, you need Real-Time High-Quality data from all of your data sources in a central repository. Traditional On-Premise Data Warehouse solutions have limited Scalability and Performance, and they require constant maintenance. Snowflake is a more Cost-Effective and Instantly Scalable solution with industry-leading Query Performance. It’s a one-stop-shop for Cloud Data Warehousing and Analytics, with complete SQL support for Data Analysis and Transformations.
Data Storage is a critical component for every Snowflake Database. Snowflake can access data saved in other Cloud Storage Systems as well as store data locally. When it comes to Semi-Structured data storage, the data is usually stored in either of the two formats:
- VARIANT Columns
- Flattening Nested Structure
In this article, you will learn everything about Snowflake Flatten along with the process which you might want to carry out while using it with simple SQL code to make the process run smoothly.
Table of Contents
What is Snowflake?
Image Source
Snowflake is the world’s first Cloud Data Warehouse solution, built on the customer’s preferred Cloud Provider’s infrastructure (AWS, Azure, or GCP). Snowflake (SnowSQL) adheres to the ANSI Standard and includes typical Analytics and Windowing Capabilities. There are some differences in Snowflake’s syntax, but there are also some parallels.
Snowflake’s Integrated Development Environment (IDE) is totally Web-based. Visit XXXXXXXX.us-east-1.snowflakecomputing.com. You’ll be sent to the primary Online GUI, which works as an IDE, where you can begin interacting with your Data Assets after logging in. Each query tab in the Snowflake interface is referred to as a “Worksheet” for simplicity. These “Worksheets,” like the tab history function, are automatically saved and can be viewed at any time.
Key Features of Snowflake
- Query Optimization: By using Clustering and Partitioning, Snowflake may optimize a query on its own. With Snowflake, Query Optimization isn’t something to be concerned about.
- Secure Data Sharing: Data can be exchanged securely from one account to another using Snowflake Database Tables, Views, and UDFs.
- Support for File Formats: JSON, Avro, ORC, Parquet, and XML are all Semi-Structured data formats that Snowflake can import. It has a VARIANT column type that lets you store Semi-Structured data.
- Caching: Snowflake has a caching strategy that allows the results of the same query to be quickly returned from the cache when the query is repeated. Snowflake uses permanent (during the session) query results to avoid regenerating the report when nothing has changed.
- SQL and Standard Support: Snowflake offers both standard and extended SQL support, as well as Advanced SQL features such as Merge, Lateral Views, Statistical Functions, and many others.
- Fault Resistant: Snowflake provides exceptional fault-tolerant capabilities to recover the Snowflake object in the event of a failure (tables, views, database, schema, and so on).
To get further information check out the official website here.
How does Snowflake Handle Semi-structured Data?
The Semi-structured data is stored in two forms:
- VARIANT Columns
- Flattening Nested Structure
When Semi-structured data is inserted into a VARIANT column, Snowflake extracts as much data as possible into the column format based on specific rules. The rest is stored as a single column of Parsed Semi-Structured structure. Currently, items with the following characteristics are not extracted into columns.
- Items that contain at least one “Null” value are not extracted into the column. Note that this applies to items with “null” values, not to items with missing values displayed in column format. This rule ensures that no information is lost. The difference between a VARIANT “NULL” value and a SQL NULL value is not confusing.
- Elements with multiple data types. Example:
- The foo element on line contains the following numbers:
{"foo":1}
- The same element on another line contains the string:
{"foo":"1"}
When a Semi-Structured element is queried:
- If the element was extracted into a column, Snowflake`s Execution Engine (which is columnar) scans only the extracted column.
- If the element was not extracted into a column, the Execution Engine must scan the entire JSON structure, and then for each row traverse the structure to Output Values, impacting Performance.
To avoid this Performance impact:
- Extract Semi-structured data elements containing “null” values into relational columns before loading them.
- Alternatively, if the “null” values in your files indicate missing values and have no other special meaning, it’s recommended to set the file format option STRIP_NULL_VALUES to TRUE when loading the Semi-Structured data files. This option removes object elements or array elements containing “null” values.
- Make sure that each unique element stores a value of a single native data type (string or number).
For getting more information on how Snowflake handles Semi-structured data, check out the official documentation here.
Hevo Data is a No-code Data Pipeline that helps you transfer data from 100+ sources (including 40+ Free Data Sources) to Snowflake in real-time in an effortless manner. After using Hevo you can easily carry out Snowflake Create Users Tasks.
Get Started with Hevo for Free
Key Features of Hevo Data:
- Fully Managed: Hevo Data is a fully managed service and is straightforward to set up.
- Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
- Connectors: Hevo supports 100+ integrations to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL databases to name a few.
- Ensure Unique Records: Hevo Data helps you ensure that only unique records are present in the tables if Primary Keys are defined.
- Multiple Sources: Hevo Data has various connectors incorporated with it, which can connect to multiple sources with ease.
- Automatic Mapping: Hevo Data automatically maps the source schema to perform analysis without worrying about the changes in the schema.
- Real-time Data Transfer: Hevo Data works on both batch as well as real-time data transfer.
- Resume from Point of Failure: Hevo Data can resume the ingestion from the point of failure if it occurs.
- Advanced Monitoring: Advanced monitoring gives you a one-stop view to watch all the activity that occurs within pipelines.
- 24/7 Support: With 24/7 Support, Hevo provides customer-centric solutions to the business use case.
Steps to load Snowflake data using Hevo Data:
- Sign up on the Hevo Data, and select Snowflake as the destination.
- Provide the user credentials and connect to the server.
- Select the database, and schema to load the data.
Sign up here for a 14-Day Free Trial!
How to Use the Snowflake Flatten Command?
With the help of Snowflake Flatten Command, compound values are flattened (exploded) into many rows.
Snowflake FLATTEN is a table function that creates a Lateral View from a VARIANT, OBJECT, or ARRAY column (i.e. an inline view that contains correlation referring to other tables that precede it in the FROM clause).
Snowflake FLATTEN is a tool for converting Semi-structured data into a relational format.
A) Snowflake Flatten Command: Syntax
FLATTEN( INPUT => <expr> [ , PATH => <constant_expr> ]
[ , OUTER => TRUE | FALSE ]
[ , RECURSIVE => TRUE | FALSE ]
[ , MODE => 'OBJECT' | 'ARRAY' | 'BOTH' ] )
B) Snowflake Flatten Command: Key Arguments
INPUT => <expr>: The expression that will be sitting in rows will be unseated. The data type of the expression must be VARIANT, OBJECT, or ARRAY.
C) Snowflake Flatten Command: Output
The columns in the returned rows are all the same:
+-----+------+------+-------+-------+------+
| SEQ | KEY | PATH | INDEX | VALUE | THIS |
|-----+------+------+-------+-------+------|
- SEQ: The input record has a unique sequence number; the sequence is not guaranteed to be gap-free or organized in any specific way.
- KEY: This column contains the key to the exploded value for Maps or Objects.
- PATH: The path to the element that needs to be flattened within a Data Structure.
- INDEX: If the element is an array, INDEX is the index; otherwise, NULL.
- VALUE: The value of the flattened array/object.
- THIS: It is the element that has been Flattened (useful in Recursive Flattening).
D) Snowflake Flatten Command: Example Queries
The following is a simple example of Flattening one record (notice that the array’s middle member is missing):
select * from table(flatten(input => parse_json('[1, ,77]'))) f;
+-----+------+------+-------+-------+------+
| SEQ | KEY | PATH | INDEX | VALUE | THIS |
|-----+------+------+-------+-------+------|
| 1 | NULL | [0] | 0 | 1 | [ |
| | | | | | 1, |
| | | | | | , |
| | | | | | 77 |
| | | | | | ] |
| 1 | NULL | [2] | 2 | 77 | [ |
| | | | | | 1, |
| | | | | | , |
| | | | | | 77 |
| | | | | | ] |
+-----+------+------+-------+-------+------+
The OUTER parameter has the following effect:
select * from table(flatten(input => parse_json('[]'))) f;
+-----+-----+------+-------+-------+------+
| SEQ | KEY | PATH | INDEX | VALUE | THIS |
|-----+-----+------+-------+-------+------|
+-----+-----+------+-------+-------+------+
select * from table(flatten(input => parse_json('[]'), outer => true)) f;
+-----+------+------+-------+-------+------+
| SEQ | KEY | PATH | INDEX | VALUE | THIS |
|-----+------+------+-------+-------+------|
| 1 | NULL | | NULL | NULL | [] |
+-----+------+------+-------+-------+------+
How to Flatten JSON Files/Data in Snowflake?
Due to its compactness, the JSON file type is a popular choice for data transferred online. It is also the standard format for accessing API data. The information is saved as a string in a specific structure.
Rules:
- <NAME>: The data is organized into name/value pairs. A name/value pair is made up of a field name (in double quotes), a colon, and then a value: “firstName”:”John”
- Commas are used to separate the data.
- Curly Braces () are used to enclose objects.
- Multiple Name/Value pairs can be found in an object: {“firstName”:”John”, “lastName”:”Doe”}.
- Square Brackets are used to surround arrays ([]).
Multiple objects can be stored in an array:
"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]
Steps:
You are required to download Data for a Snowflake Project via an API from an Alteryx connection, then load it directly to a Snowflake Server and manipulate it within the Snowflake environment. The goal is to show how to switch from Extract, Transform, and Load (ETL) to ELT mode. Let’s understand this with a help of an example. Consider you are making a Snowflake Project on Bike Rental Data.
Step 1: To begin, you’ll work with a single file.
Image Source
Image Source
Let’s look at the code in a Code Editor to better comprehend the structure.
Image Source
Step 2: The next thing is to notice is the external ‘[]‘ on the entire entry, which denotes an array that needs to be flattened, followed by another ‘[]‘ for additionalProperties, which denotes a Nested Array. In Snowflake, there is a function called LATERAL FLATTEN that flattens JSON.
Let’s demonstrate this function with specific cases in this example.
Image Source
Step 3: From the Project_BikePoint Data table, you have a table with a single column BikePoint_JSON, as shown in the first image. The Lateral Flatten function is applied to the column that holds the JSON file (need a common in between). Each object within the Name and Value Pair will be moved as independent rows once the array has been flattened. Most of the time, you’re only concerned with the output’s value, so we may use .value: followed by the appropriate columns.
Image Source
Step 4: You notice the first column still contains the array which you need to Flatten as well, this is a Nested Structure, but the Lateral flatten function is easy to use where you can just repeatedly apply this process.
Image Source
Step 5: You can set up a similar structure as the above code, with an additional name for results from the inner array where you called t1 here. To get the results from the inner array can be called with .value on t1 instead, while results from the outer array are called with .value on t0.
Image Source
Image Source
Since every Object (782) from the outer array has 9 Objects, the resultant has 7038 rows, so the Flatten process has taken place efficiently and created an Appended Table.
Now you may simply upload additional calls to get any of the desired columns from the JSON Report in Snowflake in a query, as opposed to a few complex RegEx codes.
How to Partition JSON Data using Snowflake Flatten Command?
Snowflake FLATTEN is a table function that displays a VARIANT, OBJECT, or ARRAY column in a Lateral View. This phase uses the function to generate two tables with varying flattening levels.
1) Store Object Data in Snowflake Variant Columns
To divide events into individual JSON objects while keeping the global data, utilize a LATERAL JOIN and the Snowflake FLATTEN function. The LATERAL modifier connects the data with any information outside of the object, in this case, the device type, and version, and returns a row for each object. The value of the flattened object is output in a VALUE column by the function.
Step A: Perform a data query for each event:
select src:device_type::string
, src:version::string
, value
from
raw_source
, lateral flatten( input => src:events );
The following is the output of Snowflake:
+-------------------------+---------------------+-------------------------------------------------------------------------------+
| SRC:DEVICE_TYPE::STRING | SRC:VERSION::STRING | VALUE |
|-------------------------+---------------------+-------------------------------------------------------------------------------|
| server | 2.6 | { |
| | | "f": 83, |
| | | "rv": "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19", |
| | | "t": 1437560931139, |
| | | "v": { |
| | | "ACHZ": 42869, |
| | | "ACV": 709489, |
| | | "DCA": 232, |
| | | "DCV": 62287, |
| | | "ENJR": 2599, |
| | | "ERRS": 205, |
| | | "MXEC": 487, |
| | | "TMPI": 9 |
| | | }, |
| | | "vd": 54, |
| | | "z": 1437644222811 |
| | | } |
| server | 2.6 | { |
| | | "f": 1000083, |
| | | "rv": "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22", |
| | | "t": 1437036965027, |
| | | "v": { |
| | | "ACHZ": 6953, |
| | | "ACV": 346795, |
| | | "DCA": 250, |
| | | "DCV": 46066, |
| | | "ENJR": 9033, |
| | | "ERRS": 615, |
| | | "MXEC": 0, |
| | | "TMPI": 112 |
| | | }, |
| | | "vd": 626, |
| | | "z": 1437660796958 |
| | | } |
+-------------------------+---------------------+-------------------------------------------------------------------------------+
Step B: To save the aforementioned results in a table, use the CREATE TABLE AS statement:
create or replace table flattened_source as
select
src:device_type::string as device_type
, src:version::string as version
, value as src
from
raw_source
, lateral flatten( input => src:events );
select * from flattened_source;
+-------------+---------+-------------------------------------------------------------------------------+
| DEVICE_TYPE | VERSION | SRC |
|-------------+---------+-------------------------------------------------------------------------------|
| server | 2.6 | { |
| | | "f": 83, |
| | | "rv": "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19", |
| | | "t": 1437560931139, |
| | | "v": { |
| | | "ACHZ": 42869, |
| | | "ACV": 709489, |
| | | "DCA": 232, |
| | | "DCV": 62287, |
| | | "ENJR": 2599, |
| | | "ERRS": 205, |
| | | "MXEC": 487, |
| | | "TMPI": 9 |
| | | }, |
| | | "vd": 54, |
| | | "z": 1437644222811 |
| | | } |
| server | 2.6 | { |
| | | "f": 1000083, |
| | | "rv": "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22", |
| | | "t": 1437036965027, |
| | | "v": { |
| | | "ACHZ": 6953, |
| | | "ACV": 346795, |
| | | "DCA": 250, |
| | | "DCV": 46066, |
| | | "ENJR": 9033, |
| | | "ERRS": 615, |
| | | "MXEC": 0, |
| | | "TMPI": 112 |
| | | }, |
| | | "vd": 626, |
| | | "z": 1437660796958 |
| | | } |
+-------------+---------+-------------------------------------------------------------------------------+
2) Extract JSON Key-value Pairs across Separate Columns
Snowflake allows you to query JSON directly, as demonstrated in this tutorial. When formats change, you don’t have to recreate and re-populate massive tables, which is one advantage of using the JSON structure. You can, however, copy JSON Key Values into Typed Columns if you like.
The following statement creates a new table, EVENTS, in which the values from the RAW SOURCE table are copied into distinct columns. Using a double-colon (::) followed by the type, each value is cast to a data type that is appropriate for the value. If you don’t cast the column, it takes on the data type VARIANT, which can carry any value:
create or replace table events as
select
src:device_type::string as device_type
, src:version::string as version
, value:f::number as f
, value:rv::variant as rv
, value:t::number as t
, value:v.ACHZ::number as achz
, value:v.ACV::number as acv
, value:v.DCA::number as dca
, value:v.DCV::number as dcv
, value:v.ENJR::number as enjr
, value:v.ERRS::number as errs
, value:v.MXEC::number as mxec
, value:v.TMPI::number as tmpi
, value:vd::number as vd
, value:z::number as z
from
raw_source
, lateral flatten ( input => SRC:events );
The statement flattens the EVENTS’ layered data. Add a separate column for each value to the SRC:V key. For each key/value pair, the statement returns a row. The first two records in the new EVENTS table are shown in the output below:
select * from events;
+-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------+
| DEVICE_TYPE | VERSION | F | RV | T | ACHZ | ACV | DCA | DCV | ENJR | ERRS | MXEC | TMPI | VD | Z |
|-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------|
| server | 2.6 | 83 | "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19" | 1437560931139 | 42869 | 709489 | 232 | 62287 | 2599 | 205 | 487 | 9 | 54 | 1437644222811 |
| server | 2.6 | 1000083 | "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22" | 1437036965027 | 6953 | 346795 | 250 | 46066 | 9033 | 615 | 0 | 112 | 626 | 1437660796958 |
+-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------+
How to Extract Nested JSON Data using Lateral and Flatten?
Internal Objects with one or more fields and no set structure are sometimes seen in JSON Objects. To extract a Nested Variation, object, or array from JSON data, utilize the (LATERAL) Snowflake FLATTEN function.
Let’s say you want to construct a table VNT with a single JSON field:
CREATE OR REPLACE TABLE vnt
(
src variant
)
AS SELECT parse_json(column1) as src
FROM values
('{
"topleveldate" : "2017-04-28",
"toplevelname" : "somename",
"extraFields": [
{
"value": "somevalue1",
"key": "somekey1",
"type": "sometype1",
"booleanflag": false
},
{
"value": "",
"key": "somekey2"
}]}');
Let’s further assume that the topleveldate and toplevelname fields are well-known and that the extraFields field contains an array of fields that can vary from record to record, making it impossible to extract them using the standard syntax.
However, you can still use the LATERAL FLATTEN to convert this JSON data into a collection of rows containing the appropriate Field Names and Values:
WITH a as
(
select
src:topleveldate::string as topleveldate
, src:toplevelname::string as toplevelname
, value as val
from vnt, lateral flatten( input => src:extraFields )
)
select topleveldate, toplevelname, key, value from a, lateral flatten( input => val );
;
;
Image Source
Using Snowflake Flatten with Lateral Joins
To divide events into distinct JSON objects, utilize the LATERAL FLATTEN method:
However, you now see how that snippet of documentation is incorrect and deceptive. The term “LATERAL FLATTEN function” does not exist. Although there is a Snowflake FLATTEN function and a LATERAL join technique, the phrase “LATERAL FLATTEN function” is incorrect and misleading. The Snowflake tutorial should instead say:
To divide events into distinct JSON objects, utilize a LATERAL join to a FLATTEN subquery:
The reasoning of this query makes it more obvious now that you know what a LATERAL Join is. The following are crucial elements:
- The FLATTEN function is not Scalar; it returns rows and columns and functions similarly to the subqueries we saw earlier. The SRC column from the outer table RAW SOURCE is supplied to the FLATTEN subquery as a function argument.
- The ‘Event’ key is used to begin JSON parsing (see the tutorial for its example JSON). This is similar to the AVG() FLATTEN aggregate technique used in the preceding instances. The Join has no condition; it is Cartesian. But don’t worry: while each row of the outer table is effectively processed one at a time, each invocation of FLATTEN results in only one row being linked to the single outer calling row.
select src:device_type::string
, src:version::string
, value
from
raw_source
, lateral flatten( input => src:events );
What is a Snowflake Lateral Join?
The Lateral keyword in Snowflake allows an in-line view to refer to Columns from a table expression that comes before it, although it may not refer to columns from the left table in some instances. Only the rows generated by the in-line View or Subquery are included in the Snowflake Lateral Joins Output.
A) Syntax
The LATERAL keyword can be used to perform a Lateral Join. The Lateral Connect keyword is next.
SELECT column-1, column-2,... column-n
FROM <left_hand_table_expression>, LATERAL ( <in_line_view> )
…
- left_hand_table_expression: A User Table, View, Subquery, Table Function, or the result of an earlier join can be left_hand_table_expression.
- in_line_view: A Subquery, or table functions (FLATTEN or a user defined table functions (UDTF)) are all examples of in_line_view.
B) Example Queries
The examples below show how to use Lateral Joins in Snowflake.
E_EMP
+-------+----------+-----+-----------------+
| EMPID | LASTNAME | DID | PROJECTNAMES |
|-------+----------+-----+-----------------|
| 101 | ABC | 1 | [ |
| | | | "IT", |
| | | | "PROD" |
| | | | ] |
| 102 | BCD | 1 | [ |
| | | | "PS", |
| | | | "PRODSupport" |
| | | | ] |
| 103 | CDE | 2 | NULL |
+-------+----------+-----+-----------------+
E_DEPT
+-----+-------------+
| DID | NAME |
|-----+-------------|
| 1 | Engineering |
| 2 | Support |
+-----+-------------+
For example, consider Lateral Join:
select *
from e_dept as d, lateral (select * from e_emp as e where e.did = d.did) as LV
order by empid;
+-----+-------------+-------+----------+-----------------+
| DID | NAME | EMPID | LASTNAME | PNAME |
|-----+-------------+-------+----------+-----------------|
| 1 | Engineering | 101 | ABC | [ |
| | | | | "IT", |
| | | | | "PROD" |
| | | | | ] |
| 1 | Engineering | 102 | BCD | [ |
| | | | | "PS", |
| | | | | "PRODSupport" |
| | | | | ] |
| 2 | Support | 103 | CDE | NULL |
+-----+-------------+-------+----------+-----------------+
Explode Array to Rows: Using Snowflake Flatten Function & Lateral
The FLATTEN function is a table function that explores the values of an object or array object into rows. A lateral perspective is created by using the flatten function. When converting array data to table rows, the flatten function is most typically employed.
Consider the following example of producing rows from the e_emp table’s PROJECTNAMES columns. The projectnames column is an array build column, so keep that in mind.
select emp.empid, emp.lastName, index as array_index, value as projectNames
from e_emp as emp, lateral flatten(input => emp.projectNames) as proj_names
order by empid;
+-------+----------+-------------+---------------+
| EMPID | LASTNAME | ARRAY_INDEX | PROJECTNAMES |
|-------+----------+-------------+---------------|
| 101 | ABC | 0 | "IT" |
| 101 | ABC | 1 | "PROD" |
| 102 | BCD | 0 | "PS" |
| 102 | BCD | 1 | "PRODSupport" |
+-------+----------+-------------+---------------+
Array values are burst into rows as a result of the flatten tables function. The array index of Project Names is also visible.
Conclusion
This article has exposed you to the various Snowflake Flatten to help you improve your overall decision-making and experience when trying to make the most out of your data. In case you want to export data from a source of your choice into your desired Database/destination like Snowflake, then Hevo Data is the right choice for you!
Visit our Website to Explore Hevo
Hevo Data provides its users with a simpler platform for integrating data from 100+ sources for Analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources. You can use it to transfer data from multiple data sources into your Data Warehouses such as Snowflake, Database, or a destination of your choice. It provides you with a consistent and reliable solution to managing data in real-time, ensuring that you always have Analysis-ready data in your desired destination.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!
Share your experience of learning about Snowflake Flatten! Let us know in the comments section below!