Querying Snowflake Metadata for Staged File: 2 Easy Examples

|

Snowflake Metadata - Featured Image | Hevo Data

As a business grows, firms find it difficult to keep up with rising Data Storage and Real-Time Analytics demands. Traditional On-Premise Solutions fail to provide On-demand Scaling and often require huge investments as well as continuous maintenance. A more Economical and Efficient choice is employing Cloud Data Warehousing and Analytics platforms such as Snowflake. Owing to its architecture, Snowflake can automatically scale both Storage and Compute resources Independently according to your fluctuating workloads.

One of the most common scenarios in any sort of data architecture is querying data from its source. Metadata services are a critical part of Snowflake’s architecture. By Querying Snowflake Metadata from Staged files via the Standard SQL commands, you can get additional information about the files, such as filename and row numbers. Essentially, Snowflake Metadata Repository contains Road Map to locate the contents of a data warehouse.

In this article, you will learn how to effectively Query Snowflake Metadata from staged files.

Table of Contents

What is Snowflake?

Snowflake Metadata - Snowflake Logo | Hevo Data
Image Source

Snowflake is a Revolutionary Cloud Data Platform that is changing the landscape of Data Engineering, Data Warehouses, Data Lakes, Data Science, etc. The Snowflake Cloud Data Platform is capable of running any workload, enabling secure and governed access to all data, and delivering the performance and scalability that modern enterprises need, all in an easy-to-use Platform Delivered as a Service.

Snowflake makes it easy to load Structured and Semi-Structured data with the use of Standard SQL thus enabling Data Engineers and Business Analysts to collaborate and fast track the discovery of new business insights. The SnowSQL dialect is simple and feature-rich, and this makes it really easy for analysts to use. Snowflake also has a great Data Marketplace that can help you to enrich your organization’s data with free or cheap external data sources without having to move it into your Data Warehouse.

Snowflake’s Decoupling of Compute and Storage means that organizations can provision independent warehouses of different sizes for Ingesting Data, Transforming Data, Business Intelligence, and running Ad-hoc Queries. This eliminates the problem of resource contention that you might encounter in other Data Warehouse vendors. You can also set up separate Auto Scaling rules for any of those warehouses for greater Cost Management.

What is a Snowflake Stage?

A stage in Snowflake is an intermediate space where you can upload files before using the COPY command to load and unload them into tables. In other words, a Snowflake Stage is a Named Database Object that specifies where data files are staged to streamline the bulk loading and unloading of data in & out of database tables respectively.

Snowflake supports two types of stages for storing data files. These are:

Snowflake Metadata - Internal Stage | Hevo Data
Image Source
  • Internal Stages: The internal staging environment stores data internally in Snowflake tables.
Snowflake Metadata - External Stage | Hevo Data
Image Source
  • External Stages: The external staging environment stores data in an external location (Azure Blob, Amazon S3 Buckets, and GCP Storage Buckets). These tables are read-only which means you cannot perform DML operations on them.

What is Snowflake Metadata?

Snowflake automatically generates Metadata for files in both internal (i.e. Snowflake) stages and external (Amazon S3, Google Cloud Storage, or Microsoft Azure Blob) stages. Snowflake Metadata includes File Names, Version IDs, and Associated Properties.

This Metadata is ‘stored’ in virtual columns that can be:

  • Queried using a standard SELECT statement.
  • Loaded into a table, along with the regular data columns using the COPY INTO <table> statement.

The following Snowflake Metadata columns can be queried or copied into tables:

  • METADATA$FILENAME – This Snowflake Metadata query returns the name of the staged file, the row it belongs to, as well as the file path.
  • METADATA$FILE_ROW_NUMBER – This will return the row number for each record in the staged file.
Supercharge Snowflake ETL and Analysis Using Hevo’s No-code Data Pipeline

Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up Data Integration for 150+ Data Sources (Including 40+ Free sources) and will let you directly load data to a Data Warehouse such as Snowflake or the Destination of your choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data.

Get Started with Hevo for Free

Let’s look at some of the salient features of Hevo:

  • Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Connectors: Hevo supports 100+ Integrations to SaaS platforms such as WordPress, FTP/SFTP, Files, Databases, BI tools, and Native REST API & Webhooks Connectors. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake, Firebolt, Data Warehouses; Amazon S3 Data Lakes; Databricks; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.  
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within Data Pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

How to Query Snowflake Metadata from Staged Files?

To easily understand the Query process for Snowflake Metadata from both CSV & JSON Staged Files, let’s go through the following stepwise Query Examples:

  • Query 1: Querying the Snowflake Metadata Columns for a Staged CSV File
  • Query 2: Querying the Metadata Columns for a Staged JSON File

Query 1: Querying the Snowflake Metadata Columns for a Staged CSV File

In this example, you will first stage a CSV file into a Snowflake table and then run a query that will return the Snowflake Metadata columns, as well as the regular data columns of the CSV file.

The CSV file to be staged is called Cars. It has three columns (i.e. id, name, and location), and has the following structure:

1,Tesla,USA    
2,Toyota,Japan
3,TATA,India  

To Query Snowflake Metadata Columns for a Staged CSV file, you can follow the simple steps given below:

  • Step 1: Create a Snowflake stage by executing the command given below.
create or replace stage cars_stage;
  • Step 2: Using the FILE FORMAT command given below, create a file format to describe the format of the CSV file to be staged.
create or replace file format cars_format type = 'csv' field_delimiter = ',';
  • Step 3: Use the PUT command to stage your CSV file from your local folder to a Snowflake internal stage. 

For Windows

put file://C:UsersYOURUSERNAMEDesktopcars.csv @cars_stage;

For Linux/Mac

put file:///tmp/data/cars.csv @cars_stage;
  • Step 4: Query the filename and row number metadata columns and the regular data columns in the staged file.

select metadata$filename, metadata$file_row_number, c.$1, c.$2, c.$3 from @cars_stage (file_format => myformat) c;

METADATA$FILENAMEMETADATA$FILE_ROW_NUMBER$1$2
cars.csv.gz1TeslaUSA
cars.csv.gz2ToyotaJapan
cars.csv.gz3TATAIndia

Query 2: Querying the Metadata Columns for a Staged JSON File

In this example, you will first stage a Javascript Structured Object Notation (JSON) file into a Snowflake table and then run a query that will return the metadata columns, as well as the objects in the JSON file.

The JSON file to be staged is called movies and it has the following structure:

{"movie": {"title": "Roma","year": "2018"}},
{"movie": {"title": "Okja","year": "2017"}}

To Query Snowflake Metadata Columns for a Staged JSON file, you can follow the simple steps given below:

  • Step 1: Create an internal Snowflake stage using the command given below.
create or replace stage movies_stage;
  • Step 2: Using the FILE FORMAT command, create a file format to describe the format of the JSON file to be staged.
create or replace file format movies_format type = 'json';
  • Step 3: Use the PUT command to stage the JSON file from your local directory to the Snowflake stage. 

For Windows

put file://C:UsersYOURUSERNAMEDesktopmovies.json @movies_stage;

For Linux/Mac

put file:///tmp/data/movies.json @movies_stage;

select metadata$filename, metadata$file_row_number, parse_json($1) from @movies_stage/movies.json.gz;

METADATA$FILENAMEMETADATA$FILE_ROW_NUMBERPARSE_JSON($1)
movies.json.gz1{  “movie”: {    “title”:”Roma”,    “year”: “2018”  }}
movies.json.gz2{  “movie”: {    “title”:”Okja”,    “year”: “2017”  }}

Query Limitations on Snowflake Metadata

There are a few challenges that you might experience while Querying Snowflake Metadata:

  • The metadata files are read-only. This means that you cannot insert metadata into existing table rows.
  • Since you must reference metadata columns by name when running queries, they are not included in the output of any of the following statements:
    • DESCRIBE <object>
    • SELECT*
    • SHOW <object>
    • Queries on INFORMATION_SCHEMA views

Conclusion

In this article, you have learned how to effectively query Snowflake Metadata files stored in a stage. Snowflake creates and stores metadata for Internal & External files. Snowflake stores Metadata in the METADATA$FILENAME and METADATA$FILE_ROW_NUMBER columns Using the Standard SQL commands, you can Query Snowflake Metadata Columns for both CSV & JSON Staged Files.

To empower your Business Decisions with Insights from High-Quality Real-Time Data, it is crucial to consolidate data from all the applications used across your business for a complete view of your business health. As your firm grows, it becomes a Time-consuming & Resource-Intensive task to efficiently handle the exponentially growing data. Ultimately, you would require to invest a portion of your engineering bandwidth to Integrate Data from all sources, Clean & Transform it, and finally Load it to a Cloud Data Warehouse such as Snowflake for further business analytics. All of these challenges can be comfortably solved by a Cloud-Based ETL tool such as HevoData.  

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline can seamlessly transfer data from a vast sea of 150+ sources to a Data Warehouse like Snowflake or a Destination of your choice to be visualized in a BI Tool. It is a reliable, completely automated, and secure service that doesn’t require you to write any code!  

If you are using Snowflake as a Data Warehousing and Analytics Platform and searching for a Stress-Free Alternative to Manual Data Integration, then Hevo can effortlessly automate this for you. Hevo, with its strong integration with 100+ sources & BI tools(Including 40+ Free Sources), allows you to not only export & load data but also transform & enrich your data & make it analysis-ready in a jiffy.

Want to take Hevo for a ride? Sign Up for a 14-day free trial and simplify your Data Integration process. Do check out the pricing details to understand which plan fulfills all your business needs.

Tell us about your experience of Querying Snowflake Metadata $filename for Staged Files! Share your thoughts with us in the comments section below.

Jeremiah
Technical Content Writer, Hevo Data

Jeremiah is a specialist in crafting insightful content for the data industry, and his writing offers informative and simplified material on the complexities of data integration and analysis. He enjoys staying updated on the latest trends and uses his knowledge to help businesses succeed.

No-code Data Pipeline for Snowflake