As a business grows, firms find it difficult to keep up with rising Data Storage and Real-Time Analytics demands. Traditional On-Premise Solutions fail to provide On-demand Scaling and often require huge investments as well as continuous maintenance. A more Economical and Efficient choice is employing Cloud Data Warehousing and Analytics platforms such as Snowflake. Owing to its architecture, Snowflake can automatically scale both Storage and Compute resources Independently according to your fluctuating workloads.

One of the most common scenarios in any sort of data architecture is querying data from its source. Metadata services are a critical part of Snowflake’s architecture. By Querying Snowflake Metadata from Staged files via the Standard SQL commands, you can get additional information about the files, such as filename and row numbers. Essentially, Snowflake Metadata Repository contains Road Map to locate the contents of a data warehouse.

In this article, you will learn how to effectively Query Snowflake Metadata from staged files.

What is Snowflake?

Snowflake Metadata - Snowflake Logo | Hevo Data

Snowflake is a Revolutionary Cloud Data Platform that is changing the landscape of Data Engineering, Data Warehouses, Data Lakes, Data Science, etc. The Snowflake Cloud Data Platform is capable of running any workload, enabling secure and governed access to all data, and delivering the performance and scalability that modern enterprises need, all in an easy-to-use Platform Delivered as a Service.

Snowflake makes it easy to load Structured and Semi-Structured data with the use of Standard SQL thus enabling Data Engineers and Business Analysts to collaborate and fast track the discovery of new business insights. The SnowSQL dialect is simple and feature-rich, and this makes it really easy for analysts to use.

Snowflake’s Decoupling of Compute and Storage means that organizations can provision independent warehouses of different sizes for Ingesting Data, Transforming Data, Business Intelligence, and running Ad-hoc Queries. This eliminates the problem of resource contention that you might encounter in other Data Warehouse vendors. You can also set up separate Auto Scaling rules for any of those warehouses for greater Cost Management.

What is a Snowflake Stage?

A stage in Snowflake is an intermediate space where you can upload files before using the COPY command to load and unload them into tables. In other words, a Snowflake Stage is a Named Database Object that specifies where data files are staged to streamline the bulk loading and unloading of data in & out of database tables respectively.

Snowflake supports two types of stages for storing data files. These are:

Snowflake Metadata - Internal Stage | Hevo Data
Image Source
  • Internal Stages: The internal staging environment stores data internally in Snowflake tables.
Snowflake Metadata - External Stage | Hevo Data
Image Source
  • External Stages: The external staging environment stores data in an external location (Azure Blob, Amazon S3 Buckets, and GCP Storage Buckets). These tables are read-only which means you cannot perform DML operations on them.
Supercharge Snowflake ETL and Analysis Using Hevo’s No-code Data Pipeline

Hevo Data is now available on Snowflake Partner Connect, making it easier than ever to integrate your data seamlessly. With Hevo’s powerful data integration capabilities, Snowflake users can connect to Hevo directly from their Snowflake environment and streamline their data pipelines effortlessly. Hevo offers: 

  • More than 150 source connectors from databases, SaaS applications, etc.
  • A simple Python-based drag-and-drop data transformation technique that allows you to transform your data for analysis.
  • Automatic schema mapping to match the destination schema with the incoming data. You can also choose between Full and Incremental Mapping.
  • Proper bandwidth utilization on both the source and destination allows for the real-time transfer of modified data.
  • Transparent pricing with no hidden fees allows you to budget effectively while scaling your data integration needs.

Try Hevo today to seamlessly integrate data into Snowflake. 

Get Started with Hevo for Free

What is Snowflake Metadata?

Snowflake automatically generates Metadata for files in both internal (i.e. Snowflake) stages and external (Amazon S3, Google Cloud Storage, or Microsoft Azure Blob) stages. Snowflake Metadata includes File Names, Version IDs, and Associated Properties.

This Metadata is ‘stored’ in virtual columns that can be:

  • Queried using a standard SELECT statement.
  • Loaded into a table, along with the regular data columns using the COPY INTO <table> statement.

The following Snowflake Metadata columns can be queried or copied into tables:

  • METADATA$FILENAME – This Snowflake Metadata query returns the name of the staged file, the row it belongs to, as well as the file path.
  • METADATA$FILE_ROW_NUMBER – This will return the row number for each record in the staged file.

How to Query Snowflake Metadata from Staged Files?

To easily understand the Query process for Snowflake Metadata from both CSV & JSON Staged Files, let’s go through the following stepwise Query Examples:

  • Query 1: Querying the Snowflake Metadata Columns for a Staged CSV File
  • Query 2: Querying the Metadata Columns for a Staged JSON File

Query 1: Querying the Snowflake Metadata Columns for a Staged CSV File

In this example, you will first stage a CSV file into a Snowflake table and then run a query that will return the Snowflake Metadata columns, as well as the regular data columns of the CSV file.

The CSV file to be staged is called Cars. It has three columns (i.e. id, name, and location), and has the following structure:

1,Tesla,USA    
2,Toyota,Japan
3,TATA,India  

To Query Snowflake Metadata Columns for a Staged CSV file, you can follow the simple steps given below:

  • Step 1: Create a Snowflake stage by executing the command given below.
create or replace stage cars_stage;
  • Step 2: Using the FILE FORMAT command given below, create a file format to describe the format of the CSV file to be staged.
create or replace file format cars_format type = 'csv' field_delimiter = ',';
  • Step 3: Use the PUT command to stage your CSV file from your local folder to a Snowflake internal stage. 

For Windows

put file://C:UsersYOURUSERNAMEDesktopcars.csv @cars_stage;

For Linux/Mac

put file:///tmp/data/cars.csv @cars_stage;
  • Step 4: Query the filename and row number metadata columns and the regular data columns in the staged file.

select metadata$filename, metadata$file_row_number, c.$1, c.$2, c.$3 from @cars_stage (file_format => myformat) c;

METADATA$FILENAMEMETADATA$FILE_ROW_NUMBER$1$2
cars.csv.gz1TeslaUSA
cars.csv.gz2ToyotaJapan
cars.csv.gz3TATAIndia
Integrate Salesforce to Snowflake
Integrate MongoDB to Snowflake
Integrate Google Analytics to Snowflake

Query 2: Querying the Metadata Columns for a Staged JSON File

In this example, you will first stage a Javascript Structured Object Notation (JSON) file into a Snowflake table and then run a query that will return the metadata columns, as well as the objects in the JSON file.

The JSON file to be staged is called movies and it has the following structure:

{"movie": {"title": "Roma","year": "2018"}},
{"movie": {"title": "Okja","year": "2017"}}

To Query Snowflake Metadata Columns for a Staged JSON file, you can follow the simple steps given below:

  • Step 1: Create an internal Snowflake stage using the command given below.
create or replace stage movies_stage;
  • Step 2: Using the FILE FORMAT command, create a file format to describe the format of the JSON file to be staged.
create or replace file format movies_format type = 'json';
  • Step 3: Use the PUT command to stage the JSON file from your local directory to the Snowflake stage. 

For Windows

put file://C:UsersYOURUSERNAMEDesktopmovies.json @movies_stage;

For Linux/Mac

put file:///tmp/data/movies.json @movies_stage;

select metadata$filename, metadata$file_row_number, parse_json($1) from @movies_stage/movies.json.gz;

METADATA$FILENAMEMETADATA$FILE_ROW_NUMBERPARSE_JSON($1)
movies.json.gz1{  “movie”: {    “title”:”Roma”,    “year”: “2018”  }}
movies.json.gz2{  “movie”: {    “title”:”Okja”,    “year”: “2017”  }}

Query Limitations on Snowflake Metadata

There are a few challenges that you might experience while Querying Snowflake Metadata:

  • The metadata files are read-only. This means that you cannot insert metadata into existing table rows.
  • Since you must reference metadata columns by name when running queries, they are not included in the output of any of the following statements:
    • DESCRIBE <object>
    • SELECT*
    • SHOW <object>
    • Queries on INFORMATION_SCHEMA views

Learn More About:

ROW_NUMBER function in snowflake

Conclusion

In this article, you have learned how to effectively query Snowflake Metadata files stored in a stage. Snowflake creates and stores metadata for Internal & External files. Snowflake stores Metadata in the METADATA$FILENAME and METADATA$FILE_ROW_NUMBER columns Using the Standard SQL commands, you can Query Snowflake Metadata Columns for both CSV & JSON Staged Files.

To empower your Business Decisions with Insights from High-Quality Real-Time Data, it is crucial to consolidate data from all the applications used across your business for a complete view of your business health. Ultimately, you would be required to invest a portion of your engineering bandwidth to Integrate Data from all sources, Clean & Transform it, and finally Load it to a Cloud Data Warehouse such as Snowflake for further business analytics. All of these challenges can be comfortably solved by a Cloud-Based ETL tool such as HevoData.  Connect with us today to improve your data management experience and achieve more with your data.

FAQs

1. What is metadata in Snowflake?

Metadata in Snowflake refers to data about the structure, organization, and properties of data within the system, such as table definitions, schemas, columns, file formats, and query history.

2. Where is Snowflake’s metadata stored?

Snowflake stores metadata in its centralized cloud services layer, separate from user data. This layer manages metadata for queries, security, optimization, and storage, ensuring high availability and performance.

3. What is load metadata in Snowflake?

Load metadata in Snowflake refers to information generated during data loading, such as file names, load times, row counts, errors, and status.

Jeremiah
Technical Content Writer, Hevo Data

Jeremiah is a specialist in crafting insightful content for the data industry, and his writing offers informative and simplified material on the complexities of data integration and analysis. He enjoys staying updated on the latest trends and uses his knowledge to help businesses succeed.