Snowpark Snowflake: A Comprehensive 101 Guide

Snowflake, an industry-leading Data Warehousing Platform has been at the forefront of transforming the way we see and use data and as such, has become a favourite solution for Data Warehousing and Data Lakes for users.

With the demand for having an available Data Warehouse that does not only use SQL but can allow for further coding within its system, Snowflake came up with Snowpark Snowflake to meet this desire. Snowflake describes Snowpark as a new way to program data in Snowflake through a set of optimized APIs and the service offered includes native support for multiple Programming Languages, writing code in the language of choice by Developers, Data Engineers, and Data Scientists.

This article aims at introducing Snowpark Snowflake to you and talks about its undeniable capabilities and support for other systems.

Table of Contents

What is Snowpark Snowflake

Snowpark Snowflake Interface: Snowpark Snowflake

Snowpark Snowflake is a recent Product Offering from Snowflake. Snowpark is a new experience that allows Developers to make an easy extension to Snowflake’s capability by writing code that uses objects like DataFrame-Style Programming rather than SQL Statements to query and manipulate data using your favourite tools and deploy them in a Serverless Manner to Snowflake’s Virtual Warehouse Compute Engine.

Snowpark Supported Programmig Languages: Snowpark Snowflake

Capabilities of Snowpark Snowflake

Customers may quickly transition their business logic thanks to the ability to use already established code bases and the capability of Java or Scala (with other languages on the roadmap). The ability to access data programmatically will also be made significantly easier by the possibility to communicate with the widely used DataFrame API.

Snowflake is also working on:

Consistent Ingestion and Integration System: It is another project under progress. Users are able to integrate various data kinds, performance measurements, and computations.
Standardized Approach to Data Engineering: Since data pipelines can be tested, real CI/CD and unit testing are possible. Data Pipelines are easier to understand and interpret.
Access to Libraries from Third Parties: This includes processing using Machine Learning and Data Science.
Machine Learning: Unlocking the ability to store, track, and serve your models will enable you to operationalize your MLOps platform.

What are the Potential Gaps Snowpark Solves?

The capabilities of Snowpark Snowflake are enormous, which allows it to solve numerous user issues. Some of the key potential gaps that Snowpark solves are:

It offers Developers the functionality of using Java, Scala, and other Programming Languages to build code to allow Businesses to migrate data by interacting with Industry-Standard DataFrame API to access data programmatically.
Snowpark Snowflake provides the avenue for building Applications that link with Snowflake natively unlike previously when Code Development and Deployment would require separate infrastructure and maintenance to achieve this.
Snowpark gives Developers the tools to Calculate, Report, and Store Computations that can be served in real-time making it possible to read Internet of Things (IoT) Sensors or stream financial data to include ML Models, Web Sessions Analysis, etc.
Snowpark Snowflake also reduces the long startup time for distributed resources where systems require Clusters of Nodes to carry out specific operations but Snowpark solves this by connecting with Snowflake’s Virtual Data Warehouse directly.

Below is a Snowpark Snowflake example snippet that prints the count and names of tables in the current Database:

import com.snowflake.snowpark._
import com.snowflake.snowpark.functions._
 
object Main {
  def main(args: Array[String]) {
 
    // Create a Session, specifying the properties used to
    // connect to the Snowflake database.
    val builder = Session.builder.configs(Map(
      "URL" -> "https://<account_identifier>.snowflakecomputing.com",
      "USER" -> "<username>",
      "PASSWORD" -> "<password>",
      "ROLE" -> "<role_name_with_access_to_public_schema>",
      "WAREHOUSE" -> "<warehouse_name>",
      "DB" -> "<database_name>",
      "SCHEMA" -> "<schema_name>"
    ))
    val session = builder.create
 
    // Get the number of tables in the PUBLIC schema.
    var dfTables = session.table("INFORMATION_SCHEMA.TABLES").filter(col("TABLE_SCHEMA") === "PUBLIC")
    var tableCount = dfTables.count()
    var currentDb = session.getCurrentDatabase.getOrElse("<no current database>")
    println(s"Number of tables in the $currentDb database: $tableCount")
 
    // Get the list of tables in the PUBLIC schema.
    var dfPublicSchemaTables = session.table("INFORMATION_SCHEMA.TABLES").filter(col("TABLE_SCHEMA") === "PUBLIC").select(col("TABLE_NAME"))
    dfPublicSchemaTables.show()
  }
}

Upon execution, it prints out the number of tables and the list of tables in the schema as displayed below:

Number of tables in the "MY_DB" database: 8
...
---------------------
|"TABLE_NAME"       |
---------------------
|A_TABLE            |
...

Why is Snowpark Snowflake Exciting?

Now, since you have understood the benefits of Snowpark Snowflake, you can realize the importance of this software tool. Snowpark is exciting because it provides you with a collection of diverse online tools on a single platform. The following 3 tools are the most popular among Snowpark users:

Snowpark API
Snowpark Snowflake Dataframe
Java functions

1) Snowpark API

Image Source

The Snowpark Snowflake Library contains APIs for Querying and Processing data in a Data Pipeline to build Applications that will operate on the data in Snowflake without having to move the data to where the application code runs.

The Snowpark API provides Programming Language support for building SQL Statements that are executed on the Server, thereby, reducing the amount of data transferred between the Client and the Snowflake Database. For example, the API provides a select method that you can use to specify the column names to return, rather than writing select column_name as a string.

To get more information on Snowflake APIs, you can visit Snowflake’s official API Reference Guide here.

2) Snowpark Snowflake DataFrame

A DataFrame in Snowpark Snowflake is lazily evaluated, which means that statements are not executed until you call a method that performs an action. The core of Snowpark Snowflake is the DataFrame which is a set of data that provides methods to operate on data but you can also create a User-Defined Function (UDF) in your code and Snowpark will transfer the code to the server where the code can operate on the data.

With Snowpark, you can build queries using DataFrames in your code without having to create and pass along SQL Strings. An example is shown below:

val sess = // get connection to Snowflake
 
val sales:DataFrame = sess.table("sales")
val line_items:DataFrame = sess.table("sales_details")
 
val query = sales.join(line_items, sales("id") === line_items("sid"))
                 .groupBy(line_items("product_id"))
                 .count()

3) Java Functions

Java functions allow Developers to build complex logic that uses a simple function interface by running the code in a Secured, Sandboxed Java Virtual Machine (JVM) hosted in Snowflake’s Data Warehouse. In building these functions, you can use your existing Toolsets and can bring External Libraries as well. To get it into SQL, you will need to build a Java Archive (JAR or JARs) then load it into Snowflake and register a function. An example snippet is shown below.

create function sentiment(txt string) returns float
language java
imports = ('@jars/Sentiment.jar')
handler = 'Sentiment.score';

Additional Snowpark Snowflake Enhancements

Snowpark has recently enhanced its services by launching the following products:

New Table Function: You can now utilize the Java function of Snowpark for including table functions. This will offer new use cases within Snowpark as the public preview option will incorporate all supported cloud providers. Earlier, Snowpark was limited to salary functions only which were able to process a single row at once. However, now developers can implement complete functions that can return multiple rows at once or maintain a certain state in multiple rows simultaneously.
Logging Framework: You can call Snowflake’s logging framework using Snowpark to use it for the private preview. This feature will enhance the development work as tasks like monitoring & debugging will be streamlined. Moreover, Snowpark has shifted the boundaries of Data Cloud with this tool. It is inspiring the next generation of Data Engineers and Data Scientists by providing them better opportunities in the field of Cloud-technology. Businesses can now easily collaborate their resources stored on the cloud. Furthermore, Data Professionals can now develop pipelines in a hassle-free manner.
File Processing (Unstructured): Unstructured Data is now within the reach of your developers while working in the private preview, directly using Snowpark. Moreover, using this tool along with table functions, you can seamlessly modify your unstructured data for tasks such as parsing PDFs, obtaining metadata from DICOM files, etc.

Conclusion

This write-up has talked about Snowpark Snowflake is an exciting step into the Data Application World and shown that with it, Developers, Data Engineers, and Data Scientists can build complex Data Pipelines which will be executed by the Snowflake Engine after conversion to SQL and will be able to leverage Snowflake Cloud-native Elasticity and Unlimited Scalability.

Explore how Snowpark enables efficient data processing in Snowflake, supporting various use cases from data engineering to machine learning. Learn more at Snowflake Snowpark Architecture.

To have a smooth coding experience using Snowpark on Snowflake Data Warehousing, Hevo Data can act as a go-between across these two powerful Tools as it supports integration between them and your other sources of data.

Share your experience of understanding Snowpark Snowflake in the comments section below!

Ofem Eteng Technical Content Writer, Hevo Data

Ofem Eteng is a seasoned technical content writer with over 12 years of experience. He has held pivotal roles such as System Analyst (DevOps) at Dagbs Nigeria Limited and Full-Stack Developer at Pedoquasphere International Limited. He specializes in data science, data analytics and cutting-edge technologies, making him an expert in the data industry.