Snowpark Snowflake: A Comprehensive 101 Guide

By: Published: September 28, 2021

Snowpark Snowflake- Featured Image

Snowflake, an industry-leading Data Warehousing Platform has been at the forefront of transforming the way we see and use data and as such, has become a favourite solution for Data Warehousing and Data Lakes for users.

With the demand for having an available Data Warehouse that does not only use SQL but can allow for further coding within its system, Snowflake came up with Snowpark Snowflake to meet this desire. Snowflake describes Snowpark as a new way to program data in Snowflake through a set of optimized APIs and the service offered includes native support for multiple Programming Languages, writing code in the language of choice by Developers, Data Engineers, and Data Scientists.

This article aims at introducing Snowpark Snowflake to you and talks about its undeniable capabilities and support for other systems. 

Table of Contents

What is Snowpark Snowflake

Snowpark Snowflake Interface: Snowpark Snowflake
Image Source: snowflake.com

Snowpark Snowflake is a recent Product Offering from Snowflake. Snowpark is a new experience that allows Developers to make an easy extension to Snowflake’s capability by writing code that uses objects like DataFrame-Style Programming rather than SQL Statements to query and manipulate data using your favourite tools and deploy them in a Serverless Manner to Snowflake’s Virtual Warehouse Compute Engine.

History of Snowflake Snowpark

Before the introduction of Snowpark, Snowflake, founded in 2012, offered Cloud-based Data Warehousing to users in the form of a Software as a Service (SaaS) Platform to load, analyze, and create reports on large data volumes. It did all of this without the need to deploy hardware, install or configure any software as this was handled automatically providing a Reliable, Secure, High Performance, and Scalable Data-Processing System, therefore, making it ideal for Organizations running on a tight budget and which did not want to have On-premise Support.

But with Data Warehousing, SQL is the language of choice thereby limiting some Developers who prefer writing in other Programmable Languages and will want to perform some kind of operations that were constrained in Data Warehousing Systems. This led Developers to pull data into other systems to perform certain tasks before bringing them back to Snowflake thereby, increasing cost, run time, and complexity. 

So, Snowflake came up with Snowpark Snowflake capability to enable Developers to bring Deeply Integrated, DataFrame Programming to the Languages they prefer and all in one place.

Snowpark Snowflake is designed to make the building of complex Data Pipelines easy for Developers and make it simple to interact with Snowflake Data Warehouse without having to move data as was done before and all of this is done seamlessly because Snowpark uploads and runs your code in Snowflake. With this development, you can now transform your Data Analytics, Data Engineering, and Machine Learning Applications into one Data Platform in Snowflake Virtual Warehouses.

Snowpark Snowflake, the Developer Tool as it is known, aids Software Engineers to deploy custom code on Snowflake’s Data Warehouse to perform various Information Management Tasks. Java, Scala, and Python programming languages are the initial languages Snowpark supports.

For more information on Snowflake Database usage and examples, visit our other comprehensive article on Snowflake Database here.

Snowpark Supported Programmig Languages: Snowpark Snowflake
Image Source: medium.com

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline, helps load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK’s, and Streaming Services and simplifies the ETL process. It supports 100+ Data Sources including 30+ Free Sources. It is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination like Snowflake. Hevo loads the data onto the desired Data Warehouse/Destination and enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Get Started with Hevo for Free

Capabilities of Snowpark Snowflake

Customers may quickly transition their business logic thanks to the ability to use already established code bases and the capability of Java or Scala (with other languages on the roadmap). The ability to access data programmatically will also be made significantly easier by the possibility to communicate with the widely used DataFrame API.

Snowflake is also working on:

  • Consistent Ingestion and Integration System: It is another project under progress. Users are able to integrate various data kinds, performance measurements, and computations.
  • Standardized Approach to Data Engineering: Since data pipelines can be tested, real CI/CD and unit testing are possible. Data Pipelines are easier to understand and interpret.
  • Access to Libraries from Third Parties: This includes processing using Machine Learning and Data Science.
  • Machine Learning: Unlocking the ability to store, track, and serve your models will enable you to operationalize your MLOps platform.

What are the Potential Gaps Snowpark Solves?

The capabilities of Snowpark Snowflake are enormous, which allows it to solve numerous user issues. Some of the key potential gaps that Snowpark solves are:

  • It offers Developers the functionality of using Java, Scala, and other Programming Languages to build code to allow Businesses to migrate data by interacting with Industry-Standard DataFrame API to access data programmatically.
  • Snowpark Snowflake provides the avenue for building Applications that link with Snowflake natively unlike previously when Code Development and Deployment would require separate infrastructure and maintenance to achieve this.
  • Snowpark gives Developers the tools to Calculate, Report, and Store Computations that can be served in real-time making it possible to read Internet of Things (IoT) Sensors or stream financial data to include ML Models, Web Sessions Analysis, etc. 
  • Snowpark Snowflake also reduces the long startup time for distributed resources where systems require Clusters of Nodes to carry out specific operations but Snowpark solves this by connecting with Snowflake’s Virtual Data Warehouse directly.

Below is a Snowpark Snowflake example snippet that prints the count and names of tables in the current Database:

import com.snowflake.snowpark._
import com.snowflake.snowpark.functions._
 
object Main {
  def main(args: Array[String]) {
 
    // Create a Session, specifying the properties used to
    // connect to the Snowflake database.
    val builder = Session.builder.configs(Map(
      "URL" -> "https://<account_identifier>.snowflakecomputing.com",
      "USER" -> "<username>",
      "PASSWORD" -> "<password>",
      "ROLE" -> "<role_name_with_access_to_public_schema>",
      "WAREHOUSE" -> "<warehouse_name>",
      "DB" -> "<database_name>",
      "SCHEMA" -> "<schema_name>"
    ))
    val session = builder.create
 
    // Get the number of tables in the PUBLIC schema.
    var dfTables = session.table("INFORMATION_SCHEMA.TABLES").filter(col("TABLE_SCHEMA") === "PUBLIC")
    var tableCount = dfTables.count()
    var currentDb = session.getCurrentDatabase.getOrElse("<no current database>")
    println(s"Number of tables in the $currentDb database: $tableCount")
 
    // Get the list of tables in the PUBLIC schema.
    var dfPublicSchemaTables = session.table("INFORMATION_SCHEMA.TABLES").filter(col("TABLE_SCHEMA") === "PUBLIC").select(col("TABLE_NAME"))
    dfPublicSchemaTables.show()
  }
}

Upon execution, it prints out the number of tables and the list of tables in the schema as displayed below:

Number of tables in the "MY_DB" database: 8
...
---------------------
|"TABLE_NAME"       |
---------------------
|A_TABLE            |
...

Sign up here for a 14-Day Free Trial!

Why is Snowpark Snowflake Exciting?

Now, since you have understood the benefits of Snowpark Snowflake, you can realize the importance of this software tool. Snowpark is exciting because it provides you with a collection of diverse online tools on a single platform. The following 3 tools are the most popular among Snowpark users:

1) Snowpark API

Snowpark Snowflake: API Logo
Image Source

The Snowpark Snowflake Library contains APIs for Querying and Processing data in a Data Pipeline to build Applications that will operate on the data in Snowflake without having to move the data to where the application code runs.

The Snowpark API provides Programming Language support for building SQL Statements that are executed on the Server, thereby, reducing the amount of data transferred between the Client and the Snowflake Database. For example, the API provides a select method that you can use to specify the column names to return, rather than writing select column_name as a string.

To get more information on Snowflake APIs, you can visit Snowflake’s official API Reference Guide here

2) Snowpark Snowflake DataFrame

A DataFrame in Snowpark Snowflake is lazily evaluated, which means that statements are not executed until you call a method that performs an action. The core of Snowpark Snowflake is the DataFrame which is a set of data that provides methods to operate on data but you can also create a User-Defined Function (UDF) in your code and Snowpark will transfer the code to the server where the code can operate on the data.

With Snowpark, you can build queries using DataFrames in your code without having to create and pass along SQL Strings. An example is shown below:

val sess = // get connection to Snowflake
 
val sales:DataFrame = sess.table("sales")
val line_items:DataFrame = sess.table("sales_details")
 
val query = sales.join(line_items, sales("id") === line_items("sid"))
                 .groupBy(line_items("product_id"))
                 .count()

3) Java Functions

Java functions allow Developers to build complex logic that uses a simple function interface by running the code in a Secured, Sandboxed Java Virtual Machine (JVM) hosted in Snowflake’s Data Warehouse. In building these functions, you can use your existing Toolsets and can bring External Libraries as well. To get it into SQL, you will need to build a Java Archive (JAR or JARs) then load it into Snowflake and register a function. An example snippet is shown below.

create function sentiment(txt string) returns float
language java
imports = ('@jars/Sentiment.jar')
handler = 'Sentiment.score'; 

Additional Snowpark Snowflake Enhancements

Snowpark has recently enhanced its services by launching the following products:

  • New Table Function: You can now utilize the Java function of Snowpark for including table functions. This will offer new use cases within Snowpark as the public preview option will incorporate all supported cloud providers. Earlier, Snowpark was limited to salary functions only which were able to process a single row at once. However, now developers can implement complete functions that can return multiple rows at once or maintain a certain state in multiple rows simultaneously.
  • Logging Framework: You can call Snowflake’s logging framework using Snowpark to use it for the private preview. This feature will enhance the development work as tasks like monitoring & debugging will be streamlined. Moreover, Snowpark has shifted the boundaries of Data Cloud with this tool. It is inspiring the next generation of Data Engineers and Data Scientists by providing them better opportunities in the field of Cloud-technology. Businesses can now easily collaborate their resources stored on the cloud. Furthermore, Data Professionals can now develop pipelines in a hassle-free manner.
  • File Processing (Unstructured): Unstructured Data is now within the reach of your developers while working in the private preview, directly using Snowpark. Moreover, using this tool along with table functions, you can seamlessly modify your unstructured data for tasks such as parsing PDFs, obtaining metadata from DICOM files, etc.

Conclusion

This write-up has talked about Snowpark Snowflake is an exciting step into the Data Application World and shown that with it, Developers, Data Engineers, and Data Scientists can build complex Data Pipelines which will be executed by the Snowflake Engine after conversion to SQL and will be able to leverage Snowflake Cloud-native Elasticity and Unlimited Scalability.

To have a smooth coding experience using Snowpark on Snowflake Data Warehousing, Hevo Data can act as a go-between across these two powerful Tools as it supports integration between them and your other sources of data. 

Hevo Data provides its users with a simpler platform for integrating data from 100+ sources for Data Analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources. You can use it to transfer data from multiple data sources into your Data Warehouse, Database, or a destination of your choice like Snowflake. It provides you with a consistent and reliable solution to managing data in real-time, ensuring that you always have Analysis-ready data in your desired destination.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of understanding Snowpark Snowflake in the comments section below!

Ofem Eteng
Freelance Technical Content Writer, Hevo Data

Ofem is a freelance writer specializing in data-related topics, who has expertise in translating complex concepts. With a focus on data science, analytics, and emerging technologies.

No-code Data Pipeline for Snowflake