Snowpark Snowflake: A Comprehensive 101 Guide

on Data Analytics, Data Processing, Data Warehouses, Database Management Systems, Snowflake, SQL • September 28th, 2021 • Write for Hevo

With the explosion in the need to get the most out of data being produced daily, there has been an increased desire from Organizations for Data Service Providers to grow and expand the capabilities of available resources to cater to their needs or even develop new tools that will be able to serve a wide variety of needs.

Snowflake, an industry-leading Data Warehousing Platform has been at the forefront of transforming the way we see and use data and as such, has become a favourite solution for Data Warehousing and Data Lakes for users.

With the demand for having an available Data Warehouse that does not only use SQL but can allow for further coding within its system, Snowflake came up with Snowpark Snowflake to meet this desire. Snowflake describes Snowpark as a new way to program data in Snowflake through a set of optimized APIs and the service offered include native support for multiple Programming Languages, writing code in the language of choice by Developers, Data Engineers, and Data Scientists.

This article aims at introducing Snowpark Snowflake to you and talks about its undeniable capabilities and support for other systems. 

Table of Contents

An Overview of Snowpark Snowflake

Snowpark Snowflake Interface: Snowpark Snowflake
Image Source: Snowflake

Snowpark Snowflake is a recent Product Offering from Snowflake. Snowpark is a new experience that allows Developers to make an easy extension to Snowflake’s capability by writing code that uses objects like DataFrame-Style Programming rather than SQL Statements to query and manipulate data using your favourite tools and deploy them in a Serverless Manner to Snowflake’s Virtual Warehouse Compute Engine.

Before the introduction of Snowpark, Snowflake, founded in 2012, offered Cloud-based Data Warehousing to users in the form of a Software as a Service (SaaS) Platform to load, analyze, and create reports on large data volumes. It did all of this without the need to deploy hardware, install or configure any software as this was handled automatically providing a Reliable, Secure, High Performance, and Scalable Data-Processing System, therefore, making it ideal for Organizations running on a tight budget and which did not want to have On-premise Support.

But with Data Warehousing, SQL is the language of choice thereby limiting some Developers who prefer writing in other Programmable Languages and will want to perform some kind of operations that were constrained in Data Warehousing Systems. This led Developers to pull data into other systems to perform certain tasks before bringing them back to Snowflake thereby, increasing cost, run time, and complexity. 

So, Snowflake came up with Snowpark Snowflake capability to enable Developers to bring Deeply Integrated, DataFrame Programming to the Languages they prefer and all in one place.

Snowpark Snowflake is designed to make the building of complex Data Pipelines easy for Developers and making it simple to interact with Snowflake Data Warehouse without having to move data as was done before and all of this is done seamlessly because Snowpark uploads and runs your code in Snowflake. With this development, you can now transform your Data Analytics, Data Engineering, and Machine Learning Applications into one Data Platform in Snowflake Virtual Warehouses.

Snowpark Snowflake, the Developer Tool as it is known, aids Software Engineers to deploy custom code on Snowflake’s Data Warehouse to perform various Information Management Tasks. Java, Scala, and Python programming languages are the initial languages Snowpark supports.

For more information on Snowflake Database usage and examples, visit our other comprehensive article on Snowflake Database here.

Snowpark Supported Programmig Languages: Snowpark Snowflake
Image Source: Miro Medium

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline, helps load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK’s, and Streaming Services and simplifies the ETL process. It supports 100+ Data Sources including 30+ Free Sources. It is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination like Snowflake. Hevo loads the data onto the desired Data Warehouse/Destination and enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Get Started with Hevo for Free

Check out why Hevo is the Best:

Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled securely and consistently with zero data loss.

Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.

Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.

Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.

Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.

Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.

Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.

Simplify your Data Analysis with Hevo today! 

Sign up here for a 14-Day Free Trial!

Snowpark Snowflake Capabilities

The capabilities of Snowpark Snowflake are enormous as it offers Developers the functionality of using Java, Scala, and other Programming Languages to build code to allow Businesses to migrate data by interacting with Industry-Standard DataFrame API to access data programmatically.

Snowpark Snowflake provides the avenue for building Applications that link with Snowflake natively unlike previously when Code Development and Deployment would require separate infrastructure and maintenance to achieve this.

Snowpark gives Developers the tools to Calculate, Report, and Store Computations that can be served in real-time making it possible to read Internet of Things (IoT) Sensors or streaming financial data to include ML Models, Web Sessions Analysis, etc. 

Snowpark Snowflake also reduces long startup time for distributed resources where systems require Clusters of Nodes to carry out specific operations but Snowpark solves this by connecting with Snowflake’s Virtual Data Warehouse directly.

Below is a Snowpark Snowflake example snippet that prints the count and names of tables in the current Database:

import com.snowflake.snowpark._
import com.snowflake.snowpark.functions._
 
object Main {
  def main(args: Array[String]) {
 
    // Create a Session, specifying the properties used to
    // connect to the Snowflake database.
    val builder = Session.builder.configs(Map(
      "URL" -> "https://<account_identifier>.snowflakecomputing.com",
      "USER" -> "<username>",
      "PASSWORD" -> "<password>",
      "ROLE" -> "<role_name_with_access_to_public_schema>",
      "WAREHOUSE" -> "<warehouse_name>",
      "DB" -> "<database_name>",
      "SCHEMA" -> "<schema_name>"
    ))
    val session = builder.create
 
    // Get the number of tables in the PUBLIC schema.
    var dfTables = session.table("INFORMATION_SCHEMA.TABLES").filter(col("TABLE_SCHEMA") === "PUBLIC")
    var tableCount = dfTables.count()
    var currentDb = session.getCurrentDatabase.getOrElse("<no current database>")
    println(s"Number of tables in the $currentDb database: $tableCount")
 
    // Get the list of tables in the PUBLIC schema.
    var dfPublicSchemaTables = session.table("INFORMATION_SCHEMA.TABLES").filter(col("TABLE_SCHEMA") === "PUBLIC").select(col("TABLE_NAME"))
    dfPublicSchemaTables.show()
  }
}

Upon execution, it prints out the number of tables and the list of tables in the schema as displayed below:

Number of tables in the "MY_DB" database: 8
...
---------------------
|"TABLE_NAME"       |
---------------------
|A_TABLE            |
...

Snowpark API

The Snowpark Library contains APIs for Querying and Processing data in a Data Pipeline to build Applications that will operate on the data in Snowflake without having to move the data to where the Application code runs.

The Snowpark API provides Programming Language support for building SQL Statements that are executed on the Server, thereby, reducing the amount of data transferred between the Client and the Snowflake Database. For example, the API provides a select method that you can use to specify the column names to return, rather than writing select column_name as a string.

To get more information on Snowflake APIs, you can visit Snowflake’s official API Reference Guide here

Snowpark Snowflake DataFrame

A DataFrame in Snowpark Snowflake is lazily evaluated, which means that statements are not executed until you call a method that performs an action. The core of Snowpark Snowflake is the DataFrame which is a set of data that provides methods to operate on data but you can also create a User-Defined Function (UDF) in your code and Snowpark will transfer the code to the server where the code can operate on the data.

With Snowpark, you can build queries using DataFrames in your code without having to create and pass along SQL Strings. An example is shown below:

val sess = // get connection to Snowflake
 
val sales:DataFrame = sess.table("sales")
val line_items:DataFrame = sess.table("sales_details")
 
val query = sales.join(line_items, sales("id") === line_items("sid"))
                 .groupBy(line_items("product_id"))
                 .count()

Java Functions

Java functions allow Developers to build complex logic that uses a simple function interface by running the code in a Secured, Sandboxed Java Virtual Machine (JVM) hosted in Snowflake’s Data Warehouse. In building these functions, you can use your existing Toolsets and can bring External Libraries as well. To get it into SQL, you will need to build a Java Archive (JAR or JARs) then load it into Snowflake and register a function. An example snippet is shown below.

create function sentiment(txt string) returns float
language java
imports = ('@jars/Sentiment.jar')
handler = 'Sentiment.score'; 

Conclusion

This write-up has talked about Snowpark Snowflake is an exciting step into the Data Application World and shown that with it, Developers, Data Engineers, and Data Scientists can build complex Data Pipelines which will be executed by the Snowflake Engine after conversion to SQL and will be able to leverage Snowflake Cloud-native Elasticity and Unlimited Scalability.

To have a smooth coding experience using Snowpark on Snowflake Data Warehousing, Hevo Data can act as a go-between across these two powerful Tools as it supports integration between them and your other sources of data. 

Hevo Data provides its users with a simpler platform for integrating data from 100+ sources for Data Analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources. You can use it to transfer data from multiple data sources into your Data Warehouse, Database, or a destination of your choice like Snowflake. It provides you with a consistent and reliable solution to managing data in real-time, ensuring that you always have Analysis-ready data in your desired destination.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of understanding Snowpark Snowflake in the comments section below!

No-code Data Pipeline for Snowflake