Setting up Snowflake Streaming: 2 Easy Methods

Last Modified: December 29th, 2022

Snowflake Streaming | Hevo Data

With data requirements growing at the speed of light, the ability to not only manipulate data in real-time but also scale along with it is a must-have for most organizations around the world. Snowflake is one data warehouse that allows users to store, compute, scale and even carry out analytics to draw crucial & actionable insights about numerous metrics such as customer interactions, traffic, etc. to measure their performance.

Snowflake is the go-to option for many organizations around the world, which allows them to leverage its robust architecture and data streaming support to stream their data into Snowflake with ease. This article aims at providing you with a step-by-step guide and in-depth knowledge to help you set up Snowflake Streaming in a matter of minutes using various methods.

Through this article, you will get a deep understanding of the tools and techniques being used & thus, will help you hone your skills further. It will help you build a customized ETL pipeline for your organization.

Table of Contents

What is Snowflake?

Snowflake Streaming: Snowflake Logo | Hevo Data
Image Source

Snowflake is one of the most robust only-data platforms available in the market that provides users with an immensely scalable cloud data storage functionality. It allows them to leverage its robust architecture to not only store but also share their data in real time.

It further houses a wide range of features such as advanced analytics, business intelligence, seamless & secure data transfer, and a lot more.

What is Data Streaming & Its Applications?

Data streaming facilitates the continuous flow and transmission of data which is processed, analyzed, stored, and generated in real-time using stream processing technology.

With wide-ranging applications such as real-time tracking, multiplayer games, managing retail & warehouse inventory, etc. data streaming has become an instrumental feature for numerous businesses.

Similar to other platforms, Snowflake also allows users to set up data streaming, and stream data from various data sources such as Kafka using its data loading pipeline known as Snowpipe as follows:

Snowflake Streaming: Snowpipe for Snowflake Streaming | Hevo Data
Image Source

Seamlessly Stream Your Data to Snowflake Using Hevo’s No-code Data Pipelines

Hevo Data, a No-code Data Pipeline, helps you stream data from 100+ sources to Snowflake and visualize it in a BI Tool with ease. Hevo is fully-managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Get started with hevo for free

It provides a consistent & reliable solution to manage data in real-time and always has analysis-ready data in your desired destination. It allows you to focus on essential business needs and perform insightful analysis using various BI tools such as Power BI, and Tableau. 

Check out what makes Hevo amazing:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-day free trial!

Prerequisites

  • Working knowledge of Snowflake.
  • Working knowledge of Kafka.
  • A general idea about APIs.
  • A general idea about Python and its libraries.
  • An active Snowflake account.

What are the Methods to Set up Snowflake Streaming?

There are multiple ways in which you can set up Snowflake Streaming:

Method 1: Using Custom Scripts to Set up Snowflake Streaming

Snowflake houses robust support for carrying out efficient data streaming and allows users to make use of either APIs or various drivers such as the Node.js driver with custom codes & scripts to establish a connection and stream their data in a matter of minutes.

You can learn more about setting up Snowflake Streaming using APIs and Node.js driver from the following sections:

Streaming Data to Snowflake using APIs

Snowflake allows users to leverage APIs provided by a data source of their choice such as Twitter, Stripe, etc., with the help of custom-code scripts to stream & map data to a table of their choice in their Snowflake database.

For example, if you want to stream data from Twitter, you first need to install the Tweepy Python library to access data by leveraging the Twitter API using a Python script. You can install the Tweepy library using the pip command as follows:

pip install tweepy

Once you’ve installed the Tweepy library, you now need to write a Python-based script that will help pull data from Twitter and stream it to your Snowflake database. You can do this as follows:

from tweepy import OAuthHandler, Stream, StreamListener
 
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
 
class StdOutListener(StreamListener):
 
    def on_data(self, data):
        print(data)
        return True
    
    def on_error(self, status):
        if status == 420:
            return False
 
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
 
stream = Stream(auth=auth, listener=l)
stream.filter(track=['basketball'])

This is how you can set up Snowflake Streaming to start streaming data from a source of your choice such as Twitter by leveraging its APIs.

Streaming Data to Snowflake using Node.js Driver

Snowflake further allows users to start streaming data to their Snowflake tables by leveraging its Node.js driver. To do this, you will first have to establish a connection with your Snowflake database using the Snowflake SDK. You can use the following lines of code to connect with your desired Snowflake database:

// Load the Snowflake Node.js driver.
var snowflake = require('snowflake-sdk');
 
// Create a Connection object that we can use later to connect.
var connection = snowflake.createConnection( {
    account: account,
    username: user,
    password: password
    }
    );
 
// Try to connect to Snowflake, and check whether the connection was successful.
connection.connect( 
    function(err, conn) {
        if (err) {
            console.error('Unable to connect: ' + err.message);
            } 
        else {
            console.log('Successfully connected to Snowflake.');
            // Optional: store the connection ID.
            connection_ID = conn.getId();
            }
        }
    );

With your connection now up and running, you now need to call the “connection. execute()” method to execute all the necessary statements. Once your query statement is ready for consumption, a complete callback will invoke. You can now use the following lines of code to implement this:

var statement = connection.execute({
  sqlText: 'create database testdb',
  complete: function(err, stmt, rows) {
    if (err) {
      console.error('Failed to execute statement due to the following error: ' + err.message);
    } else {
      console.log('Successfully executed statement: ' + stmt.getSqlText());
    }
  }
});

Once you’ve called the “connection. execute()” method, you can start pulling data in the form of rows using the “statement. streamRows()” function as follows:

connection.execute({
  sqlText: 'select * from sometable',
  complete: function(err, stmt, rows) {
    if (err) {
      console.error('Failed to execute statement due to the following error: ' + err.message);
    } else {
      console.log('Number of rows produced: ' + rows.length);
    }
  }
});
var statement = connection.execute({
  sqlText: 'select * from sometable'
});
 
var stream = statement.streamRows();
 
stream.on('error', function(err) {
  console.error('Unable to consume all rows');
});
 
stream.on('data', function(row) {
  // consume result row...
});
 
stream.on('end', function() {
  console.log('All rows consumed');
});

This is how you can make use of custom scripts to leverage the Snowflake Node.js driver to stream your data to Snowflake.

Limitations of Streaming Data to Snowflake using Custom Scripts

  • Using custom scripts to set up data steaming for Snowflake, requires you to have strong technical knowledge about using APIs, drivers such as Node.js driver, etc.
  • Streaming data to Snowflake manually can often result in various inconsistencies such as data redundancies, streaming issues, programming language-based incompatibility, etc.

Method 2: Using Kafka to Set up Snowflake Streaming

Snowflake Streaming: Kafka for Snowflake Streaming | Hevo Data
Image Source

Snowflake provides users with an in-built connector for Kafka, allowing them to pull in data from numerous Kafka topics in the form of data streams. You can make use of Snowflake’s in-built Kafka connector to start streaming your data using the following steps:

Step 1: Installing the Kafka Connector for Snowflake

To start streaming data from Kafka to your desired Snowflake database and tables, you first need to install the Kafka connector, either for Apache Kafka or Confluent Kafka. You can check out the following links to download the desired Kafka connector for your Snowflake instance:

Step 2: Creating a Snowflake Schema, Database, and Custom Role

With your desired Kafka connector now installed, you now need to create a Snowflake schema and database, where you’ll stream and store your data coming from Kafka topics. To do this, you can make use of the following lines of code:

create schema kafka_schema;
create database kafka_db;

Once you’ve created your Snowflake database, you now need to create a custom role for your Kafka connector and then assign it the necessary privileges as follows:

-- Use a role that can create and manage roles and privileges.
use role securityadmin;
 
-- Create a Snowflake role with the privileges to work with the connector.
create role kafka_connector_role_1;
 
-- Grant privileges on the database.
grant usage on database kafka_db to role kafka_connector_role_1;
 
-- Grant privileges on the schema.
grant usage on schema kafka_schema to role kafka_connector_role_1;
grant create table on schema kafka_schema to role kafka_connector_role_1;
grant create stage on schema kafka_schema to role kafka_connector_role_1;
grant create pipe on schema kafka_schema to role kafka_connector_role_1;
 
-- Grant the custom role to an existing user.
grant role kafka_connector_role_1 to user kafka_connector_user_1;
 
-- Set the custom role as the default role for the user.
alter user kafka_connector_user_1 set default_role = kafka_connector_role_1;

Step 3: Configuring the Kafka Connector for Snowflake

Once you’ve created a custom role for your Kafka connector, you now need to configure it either in the standalone or distributed mode as per the installation type of Kafka on your system.

In case you want to configure your Kafka connector in the distributed mode, you can do so, using the following lines of code:

{
  "name":"XYZCompanySensorData",
  "config":{
    "connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector",
    "tasks.max":"8",
    "topics":"topic1,topic2",
    "snowflake.topic2table.map": "topic1:table1,topic2:table2",
    "buffer.count.records":"10000",
    "buffer.flush.time":"60",
    "buffer.size.bytes":"5000000",
    "snowflake.url.name":"myaccount.us-west-2.snowflakecomputing.com:443",
    "snowflake.user.name":"jane.smith",
    "snowflake.private.key":"xyz123",
    "snowflake.private.key.passphrase":"jkladu098jfd089adsq4r",
    "snowflake.database.name":"mydb",
    "snowflake.schema.name":"myschema",
    "key.converter":"org.apache.kafka.connect.storage.StringConverter",
    "value.converter":"com.snowflake.kafka.connector.records.SnowflakeAvroConverter",
    "value.converter.schema.registry.url":"http://localhost:8081",
    "value.converter.basic.auth.credentials.source":"USER_INFO",
    "value.converter.basic.auth.user.info":"jane.smith:MyStrongPassword"
  }
}

In case you’ve installed Kafka in the standalone mode, you can use the following lines of code to configure your Kafka connector:

connector.class=com.snowflake.kafka.connector.SnowflakeSinkConnector
tasks.max=8
topics=topic1,topic2
snowflake.topic2table.map= topic1:table1,topic2:table2
buffer.count.records=10000
buffer.flush.time=60
buffer.size.bytes=5000000
snowflake.url.name=myaccount.us-west-2.snowflakecomputing.com:443
snowflake.user.name=jane.smith
snowflake.private.key=xyz123
snowflake.private.key.passphrase=jkladu098jfd089adsq4r
snowflake.database.name=mydb
snowflake.schema.name=myschema
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=com.snowflake.kafka.connector.records.SnowflakeAvroConverter
value.converter.schema.registry.url=http://localhost:8081
value.converter.basic.auth.credentials.source=USER_INFO
value.converter.basic.auth.user.info=jane.smith:MyStrongPassword

Step 4: Enabling the Snowflake Kafka Connector

Once you’ve configured your Kafka connector, you can now enable the Kafka connector and start streaming your data from your Kafka topics to your desired Snowflake table.

To enable the Kafka connector, you can either make a curl-based POST request or use the Kafka directory commands.

To do this, you can use either of the following lines of code:

Using the Curl Command

curl -X POST -H "Content-Type: application/json" --data @<config_file>.json http://localhost:8083/connectors 

Using the Kafka Directory Commands

<kafka_dir>/bin/connect-standalone.sh <kafka_dir>/<path>/connect-standalone.properties <kafka_dir>/config/connect-standalone.properties

With your Snowflake Kafka connector now up and running, the data will start streaming from your desired Kafka topics to a Snowflake table of your choice.

This is how you can set up Snowflake Streaming using the in-built Kafka connector.

Conclusion

This article teaches you how to set up Snowflake Streaming with ease. It provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently. These methods, however, can be challenging especially for a beginner & this is where Hevo saves the day. 

visit our website to explore hevo

Hevo Data, a No-code Data Pipeline, helps you transfer data from a source of your choice in a fully-automated and secure manner without having to write the code repeatedly. Hevo, with its strong integration with 100+ sources & BI tools, allows you to not only export & load data but also transform & enrich your data & make it analysis-ready in a jiff.

Want to give Hevo a spin? Get started by sign up for a 14-day free trial and experience the feature-rich Hevo suite firsthand. You can also have a look at our unbeatable pricing that will help you choose the right plan for you!

Why don’t you share your experience of setting up Snowflake Streaming in the comments? We would love to hear from you!

Aman Sharma
Freelance Technical Content Writer, Hevo Data

Driven by a problem-solving approach and guided by analytical thinking, Aman loves to help data practitioners solve problems related to data integration and analysis through his extensively researched content pieces.

No-code Data Pipeline For Snowflake