Setting Up Snowflake Docker: 4 Easy Steps

on Big Data, Data Driven, Data Driven Strategies, Data Integration, Data Storage, Data Streaming, Data Warehouses, Kafka, Snowflake, Tutorials • September 28th, 2021 • Write for Hevo

Docker is a containerization engine used to build, ship, and run cross-platform applications on any machine. Snowflake provides connectors that allow you to interact with it from your local machine. Some of them include Python, SQL, Kafka Connect, etc.

Snowflake can be interacted with using Kafka Connector. Kafka itself is a framework for handling real-time data feeds. Kafka Connector is one of the available built-in connectors that can be used for pushing data into the Snowflake database.

In this article, you will be introduced to Snowflake Kafka Connector. In addition, you will learn how to set up a Docker container with Kafka for real-time data transmission with Docker. So, read along to understand more about Snowflake Docker.

Table of Contents

Introduction to Apache Kafka Connector

Snowflake Docker - Apache Kafka Logo
Image Source

Apache Kafka is a Messaging Queue tool that works in a Publish and a Subscribe manner. It is an event streaming platform that logs events in real-time. Kafka connect allows you to reliably stream data between Apache Kafka and other data systems. Working with Kafka Connector requires setting up Confluent.

Snowflake Docker - Confluent Logo
Image Source

Confluent is a stream processing platform. It is necessary to handle the data streaming between Snowflake and Docker. For any connection to take place we need to get started by setting up Docker on our local machine if it’s not yet set up.

To read more about Apache Kafka refer to this link.

In this next section, you will understand the steps to set up Snowflake Docker using Apache Kafka.

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline, helps load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ Data Sources including 30+ Free Sources. It is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo loads the data onto the desired Data Warehouse/destination like Snowflake and enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

GET STARTED WITH HEVO FOR FREE

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled securely and consistently with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.

Simplify your Data Analysis with Hevo today! 

SIGN UP HERE FOR A 14-DAY FREE TRIAL!

Steps to Connect Snowflake Docker using Kafka Connector

In this section, you will learn how you can set up the Snowflake Docker using Apache Kafka and Confluent. So, follow the steps below to successfully set up Snowflake Docker.

Step 1: Setup Docker on your Local Machine

Snowflake Docker - Docker Logo
Image Source

You can install Docker on your desired machine using the following links for the Operating System installed on your machine.

Once the installation is completed, test if Docker is running by typing the following command:

Ø  docker ps

Step 2: Set Up Confluent for Data Streaming

As mentioned earlier, Confluent is necessary to handle the Data Streaming between Snowflake and Docker. Snowflake provides two versions of Kafka to use: Confluent and the Open Source Software(OSS) Apache Kafka package

To get the Snowflake Docker connection in place, we need a config file containing relevant credentials for connection to your Snowflake database. So, follow the steps below:

  •  Go to this link to download the JAR file.

Note: The alternative to this is downloading from Maven central repository. The difference is that the package from the Confluent hub includes the dependencies required to encrypt and decrypt the Private Key used for Key Pair Authentication.

  • To install, use the Docker image of the needed connector from DockerHub. Example:
docker pull confluentinc/cp-Kafka-connect

Step 3: Configure the Snowflake Docker Kafka Connectors

Kafka Connect can be configured in 2 modes, mainly Distributed and Standalone modes.

Image Source

1) Distributed Mode

The Distributed mode is recommended as it distributes the workload/running connectors across clusters. If one node fails or leaves the cluster, the process can be transferred to another node. It is best for use during production and it gives room for scalability.

2) Standalone Mode

A Standalone mode, on the other hand, is used for development and testing on a local machine.

So enter the below commands to create a new directory and create a config file inside it.

Distributed mode:

Ø  mkdir <new dir>
Ø  cd <new dir>
Ø  vi sfdata.json

The config file is sfdata.json. Copy and fill the required fields with the connector config information. All in JSON format.

{
"name":"sfTest",
"config":{
   "connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector",
   "tasks.max":"1",
   "topics":"<Your Topic Name>",
   "snowflake.topic2table.map": "<Your Topic Name>:<Your Table Name>",
   "buffer.count.records":"1000",
   "buffer.size.bytes":"1048576",
   "snowflake.url.name":"ACCOUNT.snowflakecomputing.com",
   "snowflake.user.name":"<Your User Name>",
   "snowflake.private.key":"<Your private Key>",
   "snowflake.private.key.passphrase":"<Your Pass passphrase>",
   "snowflake.database.name":"<Your DB Name>",
   "snowflake.schema.name":"PUBLIC",
   "key.converter":"org.apache.kafka.connect.storage.StringConverter",
   "value.converter":"com.snowflake.kafka.connector.records.SnowflakeJsonConverter",
   "value.converter.schema.registry.url":"http://localhost:8081"",
   "value.converter.basic.auth.credentials.source":"USER_INFO",
   "value.converter.basic.auth.user.info":"<Username>:<password>"
}
}

Standalone mode:

Ø  mkdir  -p <new dir>/config
Ø  touch SF_connect.properties

The above command is going to create a config file inside a config folder. Note that the config file is not in json. The config file is shown below:

connector.class=com.snowflake.kafka.connector.SnowflakeSinkConnector
tasks.max=8
topics=<YOUR TOPICS>
snowflake.topic2table.map= <Your Topic Name>:<Your Table Name>
buffer.count.records=10000
buffer.flush.time=60
buffer.size.bytes=5000000
snowflake.url.name=ACCOUNT.snowflakecomputing.com:443
snowflake.user.name=<YOUR USERNAME>
snowflake.private.key=<YOUR PRIVATE KEY>
snowflake.private.key.passphrase=<YOUR PASSPHRASE>
snowflake.database.name=<YOUR DB_NAME>
snowflake.schema.name=”PUBLIC
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=com.snowflake.kafka.connector.records.SnowflakeAvroConverter
value.converter.schema.registry.url=http://localhost:8081
value.converter.basic.auth.credentials.source=USER_INFO
value.converter.basic.auth.user.info=<USERNAME>:<PASSWORD>

The required fields in all of these include:

  • connector.class: always constant as seen above.
  • Topics: A topic is like a category to which Kafka organizes messages. By default, Snowflake expects the topic name to be the same as the table name to which it is writing. If that is not the case, an optional topic2table.map parameter can be used to map topic name to table name.
  • Snowflake.url.name: The URL to access your snowflake account.
  • Snowflake.user.name: Your snowflake username.
  • Snowflake.private.key: Private key required to authenticate the user. If the private key is encrypted, make sure to put in the optional snowflake.private.key.passphrase to enable decryption. This should be used only if the private.key is encrypted.
  • Snowflake.database.name: Name of the database containing the tables to be filled.
  • Snowflake.schema.name: Name of the schema that contains the table to be filled.

With the configuration all set, the container can build the services.

Ø  docker-compose build
Ø  docker-compose up -d

All the services should be up and running. You can view them in the terminal with

Ø docker ps

Step 4: Start the Snowflake Docker Connectors

Services are up and spinning, you can go ahead and start the connectors.

Distributed mode:

curl -X POST -H "Content-Type: application/json" --data @<new dir>/sfTest.json http://localhost:8083/connectors

Standalone mode:

<new_dir>/bin/connect-standalone.sh <new_dir>/<path>/connect-standalone.properties <new_dir>/config/SF_connect.properties

Now, you can publish a sample set of data to your Kafka topic for testing. It should reflect on your Snowflake table after the insertion.

Bravo! You have successfully connected your Snowflake Docker using Apache Kafka Connector.

Conclusion

In this article, you learned the steps to set up the Snowflake Docker using Apache Kafka Connector and Confluent for Data Streaming. Connecting to Snowflake Docker can be done in several ways, but they all require Data Streaming. For the Data Streaming to be effective these connectors have to be robust enough to handle all queries. Using the Kafka Connector will help you keep a tab on every publishes made while managing the clusters with Confluent for the Snowflake Docker.

Moreover, extracting complex data from a diverse set of data sources can get quite challenging and cumbersome, so a simpler alternative like Hevo is the right solution for you! 

Hevo Data is a No-Code Data Pipeline that offers a faster way to move data from 100+ Data Sources including 30+ Free Sources, into your Data Warehouse such as Snowflake to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.

VISIT OUR WEBSITE TO EXPLORE HEVO

Want to take Hevo for a spin?

SIGN UP and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share your experience of setting up Snowflake Docker in the comments section below!

No-code Data Pipeline For Your Snowflake