Snowflake R/RStudio Integration: How to Connect & Analyze Data?

• February 4th, 2022

Snowflake R Feature Image

Businesses today are overflowing with data, and most of the companies are modernizing and moving into the Cloud. Snowflake is a Data Warehouse that has become an industry-leading Cloud-Based SaaS (Software-as-a-service) Data Platform. However, in order to analyze Snowflake data and create data visualizations, it is essential to establish connections between Snowflake and Statistical Computing Programming Languages like R. This article will help you establish a Snowflake R connection.

R is a free software environment and programming language for statistical computing and graphics. R is used by Data Miners and Statisticians for Data Analysis and developing graphical representations. Snowflake R integration allows users to use high-performance R statistical functions to analyze Snowflake data. It will also help you with a lot of Data Engineering tasks such as Data Wrangling, Data Processing, Data Visualization, and more. Before getting into the Snowflake R integration, let’s discuss this robust Data Warehouse in brief.

Table of Contents

Introduction to Snowflake

Snowflake R: Snowflake
Image Source: www.en.m.wikipedia.org

Snowflake is a Cloud Data Warehousing solution provided as a SaaS offering. It is built on Amazon Web Service, Microsoft Azure, or Google Cloud infrastructure that provides an unbounded platform for storing and retrieving data. Snowflake Data Warehouse uses a different proprietary SQL Database Engine with a unique architecture designed for the cloud.

The architecture of Snowflake separates its “Compute” and “Storage” units, thereby scaling differently. This allows the customers to use and pay for both services independently. It means organizations that have high storage demands but less need for CPU cycles, or vice versa, do not have to pay for an integrated bundle that requires payment for both, making it very attractive to companies. Like other popular Data Warehouses, it also uses Columnar Storage for parallel query execution.

With Snowflake, there is no hardware or software to select, install, configure, or manage, therefore, making it ideal for organizations that do not want to have dedicated resources for setup, maintenance, and support for in-house servers. Snowflake security and sharing functionalities make it easy for organizations to quickly share and secure data in real-time using any available ETL solution. Snowflake is known for its scalability and relative ease of use.

Simplify Snowflake ETL and Data Integration using Hevo’s No-code Data Pipeline

Hevo Data helps you directly transfer data from 100+ data sources (including 30+ free sources) to Snowflake, Business Intelligence tools, Data Warehouses, or a destination of your choice in a completely hassle-free & automated manner. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code.

Get started with hevo for free

Let’s look at some of the salient features of Hevo:

  • Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Scalable Infrastructure: Hevo has in-built integrations for 100’s of sources that can help you scale your data infrastructure as required.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within Data Pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-day free trial!

How to Connect R to Snowflake using ODBC Driver?

Snowflake is always seeking ways to improve its offerings and enhance its data sharing and analysis thereby, making it a Data Warehouse of choice. Snowflake provides a driver allowing ODBC-based client applications to connect to Snowflake.

The Snowflake R ODBC Driver is a powerful tool that allows you to connect the popular open-source R language with live Snowflake Data Warehouse. It allows you to access remote Snowflake data with a pure R script. Using R with Snowflake gives you finer control over your queries. Let’s discuss how you can connect to the Snowflake Data Warehouse using R to query the data.

Install R/RStudio

To establish a Snowflake R connection, you must install R/RStudio first. Follow the below-mentioned steps to install R on your Windows platform.

Snowflake R: Download R
Image Source: www.r-project.org
  • Select a CRAN location (a mirror site) and click the corresponding link.
  • Click on the “Download R for Windows” link located at the top.
  • Click on the “install R for the first time” link located at the top.
  • Click “Download R for Windows” to save and download the file.  
  • After downloading the file, open the .exe file and follow the installation instructions.

Now that R is installed, you need to download and install RStudio. Follow the below-mentioned steps to install RStudio on your Windows platform.

  • Go to www.rstudio.com and click on the “Download RStudio” button.
  • Click on “Download RStudio Desktop“.
  • Select the recommended version for your system and save the file.
  • After downloading the file, open the .exe file and follow the installation instructions. 

You can follow similar steps to install R on your macOS platform.

Downloading and Installing the ODBC Driver

You can download the installer for the latest version of the Snowflake ODBC Driver from the Snowflake Client Repository.

After downloading the appropriate installer for the ODBC Driver, double-click on the downloaded .msi file. The driver will be installed in C:Program Files.

Configuring the ODBC Driver

To configure the ODBC driver on a Windows platform, you need to create a DSN for the driver. To do so, follow the instructions.

  • The first step is to launch the Windows Data Source Administration Tool. You can run a search for “ODBC Data Sources” on your Windows platform.
Snowflake R: ODBC Data Source
Image Source: www.docs.snowflake.com
  • Click on “ODBC Data Sources” to launch the ODBC Data Source Administration Tool.
  • From the setup window, navigate to the “Drivers” tab and make sure that the Snowflake ODBC Driver (SnowflakeDSIIDriver) is installed.
Snowflake R: ODBC Driver
Image Source: www.docs.snowflake.com
  • Now, to create a new DSN, navigate to the “User DSN” or “System DSN” tab and click on the “Add” button.
Snowflake R: Create DSN
Image Source: www.docs.snowflake.com
  • From the list of installed drivers, you can select “SnowflakeDSIIDriver”.
  • Now, you need to specify the connection parameters for the ODBC Driver. You can enter the parameters for the DSN in the fields provided in the “Snowflake Configuration” dialog box.
Snowflake R: Parameters
Image Source: www.docs.snowflake.com
  • To create a DSN, only “Data Source”, “User” and “Server” parameters are required. Apart from these, all other parameters in the dialog box are optional. If you’re using a proxy, you also need to specify the proxy-related parameters. You can refer to the ODBC Configuration and Connection Parameters guide for detailed information on ODBC Data Source parameters.
  • After specifying the required parameters, click on “OK” to create the DSN.

Similarly, you can refer to the official guide in order to configure the ODBC Driver on macOS and Linux.

Connect to Snowflake from R

After setting up the connection you will be able to connect to Snowflake data in R. You can use the below sample code on RStudio to establish the Snowflake R connection.

install.packages(c("DBI", "dplyr","dbplyr","odbc"))
library(DBI)
library(dplyr)
library(dbplyr)
library(odbc)
myconn <- DBI::dbConnect(odbc::odbc(), "SNOWFLAKE_DSN_NAME", uid="USERNAME", pwd='Snowflak123')
mydata <- DBI::dbGetQuery(myconn,"SELECT * FROM Products")
head(mydata)

Make sure to replace “SNOWFLAKE_DSN_NAME”, “uid”, and “pwd” with the values you entered while configuring the ODBC Driver. A sample connection object called “myconn” is created to feed the queries. 

Data Analysis with Snowflake R

You can now use the Snowflake R ODBC driver to execute SQL queries on Snowflake data and visualize Snowflake data in R. Follow the below-mentioned examples to get started.

Schema Discovery

The ODBC Driver models Snowflake APIs as Relational Tables, Views, and Stored Procedures. You can use the following command to retrieve the list of tables.

sqlTables(myconn)

Execute SQL Queries

You can use the “sqlQuery” function in order to execute any SQL query supported by the Snowflake API.

products <- sqlQuery(myconn, "SELECT Id, ProductName FROM Products", believeNRows=FALSE, rows_at_time=1)

You can also use the “view” command to view the results in a data viewer window.

View(products)

Plot Snowflake Data

As discussed, Snowflake R integration lets you visualize Snowflake Data with graphical representations in R. You can analyze Snowflake data with any of the Data Visualization packages available in the CRAN repository. You can use the built-in “barplot” function to create simple bar plots.

par(las=2,ps=10,mar=c(5,15,4,2))
barplot(products$ProductName, main="Snowflake Products", names.arg = products$Id, horiz=TRUE)
Snowflake R: Plot
Image Source: www.cdata.com

Conclusion

Snowflake is a virtual Data Warehouse that has become an industry-leading Cloud-Based SaaS Data Platform. There are so many ways to interact with Snowflake, and using R to interact with Snowflake gives you extended functionality for statistical computing and graphics.

This article provided you with a step-by-step guide to establish a Snowflake R connection for enhanced Data Analysis and Visualization. You can leverage the Snowflake R integration for a variety of tasks such as Data Cleaning, Data Analysis, Data Reporting, etc.

Snowflake supports many more data sources and allows loading data from other traditional and Cloud-based applications, SaaSs, CRMs, BI tools, etc. You may require a Data Integration tool to do extract and load data from multiple sources into Snowflake, and this is where Hevo comes in.

visit our website to explore hevo

Hevo Data, with its strong integration with 100+ Sources & BI tools, allows you to not only export data from sources & load data in the destinations such as Snowflake, but also transform & enrich your data, & make it analysis-ready so that you can focus only on your key business needs and perform insightful analysis using BI tools. In short, Hevo can help you store your data securely in Snowflake.

Give Hevo Data a try and sign up for a 14-day free trial today. Hevo offers plans & pricing for different use cases and business needs!

Share your experience of working with the Snowflake R connector in the comments section below.

No-code Data Pipeline for Snowflake