Snowflake is a Data Warehouse that has become an industry-leading Cloud-Based SaaS (Software-as-a-service) Data Platform.

  1. However, in order to analyze Snowflake data and create data visualizations, it is essential to establish connections between Snowflake and Statistical Computing Programming Languages like R.
  2. R is a free software environment and programming language for statistical computing and graphics. R is used by Data Miners and Statisticians for Data Analysis and developing graphical representations.
  3. Snowflake R integration allows users to use high-performance R statistical functions to analyze Snowflake data.

Introduction to Snowflake

  • Snowflake is a Cloud Data Warehousing solution provided as a SaaS offering. It is built on Amazon Web Service, Microsoft Azure, or Google Cloud infrastructure that provides an unbounded platform for storing and retrieving data.
  • Snowflake Data Warehouse uses a different proprietary SQL Database Engine with a unique architecture designed for the cloud.

How to Connect R to Snowflake using ODBC Driver?

  • Step 1: Install R/RStudio
  • Step 2: Downloading and Installing the ODBC Driver
  • Step 3: Configuring the ODBC Driver
  • Step 4: Connect to Snowflake from R

Step 1 – Install R/RStudio

To establish a Snowflake R connection, you must install R/RStudio first. Follow the below-mentioned steps to install R on your Windows platform.

  • Go to www.r-project.org and click on the “download R” link.

Pro Tip: Build Your Single Source of Truth with Snowflake in Minutes using Snowflake ETL

Integrate Oracle to Snowflake
Integrate PostgreSQL to Snowflake
Integrate MongoDB to Snowflake
Integrate Salesforce to Snowflake
Snowflake R: Download R
  • Select a CRAN location (a mirror site) and click the corresponding link.
  • Click on the “Download R for Windows” link located at the top.
  • Click on the “install R for the first time” link located at the top.
  • Click “Download R for Windows” to save and download the file.  
  • After downloading the file, open the .exe file and follow the installation instructions.

Now that R is installed, you need to download and install RStudio. Follow the below-mentioned steps to install RStudio on your Windows platform.

  • Go to www.rstudio.com and click on the “Download RStudio” button.
  • Click on “Download RStudio Desktop”.
  • Select the recommended version for your system and save the file.
  • After downloading the file, open the .exe file and follow the installation instructions. 

Step 2 – Downloading and Installing the ODBC Driver

You can download the installer for the latest version of the Snowflake ODBC Driver from the Snowflake Client Repository.

After downloading the appropriate installer for the ODBC Driver, double-click on the downloaded .msi file. The driver will be installed in C:Program Files.

Step 3 – Configuring the ODBC Driver

To configure the ODBC driver on a Windows platform, you need to create a DSN for the driver. To do so, follow the instructions.

  • The first step is to launch the Windows Data Source Administration Tool. You can run a search for “ODBC Data Sources” on your Windows platform.
Snowflake R: ODBC Data Source
  • Click on “ODBC Data Sources” to launch the ODBC Data Source Administration Tool.
  • From the setup window, navigate to the “Drivers” tab and make sure that the Snowflake ODBC Driver (SnowflakeDSIIDriver) is installed.
Snowflake R: ODBC Driver
  • Now, to create a new DSN, navigate to the “User DSN” or “System DSN” tab and click on the “Add” button.
Snowflake R: Create DSN
  • From the list of installed drivers, you can select “SnowflakeDSIIDriver”.
  • Now, you need to specify the connection parameters for the ODBC Driver. You can enter the parameters for the DSN in the fields provided in the “Snowflake Configuration” dialog box.
Snowflake R: Parameters
  • To create a DSN, only “Data Source”, “User” and “Server” parameters are required. Apart from these, all other parameters in the dialog box are optional. If you’re using a proxy, you also need to specify the proxy-related parameters. You can refer to the ODBC Configuration and Connection Parameters guide for detailed information on ODBC Data Source parameters.
  • After specifying the required parameters, click on “OK” to create the DSN.

Similarly, you can refer to the official guide in order to configure the ODBC Driver on macOS and Linux.

Step 4 – Connect to Snowflake from R

After setting up the connection you will be able to connect to Snowflake data in R. You can use the below sample code on RStudio to establish the Snowflake R connection.

install.packages(c("DBI", "dplyr","dbplyr","odbc"))
library(DBI)
library(dplyr)
library(dbplyr)
library(odbc)
myconn <- DBI::dbConnect(odbc::odbc(), "SNOWFLAKE_DSN_NAME", uid="USERNAME", pwd='Snowflak123')
mydata <- DBI::dbGetQuery(myconn,"SELECT * FROM Products")
head(mydata)

Make sure to replace “SNOWFLAKE_DSN_NAME”, “uid”, and “pwd” with the values you entered while configuring the ODBC Driver. A sample connection object called “myconn” is created to feed the queries. 

Data Analysis with Snowflake R

1. Schema Discovery

The ODBC Driver models Snowflake APIs as Relational Tables, Views, and Stored Procedures. You can use the following command to retrieve the list of tables.

sqlTables(myconn)

2. Execute SQL Queries

You can use the “sqlQuery” function in order to execute any SQL query supported by the Snowflake API.

products <- sqlQuery(myconn, "SELECT Id, ProductName FROM Products", believeNRows=FALSE, rows_at_time=1)

You can also use the “view” command to view the results in a data viewer window.

View(products)

3. Plot Snowflake Data

As discussed, Snowflake R integration lets you visualize Snowflake Data with graphical representations in R. You can analyze Snowflake data with any of the Data Visualization packages available in the CRAN repository. You can use the built-in “barplot” function to create simple bar plots.

par(las=2,ps=10,mar=c(5,15,4,2))
barplot(products$ProductName, main="Snowflake Products", names.arg = products$Id, horiz=TRUE)
Snowflake R: Plot

Conclusion

This article provided you with a step-by-step guide to establish a Snowflake R connection for enhanced Data Analysis and Visualization.

You can leverage the Snowflake R integration for a variety of tasks such as Data Cleaning, Data Analysis, Data Reporting, etc.

FAQ on Connect Snowflake R/RStudio Integration

Can you use R for data mining?

Yes, you can use R for data mining and a wide range of data analysis tasks.

Can we write function in Snowflake?

Yes, you can write user-defined functions (UDFs) in Snowflake using JavaScript or SQL.

What is the best file format for Snowflake?

In Snowflake, the choice of file format depends on your specific use case, data characteristics, and performance requirements.

Can you use R on Snowflake?

Yes, you can use R with Snowflake for data analysis, data mining, and other statistical computations.

Is Snowflake a tool or database?

Snowflake is more than just a traditional database or tool; it is a comprehensive data platform designed to meet the evolving needs of modern data analytics.

Raj Verma
Business Analyst, Hevo Data

Raj, a data analyst with a knack for storytelling, empowers businesses with actionable insights. His experience, from Research Analyst at Hevo to Senior Executive at Disney+ Hotstar, translates complex marketing data into strategies that drive growth. Raj's Master's degree in Design Engineering fuels his problem-solving approach to data analysis.

No-code Data Pipeline for Snowflake