BigQuery R: Querying and Analyzing Data Simplified 101

on BigQuery Functions, Data Analytics, Data Warehouse, Google BigQuery • December 14th, 2021 • Write for Hevo

Bigquery R - Featured Image

You’ve just decided to use Google BigQuery to store, query & analyze your large datasets. But you’re confused about which is the best programming language for this Data Warehouse to run an effective analysis. If this is the question you have on your mind, then you’re at the right place! R is one of the best Programming Languages that you can leverage for Statistical Analysis. There is a ready-made Google Bigquery R package that reduces the complications involved in analyzing Big Data.

In this article, you’ll learn about Google BigQuery, its key features and gain an in-depth knowledge of the R Programming Language & how you can use it with Google BigQuery. Continue reading to discover how to use the Google BigQuery R package to analyze your datasets!

Table of Contents

Prerequisite

To analyze data that resides in Google BigQuery as part of the R workflow you need a basic knowledge of R programming language and a Google Account to use the Google BigQuery Console. The bigrquery R package helps simplify and streamline the analysis process while leveraging the R language.

If you have not set up a Google account yet, you can create one with this link.

After creating a Google account, you now need to head to Google Cloud Console

Create a NEW PROJECT in the Google Cloud Platform.

Google Cloud Platform - New Project

You’ll see the option to create a new project on the left side of the Google Cloud top navigation bar. Click the menu present to the right of “Google Cloud Platform” (it may say “select a project” if you do not have any already). Name your project. If billing is already enabled in your Google Account, you’ll need to choose a payments account; if you don’t, it probably won’t appear as an option. You may then click “Create”.

Aside tip: Pin BigQuery is: Easy way to find 

After finishing the setup of the new Project, you might feel like the Google Cloud Platform dashboard features a larger number of options available, but don’t worry, you will find the BigQuery option on the left side navigation menu(if the navbar is not present, click on hamburger button present on the left-hand side).

Google Cloud Platform - Pin BigQuery

One way to easily find the BigQuery option in the list is to “pin” it. This will ensure it appears top of the navigation menu.

What is Google BigQuery?

 BigQuery R - Google BigQuery Logo
Image Source

BigQuery is a Google-owned large Data Warehouse that allows programmers to analyze massive datasets on the Google Cloud. With Google BigQuery, programmers do not have to install any software before they can query Big Data. 

Many software developers enjoy using this service because they don’t need to understand the underlying architecture before running data on the system. This is because Google BigQuery offers simple interfaces through which newbie programmers can query their Big Data. 

Other key benefits of Google BigQuery are:

  • Affordable Pricing: Many Analytics Software services offer Flat Storage and Query Rates to users. However, Google BiQuery only charges programmers based on the number of queries they make and the volume of data they store. The service only increases a user’s charges when they request expansion.
  • User-friendly Storage Engine: Google BigQuery’s storage engine allows programmers to develop and optimize their storage as they deem fit.
  • Large Processing Capacity: Google BigQuery can process millions of data rows at a single time.
  • Data Sharing: Once you store your data on Google BigQuery, you can share it with other organizations. In such cases, the system will only charge you for the storage while the recipients pay for the queries they make with your data. 
  • Effective Scheduling: Google BigQuery contains the Dremel Execution Engine, which lets you schedule data analysis and execute pipelines.

What is R?

 BigQuery R - R Programming Logo
Image Source

R is a programming language that enables efficient Statistical Analysis and Graphical Computation. Due to the vast supply of data collection and programming tools on its network, the R Programming Language is very popular in the Software Development world. 

That said, there are particular benefits that R offers which make the programming language suitable for analyzing Big Data with Google BigQuery. Some of these advantages are:

  • Big Data Libraries: Certain R libraries, like Dplyr, lets users analyze large data sets.
  • Database Interaction: R supports interaction with databases such as R-MySQL and Open Database.
  • Cross-platform: It works perfectly on various operating systems.
  • Multi-Format Support: R can analyze a variety of data forms, including vectors and matrices.
  • Open-source: Big Data Analysts have several costs to cover. Since R is free to use, analyzing your data with the programming language will help you save money.
  • Timely: Since R is an interpreted language, you don’t need a compiler to run data when using it. Once you input a code, R will interpret it automatically. This reduces the time spent on basic tasks.
  • Educative: If you analyze Big Data regularly, you probably work in a competitive organization. R has an active network of developers who are willing to help beginners. This network offers knowledge updates and trends in the programming world.

Simplify Google BigQuery ETL & Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 100+ Different Sources (including 40+ free sources) to a Data Warehouse such as Google BigQuery or Destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Get Started with Hevo for free

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, Databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Firebolt, Snowflake Data Warehouses; Amazon S3, Databricks Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.  
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (including 40+ free sources) that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-day Free Trial!

Two methods for connecting R with BigQuery

Connecting R with BigQuery has two methods
1. odbc driver
2. bigrquery package 

There are two methods to connect to Google BigQuery: 

  • Use the odbc package with the database driver.  
  • Use the bigrquery package.

In this article, you will learn how to use the bigrquery package to connect R with BigQuery. To leverage the odbc package for connecting BigQuery and R you can click here.

How to Connect to BigQuery dataset in R?

We will be using the bigrquery package to connect the BigQuery Data set with R. But before connecting, you need to install the bigrquery package.

bigrquery works as a database interface for R. 

Command to install from CRAN:

install.packages("bigrquery")

The development version: To install the development version from GitHub:

# install.packages("remotes")
remotes::install_github("r-dbi/DBI")
remotes::install_github("r-dbi/bigrquery")

Now to query a BigQuery dataset with R using the bigrquery package, you need to establish a connection with the dataset. To accomplish this, follow the syntax given below.

Now to query a BigQuery dataset with R using the bigrquery package, you need to establish a connection with the dataset. To accomplish this, follow the syntax given below.

library(bigrquery) con <- dbConnect(
  bigquery(),  
  project = project_id_containing_the_data,
  dataset = database_name
  billing = your_project_id_with_the_billing_source
)

Here,  bigquery()is a function from the bigrquery package, signifying dbConnect that you want to connect to a BigQuery data source. The other arguments are quite self-explanatory outlining the project ID, dataset name, and billing project ID.

How to Query a BigQuery table in R?

To query one particular BigQuery table in R, you can use dplyr’s tbl() function. The tbl() function creates a table object that references the table.

For example, Command given below will create a brand new table called “schedules” using the created connection to the data set.

skeds <- tbl(con, "schedules")

After a one-time initial setup, parsing BigQuery data in R is as easy as running dplyr code on a local dataframe. Just keep in mind your query cost. If you run dozens of queries on a 10GB data set, you won’t reach your free 1TB monthly quota. But if you work on larger datasets on a daily basis, you should find a way to streamline your code.

How to use the Google BigQuery R Package?

Having understood the basics of Google BigQuery and R, you’re ready to learn how to combine both services when analyzing big data.

Google Cloud offers a BigQuery Package that enables developers to use the R Programming Language with Google BigQuery effectively. This Google BigQuery R package works by letting users transfer data from Google BigQuery into an R Tibble, where the Google BigQuery R package enables users to analyze the data. 

Before you can use the Google Bigrquery R package, you must take the following steps:

1) Load your Project

  • Log in to your Google Cloud account. You can also click here to reach the login page.
  • Go to the left navigation bar, and choose ‘Select Project’ to access the project that contains the data you want to analyze. The system will load all the data in the project.
BigQuery R - Loading Project into Google Cloud
Image Source

2) Creating an R-framework User-managed Notebook Instance

  • Select ‘Dashboard’ at the top of your Google Cloud Homepage. 
  • Click on the sidebar on the dashboard page and select ‘Notebooks’.
  • Once you’ve loaded the Notebooks page, find the ‘User-managed notebooks’ tab.
  • Select an instance.
  • Next, choose ‘Open JupyterLab’.
  • On the JupyterLab page, click on ‘File’, select ‘New’, then choose ‘Notebook’ to create a new notebook within the instance.
  • Finally, click on the R Kernel to establish the new notebook instance under an R framework.

3) Activate the Bigrquery Package

Here’s how to run the BigQuery R package on your Google Cloud account:

  • Go to the first code cell of the new notebook file and type:
# Load the package
 library(bigrquery)
  • Select the ‘Run’ symbol to run the code in the cell.
BigQuery - The Run the selected cells and advance button.
Image Source
  • R will load the Bigrquery package. 

Load Data from Google BigQuery into R

Once you’ve activated the Bigrquery R package, you can now load your Big Data from Google BigQuery into an R table.

Follow these steps to move your Google BigQuery data into R:

  • Go to the notebook’s Home tab and select ‘+’ to add a new code cell.
  • Enter the following query in the new code cell:
# Store the project id
projectid = "PROJECT_ID"

# Set your query
sql <- "SELECT * FROM `bigquery-public-data.usa_names.usa_1910_current` LIMIT 10"

# Run the query and store the data in a tibble
tb <- bq_project_query(projectid, sql)

# Print 10 rows of the data
bq_table_download(tb, max_results = 10)
  • In the part where ‘PROJECT ID is quoted, input the title of your project.
  • Run the code in the cell to start using the BigQuery R package.
  • To analyze a BigQuery table in R, enter the following in a code cell.
keds <- tblr (con, “table name”)

This is how you can use Google BigQuery R Package to start analyzing data with ease!

BigQuery Pricing Model

With BigQuery you have to pay for both data storage and data queries. You can leverage 10 GB of free data storage space per month in BigQuery. The data storage rate will not change with the number of people using that data i.e if you create and store your data in BigQuery and pay – the charges for paying will remain the same if only you are using it, or you share it with a group of people or make it public. 

The price charged for a query is based on how much data is processed by the query not how much data is returned by it. It is crucial to note that if your query returns only the top 5 results after analyzing a 10 GB data set, the query will still use 10 GB of your data analysis quota, not just the small amount related to your 5 rows of results.

Google recommends avoiding executing the SELECT * command, which iterates through all available columns. Instead, select only the specific columns that you need to reduce data to process.

Advantages of using bigrquery package

The bigrquery package makes working with data stored in Google BigQuery easier by allowing you to query BigQuery tables and retrieve metadata associated with your projects, datasets, tables, and tasks. Along with BigQuery, the bigrquery package comes with three levels of abstraction: 

  •  The low-level API provides thin wrappers on top of the underlying REST API. All low-level functions start with” bq_ “ and most have the form “bq_noun_verb()”. This level of abstraction is most appropriate if you are familiar with REST APIs and want to do something that is not supported in the higher-level APIs. 
  •  The DBI interface encapsulates the low-level API and allows working with BigQuery as with any other database system. This is the most convenient class if you want to run SQL queries in BigQuery or import smaller amounts of data (i.e. <100MB). 
  • The dplyr interface allows you to treat BigQuery tables as if they were blocks of data in memory. This is the most convenient class if you don’t want to write SQL but instead want dbplyr to write it for you.

Conclusion

The BigQuery R is a helpful tool for programmers who want to enjoy a variety of functionalities while analyzing Big Data. This article provided you with directions to load your Bigrquery package, and analyze large datasets effortlessly.

With your Google BigQuery Data Warehouse live and running, you’ll need to extract data from multiple platforms to carry out your analysis. However, integrating and analyzing your data from a diverse set of data sources can be challenging and this is where Hevo Data comes into the picture.

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations such as Google BigQuery, with a few clicks. Hevo Data with its strong integration with 100+ Data Sources (including 40+ Free Sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!

Share your experience of learning about using the BigQuery R package. Let us know in the comments section below!

No-code Data Pipeline for Google BigQuery