• Nowadays, Machine Learning plays a vital role in making calculated decisions and predictions on data. Terabytes of data are generated daily and stored in Data Warehouses and Data Lakes to feed it to Business Intelligence tools, run Data Analytics, and create Machine Learning models.
  • Companies tend to hire developers and Data Analysts with expertise in Python or R to handle Machine Learning models tasks this restricts the SQL experts to leverage Machine Learning in their analysis.
  • It is a toolset of the Google BigQuery Data Warehouse that allows SQL experts to train and execute Machine Learning models. So, if you don’t know any Machine Learning supported language or you are a Data Analyst with only expertise in SQL. You can also build Machine Learning models for your data with the help of Google BigQuery ML. 
  • It allows SQL users to utilize their skills without learning any other language to use Machine Learning in their daily tasks and eliminates any dependency on other team members.

In this article, you will learn about Google BigQuery ML and its features. You will also read about different Machine Learning models supported in it. 

Introduction to Google BigQuery ML

It is a new feature of Google BigQuery that is currently in the beta phase. It is a set of tools and extensions that enables users to create, train and execute Machine Learning models in Google BigQuery using standard SQL queries.

It eliminates the need for data movement by allowing users to directly create Machine Learning models into the Data Warehouse. Google BigQuery Machine Learning (BQML) makes Data Analytics easy as it doesn’t require Data Analysts to learn other programming languages like Python or R to train Machine Learning models.

It supports many Machine Learning models, and users can train, test, evaluate and deploy models in Google BigQuery itself. users can use Google it in the following ways listed below:

  • The Google Cloud Console.
  • By using the “bq” command-line tool.
  • One can access it through BigQuery REST API.
  • Users can use external Bussiness Intelligence (BI) tools and Jupyter Notebook. 

Key Features of Google BigQuery ML

Google BigQuery ML allows SQL practitioners to train Machine Learning models without learning any Machine Learning supported languages like Python or R. It comes with many other features. A few of them are listed below:

  • Automatically Generate ML Models: Users who don’t have expertise in choosing the right Machine learning model for their data set. It uses AutoML to provide users a user-friendly graphical interface and allows you to choose the best Machine Learning model for your requirements.
  • Eliminates Data Transfer: Before Google BigQuery ML, users needed to move data from Google BigQuery to other platforms or isolated environments to train Machine Learning models. It consumes unnecessary time for large datasets. Now, users can train and execute models directly from Google BigQuery.
  • Encrypted Models: It allows users to encrypt Machine Learning models with the Customer-Managed Encryption Keys (CMEK).
Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Get Started with Hevo for Free

Different Models Supported in Google BigQuery ML

Various Machine Learning models are used to train data and predict them based on the datasets fed to them. It ensures the availability of Machine Learning models to train any kind of dataset. All the Machine Learning models supported by Google BigQuery ML are listed below:

  • Linear Regression: Linear Regression is a fundamental machine learning model used for making predictions based on statistical calculations. It’s utilized for forecasting trends using real-valued labels.
  • Logistic Regression: Unlike Linear Regression, Logistic Regression is employed for classification tasks where the output is categorical (e.g., YES or NO). It supports two types:
  • Binary Logistic Regression: Used when there are only two possible outcomes (e.g., 1 or 0).
  • Multiclass Logistic Regression: Supports multiple outcomes, enabling classification into more than two categories using a multinomial classifier and cross-entropy loss function.
  • K-Means Clustering: This model segments data into clusters based on similarities without predefined labels. It’s implemented using the KMEANS model type within the CREATE MODEL statement.
  • Matrix Factorization: Designed for building recommendation systems using customer behavior and product purchase data. It enables users to create these systems efficiently using SQL.
  • Time Series: Ideal for forecasting future values based on historical time-stamped data. It simplifies handling data anomalies such as holidays and seasonality. It supports creating, testing, and deploying various time series models for accurate forecasting.
  • These models empower users to solve diverse real-world problems efficiently within the Google BigQuery ML environment, leveraging SQL for seamless integration with existing data workflows.

Steps to Create Models using Google BigQuery ML

Now that you have learned about Google BigQuery ML. In this section, you will learn the steps to create a simple Logistic Regression model in Google BigQuery ML and make learn to make predictions from a dataset. The steps to create a simple model are listed below:

Step 1: Setting Up the Environment

  • Sign in to your Google Cloud Platform account.
  • Now, create a New Project and fill out all the required details for billing.
  • Now, go to Google Developer Console and click on the sidebar menu.
  • Now click on the “BigQuery” option, and it will open up the Google BigQuery Console in the next tab, shown in the image below.
Sidebar Menu of Google Cloud Platform to Open Google BigQuery
Sidebar Menu of Google Cloud Platform to Open Google BigQuery
  • The project that you provided earlier will be reflected in the menu on the left side of the screen.
  • Under the project name, click on the next arrow button and then select the “Create New Dataset” option, shown in the image below.
Creating New Dataset for Google BigQuery ML
Creating New Dataset
  • Provide a name to your dataset as bqml_codelab. If you want to give a different name, then make sure to replace the name with code in the process.

Step 2: Creating Machine Learning Model

  • Now, open Google Cloud Console.
  • In this example, you will learn about creating a model to predict whether a visitor will make a transaction. The code is given below.
#standardSQL
CREATE OR REPLACE MODEL `bqml_codelab.sample_model` 
OPTIONS(model_type='logistic_reg') AS
SELECT
  IF(totals.transactions IS NULL, 0, 1) AS label,
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(geoNetwork.country, "") AS country,
  IFNULL(totals.pageviews, 0) AS pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160801' AND '20170631'
LIMIT 100000;
  • In the following code, the name of the dataset is “bqml_codelab” and “sample_model” is the name of the model. The model used here is Binary Logistic Regression.

Step 3: Evaluating the Model

  • Now the following code will return the evaluation metrics for the model. The metrics are “precision, “recall”, “accuracy”, “f1_score”, “log_loss”, “roc_auc”. The code is given below.
#standardSQL
SELECT
  *
FROM
  ml.EVALUATE(MODEL `bqml_codelab.sample_model`, (
SELECT
  IF(totals.transactions IS NULL, 0, 1) AS label,
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(geoNetwork.country, "") AS country,
  IFNULL(totals.pageviews, 0) AS pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'));

Step 4: Predicting Using the Model

  • Now, the model will predict the number of transactions made by visitors for each country and the code will sort the result to the top 10 countries. The code is given below.
#standardSQL
SELECT
  country,
  SUM(predicted_label) as total_predicted_purchases
FROM
  ml.PREDICT(MODEL `bqml_codelab.sample_model`, (
SELECT
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(totals.pageviews, 0) AS pageviews,
  IFNULL(geoNetwork.country, "") AS country
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY country
ORDER BY total_predicted_purchases DESC
LIMIT 10;
  • The output is shown in the image below.
Output Table of Prediction Using BigQuery ML
Output Table of Prediction
  • The difference between the previous code and this block is that previously it used “m1.EVALUATE” and now it’s “m1.PREDICT”. The other difference between both the codes is the use of the GROUP BY command to group country columns, sorting the results and limiting it to 10 only.

That’s it! You have created a Logistic Regression model, evaluated it, and used the model to make predictions. You can try the same example for the Linear Regression model and compare the accuracies of both models.

Benefits of Google BigQuery ML

Google BigQuery ML is fast to create, evaluate and execute various Machine Learning models easily using standard SQL queries. It offers many benefits over other Cloud Data Warehouses. Some of the benefits are listed below.

  • It eliminates the need to know Python or any other language for managing Machine Learning models. Data Analysts with expertise in SQL only can now train models and make predictions.
  • The data export involves many steps, and it’s a time-consuming process. It saves time and resources by letting users use Machine Learning models in Google BigQuery.
  • It allows users to run Machine Learning models on large datasets within minutes as it uses computation resources of Google BigQuery Data Warehouse.
  • It features some automated Machine Learning models that reduce the workload to manipulate data manually. It saves time and allows users to quickly train and test models on the dataset.

Conclusion

In this article, you learnt about Google BigQuery ML, and how easy it is to make predictions using SQL. You also read about different Machine Learning models that Google BigQuery ML supports. It eliminates the restriction Data Analysts face for using Machine Learning models in their analysis. Companies can deploy models directly on their datasets to make better business decisions. 

Google BigQuery Data Warehouse loads data from multiple data sources, and it becomes a tedious task for manually managing all the ETL processes. Hevo Data is a No-code Data pipeline solution that can help you transfer data from multiple data sources to Google BigQuery. It fully automates the process of loading and transforming data from 150+ data sources to your desired Data Warehouse without writing a single line of code.

Want to take Hevo for a spin? Sign Up or a 14-day free trial and experience the feature-rich Hevo suite firsthand. Also checkout our unbeatable pricing to choose the best plan for your organization.

Share your experience of learning in the comments section below!

Suraj Poddar
Principal Frontend Engineer, Hevo Data

Suraj has over a decade of experience in the tech industry, with a significant focus on architecting and developing scalable front-end solutions. As a Principal Frontend Engineer at Hevo, he has played a key role in building core frontend modules, driving innovation, and contributing to the open-source community. Suraj's expertise includes creating reusable UI libraries, collaborating across teams, and enhancing user experience and interface design.

No-code Data Pipeline For your Google BigQuery