- Nowadays, Machine Learning plays a vital role in making calculated decisions and predictions on data. Terabytes of data are generated daily and stored in Data Warehouses and Data Lakes to feed it to Business Intelligence tools, run Data Analytics, and create Machine Learning models.
- Companies tend to hire developers and Data Analysts with expertise in Python or R to handle Machine Learning models tasks this restricts the SQL experts to leverage Machine Learning in their analysis.
- It is a toolset of the Google BigQuery Data Warehouse that allows SQL experts to train and execute Machine Learning models. So, if you don’t know any Machine Learning supported language or you are a Data Analyst with only expertise in SQL. You can also build Machine Learning models for your data with the help of Google BigQuery ML.
- It allows SQL users to utilize their skills without learning any other language to use Machine Learning in their daily tasks and eliminates any dependency on other team members.
In this article, you will learn about Google BigQuery ML and its features. You will also read about different Machine Learning models supported in it.
Introduction to Google BigQuery ML
It is a new feature of Google BigQuery. It is a set of tools and extensions that enables users to create, train and execute Machine Learning models in Google BigQuery using standard SQL queries.
It eliminates the need for data movement by allowing users to directly create Machine Learning models into the Data Warehouse. Google BigQuery Machine Learning (BQML) makes Data Analytics easy as it doesn’t require Data Analysts to learn other programming languages like Python or R to train Machine Learning models.
It supports many Machine Learning models, and users can train, test, evaluate and deploy models in Google BigQuery itself. users can use Google it in the following ways listed below:
- The Google Cloud Console.
- By using the “bq” command-line tool.
- One can access it through BigQuery REST API.
- Users can use external Bussiness Intelligence (BI) tools and Jupyter Notebook.
Key Features of Google BigQuery ML
Google BigQuery ML allows SQL practitioners to train Machine Learning models without learning any Machine Learning supported languages like Python or R. It comes with many other features. A few of them are listed below:
- Automatically Generate ML Models: Users who don’t have expertise in choosing the right Machine learning model for their data set. It uses AutoML to provide users a user-friendly graphical interface and allows you to choose the best Machine Learning model for your requirements.
- Eliminates Data Transfer: Before Google BigQuery ML, users needed to move data from Google BigQuery to other platforms or isolated environments to train Machine Learning models. It consumes unnecessary time for large datasets. Now, users can train and execute models directly from Google BigQuery.
- Encrypted Models: It allows users to encrypt Machine Learning models with the Customer-Managed Encryption Keys (CMEK).
Hevo is a fully managed, no-code data pipeline platform that effortlessly integrates data from more than 150 sources into a data warehouse such as BigQuery. With its minimal learning curve, Hevo can be set up in just a few minutes. Its features include:
- Connectors: Hevo supports 150+ integrations to SaaS platforms, files, Databases, analytics, and BI tools. It supports various destinations, including Google BigQuery, Amazon Redshift, and Snowflake.
- Transformations: A simple Python-based drag-and-drop data transformation technique that allows you to transform your data for analysis.
- Schema Management: Hevo eliminates the tedious task of schema management. It automatically detects the schema of incoming data and maps it to the destination schema.
- 24/7 Live Support: The Hevo team is available 24/7 to provide exceptional support through chat, email, and support calls.
Try Hevo today to experience seamless data transformation and migration.
Get Started with Hevo for Free
Different Models Supported in Google BigQuery ML
Various Machine Learning models are used to train data and predict them based on the datasets fed to them. It ensures the availability of Machine Learning models to train any kind of dataset. All the Machine Learning models supported by Google BigQuery ML are listed below:
- Linear Regression: Linear Regression is a fundamental machine learning model used for making predictions based on statistical calculations. It’s utilized for forecasting trends using real-valued labels.
- Logistic Regression: Unlike Linear Regression, Logistic Regression is employed for classification tasks where the output is categorical (e.g., YES or NO). It supports two types:
- Binary Logistic Regression: Used when there are only two possible outcomes (e.g., 1 or 0).
- Multiclass Logistic Regression: Supports multiple outcomes, enabling classification into more than two categories using a multinomial classifier and cross-entropy loss function.
- K-Means Clustering: This model segments data into clusters based on similarities without predefined labels. It’s implemented using the KMEANS model type within the CREATE MODEL statement.
- Matrix Factorization: Designed for building recommendation systems using customer behavior and product purchase data. It enables users to create these systems efficiently using SQL.
- Time Series: Ideal for forecasting future values based on historical time-stamped data. It simplifies handling data anomalies such as holidays and seasonality. It supports creating, testing, and deploying various time series models for accurate forecasting.
- These models empower users to solve diverse real-world problems efficiently within the Google BigQuery ML environment, leveraging SQL for seamless integration with existing data workflows.
Steps to Create Models using Google BigQuery ML
Now that you have learned about Google BigQuery ML. In this section, you will learn the steps to create a simple Logistic Regression model in Google BigQuery ML and make learn to make predictions from a dataset. The steps to create a simple model are listed below:
Step 1: Setting Up the Environment
- Sign in to your Google Cloud Platform account.
- Now, create a New Project and fill out all the required details for billing.
- Now, go to Google Developer Console and click on the sidebar menu.
- Now click on the “BigQuery” option, and it will open up the Google BigQuery Console in the next tab, shown in the image below.
- The project that you provided earlier will be reflected in the menu on the left side of the screen.
- Under the project name, click on the next arrow button and then select the “Create New Dataset” option, shown in the image below.
- Provide a name to your dataset as bqml_codelab. If you want to give a different name, then make sure to replace the name with code in the process.
Integrate Salesforce to BigQuery
Integrate MongoDB to BigQuery
Integrate Oracle to BigQuery
Step 2: Creating Machine Learning Model
- Now, open Google Cloud Console.
- In this example, you will learn about creating a model to predict whether a visitor will make a transaction. The code is given below.
#standardSQL
CREATE OR REPLACE MODEL `bqml_codelab.sample_model`
OPTIONS(model_type='logistic_reg') AS
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170631'
LIMIT 100000;
- In the following code, the name of the dataset is “bqml_codelab” and “sample_model” is the name of the model. The model used here is Binary Logistic Regression.
Step 3: Evaluating the Model
- Now the following code will return the evaluation metrics for the model. The metrics are “precision, “recall”, “accuracy”, “f1_score”, “log_loss”, “roc_auc”. The code is given below.
#standardSQL
SELECT
*
FROM
ml.EVALUATE(MODEL `bqml_codelab.sample_model`, (
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'));
Step 4: Predicting Using the Model
- Now, the model will predict the number of transactions made by visitors for each country and the code will sort the result to the top 10 countries. The code is given below.
#standardSQL
SELECT
country,
SUM(predicted_label) as total_predicted_purchases
FROM
ml.PREDICT(MODEL `bqml_codelab.sample_model`, (
SELECT
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(totals.pageviews, 0) AS pageviews,
IFNULL(geoNetwork.country, "") AS country
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY country
ORDER BY total_predicted_purchases DESC
LIMIT 10;
- The output is shown in the image below.
- The difference between the previous code and this block is that previously it used “m1.EVALUATE” and now it’s “m1.PREDICT”. The other difference between both the codes is the use of the GROUP BY command to group country columns, sorting the results and limiting it to 10 only.
That’s it! You have created a Logistic Regression model, evaluated it, and used the model to make predictions. You can try the same example for the Linear Regression model and compare the accuracies of both models.
Seamlessly Migrate Data to BigQuery with Hevo
No credit card required
Benefits of Google BigQuery ML
Google BigQuery ML is fast to create, evaluate and execute various Machine Learning models easily using standard SQL queries. It offers many benefits over other Cloud Data Warehouses. Some of the benefits are listed below.
- It eliminates the need to know Python or any other language for managing Machine Learning models. Data Analysts with expertise in SQL only can now train models and make predictions.
- The data export involves many steps, and it’s a time-consuming process. It saves time and resources by letting users use Machine Learning models in Google BigQuery.
- It allows users to run Machine Learning models on large datasets within minutes as it uses computation resources of Google BigQuery Data Warehouse.
- It features some automated Machine Learning models that reduce the workload to manipulate data manually. It saves time and allows users to quickly train and test models on the dataset.
Conclusion
In this article, you learnt about Google BigQuery ML, and how easy it is to make predictions using SQL. You also read about different Machine Learning models that Google BigQuery ML supports. It eliminates the restriction Data Analysts face for using Machine Learning models in their analysis. Companies can deploy models directly on their datasets to make better business decisions.
Google BigQuery Data Warehouse loads data from multiple data sources, and it becomes a tedious task for manually managing all the ETL processes. Hevo Data is a No-code Data pipeline solution that can help you transfer data from multiple data sources to Google BigQuery. It fully automates the process of loading and transforming data from 150+ data sources to your desired Data Warehouse without writing a single line of code. Sign up for a 14-day free trial and experience the feature-rich Hevo suite firsthand.
FAQs
1. What does BigQuery ML do?
BigQuery ML allows data analysts and data scientists to build and deploy machine learning models directly within BigQuery using SQL.
2. What is BigQuery AutoML?
BigQuery AutoML automates the creation of custom machine learning models without requiring extensive coding, offering an easy-to-use interface for model building.
3. Which of the following models are supported by BigQuery ML?
BigQuery ML supports models like linear regression, logistic regression, k-means clustering, ARIMA for time series forecasting, and deep neural networks.
Suraj has over a decade of experience in the tech industry, with a significant focus on architecting and developing scalable front-end solutions. As a Principal Frontend Engineer at Hevo, he has played a key role in building core frontend modules, driving innovation, and contributing to the open-source community. Suraj's expertise includes creating reusable UI libraries, collaborating across teams, and enhancing user experience and interface design.