Explore the Google BigQuery SQL syntax, functions, and key benefits of utilizing it in further sections. Let’s get started.

  • One of the popular Cloud-based Data Warehouses is Google BigQuery which comes with many features such as BigQuery SQL, BigQuery ML, integrations, Cloud Functions, etc.
  • Google BigQuery is one of the popular Cloud-based Data Warehouses that offers its users great scalability, simplicity, and abstraction.
  • Querying commands in Google BigQuery is based on Structured Query Language (SQL).
  • Which is a domain-specific querying language designed specifically for Database Management Systems and is especially helpful when dealing with structured data.

What is Google BigQuery?

  • Built-in functionalities like Machine Learning, Geospatial Analysis, and Business Intelligence to help you manage and analyze your data.
  • It provides a serverless architecture that enables you to utilize SQL queries to solve your organization’s biggest needs while requiring no infrastructure management.

What is Google BigQuery SQL?

Google BigQuery SQL  (Structured Query Language) is a domain-specific querying language for managing data in RDBMS. There are four sublanguages in SQL:, Data Definition Language (DDL)Data Query Language (DQL) , Data Manipulation Language (DML), Data Control Language (DCL)

Dialects of Google BigQuery SQL

Step1 – Google BigQuery Legacy SQL

  • Before Google BigQuery v2.0, the Data Warehouse used a non-standard SQL dialect called Legacy SQL to run queries.
  • However, this SQL version is currently optional and can be activated by adding the “legacy_sql: true” parameter in your query. 

For example, the below query depicts the Legacy SQL in Google BigQuery:

require "google/cloud/bigquery"
bigquery = Google::Cloud::Bigquery.new
sql = "SELECT word_id FROM [bigquery-public-data:samples.shakespeare]"
data = bigquery.query sql, legacy_sql: true

Step 2 – Google BigQuery Standard SQL

  • Google BigQuery Standard SQL is the most frequently used SQL dialect for querying data stored in Google BigQuery.
  • It follows the SQL 2011 standard and also supports nested queries and repeated data. It offers a few additional data types and has a strict grammar syntax.

For example, the below query depicts the Standard SQL in Google BigQuery:

require "google/cloud/bigquery"
bigquery = Google::Cloud::Bigquery.new
sql = "SELECT word_id FROM [bigquery-public-data:samples.shakespeare]"
data = bigquery.query sql

Google BigQuery SQL Syntax

  • Google BigQuery SQL follows a set of rules and standards called query syntax. The syntax for SQL queries in Google BigQuery is listed below.
query_statement:
    query_expr

query_expr:
    [ WITH cte[, ...] ]
    { select | ( query_expr ) | set_operation }
    [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
    [ LIMIT count [ OFFSET skip_rows ] ]

select:
    SELECT [ AS { STRUCT | VALUE } ] [{ ALL | DISTINCT }]
        { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
            [ REPLACE ( expression [ AS ] column_name [, ...] ) ]
        | expression [ [ AS ] alias ] } [, ...]
    [ FROM from_clause[, ...] ]
    [ WHERE bool_expression ]
    [ GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) } ]
    [ HAVING bool_expression ]
    [ QUALIFY bool_expression ]
    [ WINDOW window_clause ]

The following are the notation rules for the above mentioned Google BigQuery SQL syntax:

  1. Optional clauses are denoted by square brackets “[ ]”.
  2. Literal parentheses are denoted by the “( )”.
  3. A logical OR is indicated by the vertical bar “|”.
  4. A set of options is denoted by “{ }”.
  5. In a comma-separated list “[,… ]”, a comma followed by an ellipsis within square brackets indicates that the preceding item can be repeated.

Let’s take a look at some of the SQL syntax for widely used clauses in Google BigQuery.

1. Google BigQuery SQL Syntax: SELECT Clause

The SELECT clause is most widely used in SQL. It is used to fetch data from a database. You can retrieve the entire table or a subset of it based on some rules.

Basic Syntax:

SELECT column1, column2… columnN 
FROM table_name;

2. Google BigQuery SQL Syntax: FROM Clause

The FROM clause specifies the source from which you want to extract information. This might be a single table or a group of tables.

Basic Syntax:

SELECT * FROM table_name;

3. Google BigQuery SQL Syntax: WHERE Clause

Filtering the records in a table is done using the WHERE clause. It is used to retrieve only the records that meet a specific set of criteria.

Basic Syntax:

SELECT column1, column2, … columnN
FROM table_name
WHERE condition; 

4. Google BigQuery SQL Syntax: GROUP BY Clause

The GROUP BY clause is used to group identical data into groups using specified methods. Few things to be noted while using the GROUP BY clause:

  • The GROUP BY clause is used only while using the SELECT statement.
  • The GROUP BY clause comes after the WHERE clause in the query.
  • If an ORDER BY clause is used in the query, GROUP BY is placed before them.

Basic Syntax:

SELECT column1, function_name(column2)
FROM table_name
WHERE condition
GROUP BY column1, column2
ORDER BY column1, column2;

5. Google BigQuery SQL Syntax: HAVING Clause

The HAVING clause is identical to the WHERE clause, however, it focuses on the results generated by the GROUP BY clause. 

Basic Syntax:

SELECT column1, function_name(column2)
FROM table_name
WHERE condition
GROUP BY column1, column2
HAVING condition

6. Google BigQuery SQL Syntax: ORDER BY Clause

The ORDER BY clause sorts the obtained data in ascending or descending order based on one or more columns. The data is sorted in ascending order by default.

Basic Syntax:

SELECT * FROM table_name 
ORDER BY column_name ASC|DESC;

Google BigQuery SQL Functions

  • Provides a wide range of built-in functions, such as Logical and Statistical aggregate functions, Mathematical functions, String functions, Date functions, and many more.

1. Aggregation Functions

  • Aggregation functions are among the most often used SQL functions. An aggregate function combines all of a group’s rows into a single value. Some of  the commonly used aggregation functions include:
Aggregate FunctionsDescription
COUNT()Counts the number of rows in a table or view.
SUM()Computes the sum of all values.
AVG()Computes the average of a set of values.
MIN()Returns the minimum value.
MAX()Returns the maximum value.
Aggregate Functions

2. Window (Analytical) Functions

  • Window functions evaluate values for a certain window, or a selection of rows that you’re interested in, rather than the full table. You must use an OVER() clause to specify the Window.
Window FunctionsDescription
RANK()It is used to assign each record a unique rank based on a set of criteria.
ROW_NUMBER()Each record is given a unique row number by this function.
NTILE()It aids in determining which percentile a particular row belongs to.
LAG()It enables you to get data from the preceding row in the same result set without having to use SQL joins.
LEAD()It enables you to get data from the next row in the same result set without having to use SQL joins.
Window Functions

3. String Functions

  • When you need to alter text, format text in a line, or glue the values of columns, they are helpful. Some  of the commonly used string functions include:
String FunctionsDescription
CONCAT()It helps to join two words or strings together.
CONCAT_WS()It is used to join two words or strings together by using a symbol as a concatenating symbol.
FORMAT()It is used to display the number in a specified format.
RTRIM()It is used to separate the given substring from the rest of the string.
SUBSTR()It is used to extract a substring from a string at a certain position.
REVERSE()It is used for reversing a string.
String Functions

4. Date Functions

  • When you need to deal with dates in your dataset, date functions prove useful. Some of the frequently used Date functions include:
Date FunctionsDescription
DATE()It is responsible for creating a DATE field from integer values for the year, month, and day.
PARSE_DATE()It creates a DATE object from a string representation of a date.
DATE_DIFF()It counts the number of days, weeks, months, or years that have elapsed between two dates.
CURRENT_DATE()The current date in the given or default timezone is returned.
FORMAT_DATE()It formats the DATE field to the given format string.
Date Functions

Key Benefits of Google BigQuery SQL

  • Extracting the most information from data through deep segmentation, marketing KPI research, and the discovery of unfair contractors in CPA (Cost per Action) networks.
  • Advanced analysis with BigQuery SQL, such as user segmentation and determining your site’s best-performing audiences.
  • Can be used in ROPO (Research Online, Purchase Offline) analysis. It aids in determining the impact of online campaigns on offline sales.

Conclusion

  1. This article gave you a brief introduction to Google BigQuery SQL along with its key features.
  2. It also provided in-depth knowledge about Google BigQuery SQL syntax, functions, and their key benefits.
  3. You may now create your queries and examine your data to improve your analytics capabilities and make better decisions.
Ayush Poddar
Research Analyst, Hevo Data

Ayush is a Software Engineer with a strong focus on data analysis and technical writing. As a Research Analyst at Hevo Data, he authors articles on data integration and infrastructure using his proficiency in SQL, Python, and data visualization tools like Tableau and Power BI. Ayush's Bachelor's degree in Game and Interactive Media Design complements his technical expertise, enabling him to integrate cutting-edge technologies into his analytical workflows.

No-code Data Pipeline for your Data Warehouse