Working with BigQuery Select Statement: 4 Critical Aspects

• September 24th, 2021

Companies unlock new business insights by analyzing their data. However, traditional data warehouses cannot deliver the same results with rising data and suffer a considerable lag in performance. Google’s enterprise serverless data warehouse — BigQuery — facilitates large-scale data analysis for businesses of all sizes. It not only eliminates data silos to reduce significant friction among project teams but also secures data by restricting users from accessing sensitive data. As BigQuery stores data in a structured table format, it enables analysts to use standard structured query language (SQL) while working with data.

This article gives an overview of the BigQuery Select statement. It introduces you to BigQuery and the understanding of SQL dialects. Moreover, key types, syntax, and example queries of standard SQL SELECT statements are also discussed.

Title of Content:

Prerequisites

  • Understanding of SQL

What is BigQuery?

BigQuery Select: BigQuery Logo
Image Source

Google rolled out the Google cloud platform (GCP) — a suite of cloud computing services that offers infrastructure as a service, platform as a service, and serverless computing environments. BigQuery is one such product primarily released in 2011 to provide a serverless, fully managed, and cost-effective data warehouse solution. It helps analysts collect and analyze data with built-in features like machine learning, geospatial analysis, and business intelligence. The serverless architecture enables processing SQL queries at a blazing-fast speed to give a highly scalable multi-cloud data warehouse solution for business agility.

Google BigQuery offers up to $300 in free credits on Google Cloud for the first 3 months. On top of this, all customers get 1 TB of queries/month and 10 GB of storage for free.

Simplify your Data Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate data from 100+ data sources (including 30+ Free Data Sources) to a destination of your choice like Google BigQuery in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Get Started with Hevo for Free

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Salesforce CRM, Google BigQuery, Amazon Redshift, Firebolt, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, MongoDB, TokuDB, DynamoDB, PostgreSQL databases to name a few.  
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Understanding SQL in BigQuery

Need for SQL

Many programming languages deal with large datasets, however, SQL assists in querying data from databases swiftly. 

  • Easy to Learn and Use: Compared to other languages which require high-level conceptual understanding and memorization of steps to perform a task, SQL is a simple English-like query language that is easy to understand.
  • Understanding Dataset: As SQL helps analysts investigate and identify the structure of data distributed within datasets, it lays the foundation for other data-driven teams to make profitable decisions.
  • Manage Data: As organizations deal with huge volumes of data, SQL facilitates valuable insights from pools of data. It not only integrates with other programming languages (Python/R) but also is used by BI tools to generate reports.

Dialect of SQL in BigQuery

BigQuery supports two dialects: legacy SQL and standard SQL that can be configured in the query settings. One can opt for any of the SQL dialects as per the way businesses interface their data. The following steps will assist you to switch SQL dialects:

Legacy SQL

Legacy SQL (a non-standard SQL) is the default dialect if you use the bq command-line tool and the REST API. However, you can switch to standard SQL dialect while using the bq tool as shown below:

BigQuery Select: Legacy Sql
Image Source

Standard SQL

Standard SQL is the default dialect if you use cloud console and client libraries. However, you can switch from standard SQL to legacy SQL using the following steps:

  • Step 1: Click on ‘compose new query’ to open the query editor in the cloud console.
  • Step 2: Below the query editor, click on the ‘more’ option in ‘query settings.’
  • Step 3: Click on the advanced options to open SQL dialect.
  • Step 4: Select ‘legacy,’ to complete the additional setting section.

Apart from the steps mentioned above, one can also set SQL dialect as a prefix in cloud console by making it a part of the query as shown below:

BigQuery Select: Standard SQL
Image Source

The default setting for Cloud Console is standard SQL. For instance, if you wish to run legacy SQL in the cloud console, BigQuery runs the query ignoring the default setting. When using either #legacySQL or #standardSQL prefix, consider the following:

  • They are case insensitive.
  • They must precede the query.
  • They must be separated from the query by a newline character as shown below.
BigQuery Select: #LegacySQL prefix
Image Source

Understanding the SQL SELECT Statement

Companies store exabytes of data that keeps updating timely to observe underlying trends and patterns for making confident business decisions. To store such large files in traditional CSV or text files would require greater processing speeds and storage security. Hence organizations prefer a database management system (DBMS) to perform desired operations on a huge number of records using SQL.

Depending on the type of task, SQL queries are broadly divided into five parts —  data definition language (DDL), data control language (DCL), data manipulation language (DML), data query language (DQL), and transaction control language (TCL).

SQL ‘SELECT’ statement is a DQL command used to fetch data from a database. A query can also involve clauses (conditions) to filter and analyze data quickly, it helps to apply constraints and arrange data to get a better understanding of data.

Key Types of BigQuery Select Statement

There are a few notations rules to understand queries in BigQuery as given below:

  • Square brackets indicate optional clauses.
  • Parenthesis indicates literals.
  • The vertical bar indicates a logical OR.
  • Curly brackets enclose a set of options.
  • A comma followed by an ellipsis within square brackets indicates that the preceding item can be repeated in a comma-separated list.

The below query can help understand a broader aspect of SELECT statement in standard SQL:

BigQuery Select: Select Statement in Standard SQL
Image Source

Suppose you have a ‘PlayerStats’ table that includes a list of player names, a unique ID assigned to the opponent they played in a given game, and the number of points scored by an athlete, as shown below.

BigQuery Select: Table
Image Source

If you want to emulate temporary table a name using the ‘WITH’ clause, use the below query:

BigQuery Select: WITH Clause
Image Source

1. SELECT List

The SELECT command returns the desired columns, whereas expressions in the SELECT command refer to columns in any from_items in its corresponding FROM clause. Typically a SELECT command consists of the following combinations — ‘*,’ ‘expression,’ ‘expression.*.’

2. Modifiers for * Operator

Standard SQL provides two modifiers for * operator — EXCEPT and REPLACE. The ‘EXCEPT’ modifier excludes one or multiple columns, whereas the ‘REPLACE’ modifier changes values in a specified column.

3. Duplicate Row Handling

A queried result contains a mixture of records with duplicate values, which can be synthesized to get unique values. The default behavior of the SELECT statement returns all rows, including duplicate values. To discard duplicates, you can use ‘SELECT DISTINCT.’ However, it cannot return struct and array types columns.

4. Value Tables

A regular table consists of rows having a predefined name and type of columns. However, in BigQuery, a value table consists of rows with a single value and no column names. A value table can be used for queries that return exactly one column, it can return a value table with STRUCT data type.

Syntax and Example Queries of SELECT Statement

1. BigQuery SELECT list

BigQuery Select: BigQuery Select List
Image Source

The BigQuery SELECT list syntax consists of various methods to return results that rely on corresponding FROM clause, modifiers, and distinct values, as shown below:

BigQuery SELECT *

A SELECT *, often referred to as select star, returns one output column for each column that is visible after executing the entire query, as shown below:

BigQuery Select: BigQuery Select *
Image Source

BigQuery SELECT expression

Expressions are conditions defined after a clause (for eg. WHERE). It returns one or more output columns, which may have a single value with an optional explicit alias. If the expression does not have an explicit alias, it receives an implicit alias.

BigQuery SELECT expression.*

An item can also take the form of expression.* in a SELECT list. It produces one output column for each column of expression. The expression must be either a table alias or evaluate a single value data type such as STRUCT. For instance, to produce one output column for each column in the table groceries, aliased as g, use the below query:

BigQuery Select: BigQuery Select expression.*
Image Source

2. Modifiers for * operator

A query can return desired columns using the modifier — EXCEPT and REPLACE. The syntax, along with an example, is discussed below:

Except

The EXCEPT modifier omits the matching column name(s) from a table. For instance, if you want to exclude order_id from the orders table, use the below query:

BigQuery Select: Except Modifier
Image Source

Replace

The REPLACE modifier matches specified columns and replaces identified columns with an expression. One should note that the REPLACE modifier neither replaces unnamed columns nor changes the names or order of columns. However, using the replace modifier can change the value and value type.

3. Value Tables

A query would produce a value table if your Biguery uses SELECT AS, using one of the syntaxes below:

Struct

A STRUCT in BigQuery is a data type with attributes in key-value pairs. It produces a value table with a STRUCT row type, where the STRUCT field names and types match the column names and types produced in the SELECT list.

Syntax:

BigQuery Select: Struct Syntax
Image Source

To return a single STRUCT type grouping multiple values together, you can use STRUCT in a scalar or array subquery.

Syntax:

BigQuery Select: Return STRUCT
Image Source

Value

To produce a value table having exactly one column from any SELECT list, you can use the SELECT AS VALUE command. A value table discards any alias name given to a column. 

Syntax:

BigQuery Select: Value Syntax
Image Source

Conclusion

With traditional data warehouse technology, organizations suffer querying rising data and experience frequent performance delays. However, Google BigQuery adopts a simple SQL at the backend while connecting with BI tools to analyze critical parameters on a dashboard interface. This blog talks in detail about the Select statement in BigQuery.

Extracting complex data from a diverse set of data sources to carry out an insightful analysis can be challenging, and this is where Hevo saves the day! Hevo offers a faster way to load data to Google BigQuery to be visualized in a BI tool for free. Hevo is fully automated and hence does not require you to code.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for the 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.