BigQuery WITH & CTE Statements: Syntax & Usage Simplified 101

By: Published: September 28, 2021

BigQuery WITH Feature Image

The Common table expressions, commonly known as CTEs in SQL Server, is a tool that allows users to design and arrange queries. It has faster development, troubleshooting, and performance improvement. Common table expressions are a functional feature in SQL that will enable you to perform multi-step and complex transformations in a single easy-to-read query. Because of their readability and flexibility, CTEs are a useful tool for beginners and experts alike.

CTE in SQL Server provides users with a convenient way to query data faster and make it more readable. It is similar to using SQL commands like temp tables and sub-queries. With CTEs (also called SQL WITH clauses), you have quick troubleshooting options, as you can create multiple statements and run select statements inside the CTE. Read on to find more about BigQuery WITH statements, their syntax, benefits, and tips to write them. 

Table of Contents

Prerequisites

  • Understanding of SQL

Introduction to BigQuery

BigQuery WITH: BigQuery Logo
Image Source

Google’s BigQuery is a cloud-native, enterprise-grade data warehouse that was first launched in 2012. Since its inception, BigQuery has evolved into a more economical and fully managed data warehouse that enables users to run blazing fast, interactive, ad hoc queries on petabyte-scale datasets. In addition, BigQuery now integrates with various Google Cloud Platform (GCP) services and third-party tools, making it more valuable.

Since BigQuery is a serverless platform, there are no servers to manage or database software to install. BigQuery service contains underlying software and infrastructure, including high availability, flexible pricing, and scalability. 

BigQuery exposes a simple client interface where users can run interactive queries if they know little about its underlying architecture. To get started with BigQuery, you must import your data into BigQuery and then write your queries using SQL dialects offered by BigQuery. A good understanding of BigQuery architecture is helpful when implementing various queries, including controlling costs, optimizing query performance, and optimizing storage. 

Introduction to Common Table Expressions (CTEs)

A common table expression or CTE was introduced in the fourth SQL revision, SQL:1999. It is a relatively new SQL feature. SQL Server first introduced them in 2005, and then PostgreSQL and MySQL made them available starting with Version 8.4 in 2009 and Version 8.0. in 2018, respectively. 

CTE is a temporary dataset returned by a query, which is then used by another query. It’s temporary because the system does not store the results anywhere; it exists only when you run the query. The Common Table Expressions (CTE) is used in standard SQL to simplify various SQL Query classes for which a derived table was unsuitable. You can reference a CTE within the INSERT, SELECT, UPDATE, CREATE or DELETE statement in SQL. As of SQL Server 2008, you can now add a CTE to the MERGE statement.

Here is a standard CTE syntax:

BigQuery WITH: CTE Syntax
Image Source

There are two types of CTEs:

1. Non-recursive CTEs

Non-Recursive CTEs are when you don’t use any recursion or repeated processing in a sub-routine. The following example creates a simple Non-Recursive CTE to display the row number from 1 to 10:

BigQuery WITH: Non-recursive CTEs
Image Source

2. Recursive CTEs

Recursive CTEs use repeated procedural loops, and the recursive query calls itself until the query satisfies the condition. In a recursive CTE, there must be a ‘where’ condition to terminate the recursion:

BigQuery WITH: Recursive CTEs
Image Source

Simplify your Data Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate data from 100+ data sources (including 40+ Free Data Sources) to a destination of your choice like Google BigQuery in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Get Started with Hevo for Free

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Salesforce CRM, Google BigQuery, Amazon Redshift, Firebolt, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, MongoDB, TokuDB, DynamoDB, PostgreSQL databases to name a few.  
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Introduction to SQL

Structured Query Language (SQL) is the database language that allows users to perform certain operations on the existing database. You can also use SQL to create a database. It uses certain commands like Create, Drop, Insert, etc., to carry out the required tasks. 

Various SQL commands can be categorized as follows:

  • DDL: Data Definition Language
  • DQL: Data Query Language
  • DML: Data Manipulation Language
  • DCL: Data Control Language
  • TCL: Transaction Control Language

Data manipulation in SQL

The data manipulation command (DML) in SQL is for manipulating records stored in tables in the existing database. DML commands do not deal with changes to the object and structure of the database. Common DML commands are INSERT, UPDATE and DELETE. The SELECT command can also be considered part of the DML even though it strictly forms a part of the Data Query language command (DQL). It is also used while writing CTEs. 

1. INSERT

You can insert rows in a table with the INSERT statement of SQL. You can add rows in two ways with the same INSERT statement. In the first method, you can specify only the value of data to be inserted. No column names will appear. However, you can specify the columns you want to fill and their corresponding values with the second method. 

2. UPDATE

The UPDATE statement in SQL is used to update the data of an existing table in the database. You can update single columns and multiple columns using the UPDATE statements.

3. DELETE

The DELETE Statement in SQL is used to delete existing records from a table. You can delete a single record or multiple records depending on the condition we specify in the WHERE clause.

4. SELECT

Select is one of the most common SQL statements. It’s used to fetch or retrieve data from the database. You can either retrieve the entire table or provide some specified rules. Returned data is stored in the results table, which is also called the result-set.

With the SELECT command statement, you can specify the columns you want to display in the query result. Optionally, you can also choose which column headings should appear above the result table.

Understanding the Importance of CTEs

Here is why CTEs can play a vital role in your business workflow:

  • Increase readability: Using a CTE can increase the readability of the SQL query. This is especially useful in a large query and those who use complicated logic in the column. You can move the complex logic to the temporary CTE to make the overall query simple.
  • Use it like SQL View: Even though CTEs is temporary, it is named result set. It also has columns from the original table. Unlike SQL Views that are saved in the database, you can use CTEs as Views to avoid unnecessary metadata. 
  • Use a recursion or hierarchical query: SQL With Clause allows you to write a recursive query or a hierarchical query, which is a question that refers to the previous rows from the same query. 

Understanding How to Write a BigQuery WITH Statement

CTEs are also called SQL WITH clauses because CTEs necessarily use the WITH keyword to define sub-queries. It’s sometimes called sub-query factoring because you are improving the design of an SQL query.

Common table expressions (CTEs) help you break up complex queries in BigQuery. Here’s a simple query to illustrate how to write a BigQuery WITH statement:

BigQuery WITH: Query Example
Image Source

Tips for writing BigQuery WITH Statements

Here are a few tips you can use to write BigQuery WITH statements:

  • A single SELECT, UPDATE, DELETE, or INSERT statement that references some or all the BigQuery WITH columns must follow the BigQuery WITH statement. You can also specify a BigQuery WITH statement in a CREATE VIEW statement as part of the defining SELECT statement of the view.
  • You can define multiple BigQuery WITH query definitions in a non-recursive CTE. These set operators: UNION ALL, UNION, INTERSECT, or EXCEPT must combine the definitions in a BigQuery WITH clause.
  • A BigQuery WITH statement can reference itself and previously defined CTEs in the same BigQuery WITH clause. However, forward referencing is not allowed for a BigQuery WITH statement.
  • You can’t specify more than one BigQuery WITH clause in a statement. For example, if a CTE_query_definition contains a sub-query, that subquery cannot contain a nested BigQuery WITH clause that defines another CTE.
  • The statement of recursive CTE must contain at least two BigQuery WITH statement definitions: a recursive member and an anchor member. Defining a BigQuery WITH clause simply means writing a SELECT query, giving a result you want to use within another query. Some members of the anchors and recursive members can be defined. However, all definitions of queries of anchor members must be placed before the definition of the first recursive member. All BigQuery WITH definitions is an anchor member unless they refer to the BigQuery WITH statement itself.
  • One of the set operators (UNION ALL, UNION, INTERSECT, or EXCEPT) must combine anchor members. UNION ALL is the only operator set permitted between the last anchors and first recursive member and when connecting several recursive members in a BigQuery WITH statement.

Conclusion

CTEs in SQL allow users to create a temporarily named query that helps increase their query readability. CTEs in SQL is implemented using the WITH clauses. As you can see, whether recursive or non-recursive, CTE can be a useful tool when you need to produce a temporary set of results that can be accessed in the SELECT, UPDATE, DELETE, JOIN, or INSERT statement. CTE is like a derived table in a certain sense: it is not stored as an object and is only valid while implementing the main statement. However, unlike derivative tables, CTEs can be referenced several times in the query.

In addition, unlike a sub-query, CTEs can refer themselves. And best of all, CTEs are relatively easy to implement.

Extracting complex data from a diverse set of data sources to carry out an insightful analysis can be challenging, and this is where Hevo saves the day! Hevo offers a faster way to move data from Databases or SaaS applications to be visualized in a BI tool for free. Hevo is fully automated and hence does not require you to code.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for the 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Osheen Jain
Freelance Technical Content Writer, Hevo Data

Osheen has extensive experience in freelance writing within the data industry and is passionate about simplifying the complexities of data integration and data analysis through informative content for those delving deeper into these subjects.

No-code Data Pipeline for BigQuery