BigQuery Insert and Update Statements: A Comprehensive Guide

on Data Warehouse • October 20th, 2020 • Write for Hevo

Introduction 

This post is a tutorial about the BigQuery INSERT and UPDATE queries.

Google BigQuery is a completely managed data warehouse service. It has a comprehensive querying layer with state of the art processing ability and response times. BigQuery helps customers to experience a powerful data warehouse without having to spend money on developing and maintaining one. BigQuery is offered based on a pay-as-you-go model. Over the typical data warehouse features, BigQuery also offers many supporting features. BigQuery ML helps users to run models on BigQuery data using SQL queries. Connected sheets help users to analyze the data in BigQuery using Google Sheets. Other than the data residing in its own storage, BigQuery can also access data from various other locations including Google Drive, Cloud SQL, etc.

Here is a broad outline of what we will cover in this blog:

Hevo, A Simpler Alternative to Integrate your Data for Analysis

Hevo offers a faster way to move data from databases or SaaS applications into your data warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support call.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.

You can try Hevo for free by signing up for a 14-day free trial.

Prerequisites

Understanding BigQuery SQL

Since BigQuery is a data warehouse service, its querying layer plays a big role in its acceptability for use cases. Data Manipulation statements in BigQuery are implicit transactions, that is they are auto committed. There is no support for multi-statement transactions in BigQuery. Another interesting behavior is the ability to concurrently execute data manipulation queries. This means concurrently executing statements that are conflicting with each other can raise errors and fail. Now that you have learned about the basics of BigQuery DML, let us define a schema in BigQuery that can be used in the rest of the tutorial.

Consider the below three tables for our tutorial with BigQuery

The Outlet table

[
  {"name": "outlet_name", "type": "string"},
  {"name": "location", "type": "string"}
]

The Stock table

[
 {"name": "product", "type": "string"},
 {"name": "quantity", "type": "integer"},

 {"name": "outlet", "type": "string"}
]

The StockDetails  table 

[
 {"name": "product", "type": "string"},
 {"name": "color", "type": "string"},
 {"name": "price", "type": "string"},
 {"name": "expiry_months", "type": "integer"},
]

BigQuery INSERT Statements and Usage Examples

You will now learn the basic INSERT statement constructs to interact with the above table definitions.

INSERT query follows the standard SQL syntax. The values that are being inserted should be used in the same order as the columns. 

You can execute a basic INSERT query with columns specified as below.

INSERT INTO dataset.StockDetails(name, price) VALUES(‘Samsung Galaxy Note 10,’999’’)

An INSERT query without specifying columns can be executed as below.

INSERT INTO dataset.StockDetails VALUES(‘SG Note 10’,’Mystic Black’,’999’,’24’)

An INSERT query using a subquery can be executed as follows.

INSERT INTO dataset.stockDetails(product, color) VALUES(‘SG Note 10’, SELECT color from dataset.StockDetails WHERE product=’Samsung Galaxy Note 20’)

BigQuery also supports the WITH keyword in the QUERY syntax.

A point to note here is the implicit data typing mapping that is executed by BigQuery in the case of INSERT statements. This can lead to corrupted databases in case the user is not careful. BigQuery will raise an error for data type mismatch only in cases where the incoming column cannot be converted to the target column data type.

BigQuery UPDATE Statements and Usage examples

In the case of BigQuery, each UPDATE statement must include a WHERE clause. This offers protection against the bulk erroneous updates if the developer accidentally forgets the WHERE clause. To update the complete table, you have to specifically mention WHERE TRUE.

The most basic UPDATE example will look like below.

UPDATE dataset.StockDetails SET color = ‘Mystic Green’ WHERE product=’SG Note 20 Ultra’

To update records by using a string match operation, the query will be as follows.

UPDATE dataset.StockDetails SET color = ‘Mystic Green’ WHERE product LIKE ‘%Note%’

The above statement will update the color of all records where the product contains ‘Note’ in its name.

An Update statement using a JOIN clause can be executed as below.

UPDATE dataset.StockDetails a SET color = ‘black’ FROM dataset.Stock b WHERE a.product = b.product AND b.outlet = ‘central_park’

The above query updates the color of all the records that are belonging to a particular store.

An explicit JOIN keyword has to be used in case the JOIN is between tables that are not getting updated. An example will be as follows.

UPDATE dataset.StockDetails SET quantity=’10’ FROM dataset.Stock INNER JOIN dataset.Store on Stock.outlet = Outlet. outlet_name WHERE StockDetails.product = Stock.product and Outlet.location = ‘New York’

The above command will set the quantity as 10 for all items in all outlets belonging to the location ‘New York’.

That covers the basics of INSERT and UPDATE statements in BigQuery. While implementing an ETL pipeline to synchronize the data between your transactional databases and data warehouse, an automated scheduled script using the above constructs can be used. Implementing such a script has a lot more additional complexities because of the existence of many factors like duplicate rows, data type translations, etc. 

But there is an alternative.

It is to use a completely managed ETL tool like Hevo that can accomplish data transfer between various sources and destinations easily.

Using Hevo

Hevo abstracts all the pain points and limitations mentioned above. Additionally, Hevo’s fault-tolerant algorithms automatically handle any errors in the data flow, thereby ensuring that your warehouse always has up-to-date data. 

Hevo rids you of the added complexity of writing and maintaining custom scripts to move data from a multitude of data sources

Hevo natively integrates with Sales and Marketing Applications, Analytics Applications, Databases, and so on enabling your organization to efficiently assemble an enterprise BI data warehouse.

By using a data integration tool, rather than writing the code yourself, you can save your engineers time and money as you will be allowing them to focus on other value-added activities.

Sign up for a 14-day risk-free trial with Hevo and experience a seamless data migration from Google BigQuery. 

Here is a quick overview of Hevo: 

Share your thoughts on BigQuery INSERT and UPDATE queries. 

No-code Data Pipeline for your Data Warehouse