BigQuery is scalable, allowing organizations to scale their storage up and down based on their changing needs. When using BigQuery, you will want to delete data from some of your tables, probably to free up storage. There are different ways to do this- some of which are free while others where you will incur a cost. In this article, we will be discussing how to delete data from a BigQuery table using the BigQuery TRUNCATE TABLE command. 

Prerequisites 

This is what you need for this article:

  • A Google BigQuery account. 

What is Google BigQuery?

Google BigQuery is a Highly Scalable & Serverless Data Warehouse with a built-in Query Engine. It was developed by Google, hence, it uses the processing power of Google’s infrastructure. The Query Engine can run SQL queries on terabytes of data within seconds, and petabytes within minutes. 

Key Features:

  • Machine Learning: BigQuery ML allows users to train and run machine learning models in BigQuery using only SQL syntax.
  • Serverless Architecture: BigQuery manages servers and storage in the background, so a user does not need to.
  • High Scalability: It scales seamlessly to handle petabytes of data.
  • SQL Compatibility: It supports ANSI SQL, which is useful for people who already know SQL and want to write and run queries. This also allows a user to combine various BI tools for data visualization.

BigQuery DML Commands

INSERT:

Adds new rows to the table.

INSERT INTO <table_name> (<column1>, <column2>, ...) 
VALUES (<value1>, <value2>, ...);

    DELETE:

    This statement removes rows from a table that meet specific conditions.

    DELETE FROM <table_name> 
    WHERE <condition>;

      UPDATE

      This statement modifies existing data within a table.

      UPDATE <table_name> 
      SET <column1> = <value1>, <column2> = <value2>, ... 
      WHERE <condition>;

        MERGE:

        This statement combines INSERT and UPDATE operations into a single statement. It allows you to insert new rows or update existing rows based on a matching condition.

        MERGE INTO T
        USING S
        ON T. = S.
        WHEN MATCHED THEN
        UPDATE SET = S.
        WHEN NOT MATCHED THEN
        INSERT (, , …)
        VALUES (S., S., …);

          BigQuery TRUNCATE TABLE Command

          • Initially, BigQuery did not support anything other than the SELECT statement.
          • However, a lot of improvements have been made to this Data Warehousing Solution to make it support additional statements. 
          • BigQuery now supports the BigQuery TRUNCATE TABLE command, which is good for those coming from an on-premise background.
          • The statement is used to delete data from a BigQuery table. When used, the BigQuery TRUNCATE TABLE command removes all data from a table but leaves the table’s metadata intact, even the table schema, labels, and description. 
          • When you use BigQuery’s DELETE DML statement to delete data from a table, you will incur a scan cost.
          • The good news is that the BigQuery TRUNCATE TABLE statement falls under the category of BigQuery free operations, so you don’t incur any costs when using this statement. Thus, if you need to reduce scan costs, use the BigQuery TRUNCATE TABLE command over DELETE when you want to delete all table contents. 

          Pro Tip : The BigQuery TRUNCATE TABLE statement removes all rows from a table but leaves the table metadata intact, including the table schema, description, and labels. Let’s try integration tips

          BigQuery TRUNCATE TABLE Command Syntax

          The BigQuery TRUNCATE TABLE command takes the following syntax:

          TRUNCATE TABLE [[project_name.]dataset_name.]table_name

          The parameters in the above syntax are described below:

          • project_name: This is the name of the project that contains the table. The project that runs this DDL (Data Definition Language) query is used as the default. 
          • dataset_name: This is the name of the dataset that contains the table. 
          • table_name: This is the name of the table that is to be truncated. 

          Note: TRUNCATE TABLE BigQuery command does not allow you to truncate models, views, materialized views, and external tables. However, the command allows you to apply Quotas and LIMITs to your statements. 

          BigQuery TRUNCATE TABLE Example Queries

          In this section, we will be giving practical examples of how to use the BigQuery TRUNCATE TABLE command. First, let’s create the table we will use to demonstrate this. 

          We will create a table named customer that stores data about the details of customers. We will use the SQL `CREATE TABLE` command for this as shown below:

          CREATE TABLE `project.dataset.customer` ( 
              id INT64,
              name STRING,
              country STRING,
              ts TIMESTAMP
          );
          • The CREATE TABLE statement creates a new table named customer in the project.dataset.
          • The table has four columns: id, name, country, and ts.
          • id is of type INT64 (a 64-bit integer), storing unique identifiers for customers.
          • name and country are STRING data types, storing text values for customer names and countries.
          • ts is a TIMESTAMP, storing date and time information.

          The table will store details like customer id, name, country, and the time they made a purchase. 

          Now that the table is ready, let us insert some data into it. We will use the SQL INSERT statement for this as shown below:

          INSERT `project.dataset.customer` (id, name, country, ts)
          VALUES(2347, 'John Doe', 'Scotland', TIMESTAMP("2021-12-01"));
          • The INSERT statement adds a new record into the customer table in the project.dataset.
          • The columns specified are id, name, country, and ts.
          • The values being inserted are 2347 for the id, 'John Doe' for the name, 'Scotland' for the country, and a timestamp of 2021-12-01.
          • TIMESTAMP("2021-12-01") converts the date string into a proper timestamp format.
          • This operation adds one new row of data to the customer table with the provided information.

          The above statement will insert one row into the table. Let us insert 3 other rows into the table:

          INSERT `project.dataset.customer` (customer_id, name, location, ts)
          VALUES
                (23348, 'Mercy Michaels',  'Netherlands',  TIMESTAMP("2021-12-01")),
                (23349, 'Joel Pogba',  'USA',  TIMESTAMP("2021-12-03")),
                (23350, 'Kate Drogba',  'Nigeria',  TIMESTAMP("2021-12-05"));

          We now have 4 rows in the table. 

          To delete the table data, we can use the BigQuery TRUNCATE TABLE statement. This is demonstrated below:

          TRUNCATE TABLE `project.dataset.customer`

          The command will delete all rows from the table.

          The distinction between TRUNCATE TABLE and DELETE

          • Speed: TRUNCATE TABLE is significantly faster than DELETE for removing all rows from a table. DELETE operates row-by-row, which can be time-consuming for large datasets.
          • Storage: TRUNCATE TABLE immediately releases the storage space occupied by the table’s data. DELETE may require additional processing to reclaim the storage space.
          • Logging: TRUNCATE TABLE generates less logging activity than DELETE, which can improve performance and reduce costs.
          • Use Cases:
            • TRUNCATE TABLE: Ideal for quickly clearing large tables, such as staging tables before loading new data, or when you need to start with a completely empty table.
            • DELETE: Suitable for removing specific rows based on conditions, such as removing outdated or invalid data.

          Best Practices for DML Commands

          • Data Validation: Always validate your data before executing DML operations to ensure accuracy and prevent unintended consequences.
          • Backups: Always maintain regular backups of your data before executing any TRUNCATE TABLE operation to mitigate the risk of data loss.
          • Testing: Test TRUNCATE TABLE operations on a development or staging environment before executing them on production data.
          • Monitoring: Monitor project-level quotas and resource consumption to ensure that TRUNCATE TABLE operations do not impact other activities within your project.
          Google BigQuery Performance Optimization: Best Practices Guide
          Download Your Free EBook Now

          FAQ on BigQuery TRUNCATE TABLE Command

          What is truncate table in BigQuery?

          The TRUNCATE TABLE statement removes all rows from a table but leaves the table metadata intact, including the table schema, description, and labels.

          What is the difference between truncate and DELETE tables?

          DELETE- removes one or multiple rows from a table using conditions.
          TRUNCATE- removes all the rows from a table without using any condition.

          What does truncate () do?

          Truncate () file method allows the user to resize the file to a given number of bytes when the file is accessed through the append mode.

          How do I truncate all data in a table?

          To truncate a table named, let’s say, ‘categories,’ use command:
          TRUNCATE TABLE Categories;

          What is the difference between append and truncate?

          While the append simply adds new data into a table with the original data present, truncate removes all the existing data before adding new data.

          Conclusion

          • The BigQuery TRUNCATE TABLE command is an essential operation in BigQuery to reduce query scan costs.
          • We recommend using TRUNCATE over DELETE whenever you need to delete the entire contents of a table.
          • Quickly delete all rows from a table in Snowflake using the TRUNCATE TABLE command while keeping the table’s structure intact. Learn more about its usage at Snowflake Truncate Table.
          Nicholas Samuel
          Technical Content Writer, Hevo Data

          Nicholas Samuel is a technical writing specialist with a passion for data, having more than 14+ years of experience in the field. With his skills in data analysis, data visualization, and business intelligence, he has delivered over 200 blogs. In his early years as a systems software developer at Airtel Kenya, he developed applications, using Java, Android platform, and web applications with PHP. He also performed Oracle database backups, recovery operations, and performance tuning. Nicholas was also involved in projects that demanded in-depth knowledge of Unix system administration, specifically with HP-UX servers. Through his writing, he intends to share the hands-on experience he gained to make the lives of data practitioners better.