BigQuery Snapshots: How to Create, Use & Query Table Snapshots Simplified 101

on Data Warehouse, Google BigQuery, SQL, Tutorials • March 24th, 2022 • Write for Hevo

Venturing into Data Science and deciding on a tool to use to solve a given problem can be challenging at times especially when you have a wide array of choices. In this age of data transformation where organizations are constantly seeking out ways to improve the day to day handling of data being produced and looking for methods to minimize the cost of having these operations, it has become imperative to handle such data transformations in the Cloud as it is a lot easier to manage and is also cost-efficient.

Data Warehousing architectures have rapidly changed over the years and most of the notable service providers are now Cloud-based. Therefore, companies are increasingly on the move to align with such offerings on the Cloud as it provides them with a lower upfront cost, enhances scalability, and performance as opposed to traditional On-premise Data Warehousing systems. Google BigQuery is among one of the well-known and widely accepted Cloud-based Data Warehouse Applications.

In this article, you will gain information about Google BigQuery Table Snapshots. You will also gain a holistic understanding of Google BigQuery, its key features, usage of BigQuery Table Snapshots, example use cases of BigQuery Table Snapshots and their limitations. Read along to find out in-depth information about undergoing Google BigQuery Table Snapshots.

Table of Contents

Prerequisites

To run these example queries, you will need: 

  • A Google Cloud project.
  • A browser, such as GoogleChrome or Safari.
  • Familiarity with the Google Cloud Console.
  • Basic understanding of Standard SQL statements.
  • Any of the following Identity and Access Management (IAM) roles that grant you the necessary permissions to perform each task on this page:
    • bigquery.dataEditor
    • bigquery.dataOwner
    • bigquery.admin

What is Google BigQuery?

BigQuery Snapshots - Google BigQuery
Image Source

Google BigQuery is a Cloud-based Data Warehouse that provides a Big Data Analytic Web Service for processing petabytes of data. It is intended for analyzing data on a large scale. It consists of two distinct components: Storage and Query Processing. It employs the Dremel Query Engine to process queries and is built on the Colossus File System for storage. These two components are decoupled and can be scaled independently and on-demand.

Google BigQuery is fully managed by Cloud service providers. We don’t need to deploy any resources, such as discs or virtual machines. It is designed to process read-only data. Dremel and Google BigQuery use Columnar Storage for quick data scanning, as well as a tree architecture for executing queries using ANSI SQL and aggregating results across massive computer clusters. Furthermore, owing to its short deployment cycle and on-demand pricing, Google BigQuery is serverless and designed to be extremely scalable.

For further information about Google Bigquery, follow the Official Documentation.

Key Features of Google BigQuery

BigQuery Snapshots - Key Features
Image Source

Some of the key features of Google BigQuery are as follows:

1) Scalable Architecture

BigQuery has a scalable architecture and offers a petabyte scalable system that users can scale up and down as per load.

2) Faster Processing

Being a scalable architecture, BigQuery executes petabytes of data within the stipulated time and is more rapid than many conventional systems. BigQuery allows users to run analysis over millions of rows without worrying about scalability.

3) Fully-Managed

BigQuery is a product of Google Cloud Platform, and thus it offers fully managed and serverless systems.

4) Security

BigQuery has the utmost security level that protects the data at rest and in flight. 

5) Real-time Data Ingestion

BigQuery can perform real-time data analysis, thereby making it famous across all the IoT and Transaction platforms.

6) Fault Tolerance

BigQuery offers replication that replicates data across multiple zones or regions. It ensures consistent data availability when the region/zones go down.

7) Pricing Models

The Google BigQuery platform is available in both on-demand and flat-rate subscription models. Although data storage and querying will be charged, exporting, loading, and copying data is free. It has separated computational resources from storage resources. You are only charged when you run queries. The quantity of data processed during searches is billed.

Simplify BigQuery ETL with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 100+ Data Sources (including 40+ Free Sources) to a Data Warehouse such as Google BigQuery or Destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. In addition to the 100+ data sources, Hevo also allows integrating data from non-native sources using Hevo’s in-built Webhooks Connector.

Get Started with Hevo for Free

Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Connectors: Hevo supports 100+ Integrations to SaaS platforms, Files, Databases, BI tools such as Tableau, and Native REST API & Webhooks Connectors. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; Databricks (Connector Live Soon!); and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.  
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (Including 40+ Free Sources) that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

What is a BigQuery Table Snapshot?

Image Source

A BigQuery Table Snapshot is a read-only, static view of a BigQuery table. It provides a convenient way to quickly create a derived copy of any table in BigQuery. The BigQuery Snapshot represents the state of a particular source table as it existed at a particular point in time. It is transactionally consistent with the derived table as of the moment of the BigQuery snapshot’s creation and it is stored as a separate table in BigQuery. 

You can capture a BigQuery snapshot of a current table or any table as it was at a specific point in time within the past 7 days. You can also set a retention period for the BigQuery snapshot. When this period elapses from the time the BigQuery snapshot was created, the table gets deleted.

You can also query a BigQuery table snapshot in the same way as you would any other BigQuery table. However, you cannot modify the contents of the BigQuery snapshot because the snapshot table is immutable. To achieve this, you would need to create a standard table from the table snapshot.

Why Use BigQuery Table Snapshots?

There are several benefits of using BigQuery snapshots which include:

  • Extending user access to data for longer than the default seven days window: With BigQuery’s time travel feature, you can only access a table’s historical data within a seven days window from the time it was created. However, with table snapshots, you can preserve a table’s data from a specified point in time for as long as you want.
  • Capturing SCD (Slowly Changing Dimension): BigQuery table snapshots can be used to capture and store changes in data over time. This enables historical reporting based on how the data would’ve appeared in the past. For example, you can create a table snapshot at the end of a given time period (such as at the end of the day) for auditing or regulatory compliance. You can also maintain end-of-period snapshots (such as at the end of a financial quarter) indefinitely. This allows you to run queries against this data; for example, to investigate organizational performance.
  • Enabling PITR (Point-in time recovery): In the event of a user error either by mistake or due to lack of knowledge; for example, when loading new data, you can revert the source database to the state it was in when a given table snapshot was created. Any data loss is therefore confined to updates to the table since the BigQuery snapshot’s creation.
  • For CDP (Continuous Data Protection): By creating database snapshots at regular intervals, you can mitigate the impact of critical database errors. When data loses for minutes or a few hours is unacceptable, you can create a series of database snapshots spanning enough time to respond to events such as data corruption, a human-caused event, such as a malicious attack, or accidental deletion of data. For instance, you might schedule 12 to 24 rolling BigQuery snapshots spanning a 24-hour interval.
  • Minimize storage cost: BigQuery Snapshots can be used to maintain instant backups that do not incur additional charges. Table snapshots have minimal impact on storage costs since when a table snapshot is created, there is initially no storage cost for the table snapshot up until when changes are made to the base table. The table snapshot shares its underlying storage with the base table. As such, a table snapshot typically uses less storage than a full copy of the table.

The following sections contain examples of creating, updating, querying, deleting, and restoring snapshots. These actions will be performed using the BigQuery CLI, the Google Cloud Console web interface and SQL client.

Example Use Cases of BigQuery Snapshots

This section contain examples of creating, updating, querying, deleting, and restoring snapshots. These actions will be performed using the BigQuery CLI, the Google Cloud Console web interface and SQL client.

The different example use cases of BigQuery Snapshots are as follows:

1) Create a Table Snapshot in BigQuery

When creating a table snapshot, you specify the table you want to snapshot and a unique name for the table snapshot. You can optionally capture the time of the snapshot, set the table snapshot’s expiration date, include a description, as well as a label.

Ideally, you should create a BigQuery snapshot inside a different dataset from the base table. This decision is optional and by no means a requirement, but by doing so, you will be able to restore the source table from its snapshot even if the source table’s dataset gets accidentally deleted.

The following is the syntax for creating a BigQuery table snapshot using a DDL statement:

CREATE SNAPSHOT TABLE
  my_project.backup_dataset.backup_table_snapshot
  CLONE my_project.source_dataset.base_table

Where:

  • project is the name of your GCP project.
  • backup_dataset is the name of the dataset that will hold your table snapshot.
  • backup_table_snapshot is the name of the table snapshot.
  • source_dataset is the name of the dataset that will store your source or derived table.
  • base_table is the name of the table that you’re snapshotting.

The following example creates a table snapshot of the ga_sessions table from the Google Analytics 360 dataset. This dataset is part of the BigQuery public dataset program. BigQuery Public Datasets are datasets that Google BigQuery hosts for you. You can access them via your Google Cloud project and integrate them into your applications. 

To run this example, you can carry out the following steps:

  • Step 1: In your browser, open the BigQuery page using the following link: https://console.cloud.google.com/bigquery
  • Step 2: In your current project, you can create a new dataset named my_dataset.
  • Step 3: Select the “Compose new query” option in the Cloud Console.
  • Step 4: Type the following CREATE SNAPSHOT TABLE DDL statement into the Query editor text area.
CREATE SNAPSHOT TABLE
  my_dataset.ga_sessions_backup
  CLONE 'bigquery-public-data.google_analytics_sample.ga_sessions_20170301'
  OPTIONS(
  expiration_timestamp = TIMESTAMP "2023-01-01 00:00:00 UTC",
  description = "GA Sessions table snapshot that expires in 2023",
  labels = [("org_unit", "development")]
  )
  •  Step 5: Click the “Run” button.

This query will create a snapshot of the ga_sessions table named ga_sessions_backup which is set to expire on January 1, 2023. The table will be stored in the my_dataset dataset. The query statement also includes an optional description and label.

2) List Table Snapshots in a BigQuery Dataset

Table snapshots in BigQuery come under the type SNAPSHOT. BigQuery maintains a table named INFORMATION_SCHEMA.TABLE_SNAPSHOTS which stores information about the table snapshots that are created in a dataset.

To get a list of all the table snapshots stored in a dataset you can specify the type as follows:

SELECT *
  FROM my_dataset.INFORMATION_SCHEMA.TABLE_SNAPSHOTS;

You can also list the table snapshots of a specific table by running the following query:

SELECT * FROM my_dataset.INFORMATION_SCHEMA.TABLE_SNAPSHOTS
WHERE base_table_name="my_table";

3) Copy a Table Snapshot

You can create a copy of a BigQuery table snapshot using a SQL statement with the same syntax you used when creating a snapshot.

For example:

CREATE SNAPSHOT TABLE
  my_project.my_dataset.my_snapshot
  CLONE my_project.my_backup_dataset.my_snapshot_backup

4) Update Table Snapshot Properties

BigQuery table snapshots are immutable and you cannot change their data or schema. However, it’s possible to update the metadata of a table snapshot. You can update the following:

  • Description
  • Expiration time
  • Access policy

A) Update Description

To update the description of the ga_sessions_backup snapshot table that you just created, the steps to be carried out are as follows:

  • Step 1: In your browser, you can open the BigQuery page using the following link: https://console.cloud.google.com/bigquery
  • Step 2: In the Explorer panel, expand the project and dataset nodes of the my_dataset.ga_sessions_backup table snapshot.
  • Step 3: Click on the name of your table snapshot. In this example, the table name is ga_sessions_backup.
  • Step 4: Next, click the “Details” tab, and then click on “Edit Details“.
  • Step 5: Update the description of your table snapshot inside the Description field and then click the “Save” button.

B) Update Expiration Time

To update the expiration table of the ga_sessions_backup snapshot table, the steps to be carried out are as follows:

  • Step 1: Open the BigQuery page using the following link: https://console.cloud.google.com/bigquery
  • Step 2: In the Explorer panel, expand the project and dataset nodes of the my_dataset.ga_sessions_backup table snapshot.
  • Step 3: Click the name of your table snapshot. In this example, the table name is ga_sessions_backup.
  • Step 4: Next, click the “Details” tab, and then click on “Edit Details“.
  • Step 5: Update the expiration time of your table snapshot inside the Expiration time field and then click the “Save” button.

C) Update Access Policy

You can also grant or revoke access to your table snapshot in the BigQuery console. For example, to grant viewer access to your table snapshot to a user with the email johndoe@gmail.com, you can carry the following steps:

  • Step 1: Open the BigQuery page using the following link: https://console.cloud.google.com/bigquery
  • Step 2: In the Explorer panel, expand the project and dataset nodes of the my_dataset.ga_sessions_backup table snapshot.
  • Step 3: Click the name of your table snapshot which in this example is ga_sessions_backup.
  • Step 4: In the ga_sessions_backup panel that opens, click the “Share” button.
  • Step 5: Next, click on “Add principal” option.
  • Step 6: Enter the email address of the user that you wish to give access to, which in this example is johndoe@gmail.com.
  • Step 7: In the “Select a role” dropdown menu, select the “BigQuery Data Viewer” option and then click on the “Save” button.

5) Query a Table Snapshot

Querying a BigQuery snapshot table is no different from querying other types of BigQuery tables.

For example, let’s run a query on the my_dataset.ga_sessions_backup table to see the total pageviews the website received on 1st March 2017:

SELECT SUM(totals.pageviews) AS TotalPageviews
FROM my_dataset.ga_sessions_backup

6) Delete a Table Snapshot

You can delete a table snapshot using the DROP SNAPSHOT TABLE Standard SQL statement.

For example, to delete the ga_sessions_backup table snapshot, on the BigQuery page, you can run the following SQL statement in the query editor:

DROP SNAPSHOT TABLE
  my_dataset.ga_sessions_backup

BigQuery also provides the functionality to automatically clean up table snapshots after a certain time. You can either specify the expiration time when creating the table snapshot or set a default table expiration time for the dataset.

7) Restore Deleted Table Snapshot

You can restore a table snapshot that was deleted either maliciously or accidentally as long as it’s within seven days of deletion using the BigQuery CLI.

For example, you enter the following command to copy the old_dataset.old_table table at the time March 23th 2022 (EPOCH:1647968405), into a new table new_dataset.new_table.

bq cp my_project.old_dataset.old_table@1647968405 my_project.new_dataset.new_table

Limitations of BigQuery Snapshots

  • BigQuery Snapshots are designated as read-only meaning that you cannot perform DML operations (insert/update/delete/etc.) as well as certain DDL operations.
  • You can only capture a BigQuery snapshot of a table’s data as it was seven days ago or more recently, due to BigQuery’s seven-day default retention period for tables.
  • A table snapshot should have a unique name because you cannot overwrite existing tables within a dataset.
  • BigQuery Snapshots of external tables are not supported.

Conclusion

In this article, you have learned about Google BigQuery Table Snapshots. This article also provided information on Google BigQuery, its key features, usage of BigQuery Table Snapshots, example use cases of BigQuery Table Snapshots and their limitations.

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.

Visit our Website to Explore Hevo

Hevo Data with its strong integration with 100+ Data Sources (including 40+ Free Sources) allows you to not only export data from your desired data sources & load it to the destination of your choice such as Google BigQuery but also transform & enrich your data to make it analysis-ready. Hevo also allows integrating data from non-native sources using Hevo’s in-built REST API & Webhooks Connector. You can then focus on your key business needs and perform insightful analysis using BI tools. 

Want to give Hevo a try? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing price, which will assist you in selecting the best plan for your requirements.

Share your experience of understanding BigQuery Snapshots in the comment section below! We would love to hear your thoughts.

No-code Data Pipeline for Google BigQuery