In this article, you will gain information about Google BigQuery Table Snapshots. You will also gain a holistic understanding of Google BigQuery, its key features, usage of BigQuery Table Snapshots, example use cases of BigQuery Table Snapshots and their limitations. Read along to find out in-depth information about undergoing Google BigQuery Snapshots.
In this article, you will gain information about Google BigQuery Table Snapshots. You will also gain a holistic understanding of Google BigQuery, its key features, usage of BigQuery Table Snapshots, example use cases of BigQuery Table Snapshots and their limitations. Read along to find out in-depth information about undergoing Google BigQuery Table Snapshots.
What is Google BigQuery?
Google BigQuery is a cloud-based data warehouse that provides a big data analytics web service that processes petabytes of data. It is intended to analyze data on a large scale. It consists of two distinct components: Storage and Query Processing.
It employs the Dremel Query Engine to process queries and is built on the Colossus File System for storage. These two components are decoupled and can be scaled independently and on demand. Google BigQuery is fully managed by Cloud service providers. We don’t need to deploy any resources, such as discs or virtual machines.
Key Features of Google BigQuery
Some of the key features of Google BigQuery are as follows:
1) Scalable Architecture
BigQuery has a scalable architecture and offers a petabyte scalable system that users can scale up and down as per load.
2) Faster Processing
Being a scalable architecture, BigQuery executes petabytes of data within the stipulated time and is more rapid than many conventional systems. BigQuery allows users to run analysis over millions of rows without worrying about scalability.
3) Fully-Managed
BigQuery is a product of the Google Cloud Platform, and thus, it offers fully managed and serverless systems.
4) Security
BigQuery has the utmost security level, protecting the data while resting and in flight.
5) Real-time Data Ingestion
BigQuery can perform real-time data analysis, thereby making it famous across all the IoT and Transaction platforms.
6) Fault Tolerance
BigQuery offers replication that replicates data across multiple zones or regions. It ensures consistent data availability when the region/zones go down.
7) Pricing Models
The Google BigQuery platform is available in both on-demand and flat-rate subscription models. Although data storage and querying will be charged, exporting, loading, and copying data is free. It has separated computational resources from storage resources. You are only charged when you run queries. The quantity of data processed during searches is billed.
Facing challenges migrating your data to BigQuery? Migrating your data can become seamless with Hevo’s no-code intuitive platform. With Hevo, you can:
- Automate Data Extraction: Effortlessly pull data from 150+ connectors(and other 60+ free sources).
- Transform Data effortlessly: Use Hevo’s drag-and-drop feature to transform data with just a few clicks.
- Seamless Data Loading: Quickly load your transformed data into your desired destinations, such as BigQuery.
Try Hevo and join a growing community of 2000+ data professionals who rely on us for seamless and efficient migrations.
Get Started with Hevo for Free
What is a BigQuery Table Snapshot?
A BigQuery Table Snapshot is a read-only, static view of a BigQuery table. It provides a convenient way to quickly create a derived copy of any table in BigQuery. The BigQuery Snapshot represents the state of a particular source table as it existed at a particular point in time. It is transactionally consistent with the derived table as of the moment of the BigQuery snapshot’s creation and it is stored as a separate table in BigQuery.
You can capture a BigQuery snapshot of a current table or any table as it was at a specific point in time within the past 7 days. You can also set a retention period. When this period elapses from the time the BigQuery snapshot was created, the table gets deleted.
You can also query a BigQuery table snapshot in the same way as you would any other BigQuery table. However, you cannot modify the contents of the BigQuery snapshot because the snapshot table is immutable. To achieve this, you would need to create a standard table from the table snapshot.
Why Use BigQuery Table Snapshots?
There are several benefits of using BigQuery snapshots which include:
- Extending user access to data for longer than the default seven days window: With BigQuery’s time travel feature, you can only access a table’s historical data within a seven days window from the time it was created. However, with table snapshots, you can preserve a table’s data from a specified point in time for as long as you want.
- Capturing SCD (Slowly Changing Dimension): BigQuery table snapshots can be used to capture and store changes in data over time. This enables historical reporting based on how the data would’ve appeared in the past. For example, you can create a table snapshot at the end of a given time period (such as at the end of the day) for auditing or regulatory compliance. You can also maintain end-of-period snapshots (such as at the end of a financial quarter) indefinitely. This allows you to run queries against this data; for example, to investigate organizational performance.
- Enabling PITR (Point-in time recovery): In the event of a user error either by mistake or due to lack of knowledge; for example, when loading new data, you can revert the source database to the state it was in when a given table snapshot was created. Any data loss is therefore confined to updates to the table since the BigQuery snapshot’s creation.
- For CDP (Continuous Data Protection): By creating database snapshots at regular intervals, you can mitigate the impact of critical database errors. When data loses for minutes or a few hours is unacceptable, you can create a series of database snapshots spanning enough time to respond to events such as data corruption, a human-caused event, such as a malicious attack, or accidental deletion of data. For instance, you might schedule 12 to 24 rolling BigQuery snapshots spanning a 24-hour interval.
- Minimize storage cost: It can be used to maintain instant backups that do not incur additional charges. Table snapshots have minimal impact on storage costs since when a table snapshot is created, there is initially no storage cost for the table snapshot up until when changes are made to the base table. The table snapshot shares its underlying storage with the base table. As such, a table snapshot typically uses less storage than a full copy of the table.
The following sections contain examples of creating, updating, querying, deleting, and restoring snapshots. These actions will be performed using the BigQuery CLI, the Google Cloud Console web interface and SQL client.
Prerequisites
- To run these example queries, you will need:
- A Google Cloud project.
- Familiarity with the Google Cloud Console.
- Basic understanding of Standard SQL statements.
- Any of the following Identity and Access Management (IAM) roles that grant you the necessary permissions to perform each task on this page:
- bigquery.dataEditor
- bigquery.dataOwner
- bigquery.admin
Load your Data Seamlessly from any Source to BigQuery!
No credit card required
Example Use Cases of BigQuery Snapshots
This section contain examples of creating, updating, querying, deleting, and restoring snapshots. These actions will be performed using the BigQuery CLI, the Google Cloud Console web interface and SQL client.
1) Create a Table Snapshot in BigQuery
When creating a table snapshot, you specify the table you want to snapshot and a unique name for the table snapshot. You can optionally capture the time of the snapshot, set the table snapshot’s expiration date, include a description, as well as a label.
Ideally, you should create a BigQuery snapshot inside a different dataset from the base table. This decision is optional and by no means a requirement, but by doing so, you will be able to restore the source table from its snapshot even if the source table’s dataset gets accidentally deleted.
The following is the syntax for creating a BigQuery table snapshot using a DDL statement:
CREATE SNAPSHOT TABLE
my_project.backup_dataset.backup_table_snapshot
CLONE my_project.source_dataset.base_table
Note:
– project is the name of your GCP project.backup_dataset is the name of the dataset that will hold your table snapshot. – backup_table_snapshot is the name of the table snapshot. – source_dataset is the name of the dataset that will store your source or derived table. – base_table is the name of the table that you’re snapshotting. |
The following example creates a table snapshot of the ga_sessions table from the Google Analytics 360 dataset. This dataset is part of the BigQuery public dataset program. BigQuery Public Datasets are datasets that Google BigQuery hosts for you. You can access them via your Google Cloud project and integrate them into your applications.
CREATE SNAPSHOT TABLE
my_dataset.ga_sessions_backup
CLONE 'bigquery-public-data.google_analytics_sample.ga_sessions_20170301'
OPTIONS(
expiration_timestamp = TIMESTAMP "2023-01-01 00:00:00 UTC",
description = "GA Sessions table snapshot that expires in 2023",
labels = [("org_unit", "development")]
)
To run this example, you can carry out the following steps:
- Step 1: In your browser, open the BigQuery page using the following link: https://console.cloud.google.com/bigquery
- Step 2: In your current project, you can create a new dataset named my_dataset.
- Step 3: Select the “Compose new query” option in the Cloud Console.
- Step 4: Type the following CREATE SNAPSHOT TABLE DDL statement into the Query editor text area.
CREATE SNAPSHOT TABLE SNAPSHOT_PROJECT_ID.SNAPSHOT_DATASET_NAME.SNAPSHOT_NAME
CLONE TABLE_PROJECT_ID.TABLE_DATASET_NAME.TABLE_NAME
OPTIONS (
expiration_timestamp = TIMESTAMP 'TIMESTAMP_VALUE');
- Step 5: Click the “Run” button.
This query will create a snapshot of the ga_sessions table named ga_sessions_backup which is set to expire on January 1, 2023. The table will be stored in the my_dataset dataset. The query statement also includes an optional description and label.
2) List Table Snapshots in a BigQuery Dataset
Table snapshots in BigQuery come under the type SNAPSHOT. BigQuery maintains a table named INFORMATION_SCHEMA.TABLE_SNAPSHOTS which stores information about the table snapshots that are created in a dataset.
To get a list of all the table snapshots stored in a dataset you can specify the type as follows:
SELECT *
FROM my_dataset.INFORMATION_SCHEMA.TABLE_SNAPSHOTS;
You can also list the table snapshots of a specific table by running the following query:
SELECT * FROM my_dataset.INFORMATION_SCHEMA.TABLE_SNAPSHOTS
WHERE base_table_name="my_table";
3) Copy a Table Snapshot
You can create a copy of a BigQuery table snapshot using a SQL statement with the same syntax you used when creating a snapshot.
For example:
CREATE SNAPSHOT TABLE
my_project.my_dataset.my_snapshot
CLONE my_project.my_backup_dataset.my_snapshot_backup
4) Update Table Snapshot Properties
BigQuery table snapshots are immutable and you cannot change their data or schema. However, it’s possible to update the metadata of a table snapshot. You can update the following:
- Description
- Expiration time
- Access policy
A) Update Description
To update the description of the ga_sessions_backup snapshot table that you just created, the steps to be carried out are as follows:
- Step 1: In your browser, you can open the BigQuery page using the following link: https://console.cloud.google.com/bigquery
- Step 2: In the Explorer panel, expand the project and dataset nodes of the my_dataset.ga_sessions_backup table snapshot.
- Step 3: Click on the name of your table snapshot. In this example, the table name is ga_sessions_backup.
- Step 4: Next, click the “Details” tab, and then click on “Edit Details“.
- Step 5: Update the description of your table snapshot inside the Description field and then click the “Save” button.
B) Update Expiration Time
To update the expiration table of the ga_sessions_backup snapshot table, the steps to be carried out are as follows:
- Step 1: Open the BigQuery page using the following link: https://console.cloud.google.com/bigquery
- Step 2: In the Explorer panel, expand the project and dataset nodes of the my_dataset.ga_sessions_backup table snapshot.
- Step 3: Click the name of your table snapshot. In this example, the table name is ga_sessions_backup.
- Step 4: Next, click the “Details” tab, and then click on “Edit Details“.
- Step 5: Update the expiration time of your table snapshot inside the Expiration time field and then click the “Save” button.
C) Update Access Policy
You can also grant or revoke access to your table snapshot in the BigQuery console. For example, to grant viewer access to your table snapshot to a user with the email johndoe@gmail.com, you can carry the following steps:
- Step 1: Open the BigQuery page using the following link: https://console.cloud.google.com/bigquery
- Step 2: In the Explorer panel, expand the project and dataset nodes of the my_dataset.ga_sessions_backup table snapshot.
- Step 3: Click the name of your table snapshot which in this example is ga_sessions_backup.
- Step 4: In the ga_sessions_backup panel that opens, click the “Share” button.
- Step 5: Next, click on “Add principal” option.
- Step 6: Enter the email address of the user that you wish to give access to, which in this example is johndoe@gmail.com.
- Step 7: In the “Select a role” dropdown menu, select the “BigQuery Data Viewer” option and then click on the “Save” button.
5) Query a Table Snapshot
Querying a BigQuery snapshot table is no different from querying other types of BigQuery tables.
For example, let’s run a query on the my_dataset.ga_sessions_backup table to see the total pageviews the website received on 1st March 2017:
SELECT SUM(totals.pageviews) AS TotalPageviews
FROM my_dataset.ga_sessions_backup
6) Delete a Table Snapshot
You can delete a table snapshot using the DROP SNAPSHOT TABLE Standard SQL statement.
For example, to delete the ga_sessions_backup table snapshot, on the BigQuery page, you can run the following SQL statement in the query editor:
DROP SNAPSHOT TABLE
my_dataset.ga_sessions_backup
BigQuery also provides the functionality to automatically clean up table snapshots after a certain time. You can either specify the expiration time when creating the table snapshot or set a default table expiration time for the dataset.
7) Restore Deleted Table Snapshot
You can restore a table snapshot that was deleted either maliciously or accidentally as long as it’s within seven days of deletion using the BigQuery CLI. Learn to restore deleted data using time travel in BigQuery.
For example, you enter the following command to copy the old_dataset.old_table table at the time March 23th 2022 (EPOCH:1647968405), into a new table new_dataset.new_table.
bq cp my_project.old_dataset.old_table@1647968405 my_project.new_dataset.new_table
Limitations of BigQuery Snapshots
- BigQuery Snapshots are designated as read-only, meaning that you cannot perform DML operations (insert/update/delete/etc.) or perform certain DDL operations.
- You can only capture a table’s data as it was seven days ago or more recently due to BigQuery’s seven-day default retention period for tables.
- A table snapshot should have a unique name because you cannot overwrite existing tables within a dataset.
- The external tables are not supported.
Integrate Aftership to BigQuery
Integrate Drip to BigQuery
Integrate Facebook Ads to BigQuery
Explore the comprehensive guide on BigQuery dataset to understand how to effectively organize and utilize your data.
Conclusion
In this article, you have learned about Google BigQuery Table Snapshots. This article also provided information on Google BigQuery, its key features, usage of BigQuery Table Snapshots, example use cases of BigQuery Table Snapshots, and their limitations.
Hevo Data, with its strong integration with 150+ Data Sources (including 40+ Free Sources), allows you to not only export data from your desired data sources & load it to the destination of your choice, such as Google BigQuery, but also transform & enrich your data to make it analysis-ready.
Want to give Hevo a try? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand. You may also have a look at the amazing Hevo Price, which will assist you in selecting the best plan for your requirements.Share your experience of understanding BigQuery Snapshots in the comment section below! We would love to hear your thoughts.
FAQs
1. What is the snapshot function in BigQuery?
The snapshot function in BigQuery creates a point-in-time copy of a table, preserving its data as it existed at that specific moment. This allows users to query historical data without affecting the current live table, useful for auditing and time-based analysis.
2. What is the difference between a GCP snapshot and a backup?
In GCP, a snapshot captures the state of a disk at a specific point in time and is typically used for fast recovery or cloning. A backup, on the other hand, is a more comprehensive data protection strategy that involves storing copies of data (files, databases) to ensure recovery from broader data loss, often over longer periods.
3. What is the difference between a snapshot and a table in BigQuery?
In BigQuery, a table is a dynamic dataset that can be queried and updated, while a snapshot is a read-only, point-in-time copy of a table. Unlike tables, snapshots preserve the data as it existed at a specific time and cannot be modified or updated.
Skand is a dedicated Customer Experience Engineer at Hevo Data, specializing in MySQL, Postgres, and REST APIs. With three years of experience, he efficiently troubleshoots customer issues, contributes to the knowledge base and SOPs, and assists customers in achieving their use cases through Hevo's platform.