BigQuery is one of the most popular and highly efficient analytics platforms that allow you to store, process, and analyze Big Data. In addition, BigQuery can process over 100 trillion rows and run more than 10,000 queries at the same time across the organization. It not only has advanced processing capabilities but also is highly scalable and fault-tolerant, which enables users to seamlessly implement advanced Analytics. One such advanced fault-tolerant feature is BigQuery Time Travel, which allows you to travel back in time for retrieving the previously deleted or modified data.

Google BigQuery Performance Optimization: Best Practices Guide
Download Your Free EBook Now

In this article, you will learn about BigQuery, BigQuery Time Travel, and how to time travel to retrieve the deleted BigQuery data.

Prerequisites

Fundamental knowledge of data analysis using SQL.

What is BigQuery?

BigQuery Logo
Image Source

Developed by Google in 2010, BigQuery is a Cloud-based data warehousing platform that allows you to store, manage, process, and analyze Big Data. BigQuery’s fully managed Serverless Architecture will enable you to implement high-end Analytics operations like Geospatial Analysis and Business Intelligence. It not only allows you to implement Big Data Analytics but also empowers you to build end-to-end Machine Learning Models. 

Since BigQuery has a Serverless Infrastructure, you can focus on analyzing Big Data and building Machine Learning models instead of concentrating on resource and infrastructure management. Furthermore, developers and data professionals can use client libraries of popular programming languages like Python, Java, and JavaScript, as well as BigQuery’s REST API and RPC API for transforming, managing, and analyzing BigQuery data.

Key Features of Google BigQuery

The serverless Data Warehouse has various built-in capabilities that help with data analysis and provide deep insights. Take a look at some of BigQuery’s most important features.

  • Fully Managed by Google: The Data Warehouse infrastructure is managed by Google. It keeps track of, updates, monitors, and distributes all of your data and information. Google will be notified if your task fails.
  • Easy Implementation: BigQuery is simple to implement because it doesn’t require any additional software, cluster deployment, virtual machines, or tools. BigQuery is a serverless Data Warehouse that is very cost-effective. To evaluate and solve queries, all you have to do is upload or directly stream your data and run SQL.
  • Speed: BigQuery can process a large number of rows in a matter of seconds. It can also perform terabyte-scale inquiries in seconds and petabyte-scale queries in minutes.
Seamlessly Migrate Data to BigQuery Using Hevo

Hevo Data provides a fully automated, no-code platform for migrating data to Google BigQuery, enabling real-time data integration from over 150 sources without the need for manual coding or infrastructure management.

What does Hevo Offer?

  • Automated Schema Mapping: Automatically adjusts schema changes between the source and BigQuery, simplifying complex data migration.
  • 150+ Source Integrations: Supports data migration from databases, SaaS applications, cloud storage, and more, including MySQL, PostgreSQL, Salesforce, MongoDB, and Kafka.
  • Real-time Data Sync: Hevo enables real-time data transfer, ensuring your data in BigQuery is always up-to-date for real-time analytics.
Get Started with Hevo for Free

What is BigQuery Time Travel? 

Users can use BigQuery to process and analyze large amounts of data in order to implement end-to-end Cloud Analytics. However, there is a possibility that data can be deleted accidentally or due to some man-made errors while working on BigQuery. As a result of such unpredictable errors, users will permanently lose their data if they haven’t backed up or kept a copy of the deleted data. 

To eliminate this complication, BigQuery allows users to retrieve and restore the deleted data within seven days of deletion. In other words, BigQuery allows users to travel back in time to retrieve or fetch the historical data that was deleted before seven days. With a simple SQL query, users can effectively retrieve the deleted data by providing the table name and time for returning results from a given table according to the specific Timestamp.

Integrate BigQuery to BigQuery
Integrate Magento via MySQL to BigQuery

How to access Historical data using BigQuery Time Travel? 

  • With the BigQuery Time Travel feature, you can easily query and fetch historical data that has been stored, updated, or deleted before. The BigQuery’s SQL clause named FOR SYSTEM TIME AS OF helps you to time travel for a maximum of seven days in the past to access the modified or deleted data. 
  • However, there are some limitations of the FOR SYSTEM TIME AS OF clause. 
  1. The deleted table that you want to retrieve must have previously been saved or stored in BigQuery; it cannot be an external table.
  2. To use the FOR SYSTEM TIME AS OF  clause for retrieving historical table data, there is no limit on table size, and the deleted BigQuery table can have an unlimited number of rows and columns. In other words, the FOR SYSTEM TIME AS OF fetches the deleted data from BigQuery irrespective of the table size.
  3. The source table of the FROM clause that includes FOR SYSTEM TIME AS OF in the BigQuery Time Travel query syntax should not be an ARRAY scan, including a flattened array or even the output of the UNNEST operator. The syntax should not include a common table expression defined by a WITH clause.
  4. The timestamp_expression in a BigQuery Time Travel query syntax must always be a constant expression and should not be in the form of subqueries and UDFs (User Defined Functions). The timestamp_expression should not be in the form of correlated references, which refers to columns that appear at a higher level of the query statement, like in the SELECT clause.
  5. Finally, the timestamp_expression in SQL syntax should not fall within a specific time range that includes the start and end times for retrieving data between the given range. The time period specified in the timestamp expression should never be longer than seven days.  
  • The FOR SYSTEM TIME AS OF clause allows you to switch to the previous versions of the specific table definition and rows with respect to the given timestamp or time expression. On providing the particular timestamp in the syntax of the SQL query, you can easily travel back to the given time for fetching the historical data. The below-given command is the sample SQL query to fetch and return historical data in BigQuery.
SELECT *
FROM `mydataset.mytable`
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);
  • On executing the query given above, you can fetch or return deleted data of the table named from exactly one hour before. You can only return a historical version of the table before one hour since you gave the FOR SYSTEM_TIME AS OF parameter as INTERVAL 1 HOUR. In the above SQL query, we just passed the fetching interval to return data from the particular table based on the given interval. However, you can also set a specific timestamp to point back in time for fetching historical data precisely according to the given timestamp.
  • For returning historical values in a table at an absolute or specific point in time, you can follow the sample SQL query given below. 
SELECT
count (distinct device_id) as devices,
count (*) as events,
FROM table_a 
FOR SYSTEM_TIME AS OF “point_in_time_timestmap”;
)
  • By following the above command, you are able to fetch the aggregate values from the table named table_a with respect to the given timestamp in the point_in_time_timestmap parameter. You should always make sure that the parameter is given in the form of yyyy-mm-dd hh:mm:ss. For example, the point_in_time_timestmap should resemble 2022-03-17 10:30:00
  • If the timestamp you provide is greater than seven days earlier or before the table was created, the SQL query will fail to execute and return an error message given below. 
Invalid snapshot time 1601168925462 for table
myproject:mydataset.table1@1601168925462. Cannot read before 1601573410026.
  • If you use the CREATE OR REPLACE TABLE statement for replacing the active and existing table, you could still use the FOR SYSTEM TIME AS OF clause to query the previous version of the table based on the recent interval command. 
  • When the existing table is accidentally deleted, the query fails and returns an error message, as given below.
Not found: Table myproject:mydataset.table was not found in location US
  • However, you can restore an active table by retrieving data from a specific point in time based on the timestamp, as explained in the above steps.
  • Instead of just retrieving deleted data, you can use the BigQuery Time Travel feature to directly copy the modified or deleted historical data into a new table. This direct copying of deleted data into a new table works even if the table was deleted or expired, as long as you retrieve and restore historical data within seven days of deletion.
  • For copying deleted historical data into the new table, you should use decorators in the SQL query. 
  • There are three types of decorators: “tableid@TIME,” “tableid@-TIME_OFFSET,“ and “tableid@0.“ 
  • The tableid@TIME decorator retrieves historical data based on a specific time in milliseconds, while the tableid@-TIME OFFSET decorator fetches data from the deleted table based on a specific time interval. For fetching the oldest available historical data (within a time period of seven days), you can use the tableid@0 decorator. 
  • Execute the code given below to fetch the historical data from a deleted table and copy it into the new table. 
bq cp mydataset.table1@-3600000 mydataset.table1_restored
  • In the above code, but is the load command in BigQuery, which allows you to transfer or load data from one table to another table. With the SQL query, you are fetching data from a table named “table1“ and copying it into the table called “table1_restored.“ Consequently, the query retrieves the one-hour old data from the table1 and stores it in the “table1_restored” table. The query treated the time as one hour because you provided the time in milliseconds, i.e.,3600000, which equals one hour. Note: (3600000=1 hour).

By following the above-given steps, you can easily travel back in time to retrieve the deleted or modified data in BigQuery.

Learn More About:

BigQuery Snapshots

Conclusion

In this article, you learned about BigQuery and how to time travel to retrieve the deleted BigQuery data. The BigQuery Time Travel feature is one of the unique and advanced features of BigQuery that allows users to recover the lost data by traveling back in time and bringing them back. However, BigQuery also has more advanced features like teleportation, miniaturization, and multiverse for implementing cloud analytics, which you can explore later. In case you want to export data from a source of your choice into your desired Database/destination like BigQuery then Hevo Data is the right choice for you! 

FAQs

1. Does BigQuery support time travel?

Yes, BigQuery supports time travel through its feature called “BigQuery History.” It allows users to query historical data from up to 7 days prior by specifying a timestamp in the FOR SYSTEM_TIME AS OF clause, enabling recovery and analysis of data as it existed at a specific point in time.

2. What is the difference between snapshot and time travel in BigQuery?

Time travel provides temporary, point-in-time views(up to 7 days old), while snapshots create a permanent, static copy of a particular moment.

3. What is time partitioning in BigQuery?

Time partitioning in BigQuery organizes tables into segments based on a timestamp column, improving query performance and management.

Ishwarya M
Technical Content Writer, Hevo Data

Ishwarya is a skilled technical writer with over 5 years of experience. She has extensive experience working with B2B SaaS companies in the data industry, she channels her passion for data science into producing informative content that helps individuals understand the complexities of data integration and analysis.