BigQuery Time Travel: How to access Historical Data? | Easy Steps

on BigQuery Functions, Google BigQuery • March 21st, 2022 • Write for Hevo

BigQuery Time Travel FI

BigQuery is one of the most popular and highly efficient analytics platforms that allow you to store, process, and analyze Big Data. In addition, BigQuery can process over 100 trillion rows and run more than 10,000 queries at the same time across the organization. It not only has advanced processing capabilities but also is highly scalable and fault-tolerant, which enables users to seamlessly implement advanced Analytics. One such advanced fault-tolerant feature is BigQuery Time Travel, which allows you to travel back in time for retrieving the previously deleted or modified data.

In this article, you will learn about BigQuery, BigQuery Time Travel, and how to time travel to retrieve the deleted BigQuery data.

Table of Contents

Prerequisites

Fundamental knowledge of data analysis using SQL.

What is BigQuery?

BigQuery Time Travel: logo
Image Source

Developed by Google in 2010, BigQuery is a Cloud-based data warehousing platform that allows you to store, manage, process, and analyze Big Data. BigQuery’s fully managed Serverless Architecture will enable you to implement high-end Analytics operations like Geospatial Analysis and Business Intelligence. It not only allows you to implement Big Data Analytics but also empowers you to build end-to-end Machine Learning Models. 

Since BigQuery has a Serverless Infrastructure, you can focus on analyzing Big Data and building Machine Learning models instead of concentrating on resource and infrastructure management. Furthermore, developers and data professionals can use client libraries of popular programming languages like Python, Java, and JavaScript, as well as BigQuery’s REST API and RPC API for transforming, managing, and analyzing BigQuery data.

Key Features of Google BigQuery

The serverless Data Warehouse has various built-in capabilities that help with data analysis and provide deep insights. Take a look at some of BigQuery’s most important features.

  • Fully Managed by Google: The Data Warehouse infrastructure is managed by Google. It keeps track of, updates, monitors, and distributes all of your data and information. Google will be notified if your task fails.
  • Easy Implementation: BigQuery is simple to implement because it doesn’t require any additional software, cluster deployment, virtual machines, or tools. BigQuery is a serverless Data Warehouse that is very cost-effective. To evaluate and solve queries, all you have to do is upload or directly stream your data and run SQL.
  • Speed: BigQuery can process a large number of rows in a matter of seconds. It can also perform terabyte-scale inquiries in seconds and petabyte-scale queries in minutes.

Simplify BigQuery ETL with Hevo’s No-code Data Pipeline

Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up Data Integration for 100+ Data Sources (including 40+ Free sources) and will let you directly load data from sources like Google Data Studio to a Data Warehouse or the Destination of your choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data. 

Get Started with Hevo for Free

Let’s look at some of the salient features of Hevo:

  • Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Connectors: Hevo supports 100+ Integrations to SaaS platforms FTP/SFTP, Files, Databases, BI tools, and Native REST API & Webhooks Connectors. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake, Firebolt, Data Warehouses; Amazon S3 Data Lakes; Databricks; and MySQL, SQL Server, TokuDB, MongoDB, PostgreSQL Databases to name a few.  
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within Data Pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

What is BigQuery Time Travel? 

Users can use BigQuery to process and analyze large amounts of data in order to implement end-to-end Cloud Analytics. However, there is a possibility that data can be deleted accidentally or due to some man-made errors while working on BigQuery. As a result of such unpredictable errors, users will permanently lose their data if they haven’t backed up or kept a copy of the deleted data. 

To eliminate this complication, BigQuery allows users to retrieve and restore the deleted data within seven days of deletion. In other words, BigQuery allows users to travel back in time to retrieve or fetch the historical data that was deleted before seven days. With a simple SQL query, users can effectively retrieve the deleted data by providing the table name and time for returning results from a given table according to the specific Timestamp.

How to access Historical data using BigQuery Time Travel? 

  • With the BigQuery Time Travel feature, you can easily query and fetch historical data that has been stored, updated, or deleted before. The BigQuery’s SQL clause named FOR SYSTEM TIME AS OF helps you to time travel for a maximum of seven days in the past to access the modified or deleted data. 
  • However, there are some limitations of the FOR SYSTEM TIME AS OF clause. 
  1. The deleted table that you want to retrieve must have previously been saved or stored in BigQuery; it cannot be an external table.
  2. To use the FOR SYSTEM TIME AS OF  clause for retrieving historical table data, there is no limit on table size, and the deleted BigQuery table can have an unlimited number of rows and columns. In other words, the FOR SYSTEM TIME AS OF fetches the deleted data from BigQuery irrespective of the table size.
  3. The source table of the FROM clause that includes FOR SYSTEM TIME AS OF in the BigQuery Time Travel query syntax should not be an ARRAY scan, including a flattened array or even the output of the UNNEST operator. The syntax should not include a common table expression defined by a WITH clause.
  4. The timestamp_expression in a BigQuery Time Travel query syntax must always be a constant expression and should not be in the form of subqueries and UDFs (User Defined Functions). The timestamp_expression should not be in the form of correlated references, which refers to columns that appear at a higher level of the query statement, like in the SELECT clause.
  5. Finally, the timestamp_expression in SQL syntax should not fall within a specific time range that includes the start and end times for retrieving data between the given range. The time period specified in the timestamp expression should never be longer than seven days.  
  • The FOR SYSTEM TIME AS OF clause allows you to switch to the previous versions of the specific table definition and rows with respect to the given timestamp or time expression. On providing the particular timestamp in the syntax of the SQL query, you can easily travel back to the given time for fetching the historical data. The below-given command is the sample SQL query to fetch and return historical data in BigQuery.
SELECT *
FROM `mydataset.mytable`
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);
  • On executing the query given above, you can fetch or return deleted data of the table named from exactly one hour before. You can only return a historical version of the table before one hour since you gave the FOR SYSTEM_TIME AS OF parameter as INTERVAL 1 HOUR. In the above SQL query, we just passed the fetching interval to return data from the particular table based on the given interval. However, you can also set a specific timestamp to point back in time for fetching historical data precisely according to the given timestamp.
  • For returning historical values in a table at an absolute or specific point in time, you can follow the sample SQL query given below. 
SELECT
count (distinct device_id) as devices,
count (*) as events,
FROM table_a 
FOR SYSTEM_TIME AS OF “point_in_time_timestmap”;
)
  • By following the above command, you are able to fetch the aggregate values from the table named table_a with respect to the given timestamp in the point_in_time_timestmap parameter. You should always make sure that the parameter is given in the form of yyyy-mm-dd hh:mm:ss. For example, the point_in_time_timestmap should resemble 2022-03-17 10:30:00
  • If the timestamp you provide is greater than seven days earlier or before the table was created, the SQL query will fail to execute and return an error message given below. 
Invalid snapshot time 1601168925462 for table
myproject:mydataset.table1@1601168925462. Cannot read before 1601573410026.
  • If you use the CREATE OR REPLACE TABLE statement for replacing the active and existing table, you could still use the FOR SYSTEM TIME AS OF clause to query the previous version of the table based on the recent interval command. 
  • When the existing table is accidentally deleted, the query fails and returns an error message, as given below.
Not found: Table myproject:mydataset.table was not found in location US
  • However, you can restore an active table by retrieving data from a specific point in time based on the timestamp, as explained in the above steps.
  • Instead of just retrieving deleted data, you can use the BigQuery Time Travel feature to directly copy the modified or deleted historical data into a new table. This direct copying of deleted data into a new table works even if the table was deleted or expired, as long as you retrieve and restore historical data within seven days of deletion.
  • For copying deleted historical data into the new table, you should use decorators in the SQL query. 
  • There are three types of decorators: “tableid@TIME,” “tableid@-TIME_OFFSET,“ and “tableid@0.“ 
  • The tableid@TIME decorator retrieves historical data based on a specific time in milliseconds, while the tableid@-TIME OFFSET decorator fetches data from the deleted table based on a specific time interval. For fetching the oldest available historical data (within a time period of seven days), you can use the tableid@0 decorator. 
  • Execute the code given below to fetch the historical data from a deleted table and copy it into the new table. 
bq cp mydataset.table1@-3600000 mydataset.table1_restored
  • In the above code, but is the load command in BigQuery, which allows you to transfer or load data from one table to another table. With the SQL query, you are fetching data from a table named “table1“ and copying it into the table called “table1_restored.“ Consequently, the query retrieves the one-hour old data from the table1 and stores it in the “table1_restored” table. The query treated the time as one hour because you provided the time in milliseconds, i.e.,3600000, which equals one hour. Note: (3600000=1 hour).

By following the above-given steps, you can easily travel back in time to retrieve the deleted or modified data in BigQuery.

Conclusion

In this article, you learned about BigQuery and how to time travel to retrieve the deleted BigQuery data. The BigQuery Time Travel feature is one of the unique and advanced features of BigQuery that allows users to recover the lost data by traveling back in time and bringing them back. However, BigQuery also has more advanced features like teleportation, miniaturization, and multiverse for implementing cloud analytics, which you can explore later. In case you want to export data from a source of your choice into your desired Database/destination like BigQuery then Hevo Data is the right choice for you! 

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations, with a few clicks. Hevo Data with its strong integration with 100+ sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice like BigQuery, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share your experience of learning about BigQuery Time Travel! Let us know in the comments section below!

No-code Data Pipeline For BigQuery