BigQuery is a cloud platform used by individuals and organizations for the storage of data. BigQuery organizes data into tables. The good thing with BigQuery is that it can scale well to store huge volumes of data. Huge datasets normally have many tables, and you may get stuck on which table to query.

You will also need to get details like which tables are updated most, when each table was created, the columns contained in each table, and more. You can get all this information from the table metadata. 

BigQuery Information_Schema can help you to obtain the table metadata across many datasets using SQL queries from the BigQuery interface. In this article, you will understand in-depth about the BigQuery Information_Schema and its working. Before this, you will also be introduced to BigQuery and get to know the ways in which you can use it.

Prerequisite

This is what you need for this blog:

  • A Google BigQuery Account.

Part 1: Understanding BigQuery

BigQuery Information_Schema: BigQuery logo

BigQuery is a cost-effective, serverless, and highly scalable multi-cloud data warehouse designed and developed to offer business agility. BigQuery was developed by Google, hence, it uses the processing power of Google’s infrastructure. It comes with built-in machine learning capabilities that can help you to understand your data better. 

Key Features of Google BigQuery

Some of the key features of Google BigQuery are as follows:

  • Scalability: To provide consumers with true Scalability and consistent Performance, Google BigQuery leverages Massively Parallel Processing(MPP) and a Highly Scalable Secure Storage Engine. The entire Infrastructure with over a thousand machines is managed by a complex software stack.
  • Serverless: The Google BigQuery Serverless model automatically distributes processing across a large number of machines running in parallel, so any organization using Google BigQuery can focus on extracting insights from data rather than configuring and maintaining the Infrastructure/server. 
  • Storage: Google BigQuery uses a Columnar architecture to store mammoth scales of datasets. Column-based Storage has several advantages, including better Memory Utilization and the ability to scan data faster than typical Row-based Storage.
  • Integrations: Google BigQuery as a part of the Google Cloud Platform (GCP) supports seamless integration with all Google products and services. Google also offers a variety of Integrations with numerous third-party services, as well as the functionality to integrate with application APIs that are not directly supported by Google.

Part 2: Understanding BigQuery Information_Schema

BigQuery information_schema refers to a series of views that give BigQuery users access to metadata about tables, views, datasets, routines, reservations, jobs, and streaming data. The BigQuery information_schema views provide comprehensive information about the tables in your BigQuery database.

They can give you information such as when different tables were created, how they are updated, and the columns contained in different tables. The views can also give you information about consumption and performance throughout the organization. This can help you to investigate windows and patterns of low and high consumption against capacity so as to optimize slot capacity distribution. 

You can also identify any queries that bring trouble and know the average query time in the database. BigQuery information_schema can help you to monitor the performance and the condition of your BigQuery environment, giving you a central hub from which you can administer your BigQuery database. You can also break down your queries according to their execution patterns. 

Part 3: How to use the BigQuery Information_Schema

You can run queries against the BigQuery information_schema and get answers to some of the questions that you have. Note that the queries you run against the BigQuery information_schema information_schema views are not cached, and they will incur data processing costs or use the BigQuery slots based on the pricing of your project. Also, the schema query should be written in the standard SQL syntax. 

To demonstrate how to use the BigQuery information_schema, we will explore a database named MyDatabase. 

Let us list the tables contained in the dataset. We also want to know the creation time of each table. We simply have to run the following query in BigQuery:

# List all tables and the times they were created from a single dataset
#standardSQL 

SELECT * 
FROM `MyDatabase`.INFORMATION_SCHEMA.TABLES;

The returned results show the details of each table in a single row, and the number of rows will be determined by the number of tables contained in the database. Some of the metadata returned include the project ID of the project that contains the dataset, the name of the dataset that contains the tables or views, the names of all tables that belong to the dataset, the type of the table (whether a BASE table, view, materialized view, or references an external data source), an indication whether the table supports the SQL INSERT statements, and the datetime when the table was created. 

We can also filter the results using a WHERE clause. Suppose we only need to know the table name and the date of creation of each table in the dataset. We can use a WHERE clause to filter the information that is returned. The following query demonstrates this:

# List metadata from the selected tables using the WHERE clause.
#standardSQL 

SELECT table_name, creation_time 
FROM `MyDatabase`.INFORMATION_SCHEMA.TABLES 
WHERE table_type = "BASE TABLE" 
AND table_name LIKE "%table%";

In the above query, we are only interested in getting metadata about the names of tables in the database named “MyDatabase” and the times each of the tables were created. That is why we have only queried for the columns table_name and creation_time of the BigQuery information_schema. 

We have also added other conditions to the tables. The type of the table or the value of the table_type column must be “BASE TABLE” for a table to be returned. We have also used a pattern to add a condition to the name of the table. We have used SQL’s LIKE statement to ensure that tables to be returned have the “table” phrase in their names. The % symbol matches one or more characters. 

That is how to use the BigQuery information_schema.

Conclusion

From this blog, you have learned more about BigQuery and the ways you can use BigQuery. You would have now understood how to use the BigQuery Information_Schema.

Extracting complex data from a diverse set of data sources to carry out an insightful analysis can be challenging, and this is where Hevo saves the day! Hevo offers a faster way to load data to Google BigQuery to be visualized in a BI tool for free. Hevo is fully automated and hence does not require you to code.

Want to take Hevo for a spin? Sign Up for the 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share your experience of learning about BigQuery Information_Schema. Let us know in the comments section below!

Arsalan Mohammed
Research Analyst, Hevo Data

Arsalan is a research analyst at Hevo and a data science enthusiast with over two years of experience in the field. He completed his B.tech in computer science with a specialization in Artificial Intelligence and finds joy in sharing the knowledge acquired with data practitioners. His interest in data analysis and architecture drives him to write nearly a hundred articles on various topics related to the data industry.

No-code Data Pipeline for Google BigQuery