9 Google BigQuery Data Types: A Comprehensive Guide

By: Published: February 8, 2022 Updated: January 19, 2024

Google BigQuery Datatypes

Understanding the multitude of data types provided by BigQuery and effectively mapping data from source data types to BigQuery data types can be a daunting task. These data types help you store, structure, and process your data. To fully harness the potential capabilities and on-demand offerings of this warehousing tool, it is imperative to have a solid grasp of Google BigQuery Data types.

In this blog post, you will explore various BigQuery data types with a detailed explanation of their unique set of functions, operations, and specific procedures. As you move ahead through this detailed description, you will also gain valuable insights into how to leverage BigQuery’s data types effectively for your analytical journey.

Understanding the Different Google BigQuery Data Types

BigQuery Data Types
Image Source

Here is a list of the different Data Types you can find in Google BigQuery:

Simplify your Data Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo helps you integrate and load data from 150+ different sources to a destination of your choice in real-time like Google BigQuery effortlessly.

Check out some of the cool features of Hevo:

  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 150+ Data sources, that can help you scale your data infrastructure as required.
Sign up here for a 14-Day Free Trial!

BigQuery Data Types: NUMERIC

BigQuery supports the following numeric data types:

  • Integer(INT 64): Represents numbers within the range of  -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 and numbers that do not have fractional components
  • Float (Float 64): Numbers with approximate numeric values and fractional components
  • Numeric: There is a data type called ‘NUMERIC’ which is similar to ‘Decimal’ which can store values with 38 decimal digits of precision and nine decimal digits of scale. Suitable for exact calculations

Functions supported by numeric BigQuery data types include:

Arithmetic/ Mathematical  Functions

Numeric BigQuery data types support basic arithmetic operations such as addition, subtraction, multiplication, etc.  The columns and rows that contain numeric type data can always be operated with mathematical functions. An SQL query that involves a mathematical operation is shown below:

With new_data AS (
SELECT 'Male' as Gender, 181  AS Height, 80.2 As Weight
UNION ALL SELECT ‘Female', 168, 74.2 
)
SELECT *, (height/weight) AS height_to_weight_ratio 
FROM new_data

In our query above, we performed an arithmetic operation, which is a division on an integer and a float type. The query yields: 

RowGenderHeight WeightHeight_to_weight_ratio
1Male18180.22.256857855
2Female16874.22.264150943

Standard Compliant Floating Point Division

Applying the division operator when the denominator is zero will return a divide by zero error. Also, infinite inputs that would yield a non-finite output will return an overflow error. To avoid these errors from crashing your SQL queries, so you won’t have to be checking for zero values or infinite values beforehand, it is advised to use specific functions to execute the operation.

Specific functions like the IEEE_divide returns NaN (not-a-number) when a division by zero is attempted, this will enable your query to run without returning any errors. The example  below gives an illustration of how you can use the IEEE_divide function:

With new_data AS (
SELECT 'Male' as Gender, 181  AS height, 80.2 As weight
UNION ALL SELECT 'Female', 168, 74.2 
UNION ALL SELECT 'Female', 170, 0)
SELECT *,(IEEE_Divide (height,weight)) AS height_to_weight_ratio 
FROM new_data

The output of the above query will be:

RowGenderHeight WeightHeight_to_weight_ratio
1Male18180.22.256857855
2Female16874.22.264150943
3Female1700.0Infinity

Safe Functions

Safe functions enable you to return NULL instead of returning an error for mathematical operations that are undefined. You can apply the Safe Function by prefixing your operation with the SAFE command. In the example below, we try the logarithm to a negative number which would have otherwise returned an error because the Log of a negative number is undefined, but with the SAFE command, it returns NULL. 

SELECT SAFE.LOG(10, -5) AS Log_1, 

SAFE.LOG(10, 5) AS Log_2

The query  yields:

Row 1Log_1Log_2
1null1.431

Comparisons

Numerical data types support comparison operators. Comparison operators  (<, <=, >, >=, != ,<>) are used to obtain the results of comparison. The sorting order of values in BigQuery from the least to the greatest is:

  1. Null
  2. NaN
  3. -inf (negative infinity)
  4. Negative numbers 
  5. 0 and -0 (all zero numbers are considered equal)
  6. Positive numbers 
  7. +inf (positive infinity)

The table below gives an example of comparisons and their results.

Left TermOperatorRight TermReturns
NaN=Any ValueFalse
NaN<Any ValueFalse
Any Value>NaNFalse
-0.0!=0.0False

These comparisons come in handy when we want to use the WHERE clause to filter out some values. The query below gives an illustration:

With new_data AS(
SELECT 'Male' AS Gender,  181 as Height,  80.2 as Weight
UNION ALL SELECT 'Female', 168, 74.2
UNION ALL SELECT 'Female', NULL, 84
UNION ALL SELECT 'Male', IEEE_Divide(-3,0), 0)
SELECT * FROM new_data 
WHERE Height < 100

The query yields:

RowGenderHeightWeight
1Male-inf80.0

The WHERE clause returns only those rows for which the result is TRUE. 

Precise Decimal Calculations with Numeric 

Financial and accounting applications usually require exact values for calculations; this is why the numeric data type is an ideal type to utilize. Numeric data types in BigQuery use 16 Bytes for storage and can represent decimal figures accurately; this is the reason why it is an ideal type for financial calculations. 

For illustration, let’s assume we want to carry out some financial calculations, and we need the results to be exact. The query below shows how vital Numeric data types are in our financial applications:

WITH new_data AS ( 
SELECT NUMERIC '5.42' AS Cost
UNION ALL SELECT NUMERIC '7.43'
UNION ALL SELECT NUMERIC '8.94'
)
SELECT 
SUM(Cost) AS Total_cost
FROM new_data

The query yields:

RowTotal_cost
121.79

From our query above if we had used the default Numerical datatype which is Float 64 to carry out our calculation, the tiny differences between how each number is represented in memory and the way we entered it will add up and made our final calculations inexact. Using the Numeric data type saves us from that problem. 

BigQuery Data Types: BOOLEAN

Boolean data types are variables that are represented by the keyword True or Force.  Boolean data type supports the following operations:

Logical Operations

Logical operators are commands that help to filter a result based on some set of conditions. Filtering is done using a WHERE clause and the logical operators set the conditions. Logical operators on BigQuery include  AND, OR, and NOT. The result returned when using a WHERE clause and a logical Operator is based on if the WHERE condition is (TRUE or FALSE).

Conditional Operations

Booleans are also very useful in Conditional clauses. Conditional clauses simplify queries by providing an option and also providing an alternative if the clause is not satisfied. In the Query below, we want to compute a new column, which is the sum of the height and weight, but some instances have Null values. We can use a conditional clause like IF to enter a new value for all cases that have NULL values. 

WITH new_data AS(
SELECT 181 AS Height, 80 AS Weight
UNION ALL SELECT NULL, 74
UNION ALL SELECT 174, NULL)
SELECT *, (IF(Height is NULL, 170,Height) +
IF(Weight is NULL, 80,Weight)) AS sum_height_weight
FROM new_data

The query yields:

RowHeightWeightsum_height_weight
118180261
2null74244
3174null254

The IF clause works in such a way that if the first parameter in the condition is TRUE, it returns the second parameter, and if FALSE, it returns the third parameter.

Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Learn the best practices and considerations for setting up high-performance ETL to BigQuery

BigQuery Data Types: JSON

Google BigQuery supports the JSON data types for handling JSON-formatted or semi-structured data, which represents data in key:value pairs. JSON supports varying and flexible data structures, allowing you to store and manipulate data with different fields and nested structures without a fixed schema within your BigQuery tables.

Here’s an example of how you can create a column in BigQuery with the JSON data type:

CREATE TABLE table_name (
Id INT64;
json-data JSON;
);

BigQuery JSON also provides a set of functions and operators such as JSON_VALUE, JSON_QUERY, and JSON_ARRAY. This enables you to query and extract specific values from JSON data in your SQL queries.

BigQuery Data Types: STRINGS

Strings are Unicode characters with variable lengths. String manipulation is an essential requisite for working with data. BigQuery provides an arsenal of built-in functions that helps enhance your string manipulation tasks. Strings must be quoted either with single, double, or triple quotation marks. The functions supported by strings on BigQuery include:

Casting and Coercion

You can CAST a string as an INT 64 or FLOAT 64 to parse it; you can also cast an integer or float back a string. The CAST Command helps us convert from one data type to another. 

SELECT 
CAST (42 AS STRING),
CAST('42' AS INT64)


In some instances, the CAST operation fails if the data conversion is unsuccessful, SAFE_CAST will return NULL, and the query will run successfully. The type of conversion done using SAFE_CAST is called coercion. In the query above,  a numeric type is converted to a string and vice versa. 

SELECT CAST('true' AS bool),
SAFE_CAST ('new' AS bool)

The query above will fail if we don’t use the SAFE_CAST command as it’s not possible to convert a string like ‘new’ to a boolean data type. With the SAFE_CAST command, NULL is returned instead.

The format of the data we are converting to can also be customized using the FORMAT command. The query below gives an illustration.

SELECT 
FORMAT ('%3d', 500)AS A, 
FORMAT ('%5.3f', 55.5555) AS B,
FORMAT ('%03d', 78) AS C

The query yields:

RowBC
150055.556078

With the FORMAT command, we have customized how the String data converted will appear. 

Concatenation

The CONCAT command is a convenient tool used to manipulate strings. It is used to concatenate separate parts of different strings into one. 

WITH example AS (
SELECT 'John@abc.com' AS email, 'India' as country
UNION ALL SELECT 'Sam@bca.com', 'Canada'
UNION ALL SELECT 'Lucy@cab.com', 'Germany'
)
SELECT email, country,
CONCAT(
SUBSTR(email, 1, STRPOS(email, '@')-1),
' from ', country) AS callers
FROM example

The query yields:

Rowemailcountrycallers
1John@abc.comIndiaJohn from India
2Sam@bac.comCanadaSam from Canada
3Lucy@cab.comGermanyLucy from Germany

From the query above, we concatenated the email and country columns and also added conjunction between the strings from the two columns. We used the SUBSTR command to extract the substrings and  STRPOS to find the ‘@’ separator character before concatenating them.

Regular Expressions

Regular expressions enable you to find specific characters, patterns and perform

powerful searches on your String data type. BigQuery supports various Regex operations; they include:

REGEXP_CONTAINS: Find a regex pattern or character

REGEXP_EXTRACT: Remove the regex pattern 

REGEXP_REPLACE: Replace the regex pattern.

The query below gives an illustration of a Regex operation onBigQuery.

SELECT column,
 REGEXP_CONTAINS (column, r'd{5}') present
 FROM (SELECT *
 from unnest(['12345', '1234', '12345-9876']) AS column
 )

The query yields:

RowColumnPresent
112345true
21234false
312345-9876true

In the query above, we used a regex to match values that had five digits or more. The Values that matched the regex returned True and those that didn’t return False.

BigQuery Data Types: BYTES

Bytes are distinct from strings and should not be interchanged, though most functions supported on strings are also supported on bytes. Bytes are also represented using either single, double, or triple quotation marks, but for bytes, you must ensure it begins with the prefix, letter B(b). The query below gives an illustration on working with Byte data type:

SELECT * FROM new_data 
WHERE  Height = b'16D6M7PN3w7Cn8mJyrmrUSZY9ummMf5QCGEMuiSmSlw

BigQuery Data Types: TIME AND DATE

BigQuery supports various data types for Time and Date instances they include:

  • DATE: refers to a valid calendar date independent of a time zone. So a particular DATE will represent a different period when recorded in different time zones
  • TIME: refers to a time, independent of a specific date.
  • DATETIME: This refers to a point in time; it is time you see when you look at your watch. It is usually when an event happens, any other precision is irrelevant
  • TIMESTAMP refers to an exact point in time with a microsecond precision regardless of location. Datetime is part of Timestamp with just a lower level of precision. 

These data types are all quite interchangeable, just with different levels of precision. So we can extract DateTime from a timestamp and cast a timestamp to a DateTime. The query below gives an illustration:

SELECT
EXTRACT(TIME FROM CURRENT_TIMESTAMP()) as ct
EXTRACT(DATETIME FROM CURRENT_TIMESTAMP()) as dt, CAST(CURRENT_DATETIME() AS TIMESTAMP) as ts

After executing the above query, it’ll extract the Time and Datetime from the current Timestamp and CAST the current Datetime as Timestamp. This illustration gives an overview of how you can apply BigQuery’s Datetime and Timestamp modules in diverse scenarios.

BigQuery Data Types: TIMEZONE

It is also pertinent that we understand BigQuery’s timezone format. Timezones are used when parsing timestamps or formatting timestamps for display. Timezones are represented using these formats:

  • Offset from Coordinated Universal Time(UTC): When using this format, no spaces are allowed between the time zone and the rest of the Timestamp. An example is shown below:
    2019-08-21 12:30:00.45+1:00 #where +1 is the time zone
  • Time zone name from tz database: When using this format, space is required between the name of the timezone and the rest of the time stamp. An example is shown below:
     2019-08-21 12:30:00.45 India/New_Delhi 

BigQuery Data Types: GEOGRAPHY (GIS)

The Geography data type is used to represent points, lines, and polygons on the surface of the earth.  In BigQuery, all the geographical positions are represented in  WGS84 reference ellipsoid which is the same ellipsoid used in the Global Positioning System (GPS), hence the longitude and latitude can be used directly in BigQuery to find the location of a place. Geography Functions on BigQuery are peculiar and are prefixed with ST_. signature. Functions supported by Geography data type on BigQuery are classified based on their use-cases. The classifications include:

  • Constructors: These functions build new Geography values from coordinates or existing Geography data. An example is:
     
     ST_GEOGPOINT(longitude, latitude) 
  • Parsers: These functions create Geography data types from external formats like  WKT and GeoJSONAn example is:
    ST_GEOGFROMTEXT(wkt_string)
  • Formatters: These are the opposite of Parsers, they extract Geography data types to an external format such as WKT and GeoJSON
    ST_ASTEXT(geography_expression)
  • Transformations: These functions are used to generate a new Geography data type from other Geography data types. Examples include:
    ST_INTERSECTION(geography_1, geography_2) ST_BOUNDARY(geography_expression)
  • Predicates: are used to filter clauses in Geography data types, they return TRUE or FALSE for relationships between Geographies or properties of Geography data types. For example, the ST_DWITHIN command is a predicate.
    ST_DWITHIN(geography_1, geography_2),range )

    The query above returns TRUE if the distance between at least one point in geography_1 and one point in geography_2 is less than or equal to the distance given by the range argument.

  • Accessors: These functions provide unrestricted access to the property of geography. An example is ST_NUMPOINTS
    ST_NUMPOINTS(geography_expression)

    The query above returns the number of points, the number of line string vertices, and the number of polygon vertices from the geography_expression argument.

  • Measures: These functions compute the measurement of one or more Geography data types. For example the ST_Distance command:
    ST_DISTANCE(geography_1, geography_2) 

    The query above returns the shortest distance in meters between geograpphy_1 and geography_2 so far they are non-empty.

  • Aggregate Functions: These functions carry out Geography- specific aggregate functions. For example, the ST_UNION_AGG command:
    ST_UNION_AGG(geography)

    The query above returns a Geography the represents the point set Union of all input Geography data types.

BigQuery Data Types: ARRAY

ARRAY data types are ordered lists of zero or more elements of any none-ARRAY type. ARRAYs are represented using the angle brackets (< and >). The table below shows declarations that are  supported and unsupported by ARRAYs on BigQuery:

ARRAY DeclarationMeaning
ARRAY<Float 64>, ARRAY<Int 64>ARRAYs support majorly all data types.
ARRAY<NULL>BigQuery raises an error if a query is supposed to return an ARRAY that contains NULL, but you can use such ARRAYs inside your query.
ARRAY<ARRAY<INT 64>>ARRAYs of ARRAYs are not supported on BigQuery
ARRAY<Struct<ARRAY<INT 64>>>The way to declare an ARRAY of An ARRAY is by juxtaposing a Struct in between the ARRAY.

BigQuery Data Types: STRUCT

STRUCT types are containers of ordered fields with a data type that is required and an optional field name. Structs are also declared using the (< and >) angle brackets. The table below gives illustrations of possible declarations that are possible with Structs

STRUCT DeclarationMeaning
STRUCT<INT 64>STRUCT  support majorly all data types
STRUCT <x STRUCT <’y’ String, z INT 64>>A STRUCT nested inside another STRUCT named x. The STRUCT x has two fields y which is a string and z which is an integer.
STRUCT <ARRAY<INT 64>>A STRUCT that contains an ARRAY and the ARRAY holds an integer
STRUCT<NULL>STRUCT supports NULL values or fields that have NULL values.

Conclusion

Hope this blog gives a detailed overview of Google BigQuery data types, variants, and usage. Do you have any questions for us? Let us know in the comments. Understanding the intricacies of data types is crucial for optimizing queries, improving performance, and ensuring accurate analysis within the BigQuery ecosystem.

We at Hevo understand the complexities faced by organizations during the setup and use of their BigQuery data warehouse. Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Hevo offers a faster way to move data from Databases or SaaS applications such as Jira into your Data Warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.

Visit our Website to Explore Hevo

You can have a look at the unbeatable Hevo pricing that will help you choose the right plan for your business needs.

Sign Up for a 14 day free trial.

mm
Freelance Technical Content Writer, Hevo Data

Bukunmi is curious about learning on complex concepts and latest trends in data science and combines his flair for writing to curate content for data teams to help them solve business challenges.

No-code Data Pipeline for BigQuery