Press "Enter" to skip to content

Google BigQuery Data Types: A Comprehensive Guide

Having a firm understanding of Google BigQuery Data types is necessary if you are to take full advantage of the warehousing tool’s on-demand offerings and capabilities.

We at Hevo Data (Hevo is a unified data integration platform that helps customers bring data from 100s of sources to Google BigQuery in real-time without writing any code) often come across customers who are in the process of setting up their BigQuery Warehouse for analytics. Many find it overwhelming to understand the different data types provided by BigQuery and how to map the data from their source data types to BigQuery data types. This comprehensive blog is an attempt to help simplify BigQuery data types for all those who might need help. Let us dive in.

BigQuery supports various data types. Each data type has several functions and operations that apply to them and procedures that are restricted to them. 

Google BigQuery Data types

In this blog post, we will be going through a thorough explanation of the various BigQuery data types. It also covers details on multiple functions and operations permitted and restricted to each data type. The prominent BigQuery data types are:

  1. NUMERIC
  2. BOOLEAN
  3. STRING
  4. BYTES
  5. TIME
  6. GEOGRAPHY
  7. ARRAY
  8. STRUCT

BigQuery Data Types: NUMERIC

Numeric data types supported on BigQuery are:

  • Integer(INT 64): Represents numbers within the range of  -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 and numbers that do not have fractional components
  • Float (Float 64): Numbers with approximate numeric values and fractional components
  • Numeric: There is a data type called ‘NUMERIC’ which is similar to ‘Decimal’ which can store values with 38 decimal digits of precision and nine decimal digits of scale. Suitable for exact calculations

Functions supported by numeric data types include:

Arithmetic/ Mathematical  Functions

Numeric data types support basic arithmetic operations such as addition, subtraction, multiplication, etc.  The columns and rows that contain numeric type data can always be operated with mathematical functions. An SQL query that involves a mathematical operation is shown below:

With new_data AS (
SELECT 'Male' as Gender, 181  AS Height, 80.2 As Weight
UNION ALL SELECT ‘Female', 168, 74.2 
)
SELECT *, (height/weight) AS height_to_weight_ratio 
FROM new_data

In our query above, we performed an arithmetic operation, which is a division on an integer and a float. The query yields: 

Row Gender Height  Weight Height_to_weight_ratio
1 Male 181 80.2 2.256857855
2 Female 168 74.2 2.264150943

Standard Compliant Floating Point Division

Applying the division operator when the denominator is zero will return a divide by zero error. Also, infinite inputs that would yield a non-finite output will return an overflow error. To avoid these errors from crashing your SQL queries, so you won’t have to be checking for zero values or infinite values beforehand, it is advised to use specific functions to execute the operation. Specific functions like the IEEE_divide returns NaN (not-a-number) when a division by zero is attempted, this will enable your query to run without returning any errors. The example  below gives an illustration on how you can use the IEEE_divide function:

With new_data AS (
SELECT 'Male' as Gender, 181  AS height, 80.2 As weight
UNION ALL SELECT 'Female', 168, 74.2 
UNION ALL SELECT 'Female', 170, 0)
SELECT *,(IEEE_Divide (height,weight)) AS height_to_weight_ratio 
FROM new_data

The Query yields:

Row Gender Height  Weight Height_to_weight_ratio
1 Male 181 80.2 2.256857855
2 Female 168 74.2 2.264150943
3 Female 170 0.0 Infinity

Safe Functions

Safe functions enable you to return NULL instead of returning an error for mathematical operations that are undefined. You can apply the Safe Function by prefixing your operation with the SAFE command. In the example below, we try the logarithm to a negative number which would have otherwise returned an error because the Log of a negative number is undefined, but with the SAFE command, it returns NULL. 

SELECT SAFE.LOG(10, -5) AS Log_1, 

SAFE.LOG(10, 5) AS Log_2

The query  yields:

Row 1 Log_1 Log_2
1 null 1.431

Comparisons

Numerical data types support comparison operators. Comparison operators  (<, <=, >, >=, != ,<>) are used to obtain the results of comparison. The sorting order of values in BigQuery from the least to the greatest is:

  1. Null
  2. NaN
  3. -inf (negative infinity)
  4. Negative numbers 
  5. 0 and -0 (all zero numbers are considered equal)
  6. Positive numbers 
  7. +inf (positive infinity)

The table below gives an example of comparisons and their results.

Left Term Operator Right Term Returns
NaN = Any Value False
NaN < Any Value False
Any Value > NaN False
-0.0 != 0.0 False

These comparisons come in handy when we want to use the WHERE clause to filter out some values. The query below gives an illustration:

With new_data AS(
SELECT 'Male' AS Gender,  181 as Height,  80.2 as Weight
UNION ALL SELECT 'Female', 168, 74.2
UNION ALL SELECT 'Female', NULL, 84
UNION ALL SELECT 'Male', IEEE_Divide(-3,0), 0)


SELECT * FROM new_data 

WHERE Height < 100

The query yields:

Row Gender Height Weight
1 Male -inf 80.0

The WHERE clause returns only those rows for which the result is TRUE. 

Precise Decimal Calculations with Numeric 

Financial and accounting applications usually require exact values for calculations; this why the numeric data type is an ideal type to utilize. Numeric data types in BigQuery uses 16 Bytes for storage and can represent decimal figures accurately; this is the reason why it is an ideal type for financial calculations. 

For illustration, let’s assume we want to carry out some financial calculations, and we need the results to be exact. The query below shows how vital Numeric data types are in our financial applications:

WITH new_data AS ( 
SELECT NUMERIC '5.42' AS Cost
UNION ALL SELECT NUMERIC '7.43'
UNION ALL SELECT NUMERIC '8.94'
)

SELECT 
SUM(Cost) AS Total_cost
FROM new_data

The query yields:

Row Total_cost
1 21.79

From our query above if we had used the default Numerical datatype which is Float 64 to carry out our calculation, the tiny differences between how each number is represented in memory and the way we entered it will add up and made our final calculations inexact. Using the Numeric data type saves us from that problem. 

BigQuery Data Types: BOOLEAN

Boolean data types are variables that are represented by the keyword True or Force.  Boolean data type supports the following operations:

Logical Operations

Logical operators are commands that help to filter a result based on some set of conditions. Filtering is done using a WHERE clause and the logical operators set the conditions. Logical operators on BigQuery include  AND, OR and NOT. The result returned when using a WHERE clause and a logical Operator is based on if the WHERE condition is (TRUE or FALSE).

Conditional Operations

Booleans are also very useful in Conditional clauses. Conditional clauses simplify queries by providing an option and also providing an alternative if the clause is not satisfied. In the Query below, we want to compute a new column, which is the sum of the height and weight, but some instances have Null values. We can use a conditional clause like IF to enter a new value for all cases that have NULL values. 

WITH new_data AS(
SELECT 181 AS Height, 80 AS Weight
UNION ALL SELECT NULL, 74
UNION ALL SELECT 174, NULL)

SELECT *, (IF(Height is NULL, 170,Height) +
IF(Weight is NULL, 80,Weight)) AS sum_height_weight
FROM new_data

The query yields:

Row Height Weight sum_height_weight
1 181 80 261
2 null 74 244
3 174 null 254

The IF clause works in such a way that if the first parameter in the condition is TRUE, it returns the second parameter, and if FALSE, it returns the third parameter.

BigQuery Data Types: STRINGS

Strings are Unicode characters with variable length. String manipulation is an essential requisite for working with data. BigQuery provides an arsenal of built-in functions that helps enhance your string manipulation tasks. Strings must be quoted either with single, double or triple quotation marks. The functions supported by strings on BigQuery include:

Casting and Coercion

You can CAST a string as an INT 64 or FLOAT 64 to parse it; you can also cast an integer or float back a string. The CAST Command helps us convert from one data type to another. 

SELECT 
CAST (42 AS STRING),
CAST('42' AS INT64)


In some instances, the CAST operation fails if the data conversion is unsuccessful, SAFE_CAST will return NULL, and the query will run successfully. The type of conversion done using SAFE_CAST is called coercion. In the query above,  a numeric type is converted to a string and vice versa. 

SELECT CAST('true' AS bool),
SAFE_CAST ('new' AS bool)

The query above will fail if we don’t use the SAFE_CAST command as it’s not possible to convert a string like ‘new’ to a boolean data type. With SAFE_CAST command NULL is returned instead.

The format of the data we are converting to can also be customized using the FORMAT command. The query below gives an illustration.

SELECT 
FORMAT ('%3d', 500)AS A, 
FORMAT ('%5.3f', 55.5555) AS B,
FORMAT ('%03d', 78) AS C

The query yields:

Row B C
1 500 55.556 078

With the FORMAT command, we have customized how the String data converted will appear. 

Concatenation

The CONCAT command is a convenient tool used to manipulate strings. It is used to concatenate separate parts of different strings into one. 

WITH example AS (
SELECT 'John@abc.com' AS email, 'India' as country
UNION ALL SELECT 'Sam@bca.com', 'Canada'
UNION ALL SELECT 'Lucy@cab.com', 'Germany'
)
SELECT email, country,
 CONCAT(
SUBSTR(email, 1, STRPOS(email, '@')-1),
' from ', country) AS callers
FROM example

The query yields:

Row email country callers
1 John@abc.com India John from India
2 Sam@bac.com Canada Sam from Canada
3 Lucy@cab.com Germany Lucy from Germany

From the query above, we concatenated the email and country columns and also added conjunction between the strings form the two columns. We used the SUBSTR command to extract the substrings and  STRPOS to find the ‘@’ separator character before concatenating them.

Regular Expressions

Regular expressions enable you to find specific characters, patterns and perform

powerful searches on your String data type. BigQuery supports various Regex operations; they include:

REGEXP_CONTAINS: Find a regex pattern or character

REGEXP_EXTRACT: Remove the regex pattern 

REGEXP_REPLACE: Replace the regex pattern.

The query below gives an illustration on a Regex operation onBigQuery.

SELECT column,
 REGEXP_CONTAINS (column, r'\d{5}') present
 FROM (SELECT *
 from unnest(['12345', '1234', '12345-9876']) AS column
 )

The query yields:

Row Column Present
1 12345 true
2 1234 false
3 12345-9876 true

In the query above, we used a regex to match values that had five digits or more. The Values that matched the regex returned True and those that didn’t return False.

BigQuery Data Types: BYTES

Bytes are distinct from strings and should not be interchanged, though most functions supported on strings are also supported on bytes. Bytes are also represented using either single, double, or triple quotation marks, but for bytes, you must ensure it begins the prefix, letter B(b). The query below gives an illustration on working with Byte data type:

SELECT * FROM new_data 
WHERE  Height = b'16D6M7PN3w7Cn8mJyrmrUSZY9ummMf5QCGEMuiSmSlw

BigQuery Data Types: TIME AND DATE

BigQuery supports various data types for Time and Date instances they include:

  • DATE: refers to a valid calendar date independent of a time zone. So a particular DATE will represent a different period when recorded in different time zones
  • TIME: refers to a time, independent of a specific date.
  • DATETIME: This refers to a point in time; it is time you see when you look at your watch. It is usually when an event happens, any other precision is irrelevant
  • TIMESTAMP refers to an exact point in time with a microsecond precision regardless of location. Datetime is part of Timestamp with just with a lower level of precision. 

These data types are all quite interchangeable, just with different levels of precision. So we can extract DateTime from a timestamp and cast a timestamp to a DateTime. The query below gives an illustration:

SELECT
EXTRACT(TIME FROM CURRENT_TIMESTAMP()) as ct
EXTRACT(DATETIME FROM CURRENT_TIMESTAMP()) as dt, CAST(CURRENT_DATETIME() AS TIMESTAMP) as ts

 In the query above, we extracted the Time and Datetime from the current Timestamp,

and we also CAST the current Datetime as Timestamp. It gives us an overview of how we can apply BigQuery’s Datetime and Timestamp modules to our applications.

BigQuery Data Types: TIMEZONE

It is also pertinent that we understand BigQuery’s timezone format. Timezones are used when

parsing timestamps or formatting timestamps for display. Timezones are represented using these formats:

  • Offset from Coordinated Universal Time(UTC): When using this format, no spaces are allowed between the time zone and the rest of the Timestamp. An example is shown below:

    2019-08-21 12:30:00.45+1:00
    
    #where +1 is the time zone
  • Time zone name from tz database: When using this format, space is required between the name of the timezone and the rest of the time stamp. An example is shown below:

    
    2019-08-21 12:30:00.45 India/New_Delhi
    

BigQuery Data Types: GEOGRAPHY (GIS)

The Geography data type is used to represent points, lines, and polygons on the surface of the earth.  In BigQuery, all the geographical positions are represented in  WGS84 reference ellipsoid which is the same ellipsoid used in the Global Positioning System (GPS), hence the longitude and latitude can be used directly in BigQuery to find the location of a place. Geography Functions on BigQuery are peculiar and are prefixed with ST_. signature. Functions supported by Geography data type on BigQuery are classified based on their use-cases. The classifications include:

  • Constructors: These functions build new Geography values from co-ordinates or existing Geography data. An example is:

    
    ST_GEOGPOINT(longitude, latitude)
    
  • Parsers: These functions create Geography data types from external formats like  WKT and GeoJSONAn example is:
    ST_GEOGFROMTEXT(wkt_string)
  • Formatters: These are the opposite of Parsers, they extract Geography data types to an external format such as WKT and GeoJSON
    ST_ASTEXT(geography_expression)
  • Transformations: These functions are used to generate a new Geography data type from other Geography data type. Examples include:
    ST_INTERSECTION(geography_1, geography_2) ST_BOUNDARY(geography_expression)
  • Predicates: are used to filter clauses in Geography data types, they return TRUE or FALSE for relationships between Geographies or properties of Geography data types. For example, the ST_DWITHIN command is a predicate.

    ST_DWITHIN(geography_1, geography_2),range )

    The query above returns TRUE if the distance between at least one point in geography_1 and one point in geography_2 is less than or equal to the distance given by the range argument.

  • Accessors: These functions provide unrestricted access to the property of geography. An example is ST_NUMPOINTS
    ST_NUMPOINTS(geography_expression)

    The query above returns the number of points, the number of line string vertices and the number of polygon vertices from the geography_expression argument.

  • Measures: These functions compute the measurement of one or more Geography data type. For example the ST_Distance command:
    ST_DISTANCE(geography_1, geography_2)
    

    The query above returns the shortest distance in meters between geograpphy_1 and geography_2 so far they are non-empty.

  • Aggregate Functions: These functions carry out Geography- specific aggregate functions. For example, the ST_UNION_AGG command:
    ST_UNION_AGG(geography)

    The query above returns a Geography the represents the point set Union of all input Geography data type.

BigQuery Data Types: ARRAY

ARRAY data types are ordered list of zero or more elements of any none-ARRAY type. ARRAYs are represented using the angle brackets (< and >). The table below shows declarations that are  supported and unsupported by ARRAYs on BigQuery:

ARRAY Declaration Meaning
ARRAY<Float 64>, ARRAY<Int 64> ARRAYs support majorly all data types.
ARRAY<NULL> BigQuery raises an error if a query is supposed to return an ARRAY that contains NULL, but you can use such ARRAYs inside your query.
ARRAY<ARRAY<INT 64>> ARRAYs of ARRAYs are not supported on BigQuery
ARRAY<Struct<ARRAY<INT 64>>> The way to declare an ARRAY of An ARRAY by juxtaposing a Struct in between the ARRAY.

BigQuery Data Types: STRUCT

STRUCT types are containers of ordered fields with a data type that is required and a field name which is optional. Structs are also declared using the (< and >) angle brackets. The table below gives illustrations of possible declarations that are possible with Structs

STRUCT Declaration Meaning
STRUCT<INT 64> STRUCT  support majorly all data types
STRUCT <x STRUCT <’y’ String, z INT 64>> A STRUCT nested inside another STRUCT named x. The STRUCT x has two fields y which is a string and z which is an integer.
STRUCT <ARRAY<INT 64>> A STRUCT that contains an ARRAY and the ARRAY holds an integer
STRUCT<NULL> STRUCT supports NULL values or fields that have NULL values.

Hope this blog gives a detailed overview of Google BigQuery data types, the variants and usage. Do you have any questions for us? Let us know in the comments.

ETL Data to Redshift, Bigquery, Snowflake

Move Data from any Source to Warehouse in Real-time

Sign up today to get $500 Free Credits to try Hevo!
Start Free Trial