8 Google BigQuery Data Types: A Comprehensive Guide

Bukunmi I • Last Modified: September 4th, 2023

Google BigQuery Datatypes

Having a firm understanding of Google BigQuery Data types is necessary if you are to take full advantage of the warehousing tool’s on-demand offerings and capabilities.

We at Hevo Data (Hevo is a unified data integration platform that helps customers bring data from 100s of sources to Google BigQuery in real-time without writing any code) often come across customers who are in the process of setting up their BigQuery Warehouse for analytics. Many find it overwhelming to understand the different data types provided by BigQuery and how to map the data from their source data types to BigQuery data types. This comprehensive blog is an attempt to help simplify BigQuery data types for all those who might need help. Let us dive in.

BigQuery supports various data types. Each data type has several functions and operations that apply to them and procedures that are restricted to them. 

In this blog post, we will be going through a thorough explanation of the various BigQuery data types. It also covers details on multiple functions and operations permitted and restricted to each of the BigQuery data types.

Table of Contents

Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Learn the best practices and considerations for setting up high-performance ETL to BigQuery

Introduction to Google BigQuery

BigQuery Logo
Image Source

BigQuery is Google’s fully managed, highly scalable, analytics data warehouse on RESTful web service. Companies use the BigQuery BI engine to store, query, ingest, and retrieve insights from their datasets.

With BigQuery, the solution is delivered as a serverless Software as a Service (SaaS) which means your Data professionals can spend most of their time analyzing data to find meaningful insights, using standard SQL. BigQuery’s per-second billing approach is also very attractive to most businesses. You can read about the features of Google BigQuery here.

BigQuery Working
Image Source
Simplify your Data Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo helps you integrate and load data from 100+ different sources to a destination of your choice in real-time like Google BigQuery in an effortless manner.

Get Started with Hevo for Free

Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL databases to name a few.  
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources, that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Understanding the Different Google BigQuery Data Types

BigQuery Data Types
Image Source

Here is a list of the different Data Types you can find in Google BigQuery:

Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Learn the best practices and considerations for setting up high-performance ETL to BigQuery

BigQuery Data Types: NUMERIC

Numeric data types supported on BigQuery are:

  • Integer(INT 64): Represents numbers within the range of  -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 and numbers that do not have fractional components
  • Float (Float 64): Numbers with approximate numeric values and fractional components
  • Numeric: There is a data type called ‘NUMERIC’ which is similar to ‘Decimal’ which can store values with 38 decimal digits of precision and nine decimal digits of scale. Suitable for exact calculations

Functions supported by numeric BigQuery data types include:

Arithmetic/ Mathematical  Functions

Numeric BigQuery data types support basic arithmetic operations such as addition, subtraction, multiplication, etc.  The columns and rows that contain numeric type data can always be operated with mathematical functions. An SQL query that involves a mathematical operation is shown below:

With new_data AS (
SELECT 'Male' as Gender, 181  AS Height, 80.2 As Weight
UNION ALL SELECT ‘Female', 168, 74.2 
)
SELECT *, (height/weight) AS height_to_weight_ratio 
FROM new_data

In our query above, we performed an arithmetic operation, which is a division on an integer and a float type. The query yields: 

RowGenderHeight WeightHeight_to_weight_ratio
1Male18180.22.256857855
2Female16874.22.264150943

Standard Compliant Floating Point Division

Applying the division operator when the denominator is zero will return a divide by zero error. Also, infinite inputs that would yield a non-finite output will return an overflow error. To avoid these errors from crashing your SQL queries, so you won’t have to be checking for zero values or infinite values beforehand, it is advised to use specific functions to execute the operation. Specific functions like the IEEE_divide returns NaN (not-a-number) when a division by zero is attempted, this will enable your query to run without returning any errors. The example  below gives an illustration of how you can use the IEEE_divide function:

With new_data AS (
SELECT 'Male' as Gender, 181  AS height, 80.2 As weight
UNION ALL SELECT 'Female', 168, 74.2 
UNION ALL SELECT 'Female', 170, 0)
SELECT *,(IEEE_Divide (height,weight)) AS height_to_weight_ratio 
FROM new_data

The Query yields:

RowGenderHeight WeightHeight_to_weight_ratio
1Male18180.22.256857855
2Female16874.22.264150943
3Female1700.0Infinity

Safe Functions

Safe functions enable you to return NULL instead of returning an error for mathematical operations that are undefined. You can apply the Safe Function by prefixing your operation with the SAFE command. In the example below, we try the logarithm to a negative number which would have otherwise returned an error because the Log of a negative number is undefined, but with the SAFE command, it returns NULL. 

SELECT SAFE.LOG(10, -5) AS Log_1, 

SAFE.LOG(10, 5) AS Log_2

The query  yields:

Row 1Log_1Log_2
1null1.431

Comparisons

Numerical data types support comparison operators. Comparison operators  (<, <=, >, >=, != ,<>) are used to obtain the results of comparison. The sorting order of values in BigQuery from the least to the greatest is:

  1. Null
  2. NaN
  3. -inf (negative infinity)
  4. Negative numbers 
  5. 0 and -0 (all zero numbers are considered equal)
  6. Positive numbers 
  7. +inf (positive infinity)

The table below gives an example of comparisons and their results.

Left TermOperatorRight TermReturns
NaN=Any ValueFalse
NaN<Any ValueFalse
Any Value>NaNFalse
-0.0!=0.0False

These comparisons come in handy when we want to use the WHERE clause to filter out some values. The query below gives an illustration:

With new_data AS(
SELECT 'Male' AS Gender,  181 as Height,  80.2 as Weight
UNION ALL SELECT 'Female', 168, 74.2
UNION ALL SELECT 'Female', NULL, 84
UNION ALL SELECT 'Male', IEEE_Divide(-3,0), 0)
SELECT * FROM new_data 
WHERE Height < 100

The query yields:

RowGenderHeightWeight
1Male-inf80.0

The WHERE clause returns only those rows for which the result is TRUE. 

Precise Decimal Calculations with Numeric 

Financial and accounting applications usually require exact values for calculations; this is why the numeric data type is an ideal type to utilize. Numeric data types in BigQuery use 16 Bytes for storage and can represent decimal figures accurately; this is the reason why it is an ideal type for financial calculations. 

For illustration, let’s assume we want to carry out some financial calculations, and we need the results to be exact. The query below shows how vital Numeric data types are in our financial applications:

WITH new_data AS ( 
SELECT NUMERIC '5.42' AS Cost
UNION ALL SELECT NUMERIC '7.43'
UNION ALL SELECT NUMERIC '8.94'
)
SELECT 
SUM(Cost) AS Total_cost
FROM new_data

The query yields:

RowTotal_cost
121.79

From our query above if we had used the default Numerical datatype which is Float 64 to carry out our calculation, the tiny differences between how each number is represented in memory and the way we entered it will add up and made our final calculations inexact. Using the Numeric data type saves us from that problem. 

BigQuery Data Types: BOOLEAN

Boolean data types are variables that are represented by the keyword True or Force.  Boolean data type supports the following operations:

Logical Operations

Logical operators are commands that help to filter a result based on some set of conditions. Filtering is done using a WHERE clause and the logical operators set the conditions. Logical operators on BigQuery include  AND, OR, and NOT. The result returned when using a WHERE clause and a logical Operator is based on if the WHERE condition is (TRUE or FALSE).

Conditional Operations

Booleans are also very useful in Conditional clauses. Conditional clauses simplify queries by providing an option and also providing an alternative if the clause is not satisfied. In the Query below, we want to compute a new column, which is the sum of the height and weight, but some instances have Null values. We can use a conditional clause like IF to enter a new value for all cases that have NULL values. 

WITH new_data AS(
SELECT 181 AS Height, 80 AS Weight
UNION ALL SELECT NULL, 74
UNION ALL SELECT 174, NULL)
SELECT *, (IF(Height is NULL, 170,Height) +
IF(Weight is NULL, 80,Weight)) AS sum_height_weight
FROM new_data

The query yields:

RowHeightWeightsum_height_weight
118180261
2null74244
3174null254

The IF clause works in such a way that if the first parameter in the condition is TRUE, it returns the second parameter, and if FALSE, it returns the third parameter.

BigQuery Data Types: STRINGS

Strings are Unicode characters with variable lengths. String manipulation is an essential requisite for working with data. BigQuery provides an arsenal of built-in functions that helps enhance your string manipulation tasks. Strings must be quoted either with single, double, or triple quotation marks. The functions supported by strings on BigQuery include:

Casting and Coercion

You can CAST a string as an INT 64 or FLOAT 64 to parse it; you can also cast an integer or float back a string. The CAST Command helps us convert from one data type to another. 

SELECT 
CAST (42 AS STRING),
CAST('42' AS INT64)


In some instances, the CAST operation fails if the data conversion is unsuccessful, SAFE_CAST will return NULL, and the query will run successfully. The type of conversion done using SAFE_CAST is called coercion. In the query above,  a numeric type is converted to a string and vice versa. 

SELECT CAST('true' AS bool),
SAFE_CAST ('new' AS bool)

The query above will fail if we don’t use the SAFE_CAST command as it’s not possible to convert a string like ‘new’ to a boolean data type. With the SAFE_CAST command, NULL is returned instead.

The format of the data we are converting to can also be customized using the FORMAT command. The query below gives an illustration.

SELECT 
FORMAT ('%3d', 500)AS A, 
FORMAT ('%5.3f', 55.5555) AS B,
FORMAT ('%03d', 78) AS C

The query yields:

RowBC
150055.556078

With the FORMAT command, we have customized how the String data converted will appear. 

Concatenation

The CONCAT command is a convenient tool used to manipulate strings. It is used to concatenate separate parts of different strings into one. 

WITH example AS (
SELECT 'John@abc.com' AS email, 'India' as country
UNION ALL SELECT 'Sam@bca.com', 'Canada'
UNION ALL SELECT 'Lucy@cab.com', 'Germany'
)
SELECT email, country,
CONCAT(
SUBSTR(email, 1, STRPOS(email, '@')-1),
' from ', country) AS callers
FROM example

The query yields:

Rowemailcountrycallers
1John@abc.comIndiaJohn from India
2Sam@bac.comCanadaSam from Canada
3Lucy@cab.comGermanyLucy from Germany

From the query above, we concatenated the email and country columns and also added conjunction between the strings from the two columns. We used the SUBSTR command to extract the substrings and  STRPOS to find the ‘@’ separator character before concatenating them.

Regular Expressions

Regular expressions enable you to find specific characters, patterns and perform

powerful searches on your String data type. BigQuery supports various Regex operations; they include:

REGEXP_CONTAINS: Find a regex pattern or character

REGEXP_EXTRACT: Remove the regex pattern 

REGEXP_REPLACE: Replace the regex pattern.

The query below gives an illustration of a Regex operation onBigQuery.

SELECT column,
 REGEXP_CONTAINS (column, r'd{5}') present
 FROM (SELECT *
 from unnest(['12345', '1234', '12345-9876']) AS column
 )

The query yields:

RowColumnPresent
112345true
21234false
312345-9876true

In the query above, we used a regex to match values that had five digits or more. The Values that matched the regex returned True and those that didn’t return False.

BigQuery Data Types: BYTES

Bytes are distinct from strings and should not be interchanged, though most functions supported on strings are also supported on bytes. Bytes are also represented using either single, double, or triple quotation marks, but for bytes, you must ensure it begins with the prefix, letter B(b). The query below gives an illustration on working with Byte data type:

SELECT * FROM new_data 
WHERE  Height = b'16D6M7PN3w7Cn8mJyrmrUSZY9ummMf5QCGEMuiSmSlw

BigQuery Data Types: TIME AND DATE

BigQuery supports various data types for Time and Date instances they include:

  • DATE: refers to a valid calendar date independent of a time zone. So a particular DATE will represent a different period when recorded in different time zones
  • TIME: refers to a time, independent of a specific date.
  • DATETIME: This refers to a point in time; it is time you see when you look at your watch. It is usually when an event happens, any other precision is irrelevant
  • TIMESTAMP refers to an exact point in time with a microsecond precision regardless of location. Datetime is part of Timestamp with just a lower level of precision. 

These data types are all quite interchangeable, just with different levels of precision. So we can extract DateTime from a timestamp and cast a timestamp to a DateTime. The query below gives an illustration:

SELECT
EXTRACT(TIME FROM CURRENT_TIMESTAMP()) as ct
EXTRACT(DATETIME FROM CURRENT_TIMESTAMP()) as dt, CAST(CURRENT_DATETIME() AS TIMESTAMP) as ts

 In the query above, we extracted the Time and Datetime from the current Timestamp,

and we also CAST the current Datetime as Timestamp. It gives us an overview of how we can apply BigQuery’s Datetime and Timestamp modules to our applications.

BigQuery Data Types: TIMEZONE

It is also pertinent that we understand BigQuery’s timezone format. Timezones are used when parsing timestamps or formatting timestamps for display. Timezones are represented using these formats:

  • Offset from Coordinated Universal Time(UTC): When using this format, no spaces are allowed between the time zone and the rest of the Timestamp. An example is shown below:

     

    2019-08-21 12:30:00.45+1:00 #where +1 is the time zone
  • Time zone name from tz database: When using this format, space is required between the name of the timezone and the rest of the time stamp. An example is shown below:

     

     2019-08-21 12:30:00.45 India/New_Delhi 

BigQuery Data Types: GEOGRAPHY (GIS)

The Geography data type is used to represent points, lines, and polygons on the surface of the earth.  In BigQuery, all the geographical positions are represented in  WGS84 reference ellipsoid which is the same ellipsoid used in the Global Positioning System (GPS), hence the longitude and latitude can be used directly in BigQuery to find the location of a place. Geography Functions on BigQuery are peculiar and are prefixed with ST_. signature. Functions supported by Geography data type on BigQuery are classified based on their use-cases. The classifications include:

  • Constructors: These functions build new Geography values from coordinates or existing Geography data. An example is:
     
     ST_GEOGPOINT(longitude, latitude) 
  • Parsers: These functions create Geography data types from external formats like  WKT and GeoJSONAn example is:
    ST_GEOGFROMTEXT(wkt_string)
  • Formatters: These are the opposite of Parsers, they extract Geography data types to an external format such as WKT and GeoJSON
    ST_ASTEXT(geography_expression)
  • Transformations: These functions are used to generate a new Geography data type from other Geography data types. Examples include:
    ST_INTERSECTION(geography_1, geography_2) ST_BOUNDARY(geography_expression)
  • Predicates: are used to filter clauses in Geography data types, they return TRUE or FALSE for relationships between Geographies or properties of Geography data types. For example, the ST_DWITHIN command is a predicate.
    ST_DWITHIN(geography_1, geography_2),range )

    The query above returns TRUE if the distance between at least one point in geography_1 and one point in geography_2 is less than or equal to the distance given by the range argument.

  • Accessors: These functions provide unrestricted access to the property of geography. An example is ST_NUMPOINTS
    ST_NUMPOINTS(geography_expression)

    The query above returns the number of points, the number of line string vertices, and the number of polygon vertices from the geography_expression argument.

  • Measures: These functions compute the measurement of one or more Geography data types. For example the ST_Distance command:
    ST_DISTANCE(geography_1, geography_2) 

    The query above returns the shortest distance in meters between geograpphy_1 and geography_2 so far they are non-empty.

  • Aggregate Functions: These functions carry out Geography- specific aggregate functions. For example, the ST_UNION_AGG command:
    ST_UNION_AGG(geography)

    The query above returns a Geography the represents the point set Union of all input Geography data types.

BigQuery Data Types: ARRAY

ARRAY data types are ordered lists of zero or more elements of any none-ARRAY type. ARRAYs are represented using the angle brackets (< and >). The table below shows declarations that are  supported and unsupported by ARRAYs on BigQuery:

ARRAY DeclarationMeaning
ARRAY<Float 64>, ARRAY<Int 64>ARRAYs support majorly all data types.
ARRAY<NULL>BigQuery raises an error if a query is supposed to return an ARRAY that contains NULL, but you can use such ARRAYs inside your query.
ARRAY<ARRAY<INT 64>>ARRAYs of ARRAYs are not supported on BigQuery
ARRAY<Struct<ARRAY<INT 64>>>The way to declare an ARRAY of An ARRAY is by juxtaposing a Struct in between the ARRAY.

BigQuery Data Types: STRUCT

STRUCT types are containers of ordered fields with a data type that is required and an optional field name. Structs are also declared using the (< and >) angle brackets. The table below gives illustrations of possible declarations that are possible with Structs

STRUCT DeclarationMeaning
STRUCT<INT 64>STRUCT  support majorly all data types
STRUCT <x STRUCT <’y’ String, z INT 64>>A STRUCT nested inside another STRUCT named x. The STRUCT x has two fields y which is a string and z which is an integer.
STRUCT <ARRAY<INT 64>>A STRUCT that contains an ARRAY and the ARRAY holds an integer
STRUCT<NULL>STRUCT supports NULL values or fields that have NULL values.

Conclusion

Hope this blog gives a detailed overview of Google BigQuery data types, variants, and usage. Do you have any questions for us? Let us know in the comments.

Hevo offers a faster way to move data from Databases or SaaS applications such as Jira into your Data Warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.

Visit our Website to Explore Hevo

You can have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Sign Up for a 14 day free trial.

No-code Data Pipeline for BigQuery