Understanding the multitude of data types provided by BigQuery and effectively mapping data from source data types to BigQuery data types can be a daunting task. These data types help you store, structure, and process your data. To fully harness the potential capabilities and on-demand offerings of this warehousing tool, it is imperative to have a solid grasp of Google BigQuery Data types.
In this blog post, you will explore various BigQuery data types with a detailed explanation of their unique set of functions, operations, and specific procedures. As you move ahead through this detailed description, you will also gain valuable insights into how to leverage BigQuery’s data types effectively for your analytical journey.
Understanding the Different Google BigQuery Data Types
Here is a list of the different Data Types you can find in Google BigQuery:
A fully managed No-code Data Pipeline platform like Hevo helps you integrate and load data from 150+ different sources to a destination of your choice in real-time like Google BigQuery effortlessly.
Check out some of the cool features of Hevo:
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 150+ Data sources, that can help you scale your data infrastructure as required.
Sign up here for a 14-Day Free Trial!
BigQuery Data Types: NUMERIC
BigQuery supports the following numeric data types:
- Integer(INT 64): Represents numbers within the range of -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 and numbers that do not have fractional components
- Float (Float 64): Numbers with approximate numeric values and fractional components
- Numeric: There is a data type called ‘NUMERIC’ which is similar to ‘Decimal’ which can store values with 38 decimal digits of precision and nine decimal digits of scale. Suitable for exact calculations
Functions supported by numeric BigQuery data types include:
Arithmetic/ Mathematical Functions
Numeric BigQuery data types support basic arithmetic operations such as addition, subtraction, multiplication, etc. The columns and rows that contain numeric type data can always be operated with mathematical functions. An SQL query that involves a mathematical operation is shown below:
With new_data AS (
SELECT 'Male' as Gender, 181 AS Height, 80.2 As Weight
UNION ALL SELECT ‘Female', 168, 74.2
)
SELECT *, (height/weight) AS height_to_weight_ratio
FROM new_data
In our query above, we performed an arithmetic operation, which is a division on an integer and a float type. The query yields:
Row | Gender | Height | Weight | Height_to_weight_ratio |
1 | Male | 181 | 80.2 | 2.256857855 |
2 | Female | 168 | 74.2 | 2.264150943 |
Standard Compliant Floating Point Division
Applying the division operator when the denominator is zero will return a divide by zero error. Also, infinite inputs that would yield a non-finite output will return an overflow error. To avoid these errors from crashing your SQL queries, so you won’t have to be checking for zero values or infinite values beforehand, it is advised to use specific functions to execute the operation.
Specific functions like the IEEE_divide returns NaN (not-a-number) when a division by zero is attempted, this will enable your query to run without returning any errors. The example below gives an illustration of how you can use the IEEE_divide function:
With new_data AS (
SELECT 'Male' as Gender, 181 AS height, 80.2 As weight
UNION ALL SELECT 'Female', 168, 74.2
UNION ALL SELECT 'Female', 170, 0)
SELECT *,(IEEE_Divide (height,weight)) AS height_to_weight_ratio
FROM new_data
The output of the above query will be:
Row | Gender | Height | Weight | Height_to_weight_ratio |
1 | Male | 181 | 80.2 | 2.256857855 |
2 | Female | 168 | 74.2 | 2.264150943 |
3 | Female | 170 | 0.0 | Infinity |
Safe Functions
Safe functions enable you to return NULL instead of returning an error for mathematical operations that are undefined. You can apply the Safe Function by prefixing your operation with the SAFE command. In the example below, we try the logarithm to a negative number which would have otherwise returned an error because the Log of a negative number is undefined, but with the SAFE command, it returns NULL.
SELECT SAFE.LOG(10, -5) AS Log_1,
SAFE.LOG(10, 5) AS Log_2
The query yields:
Row 1 | Log_1 | Log_2 |
1 | null | 1.431 |
Precise Decimal Calculations with Numeric
Financial and accounting applications usually require exact values for calculations; this is why the numeric data type is an ideal type to utilize. Numeric data types in BigQuery use 16 Bytes for storage and can represent decimal figures accurately; this is the reason why it is an ideal type for financial calculations.
For illustration, let’s assume we want to carry out some financial calculations, and we need the results to be exact. The query below shows how vital Numeric data types are in our financial applications:
WITH new_data AS (
SELECT NUMERIC '5.42' AS Cost
UNION ALL SELECT NUMERIC '7.43'
UNION ALL SELECT NUMERIC '8.94'
)
SELECT
SUM(Cost) AS Total_cost
FROM new_data
The query yields:
From our query above if we had used the default Numerical datatype which is Float 64 to carry out our calculation, the tiny differences between how each number is represented in memory and the way we entered it will add up and made our final calculations inexact. Using the Numeric data type saves us from that problem.
Integrate your Source to BigQuery Effortlessly!
No credit card required
BigQuery Data Types: BOOLEAN
Boolean data types are variables that are represented by the keyword True or Force. Boolean data type supports the following operations:
Logical Operations
Logical operators are commands that help to filter a result based on some set of conditions. Filtering is done using a WHERE clause and the logical operators set the conditions. Logical operators on BigQuery include AND, OR, and NOT. The result returned when using a WHERE clause and a logical Operator is based on if the WHERE condition is (TRUE or FALSE).
Conditional Operations
Booleans are also very useful in Conditional clauses. Conditional clauses simplify queries by providing an option and also providing an alternative if the clause is not satisfied. In the Query below, we want to compute a new column, which is the sum of the height and weight, but some instances have Null values. We can use a conditional clause like IF to enter a new value for all cases that have NULL values.
WITH new_data AS(
SELECT 181 AS Height, 80 AS Weight
UNION ALL SELECT NULL, 74
UNION ALL SELECT 174, NULL)
SELECT *, (IF(Height is NULL, 170,Height) +
IF(Weight is NULL, 80,Weight)) AS sum_height_weight
FROM new_data
The query yields:
Row | Height | Weight | sum_height_weight |
1 | 181 | 80 | 261 |
2 | null | 74 | 244 |
3 | 174 | null | 254 |
The IF clause works in such a way that if the first parameter in the condition is TRUE, it returns the second parameter, and if FALSE, it returns the third parameter.
BigQuery Data Types: JSON
Google BigQuery supports the JSON data types for handling JSON-formatted or semi-structured data, which represents data in key:value pairs. JSON supports varying and flexible data structures, allowing you to store and manipulate data with different fields and nested structures without a fixed schema within your BigQuery tables.
Here’s an example of how you can create a column in BigQuery with the JSON data type:
CREATE TABLE table_name (
Id INT64;
json-data JSON;
);
BigQuery JSON also provides a set of functions and operators such as JSON_VALUE, JSON_QUERY, and JSON_ARRAY. This enables you to query and extract specific values from JSON data in your SQL queries.
BigQuery Data Types: STRINGS
Strings are Unicode characters with variable lengths. String manipulation is an essential requisite for working with data. BigQuery provides an arsenal of built-in functions that helps enhance your string manipulation tasks. Strings must be quoted either with single, double, or triple quotation marks. The functions supported by strings on BigQuery include:
Casting and Coercion
You can CAST a string as an INT 64 or FLOAT 64 to parse it; you can also cast an integer or float back a string. The CAST Command helps us convert from one data type to another.
SELECT
CAST (42 AS STRING),
CAST('42' AS INT64)
In some instances, the CAST operation fails if the data conversion is unsuccessful, SAFE_CAST will return NULL, and the query will run successfully. The type of conversion done using SAFE_CAST is called coercion. In the query above, a numeric type is converted to a string and vice versa.
SELECT CAST('true' AS bool),
SAFE_CAST ('new' AS bool)
The query above will fail if we don’t use the SAFE_CAST command as it’s not possible to convert a string like ‘new’ to a boolean data type. With the SAFE_CAST command, NULL is returned instead.
The format of the data we are converting to can also be customized using the FORMAT command. The query below gives an illustration.
SELECT
FORMAT ('%3d', 500)AS A,
FORMAT ('%5.3f', 55.5555) AS B,
FORMAT ('%03d', 78) AS C
The query yields:
With the FORMAT command, we have customized how the String data converted will appear.
Concatenation
The CONCAT command is a convenient tool used to manipulate strings. It is used to concatenate separate parts of different strings into one.
WITH example AS (
SELECT 'John@abc.com' AS email, 'India' as country
UNION ALL SELECT 'Sam@bca.com', 'Canada'
UNION ALL SELECT 'Lucy@cab.com', 'Germany'
)
SELECT email, country,
CONCAT(
SUBSTR(email, 1, STRPOS(email, '@')-1),
' from ', country) AS callers
FROM example
The query yields:
Row | email | country | callers |
1 | John@abc.com | India | John from India |
2 | Sam@bac.com | Canada | Sam from Canada |
3 | Lucy@cab.com | Germany | Lucy from Germany |
From the query above, we concatenated the email and country columns and also added conjunction between the strings from the two columns. We used the SUBSTR command to extract the substrings and STRPOS to find the ‘@’ separator character before concatenating them.
BigQuery Data Types: BYTES
Bytes are distinct from strings and should not be interchanged, though most functions supported on strings are also supported on bytes. Bytes are also represented using either single, double, or triple quotation marks, but for bytes, you must ensure it begins with the prefix, letter B(b). The query below gives an illustration on working with Byte data type:
SELECT * FROM new_data
WHERE Height = b'16D6M7PN3w7Cn8mJyrmrUSZY9ummMf5QCGEMuiSmSlw
BigQuery Data Types: TIME AND DATE
BigQuery supports various data types for Time and Date instances they include:
- DATE: refers to a valid calendar date independent of a time zone. So a particular DATE will represent a different period when recorded in different time zones
- TIME: refers to a time, independent of a specific date.
- DATETIME: This refers to a point in time; it is time you see when you look at your watch. It is usually when an event happens, any other precision is irrelevant
- TIMESTAMP refers to an exact point in time with a microsecond precision regardless of location. Datetime is part of Timestamp with just a lower level of precision.
These data types are all quite interchangeable, just with different levels of precision. So we can extract DateTime from a timestamp and cast a timestamp to a DateTime. The query below gives an illustration:
SELECT
EXTRACT(TIME FROM CURRENT_TIMESTAMP()) as ct
EXTRACT(DATETIME FROM CURRENT_TIMESTAMP()) as dt, CAST(CURRENT_DATETIME() AS TIMESTAMP) as ts
After executing the above query, it’ll extract the Time and Datetime from the current Timestamp and CAST the current Datetime as Timestamp. This illustration gives an overview of how you can apply BigQuery’s Datetime and Timestamp modules in diverse scenarios.
BigQuery Data Types: TIMEZONE
It is also pertinent that we understand BigQuery’s timezone format. Timezones are used when parsing timestamps or formatting timestamps for display. Timezones are represented using these formats:
BigQuery Data Types: GEOGRAPHY (GIS)
The Geography data type is used to represent points, lines, and polygons on the surface of the earth. In BigQuery, all the geographical positions are represented in WGS84 reference ellipsoid which is the same ellipsoid used in the Global Positioning System (GPS), hence the longitude and latitude can be used directly in BigQuery to find the location of a place. Geography Functions on BigQuery are peculiar and are prefixed with ST_. signature. Functions supported by Geography data type on BigQuery are classified based on their use-cases. The classifications include:
- Constructors: These functions build new Geography values from coordinates or existing Geography data. An example is:
ST_GEOGPOINT(longitude, latitude)
- Parsers: These functions create Geography data types from external formats like WKT and GeoJSON. An example is:
ST_GEOGFROMTEXT(wkt_string)
- Formatters: These are the opposite of Parsers, they extract Geography data types to an external format such as WKT and GeoJSON
ST_ASTEXT(geography_expression)
- Transformations: These functions are used to generate a new Geography data type from other Geography data types. Examples include:
ST_INTERSECTION(geography_1, geography_2) ST_BOUNDARY(geography_expression)
- Predicates: are used to filter clauses in Geography data types, they return TRUE or FALSE for relationships between Geographies or properties of Geography data types. For example, the ST_DWITHIN command is a predicate.
ST_DWITHIN(geography_1, geography_2),rangeThe query above returns TRUE if the distance between at least one point in geography_1 and one point in geography_2 is less than or equal to the distance given by the range argument.
- Accessors: These functions provide unrestricted access to the property of geography. An example is ST_NUMPOINTS
ST_NUMPOINTS(geography_expression)
The query above returns the number of points, the number of line string vertices, and the number of polygon vertices from the geography_expression argument.
- Measures: These functions compute the measurement of one or more Geography data types. For example the ST_Distance command:
ST_DISTANCE(geography_1, geography_2)
The query above returns the shortest distance in meters between geograpphy_1 and geography_2 so far they are non-empty.
- Aggregate Functions: These functions carry out Geography- specific aggregate functions. For example, the ST_UNION_AGG command:
ST_UNION_AGG(geography)The query above returns a Geography the represents the point set Union of all input Geography data types.
Integrate Adroll to BigQuery
Integrate Amazon DocumentDB to BigQuery
Integrate Amazon Ads to BigQuery
BigQuery Data Types: ARRAY
ARRAY data types are ordered lists of zero or more elements of any none-ARRAY type. ARRAYs are represented using the angle brackets (< and >). The table below shows declarations that are supported and unsupported by ARRAYs on BigQuery:
ARRAY Declaration | Meaning |
ARRAY<Float 64>, ARRAY<Int 64> | ARRAYs support majorly all data types. |
ARRAY<NULL> | BigQuery raises an error if a query is supposed to return an ARRAY that contains NULL, but you can use such ARRAYs inside your query. |
ARRAY<ARRAY<INT 64>> | ARRAYs of ARRAYs are not supported on BigQuery |
ARRAY<Struct<ARRAY<INT 64>>> | The way to declare an ARRAY of An ARRAY is by juxtaposing a Struct in between the ARRAY. |
BigQuery Data Types: STRUCT
STRUCT types are containers of ordered fields with a data type that is required and an optional field name. Structs are also declared using the (< and >) angle brackets. The table below gives illustrations of possible declarations that are possible with Structs
STRUCT Declaration | Meaning |
STRUCT<INT 64> | STRUCT support majorly all data types |
STRUCT <x STRUCT <’y’ String, z INT 64>> | A STRUCT nested inside another STRUCT named x. The STRUCT x has two fields y which is a string and z which is an integer. |
STRUCT <ARRAY<INT 64>> | A STRUCT that contains an ARRAY and the ARRAY holds an integer |
STRUCT<NULL> | STRUCT supports NULL values or fields that have NULL values. |
Conclusion
Hope this blog gives a detailed overview of Google BigQuery data types, variants, and usage. Understanding the intricacies of data types is crucial for optimizing queries, improving performance, and ensuring accurate analysis within the BigQuery ecosystem.
With integration with 150+ Data Sources (60+ free sources), Hevo helps you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.
You can have a look at the unbeatable Hevo pricing that will help you choose the right plan for your business needs. Sign Up for a 14-day free trial. Do you have any questions? Let us know in the comments.
FAQs
1. What type of data can be stored in BigQuery?
BigQuery can store various types of data, including structured data (like tables with rows and columns), semi-structured data (such as JSON and Avro), and unstructured data (like text files). It’s designed to handle large datasets efficiently for analysis and reporting.
2. What are the different types of tables in BigQuery?
BigQuery has several types of tables: Native Tables (standard tables for storing data), External Tables (linked to data stored outside BigQuery, like Google Cloud Storage), and Temporary Tables (used for session-specific queries that don’t need to be saved).
3. What is the preferred data format for BigQuery?
The preferred data formats for BigQuery are Avro and Parquet because they support efficient storage and compression, making them ideal for big data analytics. Additionally, CSV and JSON are also supported but may not be as efficient for large datasets.
Bukunmi is curious about learning on complex concepts and latest trends in data science and combines his flair for writing to curate content for data teams to help them solve business challenges.