BigQuery SUBSTR Function Simplified 101

on Data Warehouse, Google BigQuery, Tutorials • September 13th, 2021 • Write for Hevo

Venturing into Data Science and deciding on a tool to use to solve a given problem can be challenging at times especially when you have a wide array of choices. In this age of data transformation where organizations are constantly seeking out ways to improve the day to day handling of data being produced and looking for methods to minimize the cost of having these operations, it has become imperative to handle such data transformations in the Cloud as it is a lot easier to manage and is also cost-efficient.

Data Warehousing architectures have rapidly changed over the years and most of the notable service providers are now Cloud-based. Therefore, companies are increasingly on the move to align with such offerings on the Cloud as it provides them with a lower upfront cost, enhances scalability, and performance as opposed to traditional On-premise Data Warehousing systems. Google BigQuery is among one of the well-known and widely accepted Cloud-based Data Warehouse Applications.

With data being the driving force behind all the strategic as well as operational decisions in the modern-day enterprise, providing the right information, at the right time, to the right person is the key. With the ever-increasing flowing information, both relevant as well as irrelevant, grasping the crux is of utmost importance. Google BigQuery provides its users the autonomy to extract, visualize and analyze only the relevant data in any given dataset.

In this article, you will gain information about BigQuery SUBSTR function. You will also gain a holistic understanding of Google BigQuery, its key features, substrings, and the significance of using the BigQuery SUBSTR function. Read along to find out in-depth information about the BigQuery SUBSTR function.

Table of Contents

Introduction to Google BigQuery

BigQuery SUBSTR function - Google BigQuery
Image Source

Google BigQuery is a Cloud-based Data Warehouse that provides a Big Data Analytic Web Service for processing petabytes of data. It is intended for analyzing data on a large scale. It consists of two distinct components: Storage and Query Processing. It employs the Dremel Query Engine to process queries and is built on the Colossus File System for storage. These two components are decoupled and can be scaled independently and on-demand.

Google BigQuery is fully managed by Cloud service providers. We don’t need to deploy any resources, such as discs or virtual machines. It is designed to process read-only data. Dremel and Google BigQuery use Columnar Storage for quick data scanning, as well as a tree architecture for executing queries using ANSI SQL and aggregating results across massive computer clusters. Furthermore, owing to its short deployment cycle and on-demand pricing, Google BigQuery is serverless and designed to be extremely scalable.

For further information about Google Bigquery, follow the Official Documentation.

Key Features of Google BigQuery

BigQuery SUBSTR function - Features of BigQuery
Image Source

Some of the key features of Google BigQuery are as follows:

1) Performance

Partitioning is supported by BigQuery, which improves Query performance. The data may be readily queried using SQL or Open Database Connectivity (ODBC)

2) Scalability

Being quite elastic, BigQuery separates computation and storage, allowing customers to scale processing and memory resources according to their needs. The tool has significant vertical and horizontal scalability and runs real-time queries on petabytes of data in a very short period.

3) Security

When a third-party authorization exists, users can utilize OAuth as a standard approach to get the cluster. By default, all data is encrypted and in transit. Cloud Identity and Access Management (IAM) allows for fine-tuning administration.

4) Usability

Google BigQuery is a highly user-friendly platform that requires a basic understanding of SQL commands, ETL tools, etc.

5) Data Types

 It supports JSON and XML file formats.

6) Data Loading

It employs the conventional ELT/ETL Batch Data Loading techniques by employing standard SQL dialect, as well as Data Streaming to load data row by row using Streaming APIs.

7) Integrations

In addition to operational databases, the system supports integration with a wide range of data integration tools, business intelligence (BI), and artificial intelligence (AI) solutions. It also works with Google Workspace and Cloud Platform.

8) Data Recovery

Data backup and disaster recovery are among the services provided by Google BigQuery. Users can query point-in-time snapshots of data changes from the last seven days.

9) Pricing Models

The Google BigQuery platform is available in both on-demand and flat-rate subscription models. Although data storage and querying will be chargedexporting, loading, and copying data is free. It has separated computational resources from storage resources. You are only charged when you run queries. The quantity of data processed during searches is billed.

Understanding Substrings 

With the ever-growing digitization, enterprises are investing big in Analysis and Insights Extraction. Powered by Google, BigQuery is a high-performance managed data house following the Platform as a Service (PaaS) business model. It essentially lays its foundations on the serverless architecture and utilizes the cloud-based network/storage for enabling the enterprise to take data-driven optimal decisions. 

As the data increases in its scale and scope, the aspects of business that aren’t required as such for planning and strategy also get recorded, resulting in data clutter. Data clutter, though doesn’t sound much, can overshadow some critical insights and subsequently can result in considerable opportunity costs. Major components of any dataset can be strings, integers, floats, constants, booleans, and special characters, etc.; Google BigQuery provides users with the functionality to cut the clutter and display only the requisite information. Not only clutter, sometimes selective data is required to get better insights out of it. For the string and bytes data, users can visualize, update and modify certain parts of it using the BigQuery SUBSTR function.

Simplify BigQuery ETL and Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 100+ different sources (including 30+ free sources) to a Data Warehouse such as Google BigQuery or Destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Get Started with Hevo for Free

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, Databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.  
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (including 30+ free sources) that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Understanding the BigQuery SUBSTR Function

Strings and bytes data types can be sorted by many techniques in Google BigQuery but the most widely used parameter in the ANSI SQL model is the BigQuery SUBSTR function.

A) Syntax

The syntax for the BigQuery SUBSTR function is very user-intuitive and is given as follows:

SUBSTR(value, position[, length])

It will return the substring component of the given value. The position argument belongs to the integer family and specifies the starting point of the substring with the value of position = 1 indicating the first character or byte. If the position is entered as -1, the substring will display the results from the last character of the given string or byte value.

The length argument indicates the maximum number of characters for STRING arguments or Bytes for BYTE arguments in the specified Substring parameter.

If position is a position at the left end of the STRING (position = 0 or position < -LENGTH(value), the function begins at position = 1. It returns less than length characters if the length exceeds the length of the value. The BigQuery SUBSTR function returns an error if the length is less than 0.

B) Usage 

The use of the built-in BigQuery SUBSTR function can make complex calculations and visualizations very easy for the end-users. Not only it helps reduce the clutter during the analytics but also takes a smaller toll on the processing capacity of the system. Primarily used by the developers, the BigQuery SUBSTR function can be utilized in conjunction with other BigQuery parameters to enhance the performance. 

C) Examples Code/Queries 

The following example illustrates the usage of the BigQuery SUBSTR function:

SUBSTR('contact@yuichiotsuka.com', 9, 6)

Result: yuichi

BigQuery SUBSTR Function - SUBSTR Result
Image Source

Working with BigQuery Left and Right Functions

In normal SQL syntax, LEFT and RIGHT functions are supported which return specific values on the left or right side of the input parameter. 

BigQuery SUBSTR function - Left SQL
Image Source
BigQuery SUBSTR function - Right SQL
Image Source

BigQuery LEFT Function

The LEFT function returns the value (can be STRING or BYTES type) consisting of leftmost characters from the given content or parameters. 

A) Syntax

LEFT(value, length)

B) Usage 

The subject function is used to trim the characters from the complex entries in the dataset from the left side. The usefulness of this function becomes pretty evident while dealing with specific data types e.g., the email addresses, IP addresses, etc.  

C) Example Queries

A very easy-to-understand example of the use of the LEFT function, quoted from Google BigQuery’s Official Library, is given as follows:

WITH examples AS
(SELECT 'apple' as example
UNION ALL
SELECT 'banana' as example
UNION ALL
SELECT 'абвгд' as example
)
SELECT example, LEFT(example, 3) AS left_example
FROM examples;

Output:
+———+————–+
| example | left_example |
+———+————–+
| apple   | app          |
| banana  | ban          |
| абвгд   | абв          |
+———+————–+

The RIGHT function returns the value (can be STRING or BYTES type) consisting of leftmost characters from the given content or parameters. 

A) Syntax

RIGHT(value, length)

B) Usage

The subject function is used to trim the characters from the complex entries in the dataset from the right side. The usefulness of this function becomes pretty evident while dealing with specific data types e.g., alphanumeric values, the ones involving special characters, etc. 

C) Examples Queries

A very easy-to-understand example of the use of the RIGHT function, quoted from Google BigQuery’s Official Library, is given as follows:

WITH examples AS
(SELECT 'apple' as example
UNION ALL
SELECT 'banana' as example
UNION ALL
SELECT 'абвгд' as example
)
SELECT example, RIGHT(example, 3) AS right_example
FROM examples;


Output:
+———+—————+
| example | right_example |
+———+—————+
| apple   | ple           |
| banana  | ana           |
| абвгд   | вгд           |
+———+—————+

Key Consideration in using LEFT and RIGHT Functions

While working with Google BigQuery, unfortunately, some versions of SQL don’t support simple LEFT and RIGHT functions, rather they are used via the BigQuery SUBSTR function which in many aspects enhances the applicability of the parameters. The queries can be updated by making some minor changes in the syntax by replacing the LEFT function with BigQuery SUBSTR function as discussed below:

FunctionBigQuery Use with BigQuery SUBSTRDescription
LEFT (manuscript, A)SUBSTR (manuscript, 1, A)The 2nd parameter in the function will always be 1. The requisite length of the output can be entered in the 3rd argument i.e., A.
RIGHT (manuscript, B)SUBSTR (text, –BB)The 2nd parameter in the function is the negative of the number of values the user wants in the output followed by the actual number. 

The use-cases of the aforementioned syntax i.e., RIGHT and LEFT functions with BigQuery SUBSTR function are given as follows:

FunctionBigQuery Use with BigQuery SUBSTROutput
LEFT (“Commands”, 3)SUBSTR (“Commands”,1,3)Com
RIGHT (“Commands”, 4)SUBSTR (“Commands”,-4,4)ands

Conclusion

In this article, you have learned about Google BigQuery, its key features, and Substrings. This article also provided information on the BigQuery SUBSTR function, their significance, and how they can replace LEFT and RIGHT functions.

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.

Visit our Website to Explore Hevo

Hevo Data with its strong integration with 100+ data sources (including 30+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice such as Google BigQuery, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools. 

Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing price, which will assist you in selecting the best plan for your requirements.

Share your experience of understanding the BigQuery SUBSTR function in the comment section below! We would love to hear your thoughts.

No-code Data Pipeline for your Google BigQuery