The snowflake substring function is a must-have in your arsenal. In this tutorial, you will learn how to use the Snowflake substring or substr function, the different arguments that can be used with it, and also some examples which will help you understand it better. First, let’s take a look at the Snowflake data warehouse.

Snowflake – An Overview

Snowflake is a cloud-hosted data warehouse platform that allows you to store, share and analyse your data. It has a  multi-cluster architecture that provides support for different clouds and also lets you mix and match between these different cloud platforms. It is available on a variety of platforms including AWS, Azure as well as Google Cloud Platform. Its in-built performance eliminates the need to manage any infrastructure. It is also economical and feasible due to its pay-for-what-you-use strategy. 

Snowflake enables easy integration with customized and packaged tools and applications. Snowflake string is one of the most important and extensively used data types in Snowflake. You can perform different operations on this data type. These include contains, replace, position and many more.

Snowflake Substring Blog: Snowflake Logo

What is Snowflake Substring Function?

This function can be used with either a VARCHAR or BINARY data type. It will return the part of the string or binary value specified by base_expr, starting from the character/byte determined by start_expr. The length of the substring can also be limited by using another argument.

The substr and substring functions are synonymous and can be used interchangeably in Snowflake.

Syntax of Snowflake Substring

Given below is the syntax for using Snowflake substring function:

SUBSTR( <base_expr>, <start_expr> [ , <length_expr> ] )
SUBSTRING( <base_expr>, <start_expr> [ , <length_expr> ] )

Where,

base_expr: This is the value from which you want to derive your substring. It must be either a VARCHAR or a BINARY value.

start_expr: This argument determines the starting point for the substring. You can either directly use an integer or an expression that evaluates to an integer. The start_expr help you specify the offset from which the substring starts. This offset is measured as follows:

  • In terms of the number of UTF-8 characters if the base_expr is of VARCHAR type.
  • In terms of the number of bytes if the base_expr is of BINARY type.

NOTE: The start position for the substr function starts from 1,  not 0. SUBSTR(‘xyz’, 1, 1) returns ‘x’, not ‘y’.

length_expr: This is an optional argument. It is used to determine the number of characters/bytes after the start_expr you want in the substring. It can either be an integer value or an expression that evaluates to an integer. It specifies the following:

  • The number of UTF-8 characters to return if the input is of VARCHAR type.
  • The number of bytes to return if the input is of BINARY type.

The length must be greater than or equal to zero. If the length is negative, the function returns an empty string.

return: The data type of the return value is the same as the input data type.

How to use Snowflake Substring Function?

The SUBSTRING () function returns the position of a string or binary value from the complete string, starting with the character specified by substring_start_index. If any input is null, null is returned.

Examples for Snowflake Substring

Example 1: Get the substring from a specific string in Snowflake

select substr('testing 1 2 3 4', 9, 7);
 
+---------------------------------+
| SUBSTR('TESTING 1 2 3 4', 9, 7) |
|---------------------------------|
| 1 2 3 4                         |
+---------------------------------+

Example 2: Get the substring from a specific string by using table data

select ID,name,substring(name,0,2) from emp;
 
+----+-------------+---------------------+
| ID | NAME        | SUBSTRING(NAME,0,2) |
|----+-------------+---------------------|
|  1 | Ram         | Ra                  |
|  2 | RAJU        | RA                  |
|  3 | Sameer      | Sa                  |
|  4 | Sai Kumar   | Sa                  |
|  5 | Naveen Sai  | Na                  |
|  6 | Seshi Kumar | Se                  |
|  7 | Rajiv Ram   | Ra                  |
+----+-------------+---------------------+

Example 3: Full Sample of Snowflake Substring function

-- Create a Table
 
create or replace TABLE EMP (
    ID NUMBER(38,0),
    NAME VARCHAR(30) COMMENT 'NAME OF THE EMPLOYEE',
    INFO VARCHAR(20) DEFAULT 'NOT-NULL'
);
 
-- Insert Data
 
insert into emp(id,name) values
(1,'Ram'),
(2,'RAJU'),
(3,'Sameer'),
(4,'Sai Kumar'),
(5,'Naveen Sai'),
(6,'Seshi Kumar'),
(7,'Rajiv Ram'),
(8,NULL)
 
-- Apply substring function
 
select id,name,substring(name,0,4) from emp;
 
-- Output
 
+----+-------------+---------------------+
| ID | NAME        | SUBSTRING(NAME,0,4) |
|----+-------------+---------------------|
|  1 | Ram         | Ram                 |
|  2 | RAJU        | RAJU                |
|  3 | Sameer      | Same                |
|  4 | Sai Kumar   | Sai                 |
|  5 | Naveen Sai  | Nave                |
|  6 | Seshi Kumar | Sesh                |
|  7 | Rajiv Ram   | Raji                |
|  8 | NULL        | NULL                |
+----+-------------+---------------------+

When you should use the SUBSTRING Function in Snowflake?

There are certain usage scenarios where it is recommended to use the Substring() function within Snowflake Cloud Data Warehouse which are:

  • To get a specific substring from an expression or string. You can also use the substring function if you want to get the substrings in reverse order from the strings. For example:
select id,name,substring(name,-2,2) from emp;
 
+----+-------------+----------------------+
| ID | NAME        | SUBSTRING(NAME,-2,2) |
|----+-------------+----------------------|
|  1 | Ram         | am                   |
|  2 | RAJU        | JU                   |
|  3 | Sameer      | er                   |
|  4 | Sai Kumar   | ar                   |
|  5 | Naveen Sai  | ai                   |
|  6 | Seshi Kumar | ar                   |
|  7 | Rajiv Ram   | am                   |
|  8 | NULL        | NULL                 |
+----+-------------+----------------------+

If you use the substrings in reverse order, use the starting index as a negative value.

Collation Details

  • Matching is applied to VARCHAR inputs. No collation is applied if the input data type of the first parameter is BINARY.
  • No effect. Matches are syntactically acceptable, but matches do not affect the process. For example, a language that contains two or three digits (such as “dzs” in Hungarian or “ch” in Czech) counts these as two or three characters (not one character) as length arguments.
  • Sorting the results corresponds to sorting the inputs. This is useful when you want to pass the return value to another function as part of a nested function call.

Conclusion

In this blog post, you have seen the different benefits that Snowflake provides. You have also learned how to use the Snowflake substring function.

Optimize your MongoDB queries by effectively using substring aggregation. Explore methods to extract string segments in the MongoDB substring aggregation article.

Hevo Data, an Alternative you Can Depend on to Load Data Efficiently

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Get Started with Hevo for Free

Share your experience of learning about Snowflake Substring! Let us know in the comments section below!

Skand Agrawal
Customer Experience Engineer, Hevo Data

Skand is a dedicated Customer Experience Engineer at Hevo Data, specializing in MySQL, Postgres, and REST APIs. With three years of experience, he efficiently troubleshoots customer issues, contributes to the knowledge base and SOPs, and assists customers in achieving their use cases through Hevo's platform.