If you have been scouting for the right article to learn about Regex Postgres, you have come to the right place. When there is no requirement for an exact match in the query, but you still want to see all the entries that fit the criteria, pattern matching can be helpful. The LIKE operator works flawlessly for standard SQL operations, but it appears to have some performance concerns when filtering a vast database.

Additionally, the LIKE operator’s filtering condition is restricted to using only wildcards (percent) to discover patterns. PostgreSQL employs Regular Expressions to get around pattern matching. In this article, we will learn about PostgreSQL Regex. PostgreSQL uses POSIX regular expressions, which are better than LIKE and SIMILAR TO operators used for pattern matching.

PostgreSQL Regex Match Operators

PostgreSQL Regex Match Operators
Match Operators

How does Regex PostgreSQL work?

PostgreSQL Regex Use Cases
Image Source: Self

Sample Table & Use Cases

We will cover the following use cases to understand the working of PostgreSQL Regex.

Scale your data integration effortlessly with Hevo’s Fault-Tolerant No Code Data Pipeline

As you master PostgreSQL, data replication from PostgreSQL is also important to know. 1000+ data teams rely on Hevo’s Data Pipeline Platform to integrate data from over 150+ sources in a matter of minutes.

Hevo puts complete control in the hands of data teams with intuitive dashboards for pipeline monitoring, auto-schema management, custom ingestion/loading schedules. All of this combined with transparent pricing and 24×7 support makes us the most loved data pipeline software on review sites.

Take our 14-day free trial to experience a better way to manage data pipelines.


Sign up here for a 14-Day Free Trial!

Before we deep dive to learn about the working of Regex, let us create our sample table using the following lines of code.

CREATE TABLE

CREATE TABLE Email(
ID SERIAL PRIMARY KEY,
Name TEXT NOT NULL,
Email TEXT);

INSERT VALUES INTO THE TABLE

INSERT INTO Email(Name,Email) VALUES('Pratibha','pratibha67@sqlguide.edu');
INSERT INTO Email(Name,Email) VALUES('Bhavya','bhavyaa@pythonguide.com');
INSERT INTO Email(Name,Email) VALUES('Disha','disha@sqlguide.edu');
INSERT INTO Email(Name,Email) VALUES('Divanshi','divanshi23@spguide.com');
INSERT INTO Email(Name,Email) VALUES('Srishti','srishti@sqlguide.edu');
INSERT INTO Email(Name,Email) VALUES('Kartik','kartik@pythonguide.com');
INSERT INTO Email(Name,Email) VALUES('Rytham','rytham6@sqlguide.edu');
INSERT INTO Email(Name,Email) VALUES('Madhav','madhav@tsinfo.edu');
INSERT INTO Email(Name,Email) VALUES('Tanisha','tanisha@pythonguide.com');
INSERT INTO Email(Name,Email) VALUES('Radhika','radhika41@spguide.com');

OUTPUT

SELECT *
FROM Email;

—--OUTPUT—--

id |   name   |          email          
----+----------+-------------------------
  1 | Pratibha | pratibha67@sqlguide.edu
  2 | Bhavya   | bhavyaa@pythonguide.com
  3 | Disha    | disha@sqlguide.edu
  4 | Divanshi | divanshi23@spguide.com
  5 | Srishti  | srishti@sqlguide.edu
  6 | Kartik   | kartik@pythonguide.com
  7 | Rytham   | rytham6@sqlguide.edu
  8 | Madhav   | madhav@tsinfo.edu
  9 | Tanisha  | tanisha@pythonguide.com
 10 | Radhika  | radhika41@spguide.com
(10 rows)

Let us understand the syntax & basic working of PostgreSQL Regex by the following example:

SELECT *
FROM Email
WHERE Email ~ '^.*$';

—--OUTPUT—--

id |   name   |          email          
----+----------+-------------------------
  1 | Pratibha | pratibha67@sqlguide.edu
  2 | Bhavya   | bhavyaa@pythonguide.com
  3 | Disha    | disha@sqlguide.edu
  4 | Divanshi | divanshi23@spguide.com
  5 | Srishti  | srishti@sqlguide.edu
  6 | Kartik   | kartik@pythonguide.com
  7 | Rytham   | rytham6@sqlguide.edu
  8 | Madhav   | madhav@tsinfo.edu
  9 | Tanisha  | tanisha@pythonguide.com
 10 | Radhika  | radhika41@spguide.com
(10 rows)

As shown above, we have used PostgreSQL Regex using the TILDE (~) operator and the wildcard ‘.*’. This query will select all the records from the Email table with a valid email. Since the pattern condition is only the wildcard, it will fetch all the records from the table. Also, another essential point to note while writing PostgreSQL Regular expressions is that the pattern matching statement always starts with a ‘^’ operator and ends with a ‘$’ sign. These two operators mark the regular expression statement’s beginning and end. Summing up the figure, the expression can be implemented as “^.*$” while using wildcards to filter all records.

Match Regular Expression (Case Sensitive)

SELECT *
FROM Email
WHERE Email ~ 'sqlguide';
—--OUTPUT—--
id |   name   |          email          
----+----------+-------------------------
  1 | Pratibha | pratibha67@sqlguide.edu
  3 | Disha    | disha@sqlguide.edu
  5 | Srishti  | srishti@sqlguide.edu
  7 | Rytham   | rytham6@sqlguide.edu
(4 rows)

In this example, we matched the Email field with the regular expression “sqlguide” using the match operator “~ ” and obtained the results with all the rows matching the pattern.

Match Regular Expression (Case Insensitive)

SELECT *
FROM Email
WHERE Email ~* 'SQLGUIDE';
—--OUTPUT—--
id |   name   |          email          
----+----------+-------------------------
  1 | Pratibha | pratibha67@sqlguide.edu
  3 | Disha    | disha@sqlguide.edu
  5 | Srishti  | srishti@sqlguide.edu
  7 | Rytham   | rytham6@sqlguide.edu
(4 rows)

In this example, we used the match operator ” ~* ” to match the regular expression “SQLGUIDE” with the Email column. As a result, all the rows containing the pattern “SQLGUIDE” with CASE Insensitivity are returned.

Does not Match Regular Expression (Case Sensitive)

SELECT *
FROM Email
WHERE Email !~ 'sqlguide';
—--OUTPUT—--
id |   name   |          email          
----+----------+-------------------------
  2 | Bhavya   | bhavyaa@pythonguide.com
  4 | Divanshi | divanshi23@spguide.com
  6 | Kartik   | kartik@pythonguide.com
  8 | Madhav   | madhav@tsinfo.edu
  9 | Tanisha  | tanisha@pythonguide.com
 10 | Radhika  | radhika41@spguide.com
(6 rows)

The regular expression “sqlguide” was matched with the Email field in this example using the match operator  ”!~ ” The results showed that all the rows lacked the pattern.

Does not Match Regular Expression (Case Insensitive)

SELECT *
FROM Email
WHERE Email !~* 'sqlguide';
—--OUTPUT—--
id |   name   |          email          
----+----------+-------------------------
  2 | Bhavya   | bhavyaa@pythonguide.com
  4 | Divanshi | divanshi23@spguide.com
  6 | Kartik   | kartik@pythonguide.com
  8 | Madhav   | madhav@tsinfo.edu
  9 | Tanisha  | tanisha@pythonguide.com
 10 | Radhika  | radhika41@spguide.com
(6 rows)

In this illustration, we used the match operator ” ~* ” to match the regular expression “sqlguide” with the Email column. As a result, all the rows that did not contain the pattern “sqlguide” with CASE Insensitivity were obtained.

Matches the beginning of the string

SELECT *
FROM Email
WHERE Email ~ '^d';
—--OUTPUT—--
id |   name   |         email          
----+----------+------------------------
  3 | Disha    | disha@sqlguide.edu
  4 | Divanshi | divanshi23@spguide.com
(2 rows)

 For example, if we wanted to search for names that begin with the letter “d,” we would use the ” ^ ” symbol, which indicates the beginning/start of the regex pattern.

Matches the End of the String

SELECT *
FROM Email
WHERE Email ~ 'com$';
—--OUTPUT—--
id |   name   |          email          
----+----------+-------------------------
  2 | Bhavya   | bhavyaa@pythonguide.com
  4 | Divanshi | divanshi23@spguide.com
  6 | Kartik   | kartik@pythonguide.com
  9 | Tanisha  | tanisha@pythonguide.com
 10 | Radhika  | radhika41@spguide.com
(5 rows)

In this example, the “$” sign, which indicates the end of the regex pattern, was used in this case to indicate that we were searching for email addresses that ended in “com.”

Matches Numeric-type Data

SELECT *
FROM Email
WHERE Email ~ '[0-9]';
—--OUTPUT—--
id |   name   |          email          
----+----------+-------------------------
  1 | Pratibha | pratibha67@sqlguide.edu
  4 | Divanshi | divanshi23@spguide.com
  7 | Rytham   | rytham6@sqlguide.edu
 10 | Radhika  | radhika41@spguide.com
(4 rows)

Using the “[0-9]” regular expression, we have shown the email addresses in this example with at least one number.

Matches Numeric-type Data (Double Digit)

SELECT *
FROM Email
WHERE Email ~ '[0-9][0-9]';
—--OUTPUT—--
id |   name   |          email          
----+----------+-------------------------
  1 | Pratibha | pratibha67@sqlguide.edu
  4 | Divanshi | divanshi23@spguide.com
 10 | Radhika  | radhika41@spguide.com
(3 rows)

Using the regular expression “[0-9][0-9],” we have shown the results for those email addresses that include at least two numeric characters in this example.

PostgreSQL Regex Functions & Usage

PostgreSQL Regex Functions
Image Source: Self

We will discuss the following functions along with PostgreSQL regex example in the following sections. Let’s begin!

Postgres regexp_replace

To replace an occurrence of a numeric data type with a symbol, let’s utilize the REGEXP_REPLACE() function:

SELECT REGEXP_REPLACE(Email,'[0-9]','*') 
From Email;
—--OUTPUT—--
    regexp_replace      
-------------------------
 pratibha*7@sqlguide.edu
 bhavyaa@pythonguide.com
 disha@sqlguide.edu
 divanshi*3@spguide.com
 srishti@sqlguide.edu
 kartik@pythonguide.com
 rytham*@sqlguide.edu
 madhav@tsinfo.edu
 tanisha@pythonguide.com
 radhika*1@spguide.com
(10 rows)

The output makes it clear that the REGEXP_REPLACE() function inserts a ” * ” symbol in place of any numeric data type that appears in the ” Email ” field. This function replaces the first instance of the numeric data type, as seen.

Postgres regexp_replace All Occurrences

SELECT REGEXP_REPLACE(Email,'[0-9]','*','g') 
From Email;
—--OUTPUT—--
  regexp_replace      
-------------------------
 pratibha**@sqlguide.edu
 bhavyaa@pythonguide.com
 disha@sqlguide.edu
 divanshi**@spguide.com
 srishti@sqlguide.edu
 kartik@pythonguide.com
 rytham*@sqlguide.edu
 madhav@tsinfo.edu
 tanisha@pythonguide.com
 radhika**@spguide.com
(10 rows)

The output clearly shows that the REGEXP_REPLACE() function is used to replace every instance of the ” Numeric ” type data with the symbol ” * “. We provided a flag “g” that searches for every instance of the specified pattern.

Postgres regexp_matches in WHERE Clause

When a Regex is run against a string, the REGEXP_MATCHES() function compares the two and returns the string that matches the pattern as a set. We’ll look at REGEXP_MATCHES() function example for a better understanding.

SELECT REGEXP_MATCHES(Email,'.+@(.*)$')
FROM Email;
—--OUTPUT—--
regexp_matches   
-------------------
 {sqlguide.edu}
 {pythonguide.com}
 {sqlguide.edu}
 {spguide.com}
 {sqlguide.edu}
 {pythonguide.com}
 {sqlguide.edu}
 {tsinfo.edu}
 {pythonguide.com}
 {spguide.com}
(10 rows)

In this case, we searched for the email address domain name using the REGEXP_MATCHES() function. The REGEXP_MATCHES() function returned the outcomes as a set.

Postgres Regex Substring

Let us understand this better in the following example. It extracts substrings from a given field using the PostgreSQL Regex. For instance, let us extract some substring from the Email table using the SUBSTRING() function to match it against the Regular expression.

SELECT DISTINCT SUBSTRING(Email FROM '.+@(.*)$') 
FROM Email;
—--OUTPUT—--
 substring    
-----------------
 sqlguide.edu
 spguide.com
 pythonguide.com
 tsinfo.edu
(4 rows)

Here we’ve used the SUBSTRING() function to extract the Domain name of the Email address from the table Email using the DISTINCT keyword, which only displays the distinct names.

Postgres Regex Numbers Only

Here we will use the REGEXP_REPLACE() function to extract only the numbers from a string in PostgreSQL.

Let us look at an example to extract numeric data types’ occurrences.

SELECT REGEXP_REPLACE(Email,'\D','','g')
FROM Email;
—--OUTPUT—--
regexp_replace 
----------------
 67
 23
 6
 41
(10 rows)

From the output obtained, it can be seen that we have extracted only the numeric type data from the field Email in PostgreSQL using the REGEXP_REPLACE() function. Here “\D” is class shorthand for “Not a Digit.”

Postgres Regex Split

In PostgreSQL, the SPLIT_PART() function can split a string into many parts. To divide a string into several pieces, we must pass the String, the Delimiter, and the Filed Number. The PostgreSQL SPLIT_PART() function’s syntax is as follows:

SPLIT_PART(STRING,DELIMITER,FIELD_NUMBER)

Case#1

SELECT SPLIT_PART(Email,'@',1)
FROM Email;
—--OUTPUT—--
split_part 
------------
 pratibha67
 bhavyaa
 disha
 divanshi23
 srishti
 kartik
 rytham6
 madhav
 tanisha
 radhika41
(10 rows)

Case#2

SELECT SPLIT_PART(Email,'@',2)
FROM Email;
—--OUTPUT—--
  split_part    
-----------------
 sqlguide.edu
 pythonguide.com
 sqlguide.edu
 spguide.com
 sqlguide.edu
 pythonguide.com
 sqlguide.edu
 tsinfo.edu
 pythonguide.com
 spguide.com
(10 rows)

The Email table’s Email field was divided into two pieces in this example using the ” @ ” delimiter and the SPLIT_PART() function. It returns the text before the delimiter when we supply the FIELD NUMBER as “1,” in Case1, and the text after the delimiter when we pass the FIELD NUMBER as “2”, in Case2.

Postgres Regex Remove Special Characters

Remove Special Characters utilizing PostgreSQL Regex. Using the REGEXP_REPLACE() function, all Special Characters from a supplied text can be eliminated. We’ll examine an example where we utilize PostgreSQL’s REGEXP_REPLACE() function to eliminate every instance of a Special Character from a table’s column.

SELECT REGEXP_REPLACE(Email, '[^\w]+','','g')
FROM Email;
—--OUTPUT—--
   regexp_replace     
-----------------------
 pratibha67sqlguideedu
 bhavyaapythonguidecom
 dishasqlguideedu
 divanshi23spguidecom
 srishtisqlguideedu
 kartikpythonguidecom
 rytham6sqlguideedu
 madhavtsinfoedu
 tanishapythonguidecom
 radhika41spguidecom
(10 rows)

This example shows that the REGEXP_REPLACE() has been used to eliminate any instances of special characters.

Advantages of PostgreSQL Regex

The advantages of working with PostgreSQL Regex are listed below:

  • PostgreSQL Regex functions support numerous flags. For instance: REGEXP_MATCHES() supports
    • flag ‘i’: match case-insensitively.
    • Flag ‘g’: search globally for each occurrence.
  • The PostgreSQL Regex function returns no row, one row, or multiple rows per defined pattern.
  • The PostgreSQL Regex function may return zero, one, or several rows depending on the pattern specified.
  • The PostgreSQL Regex functions allow us to search for any word in the needed string at any position, or we can search for simply the first occurrence.
  • We may also extract a table’s column values using PostgreSQL Regex.
  • The PostgreSQL Regex functions can be used for validity purposes.
  • It allows us to perform challenging operations like string searches using random characters.
  • Compared to the complex mix of case statements and substrings, PostgreSQL Regex is more versatile and performs better.

Before we wrap up, let’s learn about the operators in Pattern Matching.

Functions and Operators in Pattern Matching 

LIKE, SIMILAR TO Regular Expressions, and POSIX Regular Expressions are the main three approaches in pattern matching. 

1. LIKE

string LIKE pattern [ESCAPE escape-character]
string NOT LIKE pattern [ESCAPE escape-character]
  • The LIKE expression returns true if the string matches the given pattern. False will be returned by the NOT LIKE expression when LIKE returns true. This is true reversely as well. NOT (string LIKE pattern) is an equivalent expression.
  • If the pattern does not contain percent signs or underscores, then the pattern stands for the string itself; in that case, LIKE would be the equals operator. An underscore (_) in pattern represents any single character; % matches any pattern of zero or more characters.

Some examples:

'abc' LIKE 'abc'    true
'abc' LIKE 'a%'     true
'abc' LIKE '_b_'    true
'abc' LIKE 'c'      false

2. SIMILAR TO Regular Expressions

string SIMILAR TO pattern [ESCAPE escape-character]
string NOT SIMILAR TO pattern [ESCAPE escape-character]
  • True or false returned by the SIMILAR TO operator is based on whether its pattern matches the provided string. It differs from LIKE in such a way that it reads the pattern using the SQL standard’s definition of a regular expression. SQL regular expressions are a cross between LIKE and common (POSIX) regular expression notation.
  • SIMILAR TO operator would be helpful only if its pattern is matching with the full string. In common regular expression behavior, the pattern need to match only a part of the string.
  • _ and % as wildcard characters are used by SIMILAR TO that notes any single character and any string, in the respective order. These are similar to . and .* in POSIX regular expressions.
'abc' SIMILAR TO 'abc'          true
'abc' SIMILAR TO 'a'            false
'abc' SIMILAR TO '%(b|d)%'      true
'abc' SIMILAR TO '(b|c)%'       false
'-abc-' SIMILAR TO '%\mabc\M%'  true
'xabcy' SIMILAR TO '%\mabc\M%'  false

3. POSIX Regular Expressions

POSIX Regular Expressions

Source: 

  • These type of operators are powerful compared to the ones above. The abbreviated definition of a regular set of strings is called a regular expression. If a string belongs to the regular set defined by the regular expression, that matches with a regular expression.
  • Regular expressions use various special characters compared to LIKE. A regular expression can match anywhere within a string, if the regular expression is not precisely anchored to the starting or end of the string.

Some examples:

'abcd' ~ 'bc'     true
'abcd' ~ 'a.c'    true — dot matches any character
'abcd' ~ 'a.*d'   true — * repeats the preceding pattern item
'abcd' ~ '(b|x)'  true — | means OR, parentheses group
'abcd' ~ '^a'     true — ^ anchors to start of string
'abcd' ~ '^(b|c)' false — would match except for anchoring

Conclusion

In this tutorial, we went in-depth on how to use regular expressions to write SQL statements in PostgreSQL. We can create dynamic SQL queries that use regular expressions to match patterns from a database column. The SQL LIKE operator in PostgreSQL can also match patterns, but the searches are more restricted. Greater flexibility and the ability to dynamically alter the size of the pattern to be matched are provided by regular expressions.

Want to know more? Check out these fantastic Hevo’s articles:

Copying data into a warehouse using ETL for Data Analysis may be time-consuming if you frequently utilize PostgreSQL.

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Want to give Hevo a try? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also look at the amazing price, which will assist you while selecting the best plan for your requirements.

Share your experience understanding the PostgreSQL Regex & Pattern Matching in the comments below! We would love to hear your thoughts.

Pratibha Sarin
Former Marketing Analyst, Hevo Data

With a background in marketing research at Hevo Data, Pratibha is a data science enthusiast who has a flair for writing in-depth article in data industry. She has curated technical content on various topics related to data integration and infrastructure.

Try Hevo’s No-Code Automated Data Pipeline For PostgreSQL