If you have been scouting for the right article to learn about Regex Postgres, you have come to the right place. Hold on tight while we shed light on the concept. So far, we have usually known to utilize the WHERE clause to filter searches. When there is no requirement for an exact match in the query, but you still want to see all the entries that fit the criteria, pattern matching can be helpful.
The LIKE operator works flawlessly for standard SQL operations, but it appears to have some performance concerns when filtering a vast database. Additionally, the LIKE operator’s filtering condition is restricted to using only wildcards (percent) to discover patterns. PostgreSQL employs Regular Expressions to get around pattern matching.
In this article, we will learn about PostgreSQL Regex. PostgreSQL uses POSIX or “Portable Operating System Interface for Unix” regular expressions, which are better than LIKE and SIMILAR TO operators used for pattern matching.
Table of Contents
What is PostgreSQL Regex?
Image Source: Self
Regex stands for Regular Expressions. Regex is a sequence of characters that defines a pattern that can filter data in PostgreSQL. The TILDE (~) operator and the wildcard operator “.*” is used to implement PostgreSQL’s regular expressions.
Regular Expressions have long been widely used in programming languages, but utilizing them in a SQL statement makes the query highly dynamic and improves performance in massive databases.
We’ll look at regular expressions and how to work with them using several functions, in addition to the tilde operator family, which matches regular expressions in case-sensitive and case-insensitive circumstances.
The functionality of the LIKE and SIMILAR TO operators is essentially the same. Let’s discuss these operators now and examine how to apply them to regex.
PostgreSQL Regex Match Operators
Image Source: Self
As the ability of businesses to collect data explodes, data teams have a crucial role in fueling data-driven decisions. Yet, they struggle to consolidate the data scattered across sources into their warehouse to build a single source of truth. Broken pipelines, data quality issues, bugs and errors, and lack of control and visibility over the data flow make data integration a nightmare.
1000+ data teams rely on Hevo’s Data Pipeline Platform to integrate data from over 150+ sources in a matter of minutes. Billions of data events from sources as varied as SaaS apps, Databases, File Storage and Streaming sources can be replicated in near real-time with Hevo’s fault-tolerant architecture. What’s more – Hevo puts complete control in the hands of data teams with intuitive dashboards for pipeline monitoring, auto-schema management, custom ingestion/loading schedules.
All of this combined with transparent pricing and 24×7 support makes us the most loved data pipeline software on review sites.
Take our 14-day free trial to experience a better way to manage data pipelines.
How does Regex PostgreSQL work?
Image Source: Self
Sample Table & Use Cases
We will cover the following use cases to understand the working of PostgreSQL Regex.
Before we deep dive to learn about the working of Regex, let us create our sample table using the following lines of code.
CREATE TABLE
CREATE TABLE Email(
ID SERIAL PRIMARY KEY,
Name TEXT NOT NULL,
Email TEXT);
INSERT VALUES INTO THE TABLE
INSERT INTO Email(Name,Email) VALUES('Pratibha','pratibha67@sqlguide.edu');
INSERT INTO Email(Name,Email) VALUES('Bhavya','bhavyaa@pythonguide.com');
INSERT INTO Email(Name,Email) VALUES('Disha','disha@sqlguide.edu');
INSERT INTO Email(Name,Email) VALUES('Divanshi','divanshi23@spguide.com');
INSERT INTO Email(Name,Email) VALUES('Srishti','srishti@sqlguide.edu');
INSERT INTO Email(Name,Email) VALUES('Kartik','kartik@pythonguide.com');
INSERT INTO Email(Name,Email) VALUES('Rytham','rytham6@sqlguide.edu');
INSERT INTO Email(Name,Email) VALUES('Madhav','madhav@tsinfo.edu');
INSERT INTO Email(Name,Email) VALUES('Tanisha','tanisha@pythonguide.com');
INSERT INTO Email(Name,Email) VALUES('Radhika','radhika41@spguide.com');
OUTPUT
SELECT *
FROM Email;
—--OUTPUT—--
id | name | email
----+----------+-------------------------
1 | Pratibha | pratibha67@sqlguide.edu
2 | Bhavya | bhavyaa@pythonguide.com
3 | Disha | disha@sqlguide.edu
4 | Divanshi | divanshi23@spguide.com
5 | Srishti | srishti@sqlguide.edu
6 | Kartik | kartik@pythonguide.com
7 | Rytham | rytham6@sqlguide.edu
8 | Madhav | madhav@tsinfo.edu
9 | Tanisha | tanisha@pythonguide.com
10 | Radhika | radhika41@spguide.com
(10 rows)
Let us understand the syntax & basic working of PostgreSQL Regex by the following example:
SELECT *
FROM Email
WHERE Email ~ '^.*$';
—--OUTPUT—--
id | name | email
----+----------+-------------------------
1 | Pratibha | pratibha67@sqlguide.edu
2 | Bhavya | bhavyaa@pythonguide.com
3 | Disha | disha@sqlguide.edu
4 | Divanshi | divanshi23@spguide.com
5 | Srishti | srishti@sqlguide.edu
6 | Kartik | kartik@pythonguide.com
7 | Rytham | rytham6@sqlguide.edu
8 | Madhav | madhav@tsinfo.edu
9 | Tanisha | tanisha@pythonguide.com
10 | Radhika | radhika41@spguide.com
(10 rows)
As shown above, we have used PostgreSQL Regex using the TILDE (~) operator and the wildcard ‘.*’. This query will select all the records from the Email table with a valid email. Since the pattern condition is only the wildcard, it will fetch all the records from the table. Also, another essential point to note while writing PostgreSQL Regular expressions is that the pattern matching statement always starts with a ‘^’ operator and ends with a ‘$’ sign. These two operators mark the regular expression statement’s beginning and end. Summing up the figure, the expression can be implemented as “^.*$” while using wildcards to filter all records.
Match Regular Expression (Case Sensitive)
SELECT *
FROM Email
WHERE Email ~ 'sqlguide';
—--OUTPUT—--
id | name | email
----+----------+-------------------------
1 | Pratibha | pratibha67@sqlguide.edu
3 | Disha | disha@sqlguide.edu
5 | Srishti | srishti@sqlguide.edu
7 | Rytham | rytham6@sqlguide.edu
(4 rows)
In this example, we matched the Email field with the regular expression “sqlguide” using the match operator “~ ” and obtained the results with all the rows matching the pattern.
Match Regular Expression (Case Insensitive)
SELECT *
FROM Email
WHERE Email ~* 'SQLGUIDE';
—--OUTPUT—--
id | name | email
----+----------+-------------------------
1 | Pratibha | pratibha67@sqlguide.edu
3 | Disha | disha@sqlguide.edu
5 | Srishti | srishti@sqlguide.edu
7 | Rytham | rytham6@sqlguide.edu
(4 rows)
In this example, we used the match operator ” ~* ” to match the regular expression “SQLGUIDE” with the Email column. As a result, all the rows containing the pattern “SQLGUIDE” with CASE Insensitivity are returned.
Does not Match Regular Expression (Case Sensitive)
SELECT *
FROM Email
WHERE Email !~ 'sqlguide';
—--OUTPUT—--
id | name | email
----+----------+-------------------------
2 | Bhavya | bhavyaa@pythonguide.com
4 | Divanshi | divanshi23@spguide.com
6 | Kartik | kartik@pythonguide.com
8 | Madhav | madhav@tsinfo.edu
9 | Tanisha | tanisha@pythonguide.com
10 | Radhika | radhika41@spguide.com
(6 rows)
The regular expression “sqlguide” was matched with the Email field in this example using the match operator ”!~ ” The results showed that all the rows lacked the pattern.
Does not Match Regular Expression (Case Insensitive)
SELECT *
FROM Email
WHERE Email !~* 'sqlguide';
—--OUTPUT—--
id | name | email
----+----------+-------------------------
2 | Bhavya | bhavyaa@pythonguide.com
4 | Divanshi | divanshi23@spguide.com
6 | Kartik | kartik@pythonguide.com
8 | Madhav | madhav@tsinfo.edu
9 | Tanisha | tanisha@pythonguide.com
10 | Radhika | radhika41@spguide.com
(6 rows)
In this illustration, we used the match operator ” ~* ” to match the regular expression “sqlguide” with the Email column. As a result, all the rows that did not contain the pattern “sqlguide” with CASE Insensitivity were obtained.
Matches the beginning of the string
SELECT *
FROM Email
WHERE Email ~ '^d';
—--OUTPUT—--
id | name | email
----+----------+------------------------
3 | Disha | disha@sqlguide.edu
4 | Divanshi | divanshi23@spguide.com
(2 rows)
For example, if we wanted to search for names that begin with the letter “d,” we would use the ” ^ ” symbol, which indicates the beginning/start of the regex pattern.
Matches the End of the String
SELECT *
FROM Email
WHERE Email ~ 'com$';
—--OUTPUT—--
id | name | email
----+----------+-------------------------
2 | Bhavya | bhavyaa@pythonguide.com
4 | Divanshi | divanshi23@spguide.com
6 | Kartik | kartik@pythonguide.com
9 | Tanisha | tanisha@pythonguide.com
10 | Radhika | radhika41@spguide.com
(5 rows)
In this example, the “$” sign, which indicates the end of the regex pattern, was used in this case to indicate that we were searching for email addresses that ended in “com.”
Matches Numeric-type Data
SELECT *
FROM Email
WHERE Email ~ '[0-9]';
—--OUTPUT—--
id | name | email
----+----------+-------------------------
1 | Pratibha | pratibha67@sqlguide.edu
4 | Divanshi | divanshi23@spguide.com
7 | Rytham | rytham6@sqlguide.edu
10 | Radhika | radhika41@spguide.com
(4 rows)
Using the “[0-9]” regular expression, we have shown the email addresses in this example with at least one number.
Matches Numeric-type Data (Double Digit)
SELECT *
FROM Email
WHERE Email ~ '[0-9][0-9]';
—--OUTPUT—--
id | name | email
----+----------+-------------------------
1 | Pratibha | pratibha67@sqlguide.edu
4 | Divanshi | divanshi23@spguide.com
10 | Radhika | radhika41@spguide.com
(3 rows)
Using the regular expression “[0-9][0-9],” we have shown the results for those email addresses that include at least two numeric characters in this example.
PostgreSQL Regex Functions & Usage
Image Source: Self
We will discuss the following functions & their usage in the following article. Let’s begin!
Postgres regexp_replace
To replace an occurrence of a numeric data type with a symbol, let’s utilize the REGEXP_REPLACE() function:
SELECT REGEXP_REPLACE(Email,'[0-9]','*')
From Email;
—--OUTPUT—--
regexp_replace
-------------------------
pratibha*7@sqlguide.edu
bhavyaa@pythonguide.com
disha@sqlguide.edu
divanshi*3@spguide.com
srishti@sqlguide.edu
kartik@pythonguide.com
rytham*@sqlguide.edu
madhav@tsinfo.edu
tanisha@pythonguide.com
radhika*1@spguide.com
(10 rows)
The output makes it clear that the REGEXP_REPLACE() function inserts a ” * ” symbol in place of any numeric data type that appears in the ” Email ” field. This function replaces the first instance of the numeric data type, as seen.
Postgres regexp_replace All Occurrences
SELECT REGEXP_REPLACE(Email,'[0-9]','*','g')
From Email;
—--OUTPUT—--
regexp_replace
-------------------------
pratibha**@sqlguide.edu
bhavyaa@pythonguide.com
disha@sqlguide.edu
divanshi**@spguide.com
srishti@sqlguide.edu
kartik@pythonguide.com
rytham*@sqlguide.edu
madhav@tsinfo.edu
tanisha@pythonguide.com
radhika**@spguide.com
(10 rows)
The output clearly shows that the REGEXP_REPLACE() function is used to replace every instance of the ” Numeric ” type data with the symbol ” * “. We provided a flag “g” that searches for every instance of the specified pattern.
Postgres regexp_matches in WHERE Clause
When a Regex is run against a string, the REGEXP_MATCHES() function compares the two and returns the string that matches the pattern as a set. We’ll look at REGEXP_MATCHES() function example for a better understanding.
SELECT REGEXP_MATCHES(Email,'.+@(.*)$')
FROM Email;
—--OUTPUT—--
regexp_matches
-------------------
{sqlguide.edu}
{pythonguide.com}
{sqlguide.edu}
{spguide.com}
{sqlguide.edu}
{pythonguide.com}
{sqlguide.edu}
{tsinfo.edu}
{pythonguide.com}
{spguide.com}
(10 rows)
In this case, we searched for the email address domain name using the REGEXP_MATCHES() function. The REGEXP_MATCHES() function returned the outcomes as a set.
Postgres Regex Substring
Let us understand this better in the following example. It extracts substrings from a given field using the PostgreSQL Regex. For instance, let us extract some substring from the Email table using the SUBSTRING() function to match it against the Regular expression.
SELECT DISTINCT SUBSTRING(Email FROM '.+@(.*)$')
FROM Email;
—--OUTPUT—--
substring
-----------------
sqlguide.edu
spguide.com
pythonguide.com
tsinfo.edu
(4 rows)
Here we’ve used the SUBSTRING() function to extract the Domain name of the Email address from the table Email using the DISTINCT keyword, which only displays the distinct names.
Postgres Regex Numbers Only
Here we will use the REGEXP_REPLACE() function to extract only the numbers from a string in PostgreSQL.
Let us look at an example to extract numeric data types’ occurrences.
SELECT REGEXP_REPLACE(Email,'\D','','g')
FROM Email;
—--OUTPUT—--
regexp_replace
----------------
67
23
6
41
(10 rows)
From the output obtained, it can be seen that we have extracted only the numeric type data from the field Email in PostgreSQL using the REGEXP_REPLACE() function. Here “\D” is class shorthand for “Not a Digit.”
Postgres Regex Split
In PostgreSQL, the SPLIT_PART() function can split a string into many parts. To divide a string into several pieces, we must pass the String, the Delimiter, and the Filed Number. The PostgreSQL SPLIT_PART() function’s syntax is as follows:
SPLIT_PART(STRING,DELIMITER,FIELD_NUMBER)
Case#1
SELECT SPLIT_PART(Email,'@',1)
FROM Email;
—--OUTPUT—--
split_part
------------
pratibha67
bhavyaa
disha
divanshi23
srishti
kartik
rytham6
madhav
tanisha
radhika41
(10 rows)
Case#2
SELECT SPLIT_PART(Email,'@',2)
FROM Email;
—--OUTPUT—--
split_part
-----------------
sqlguide.edu
pythonguide.com
sqlguide.edu
spguide.com
sqlguide.edu
pythonguide.com
sqlguide.edu
tsinfo.edu
pythonguide.com
spguide.com
(10 rows)
The Email table’s Email field was divided into two pieces in this example using the ” @ ” delimiter and the SPLIT_PART() function. It returns the text before the delimiter when we supply the FIELD NUMBER as “1,” in Case1, and the text after the delimiter when we pass the FIELD NUMBER as “2”, in Case2.
Postgres Regex Remove Special Characters
Remove Special Characters utilizing PostgreSQL Regex. Using the REGEXP_REPLACE() function, all Special Characters from a supplied text can be eliminated. We’ll examine an example where we utilize PostgreSQL’s REGEXP_REPLACE() function to eliminate every instance of a Special Character from a table’s column.
SELECT REGEXP_REPLACE(Email, '[^\w]+','','g')
FROM Email;
—--OUTPUT—--
regexp_replace
-----------------------
pratibha67sqlguideedu
bhavyaapythonguidecom
dishasqlguideedu
divanshi23spguidecom
srishtisqlguideedu
kartikpythonguidecom
rytham6sqlguideedu
madhavtsinfoedu
tanishapythonguidecom
radhika41spguidecom
(10 rows)
This example shows that the REGEXP_REPLACE() has been used to eliminate any instances of special characters.
Advantages of PostgreSQL Regex
The advantages of working with PostgreSQL Regex are listed below:
- PostgreSQL Regex functions support numerous flags. For instance: REGEXP_MATCHES() supports
- flag ‘i’: match case-insensitively.
- Flag ‘g’: search globally for each occurrence.
- The PostgreSQL Regex function returns no row, one row, or multiple rows per defined pattern.
- The PostgreSQL Regex function may return zero, one, or several rows depending on the pattern specified.
- The PostgreSQL Regex functions allow us to search for any word in the needed string at any position, or we can search for simply the first occurrence.
- We may also extract a table’s column values using PostgreSQL Regex.
- The PostgreSQL Regex functions can be used for validity purposes.
- It allows us to perform challenging operations like string searches using random characters.
- Compared to the complex mix of case statements and substrings, PostgreSQL Regex is more versatile and performs better.
Conclusion
In this tutorial, we went in-depth on how to use regular expressions to write SQL statements in PostgreSQL. We can create dynamic SQL queries that use regular expressions to match patterns from a database column. The SQL LIKE operator in PostgreSQL can also match patterns, but the searches are more restricted. Greater flexibility and the ability to dynamically alter the size of the pattern to be matched are provided by regular expressions.
Hungry for more? Check out these fantastic articles at Hevo:
Copying data into a warehouse using ETL for Data Analysis may be time-consuming if you frequently utilize PostgreSQL.
Hevo, No-Code Data Pipeline, is at your disposal to rescue you. You can save your engineering bandwidth by establishing a Data Pipeline and start to replicate your data from PostgreSQL to the desired warehouse using Hevo within minutes.
VISIT OUR WEBSITE TO EXPLORE HEVO
Want to give Hevo a try? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also look at the amazing price, which will assist you while selecting the best plan for your requirements.
Share your experience understanding the PostgreSQL Regex & Pattern Matching in the comments below! We would love to hear your thoughts.