Understanding PostgreSQL Full Text Search: 5 Critical Aspects

on Database, PostgreSQL, Tutorial • August 28th, 2017 • Write for Hevo

Setting up PostgreSQL Full Text Search (FTS) helps users bring up a fully-fledged search engine without a steep learning curve and new software to install. It can be used to set up anything from a small catalog website to a huge news website.

You might wonder what’s the big deal with Full-Text Search. you can use the LIKE or ILIKE clause any day in a query to perform a search on any given string. The problem with the traditional LIKE clause is that it has a very high-performance penalty. That means, it takes a long time to return results as the words inside the text are not indexed. In this blog, you will learn about the PostgreSQL Full Text Search and the PostgreSQL Phrased search. Moroever, you will understand these with examples and will also learn about the Vector technique.

Table of Contents

Before getting started you need to have a working copy of PostgreSQL installed on your favorite OS.

PostgreSQL Full Text Search Basics for the Uninitiated

PostgreSQL Full Text Search refers to a technique in which you search for a single computer-stored document or a collection in your full-text database. It provides you with the capability to identify natural languages documents that satisfy a query. PostgreSQL Full Text Search allows indexing of documents which helps in the rapid search of documents. PostgreSQL Full Text Search also supports dictionaries that allow fine-grained control over token normalization. Learn more about PostgreSQL Full Text Search from here.

PostgreSQL Full Text Search
Image Source
  • During a PostgreSQL Full Text search, Postgres will not include “stop words” like “the” or “a” while finding a match.
  • It implements “Stemming” to find derivatives of a root word. For Example, stemming will map “runs” and “running” to the root word run.
  • PostgreSQL will rank your search matches and will place the best matches at the top of the results.

PostgreSQL Full Text Search Entities

To understand the PostgreSQL Full Text Search, you need to be familiar with the following terms first:

  • Document: A set of data on which your full-text search will take place is called Document. In PostgreSQL, you can create a document using either a single column or multiple columns or even using multiple tables.
  • Lexemes: The PostgreSQL document during processing is parsed into multiple tokens, which consist of words, sentences, etc. of the text present in the document. These tokens are modified to form more meaningful entities called lexemes.
  • Dictionaries: The conversion of Tokens into Lexemes is done using dictionaries. PostgreSQL offers both options of either using the built-in dictionaries or creating your own dictionaries. These dictionaries are responsible to determine which stop words are not important and whether differently derived words have the same base language.

Simplify your Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as PostgreSQL, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ data sources and loads the data onto the desired Data Warehouse, enriches the data, and transforms it into an analysis-ready form without writing a single line of code.

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different Business Intelligence (BI) tools as well.

Get Started with Hevo for Free

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Example: Searching Storm Event Details

This section will explain the implementation of PostgreSQL Full Text Search and the search tools involved with it.

You can create a table containing data related to storm events gathered by the U.S. National Weather Service. Also, add a small portion of the storm data into our table using the below code.

CREATE TABLE se_details (
    episode_id int,
    event_id int primary key,
    state text,
    event_type text,
    begin_date_time timestamp,
    episode_narrative text,
    event_narrative text,

In this example, you can carry out a PostgreSQL Full Text Search on the event_narrative column. Moreover, insert a new column to the table to store the preprocessed search document (i.e. the list of lexemes):

ALTER TABLE se_details ADD COLUMN ts tsvector 
    GENERATED ALWAYS AS (to_tsvector('english', event_narrative)) STORED;

In the new versions of Postgres, an automatic column “ts” is synced with our source data. After that, you can then generate a GIN index using the ts column:

SELECT state, begin_date_time, event_type, event_narrative
FROM se_details
WHERE ts @@ to_tsquery('english', 'tornado');

Another Postgres full-text search data type is “tsquery”. It contains search terms that are also present as lexemes. You should pass the input search term to the to_tsquery function to optimize the query during the Postgres Full Text Search.

This query will provide records as output. These records will contain the search term “tornado” somewhere in the text string. Moreover, it will also provide records in the result set which have “tornado” as lexeme (“tornado-like” and “tornadoes”):

state           | WISCONSIN
begin_date_time | 2018-08-28 15:30:00
event_type      | Thunderstorm Wind
event_narrative | A swath of widespread tree and crop damage across the southern portion of the county. Sections of trees and crops compl
etely flattened, and some structural damage from fallen trees or due to the strong downburst winds. Various roads closed due to fallen tre
es. Two semi-trucks were overturned on highway 57 in Waldo. The widespread wind damage and tornadoes caused structural damage to many home
s with 70 homes sustaining affected damage, 3 homes with minor damage, 2 homes with major damage, one home destroyed, and 2 businesses wit
h minor damage.

Mastering PostgreSQL Tools: PostgreSQL Full Text Search and Phrase Search

PostgreSQL provides you with the following 2 key searching techniques:

1) PostgreSQL Full-Text Search

You can search for words in the “Full Text” of the document. Search terms can either be individual words or phrases. The search terms are south directly “word-for_word” in the text or a selected part of the text, and are highlighted if found. You can further use Boolean Operators like AND, OR, NOT, for added functionality.

You will face another issue while searching for a word like “query”, which may be present in its plural form “queries”. In this situation, you won’t get search matches by performing a simple pattern search using LIKE. Moreover, using Regular Expression in this search will also not be the ideal solution as it consumes a lot of time.

A more feasible solution for this issue is using a semantic vector for all of the words present in a document. This way, when you search for a word like “query”, your results will provide all instances of the word (and its tenses), even if you searched for “queried” or “queries”.Furthermore, it will give you faster results as you don’t have to search the full document itself.

2) PostgreSQL Phrase Search

The full text search is helpful if you wish to find a word (or a set of words) in a document. However, if you want to seek a phrase in that document, you can’t rely on full text search.
For example, if you wish to search for “walking fast” or “to walk very fast”, you can use words like walk & fast in the above-mentioned techniques. This will give you documents containing these words, but the results will be in any random configuration. To overcome this issue, PostgreSQL 9.6 has the operator that you can use with tsquery to search via word proximity.

The following example will help you understand Phrase search better. You will notice that the “to_tsquery” with the “AND” operator doesn’t create any distinction when it comes to the words’ location in the documents:

SELECT document_id, document_text FROM documents  
WHERE document_tokens @@ to_tsquery('jump & quick');

Output of the this code is shown below:

 document_id |             document_text
-------------+---------------------------------------
           3 | The five boxing wizards jump quickly.
           4 | How vexingly quick daft zebras jump!
(2 rows)

Now you can use the proximity operator for phrase search in the following manner:

SELECT document_id, document_text FROM documents  
WHERE document_tokens @@ to_tsquery('jump <-> quick');

It will generate the following output:

 document_id |             document_text
-------------+---------------------------------------
           3 | The five boxing wizards jump quickly.
(1 row)

Searching Against the Vector

Now, to use the tsvector you need to perform a lot of pre-processing to simplify your search process. You will need to use a function called tsquery if you want to search against the tsvector column.

PostgreSQL Full Text: Phrase Search
Image Source

The @@ operator is used when you need direct matches. Postgres also provide other operators to perform more sophisticated matches. The output that you will get will be in a true or false format. You can see in the above image that the search result for the word scares, is true. Preprocessing played a crucial role in generating this output.

Limitations of PostgreSQL Full Text Search

Some of the limitations of PostgreSQL Full Text Search (FTS) are listed below:

  • The length of each lexeme must be less than 2K bytes.
  • You should not have more than 256 positions per lexeme.
  • The length of tsvector should be less than 1 MB.
  • The total number of nodes in a tsquery should be less than 32,768.

We were able to successfully query all the documents which had the word Breaking Bad in it with just the word ‘break’.

For more such posts, visit our blog page.

Conclusion

The article introduced you to PostgreSQL Full Text Search and it’s variants. It also presented and example showcasing the implementation the PostgreSQL Full Text Search. Thae article also discussed the Phrased Search with an example and introduced the Vector technique. Furthermore, the article listed down the limitations of using the PostgreSQL Full Text approach.

Visit our Website to Explore Hevo

Now, to perform Data Analytics on your PosgreSQL data, you first need to export this data to a Data Warehouse. This will require you to custom code complex scripts to develop the ETL processes. Hevo Data can automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc. This platform allows you to transfer data from 100+ multiple sources like PostgreSQL to Cloud-based Data Warehouses like Amazon Redshift, Snowflake, Google BigQuery, etc. It will provide you with a hassle-free experience and make your work life much easier.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your understanding of the PostgreSQL Full Text Search in the comments below!

No-code Data Pipeline for your Data Warehouse