Imagine a multi-user environment in PostgreSQL, having a Table X with millions and billions of rows, edited frequently with new updates and deletions. In normal PostgreSQL operations, until and unless the rows are not physically removed, they remain untenanted, thus, blocking up the space inside the table. PostgreSQL characterizes these rows as “Dead Tuples.” 

As a result, Table X’s size will keep on increasing. And dead tuples will keep on occupying the space for centuries to come. Moreover, these particular rows will remain inoperable with no reliances attached whatsoever. 

Now the need arises for a healthy Database Maintainance routine to maintain the usability and efficiency of data stored in the Postgres server. And to do so is the responsibility of a Database Administrator. To facilitate the process of removing dead tuples, we use the PostgreSQL VACUUM Command.

PostgreSQL VACUUM Command — At a Glance

How well does your vacuumer vacuum ? | Kraken Systems Ltd.
A Stormtrooper Vacuuming Dead Tuples

The PostgreSQL VACUUM Command reclaims space occupied by dead tuples from tables and their indexes. The command will not empty up the disk space in the OS but instead makes it usable for new rows.

A simple PostgreSQL VACUUM Command is shown below:

VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ table_and_columns [, ...] ]

Few pointers to understand the query better:

  1. If you do not use a table_and_columns list, the command will process every table and materialized view from the current database.
  2. The PostgreSQL Command only processes a request for the database the user has permission to vacuum.
  3. Always ensure you have adequate system storage available before running the PostgreSQL VACUUM command.
  4. The VACUUM ANALYZE function, first, will process a VACUUM request, then an ANALYZE request for each of the selected tables.
  5. A simple VACUUM request will only reclaim storage, make it available for re-use.
  6. You can leverage the parallel vacuum feature to use multiple CPUs to process multiple indexes.
  7. The VACUUM FULL request will rewrite the entire content into a new disk file with no extra space. Post-run, the storage space will return to the operating system.

TIP: The PostgreSQL Vacuum Command requires a lot of CPU & I/O compute memory; hence, it is advised to use the command ONLY when the application activity is low.

Leverage Data Effortlessly at Scale – Hevo’s Reliable Data Pipelines 

Set up pipelines for over 150+ sources including SaaS applications and Databases into 15+ Destinations. Get rid of manual maintenance each time the source data or API changes.

With Hevo, you get a fault-tolerant architecture that scales with zero data loss and low latency. Monitor your pipeline health with an intuitive dashboard that reveals everything from pipeline failures to real-time data flow. 

Sign up here for a 14-day free trial!

Syntax Parameters in PostgreSQL VACUUM Command

FULL: The FULL request selects a “full” vacuum. The request takes a longer processing time and locks the table. One vivacious is that it can free up more space. The request creates new duplicates of the table queried and do not delete the old one until the operation is finished. This approach uses more disc space, limiting its use only when a substantial quantity of table space has to be freed up.

FREEZE: The FREEZE request is for aggressive freezing of tuple. Whenever you specify FREEZE, it is equivalent to performing VACUUM with the parameters set to zero for vacuum_freeze_min_age and vacuum_freeze_table_age. The FREEZE request should only be performed whenever you need to rewrite the table — making it counterintuitive/redundant if the user states a FULL request. 

VERBOSE: The VERBOSE request prints a thorough report on the vacuum action for every table.

ANALYZE: The ANALYZE request updates the facts/statistics that the planner uses to decide the best and the most efficient way to carry out a query.

Why Run PostgreSQL Vacuum Command Periodically?

PostgreSQL recommends Database Administrators to assist with routine database checks to maintain a healthy data repository. The reasons are straightforward, make sense to adhere to religiously.

In short, a clear and cutthroat vacuuming strategy should be in place based on the varying needs of sites.

From Best Practice to Next Practice - SAP Signavio
A clear strategy works wonders

Post the PostgreSQL 7.2 release, the VACUUM run has been made easy, non-intrusive, to be precise. Now, you can execute the command in parallel with normal database operations – such as selects, inserts, updates, deletes, and so forth. Similarly, to further streamline the process, in the PostgreSQL 8.0 version, an automated mechanism has been incorporated. 

Note: We will discuss the automation process in detail later in the article.

In general, there are three primary motivations for running the VACUUM command periodically. Let’s explain and talk about them in brief.

Retrieve Storage Space

As we mentioned earlier, when applied on the Postgres vacuum full example,  the UPDATE or DELETE operations in PostgreSQL  do not immediately remove the old version of the row. Because, for multi-version concurrency control, this approach is essential. In cases where outdated versions of rows are no longer needed, and to retrieve the storage space or “avoid infinite growth of disk space,” the VACUUM Command is employed.

TIP: For frequently updated or deleted tables that demand regular vacuuming than rarely updated tables, setting up periodic/recurring cron tasks that vacuum specific tables and skip tables that are known not to update frequently can be effective.

Update Data Statistics

For good planning, PostgreSQL relies on statistical information. This information is based on the table’s contents and is gathered by the ANALYZE command. That said, accurate statistics data is good for getting the optimum database performance level.

It is the best practice to plan a database-wide ANALYZE once daily during low activity, paired with a nightly VACUUM. However, sites with table data that fluctuate slowly might discover that this routine is excessive.

TIP: Check for the minimum and maximum values of the columns in the table change. For instance, a table containing the timestamp column might require frequent statistics updates. On the other hand, a table containing phone numbers or email addresses might not.

Avoid Transaction ID Wraparound Failures

Running a routine maintenance activity is required because of how PostgreSQL implements transaction IDs. This will be done automatically by Postgres (through AUTOVACUUM), and it won’t be observed in most cases.

TIP: Do NOT turn off the AUTOVACUUM, start it manually during low traffic periods, or change the schedule, so it runs less frequently. The consequences of not running AUTOVACUUM can be ruinous — and as PostgreSQL documentation puts it, “In short, catastrophic data loss.”

PostgreSQL VACUUM Command Example Use Cases

Reclaim Storage Space

The given-below query recovers storage space from dead tuples so that the same table can utilize the empty fields.

VACUUM;

The query frees up the space within each table. Post running the query, storage space in the PostgreSQL server will be available.

Note: The operating system does not reclaim the space; the table from which the space was allocated does. You will not observe any effect on the size of the PostgreSQL database file.

Reclaim Storage Space & Minimize Database File

The given-below query will free up the storage space by vacuuming all the tables and will return the unused space to the OS.

VACUUM FULL;

Note: This query will require an exclusive lock on each table as it would rewrite all tables into a new file. The OS will be reclaiming all the free space, resulting in the reduction of the database file.

Reclaim Space on a Table

The given-below query will free up the storage space on a table. Not of the entire database.

VACUUM products;

Note: This query will VACUUM the emails table only. The fields inside the emails table will be available to be used inside the emails table exclusively. The database’s size will not decrease. Use the FULL option to reallocate the empty space back to the OS.

Multiple Capabilities, Reliable Architecture — Firefighting No More!

Providing a high-quality ETL solution can be a difficult task if you have a large volume of data. Hevo’s automated, No-code platform empowers you with everything you need to have for a smooth data replication experience.

Check out what makes Hevo the best in its class:

  • Monitoring and Observability: Monitor pipeline health with intuitive dashboards that reveal every stat of pipeline and data flow. Bring real-time visibility into your ETL with Alerts and Activity Logs.
  • In-built Transformations: Format your data on the fly with Hevo’s Drag and Drop Pre Load Transformations. Generate analysis-ready data in your warehouse using Hevo’s Post Load Transformation.
  • Auto Schema Management: Correcting improper schema after the data is loaded into your warehouse is devilishly painful. Hevo automatically maps your source schema with your destination so that you don’t face the pain of schema errors.
  • 24*7 Customer Support: Within our product, “Live Chat” Customer support is up round the clock to unblock you with any pipeline failures.
Sign up here for a 14-day free trial!

Let’s Automate: Vacuuming Through AUTOVACUUM Daemon

Now, I do not know if it is just me, but removing dead tuples manually sounds a hell of a lot boring. So, let us talk about a better way.

You can use the optional AUTOVACUUM Daemon to automatically VACUUM the database rather than manually running the vacuum query. In PostgreSQL, the default setup has the Daemon activated.

Autovacuum is enabled by default in PostgreSQL and is a routine vacuum maintenance process. It can be configured using various parameters in postgresql.conf.file. To configure how the vacuum process runs, PostgreSQL has several settings. Some of these are listed below: 

  • autovacuum: It enables or disables the autovacuum background process. By default, autovacuum is enabled.
  • autovacuum_vacuum_threshold: This determines the minimum number of dead rows that must be present in a table before it is vacuumed. The default value is 50.
  • autovacuum_analyze_threshold: This setting determines the minimum number of live rows that must be present in a table before it is analyzed. The default value is 50.
  • autovacuum_vacuum_scale_factor: This is a multiplier that determines how many dead rows are needed to trigger a vacuum based on the table size. The default value is 0.2.
  • autovacuum_analyze_scale_factor: This setting is a multiplier that determines how many live rows are needed to trigger an analysis based on the size of the table. The default value is 0.1.
  • autovacuum_vacuum_cost_delay: It determines the time (in milliseconds) the autovacuum will wait before starting a vacuum operation. The default value is 20.
  • autovacuum_vacuum_cost_limit: This setting determines the maximum number of rows that can be vacuumed in a single vacuum operation. The default value is 200.

You can check whether the AUTOVACUUM is on or off by running the query given below:

SELECT *
FROM pg_settings 
WHERE name LIKE 'autovacuum%'

The default settings can be found in the PostgreSQL.conf file to control when and how the AUTOVACUUM Daemon runs. Below are some of the settings examples.

#------------------------------------------------------------------------------
# AUTOVACUUM PARAMETERS
#------------------------------------------------------------------------------

autovacuum = on                        # Enable autovacuum subprocess?  'on'
                                       # requires track_counts to also be on.

log_autovacuum_min_duration = -1       # -1 disables, 0 logs all actions and
                                       # their durations, > 0 logs only
                                       # actions running at least this number
                                       # of milliseconds.

autovacuum_max_workers = 3             # max number of autovacuum subprocesses
                                       # (change requires restart)

autovacuum_naptime = 1min              # time between autovacuum runs

autovacuum_vacuum_threshold = 50       # min number of row updates before vacuum

autovacuum_analyze_threshold = 50      # min number of row updates before analyze

autovacuum_vacuum_scale_factor = 0.2   # fraction of table size before vacuum

autovacuum_analyze_scale_factor = 0.1  # fraction of table size before analyze

autovacuum_freeze_max_age = 200000000  # maximum XID age before forced vacuum
                                       # (change requires restart)

autovacuum_multixact_freeze_max_age = 400000000
                                       # maximum multixact age
                                       # before forced vacuum
                                       # (change requires restart)

autovacuum_vacuum_cost_delay = 20ms    # default vacuum cost delay for
                                       # autovacuum, in milliseconds;
                                       # -1 means use vacuum_cost_delay

autovacuum_vacuum_cost_limit = -1      # default vacuum cost limit for
                                       # autovacuum, -1 means use
                                       # vacuum_cost_limit

To disable AUTOVACUUM on a table, run the following query.

ALTER TABLE table_name SET (autovacuum_enabled = false);

Conclusion

In this article, we have gone through the importance of PostgreSQL VACUUM Command and PostgreSQL VACUUM database example for periodic database maintenance.

Database administrators can leverage this article to understand the command better and use it to delete dead tuples to free up the space inside the table, the database server, or to shift the space to the OS’s storage itself.

To learn about other PostgreSQL Commands, you can refer to some of the blogs below, handpicked by our editorial team.

  1. PostgreSQL Materialized Views: Syntax & Query Examples| A 101 Guide
  2. PostgreSQL DISTINCT Clause: Syntax & Query Examples | A 101 Guide
  3. Understanding PostgreSQL Triggers: A Comprehensive 101 Guide

That said, today, data folks are in a constant struggle. They want a data infrastructure that is functional, compliant, high performing, and scalable. And, if you are one of those for whom the sanity across the organization-wide data matters a lot — Hevo is just the right fit!

Visit our Website to Explore Hevo

Hevo lets you replicate your data from your PostgreSQL database to any Data Warehouse of your choice like Amazon Redshift, Snowflake, Google BigQuery, or Firebolt within minutes with just a few clicks.

Give Hevo a try. Sign Up here for a 14-day full feature access trial and experience the feature-rich Hevo suite firsthand. You can also check our transparent pricing and make a decision on your best-suited plan. 

Share your thoughts on learning about PostgreSQL VACUUM Command in the comments section below. If you have any questions, do let us know.

Yash Arora
Former Content Manager, Hevo Data

Yash is a Content Marketing professinal with experience in data-driven marketing campaigns. He has expertise in strategic thinking, integrated marketing, and customer acquisition. She has driven growth for startups and established brands through comprehensive marketing communications, and digital strategies.

No-code Data Pipeline for PostgreSQL