Imagine a multi-user environment in PostgreSQL, having a Table X with millions and billions of rows, edited frequently with new updates and deletions. In normal PostgreSQL operations, until and unless the rows are not physically removed, they remain untenanted, thus, blocking up the space inside the table. PostgreSQL characterizes these rows as “Dead Tuples.” 

As a result, Table X’s size will keep on increasing. And dead tuples will keep on occupying the space for centuries to come. Moreover, these particular rows will remain inoperable with no reliances attached whatsoever. 

Now the need arises for a healthy Database Maintainance routine to maintain the usability and efficiency of data stored in the Postgres server. And to do so is the responsibility of a Database Administrator. To facilitate the process of removing dead tuples, we use the PostgreSQL VACUUM Command.

PostgreSQL VACUUM Command — At a Glance

The PostgreSQL VACUUM Command reclaims space occupied by dead tuples from tables and their indexes. The command will not empty up the disk space in the OS but instead makes it usable for new rows.

A simple PostgreSQL VACUUM Command is shown below:

VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ table_and_columns [, ...] ]
Simplify PostgreSQL Migration with Hevo!

Ditch the manual process of writing long commands to connect your PostgreSQL and choose Hevo’s no-code platform to streamline your data migration. 

With Hevo:

  1. Easily migrate different data types like CSV, JSON etc. 
  2. 150+ connectors like PostgreSQL and Google Sheets(including 60+ free sources).
  3. Eliminate the need of manual schema mapping with the auto-mapping feature.

Experience Hevo and see why 2000+ data professionals including customers, such as Thoughtspot, Postman, and many more, have rated us 4.3/5 on G2.

Get Started with Hevo for Free

Few pointers to understand the query better:

  1. If you do not use a table_and_columns list, the command will process every table and materialized view from the current database.
  2. The PostgreSQL Command only processes a request for the database the user has permission to vacuum.
  3. Always ensure you have adequate system storage available before running the PostgreSQL VACUUM command.
  4. The VACUUM ANALYZE function, first, will process a VACUUM request, then an ANALYZE request for each of the selected tables.
  5. A simple VACUUM request will only reclaim storage, make it available for re-use.
  6. You can leverage the parallel vacuum feature to use multiple CPUs to process multiple indexes.
  7. The VACUUM FULL request will rewrite the entire content into a new disk file with no extra space. Post-run, the storage space will return to the operating system.

TIP: The PostgreSQL Vacuum Command requires a lot of CPU & I/O compute memory; hence, it is advised to use the command ONLY when the application activity is low.

Syntax Parameters in PostgreSQL VACUUM Command

FULL: The FULL request selects a “full” vacuum. The request takes a longer processing time and locks the table. One vivacious is that it can free up more space. The request creates new duplicates of the table queried and do not delete the old one until the operation is finished. This approach uses more disc space, limiting its use only when a substantial quantity of table space has to be freed up.

FREEZE: The FREEZE request is for aggressive freezing of tuple. Whenever you specify FREEZE, it is equivalent to performing VACUUM with the parameters set to zero for vacuum_freeze_min_age and vacuum_freeze_table_age. The FREEZE request should only be performed whenever you need to rewrite the table — making it counterintuitive/redundant if the user states a FULL request. 

VERBOSE: The VERBOSE request prints a thorough report on the vacuum action for every table.

ANALYZE: The ANALYZE request updates the facts/statistics that the planner uses to decide the best and the most efficient way to carry out a query.

Why Run PostgreSQL Vacuum Command Periodically?

PostgreSQL recommends Database Administrators to assist with routine database checks to maintain a healthy data repository. The reasons are straightforward, make sense to adhere to religiously.

In short, a clear and cutthroat vacuuming strategy should be in place based on the varying needs of sites.

Post the PostgreSQL 7.2 release, the VACUUM run has been made easy, non-intrusive, to be precise. Now, you can execute the command in parallel with normal database operations – such as selects, inserts, updates, deletes, and so forth. Similarly, to further streamline the process, in the PostgreSQL 8.0 version, an automated mechanism has been incorporated. 

Note: We will discuss the automation process in detail later in the article.

In general, there are three primary motivations for running the VACUUM command periodically. Let’s explain and talk about them in brief.

Integrate PostgreSQL to Databricks
Integrate PostgreSQL to BigQuery
Integrate Amazon DocumentDB to PostgreSQL

1. Retrieve Storage Space

As we mentioned earlier, when applied on the Postgres vacuum full example,  the UPDATE or DELETE operations in PostgreSQL  do not immediately remove the old version of the row. Because, for multi-version concurrency control, this approach is essential. In cases where outdated versions of rows are no longer needed, and to retrieve the storage space or “avoid infinite growth of disk space,” the VACUUM Command is employed.

TIP: For frequently updated or deleted tables that demand regular vacuuming than rarely updated tables, setting up periodic/recurring cron tasks that vacuum specific tables and skip tables that are known not to update frequently can be effective.

2. Update Data Statistics

For good planning, PostgreSQL relies on statistical information. This information is based on the table’s contents and is gathered by the ANALYZE command. That said, accurate statistics data is good for getting the optimum database performance level.

It is the best practice to plan a database-wide ANALYZE once daily during low activity, paired with a nightly VACUUM. However, sites with table data that fluctuate slowly might discover that this routine is excessive.

TIP: Check for the minimum and maximum values of the columns in the table change. For instance, a table containing the timestamp column might require frequent statistics updates. On the other hand, a table containing phone numbers or email addresses might not.

3. Avoid Transaction ID Wraparound Failures

Running a routine maintenance activity is required because of how PostgreSQL implements transaction IDs. This will be done automatically by Postgres (through AUTOVACUUM), and it won’t be observed in most cases.

TIP: Do NOT turn off the AUTOVACUUM, start it manually during low traffic periods, or change the schedule, so it runs less frequently. The consequences of not running AUTOVACUUM can be ruinous — and as PostgreSQL documentation puts it, “In short, catastrophic data loss.”

PostgreSQL VACUUM Command Example Use Cases

1. Reclaim Storage Space

The given-below query recovers storage space from dead tuples so that the same table can utilize the empty fields.

VACUUM;

The query frees up the space within each table. Post running the query, storage space in the PostgreSQL server will be available.

Note: The operating system does not reclaim the space; the table from which the space was allocated does. You will not observe any effect on the size of the PostgreSQL database file.

2. Reclaim Storage Space & Minimize Database File

The given-below query will free up the storage space by vacuuming all the tables and will return the unused space to the OS.

VACUUM FULL;

Note: This query will require an exclusive lock on each table as it would rewrite all tables into a new file. The OS will be reclaiming all the free space, resulting in the reduction of the database file.

3. Reclaim Space on a Table

The given-below query will free up the storage space on a table. Not of the entire database.

VACUUM products;

Note: This query will VACUUM the emails table only. The fields inside the emails table will be available to be used inside the emails table exclusively. The database’s size will not decrease. Use the FULL option to reallocate the empty space back to the OS.

Let’s Automate: Vacuuming Through AUTOVACUUM Daemon

Now, I do not know if it is just me, but removing dead tuples manually sounds a hell of a lot boring. So, let us talk about a better way.

You can use the optional AUTOVACUUM Daemon to automatically VACUUM the database rather than manually running the vacuum query. In PostgreSQL, the default setup has the Daemon activated.

Autovacuum is enabled by default in PostgreSQL and is a routine vacuum maintenance process. It can be configured using various parameters in postgresql.conf.file. To configure how the vacuum process runs, PostgreSQL has several settings. Some of these are listed below: 

  • autovacuum: It enables or disables the autovacuum background process. By default, autovacuum is enabled.
  • autovacuum_vacuum_threshold: This determines the minimum number of dead rows that must be present in a table before it is vacuumed. The default value is 50.
  • autovacuum_analyze_threshold: This setting determines the minimum number of live rows that must be present in a table before it is analyzed. The default value is 50.
  • autovacuum_vacuum_scale_factor: This is a multiplier that determines how many dead rows are needed to trigger a vacuum based on the table size. The default value is 0.2.
  • autovacuum_analyze_scale_factor: This setting is a multiplier that determines how many live rows are needed to trigger an analysis based on the size of the table. The default value is 0.1.
  • autovacuum_vacuum_cost_delay: It determines the time (in milliseconds) the autovacuum will wait before starting a vacuum operation. The default value is 20.
  • autovacuum_vacuum_cost_limit: This setting determines the maximum number of rows that can be vacuumed in a single vacuum operation. The default value is 200.

You can check whether the AUTOVACUUM is on or off by running the query given below:

SELECT *
FROM pg_settings 
WHERE name LIKE 'autovacuum%'

The default settings can be found in the PostgreSQL.conf file to control when and how the AUTOVACUUM Daemon runs. Below are some of the settings examples.

#------------------------------------------------------------------------------
# AUTOVACUUM PARAMETERS
#------------------------------------------------------------------------------

autovacuum = on                        # Enable autovacuum subprocess?  'on'
                                       # requires track_counts to also be on.

log_autovacuum_min_duration = -1       # -1 disables, 0 logs all actions and
                                       # their durations, > 0 logs only
                                       # actions running at least this number
                                       # of milliseconds.

autovacuum_max_workers = 3             # max number of autovacuum subprocesses
                                       # (change requires restart)

autovacuum_naptime = 1min              # time between autovacuum runs

autovacuum_vacuum_threshold = 50       # min number of row updates before vacuum

autovacuum_analyze_threshold = 50      # min number of row updates before analyze

autovacuum_vacuum_scale_factor = 0.2   # fraction of table size before vacuum

autovacuum_analyze_scale_factor = 0.1  # fraction of table size before analyze

autovacuum_freeze_max_age = 200000000  # maximum XID age before forced vacuum
                                       # (change requires restart)

autovacuum_multixact_freeze_max_age = 400000000
                                       # maximum multixact age
                                       # before forced vacuum
                                       # (change requires restart)

autovacuum_vacuum_cost_delay = 20ms    # default vacuum cost delay for
                                       # autovacuum, in milliseconds;
                                       # -1 means use vacuum_cost_delay

autovacuum_vacuum_cost_limit = -1      # default vacuum cost limit for
                                       # autovacuum, -1 means use
                                       # vacuum_cost_limit

To disable AUTOVACUUM on a table, run the following query.

ALTER TABLE table_name SET (autovacuum_enabled = false);

Conclusion

In this article, we have gone through the importance of PostgreSQL VACUUM Command and PostgreSQL VACUUM database example for periodic database maintenance.

Database administrators can leverage this article to understand the command better and use it to delete dead tuples to free up the space inside the table, the database server, or to shift the space to the OS’s storage itself.

To learn about other PostgreSQL Commands, you can refer to some of the blogs below, handpicked by our editorial team.

  1. PostgreSQL Materialized Views: Syntax & Query Examples| A 101 Guide
  2. PostgreSQL DISTINCT Clause: Syntax & Query Examples | A 101 Guide
  3. Understanding PostgreSQL Triggers: A Comprehensive 101 Guide

That said, today, data folks are in a constant struggle. They want a data infrastructure that is functional, compliant, high performing, and scalable. And, if you are one of those for whom the sanity across the organization-wide data matters a lot — Hevo is just the right fit!

Try a 14-day free trial and experience the feature-rich Hevo suite firsthand. Also, checkout our unbeatable pricing to choose the best plan for your organization.

Frequently Asked Questions

1. What does Postgres vacuum full do?

VACUUM FULL is a command in PostgreSQL used to reclaim storage space and optimize database performance. It performs a more intensive cleanup than the standard VACUUM command.

2. How to Vacuum in PostgreSQL?

VACUUM FULL my_table;

3. Is PostgreSQL Vacuum Safe?

Yes, VACUUM in PostgreSQL is generally safe to use.

Yash Arora
Content Manager, Hevo Data

Yash is a Content Marketing professional with over three years of experience in data-driven marketing campaigns. He has expertise in strategic thinking, integrated marketing, and customer acquisition. Through comprehensive marketing communications and innovative digital strategies, he has driven growth for startups and established brands.