Imagine a multi-user environment in PostgreSQL, having a Table X with millions and billions of rows, edited frequently with new updates and deletions.
In normal PostgreSQL operations, until and unless the rows are not physically removed, they remain untenanted, thus, blocking up the space inside the table.
PostgreSQL characterizes these rows as “Dead Tuples.”
As a result, Table X’s size will keep on increasing. And dead tuples will keep on occupying the space for centuries to come. Moreover, these particular rows will remain inoperable with no reliances attached whatsoever.
Now the need arises for a healthy Database Maintainance routine to maintain the usability and efficiency of data stored in the Postgres server. And to do so is the responsibility of a Database Administrator.
So, does there exist a way to remove dead tuples altogether?
To facilitate the process of removing dead tuples, we use the PostgreSQL VACUUM Command.
In this tutorial article, we will learn how to implement the PostgreSQL VACUUM Command skillfully. So, keep on reading to get to the know-how.
Table of Contents
PostgreSQL VACUUM Command — At a Glance
The PostgreSQL VACUUM Command reclaims space occupied by dead tuples from tables and their indexes. The command will not empty up the disk space in the OS but instead makes it usable for new rows.
A simple PostgreSQL VACUUM Command is shown below:
VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ table_and_columns [, ...] ]
Few pointers to understand the query better:
- If you do not use a table_and_columns list, the command will process every table and materialized view from the current database.
- The PostgreSQL Command only processes a request for the database the user has permission to vacuum.
- Always ensure you have adequate system storage available before running the PostgreSQL VACUUM command.
- The VACUUM ANALYZE function, first, will process a VACUUM request, then an ANALYZE request for each of the selected tables.
- A simple VACUUM request will only reclaim storage, make it available for re-use.
- You can leverage the parallel vacuum feature to use multiple CPUs to process multiple indexes.
- The VACUUM FULL request will rewrite the entire content into a new disk file with no extra space. Post-run, the storage space will return to the operating system.
TIP: The PostgreSQL Vacuum Command requires a lot of CPU & I/O compute memory; hence, it is advised to use the command ONLY when the application activity is low.
Set up pipelines for over 150+ sources including SaaS applications and Databases into 15+ Destinations. Get rid of manual maintenance each time the source data or API changes.
With Hevo you get a fault-tolerant architecture that scales with zero data loss and low latency. Monitor your pipeline health with an intuitive dashboard that reveals everything from pipeline failures to real-time data flow.
No more sleepless nights worrying about broken pipelines. Discover peace with Hevo’s Reliable Data Pipelines.
Sign up here for a 14-day free trial!
Try Hevo’s 14-day Free Trial for scaling your data strategy with Ease.
Syntax Parameters in PostgreSQL VACUUM Command
FULL: The FULL request selects a “full” vacuum. The request takes a longer processing time and locks the table. One vivacious is that it can free up more space. The request creates new duplicates of the table queried and do not delete the old one until the operation is finished. This approach uses more disc space, limiting its use only when a substantial quantity of table space has to be freed up.
FREEZE: The FREEZE request is for aggressive freezing of tuple. Whenever you specify FREEZE, it is equivalent to performing VACUUM with the parameters set to zero for vacuum_freeze_min_age and vacuum_freeze_table_age. The FREEZE request should only be performed whenever you need to rewrite the table — making it counterintuitive/redundant if the user states a FULL request.
VERBOSE: The VERBOSE request prints a thorough report on the vacuum action for every table.
ANALYZE: The ANALYZE request updates the facts/statistics that the planner uses to decide the best and the most efficient way to carry out a query.
Why Run PostgreSQL Vacuum Command Periodically?
PostgreSQL recommends Database Administrators to assist with routine database checks to maintain a healthy data repository. The reasons are straightforward, make sense to adhere to religiously.
In short, a clear and cutthroat vacuuming strategy should be in place based on the varying needs of sites.
Post the PostgreSQL 7.2 release, the VACUUM run has been made easy, non-intrusive, to be precise. Now, you can execute the command in parallel with normal database operations – such as selects, inserts, updates, deletes, and so forth. Similarly, to further streamline the process, in the PostgreSQL 8.0 version, an automated mechanism has been incorporated.
Note: We will discuss the automation process in detail later in the article.
In general, there are three primary motivations for running the VACUUM command periodically. Let’s explain and talk about them in brief.
Retrieve Storage Space
As we mentioned earlier, the UPDATE or DELETE operations in PostgreSQL do not immediately remove the old version of the row. Because, for multi-version concurrency control, this approach is essential. In cases where outdated versions of rows are no longer needed, and to retrieve the storage space or “avoid infinite growth of disk space,” the VACUUM Command is employed.
TIP: For frequently updated or deleted tables that demand regular vacuuming than rarely updated tables, setting up periodic/recurring cron tasks that vacuum specific tables and skip tables that are known not to update frequently can be effective.
Update Data Statistics
For good planning, PostgreSQL relies on statistical information. This information is based on the table’s contents and is gathered by the ANALYZE command. That said, accurate statistics data is good for getting the optimum database performance level.
It is the best practice to plan a database-wide ANALYZE once daily during low activity, paired with a nightly VACUUM. However, sites with table data that fluctuate slowly might discover that this routine is excessive.
TIP: Check for the minimum and maximum values of the columns in the table change. For instance, a table containing the timestamp column might require frequent statistics updates. On the other hand, a table containing phone numbers or email addresses might not.
Avoid Transaction ID Wraparound Failures
Running a routine maintenance activity is required because of how PostgreSQL implements transaction IDs. This will be done automatically by Postgres (through AUTOVACUUM), and it won’t be observed in most cases.
TIP: Do NOT turn off the AUTOVACUUM, start it manually during low traffic periods, or change the schedule, so it runs less frequently. The consequences of not running AUTOVACUUM can be ruinous — and as PostgreSQL documentation puts it, “In short, catastrophic data loss.”
PostgreSQL VACUUM Command Example Use Cases
Reclaim Storage Space
The given-below query recovers storage space from dead tuples so that the same table can utilize the empty fields.
The query frees up the space within each table. Post running the query, storage space in the PostgreSQL server will be available.
Note: The operating system does not reclaim the space; the table from which the space was allocated does. You will not observe any effect on the size of the PostgreSQL database file.
Reclaim Storage Space & Minimize Database File
The given-below query will free up the storage space by vacuuming all the tables and will return the unused space to the OS.
Note: This query will require an exclusive lock on each table as it would rewrite all tables into a new file. The OS will be reclaiming all the free space, resulting in the reduction of the database file.
Reclaim Space on a Table
The given-below query will free up the storage space on a table. Not of the entire database.
Note: This query will VACUUM the emails table only. The fields inside the emails table will be available to be used inside the emails table exclusively. The database’s size will not decrease. Use the FULL option to reallocate the empty space back to the OS.
Providing a high-quality ETL solution can be a difficult task if you have a large volume of data. Hevo’s automated, No-code platform empowers you with everything you need to have for a smooth data replication experience.
Check out what makes Hevo the best in its class:
Sign up here for a 14-day free trial!
- Monitoring and Observability: Monitor pipeline health with intuitive dashboards that reveal every stat of pipeline and data flow. Bring real-time visibility into your ETL with Alerts and Activity Logs.
- In-built Transformations: Format your data on the fly with Hevo’s Drag and Drop Pre Load Transformations. Generate analysis-ready data in your warehouse using Hevo’s Post Load Transformation.
- Auto Schema Management: Correcting improper schema after the data is loaded into your warehouse is devilishly painful. Hevo automatically maps your source schema with your destination so that you don’t face the pain of schema errors.
- 24*7 Customer Support: Within our product, “Live Chat” Customer support is up round the clock to unblock you with any pipeline failures.
Let’s Automate: Vacuuming Through AUTOVACUUM Daemon
Now, I do not know if it is just me, but removing dead tuples manually sounds a hell of a lot boring. So, let us talk about a better way.
You can use the optional AUTOVACUUM Daemon to automatically VACUUM the database rather than manually running the vacuum query. In PostgreSQL, the default setup has the Daemon activated.
You can check whether the AUTOVACUUM is on or off by running the query given below:
WHERE name LIKE 'autovacuum%'
The default settings can be found in the PostgreSQL.conf file to control when and how the AUTOVACUUM Daemon runs. Below are some of the settings examples.
# AUTOVACUUM PARAMETERS
autovacuum = on # Enable autovacuum subprocess? 'on'
# requires track_counts to also be on.
log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and
# their durations, > 0 logs only
# actions running at least this number
# of milliseconds.
autovacuum_max_workers = 3 # max number of autovacuum subprocesses
# (change requires restart)
autovacuum_naptime = 1min # time between autovacuum runs
autovacuum_vacuum_threshold = 50 # min number of row updates before vacuum
autovacuum_analyze_threshold = 50 # min number of row updates before analyze
autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum
autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze
autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum
# (change requires restart)
autovacuum_multixact_freeze_max_age = 400000000
# maximum multixact age
# before forced vacuum
# (change requires restart)
autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for
# autovacuum, in milliseconds;
# -1 means use vacuum_cost_delay
autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for
# autovacuum, -1 means use
To disable AUTOVACUUM on a table, run the following query.
ALTER TABLE table_name SET (autovacuum_enabled = false);
In this article, we have gone through the importance of PostgreSQL VACUUM Command and PostgreSQL VACUUM database example for periodic database maintenance.
Database administrators can leverage this article to understand the command better and use it to delete dead tuples to free up the space inside the table, the database server, or to shift the space to the OS’s storage itself.
To learn about other PostgreSQL Commands, you can refer to some of the blogs below, handpicked by our editorial team.
- PostgreSQL Materialized Views: Syntax & Query Examples| A 101 Guide
- PostgreSQL DISTINCT Clause: Syntax & Query Examples | A 101 Guide
- Understanding PostgreSQL Triggers: A Comprehensive 101 Guide
That said, today, data folks are in a constant struggle. They want a data infrastructure that is functional, compliant, high performing, and scalable. And, if you are one of those for whom the sanity across the organization-wide data matters a lot — Hevo is just the right fit!
Hevo Data with its strong integration with 150+ Data Sources such as PostgreSQL, MySQL, and MS SQL Server, allows you to export data from sources & load data to the destinations of your choice. It transforms & enriches your data, making it analysis-ready so that you can focus only on your key business needs and perform insightful analysis using BI tools.
Visit our Website to Explore Hevo
Hevo lets you replicate your data from your PostgreSQL database to any Data Warehouse of your choice like Amazon Redshift, Snowflake, Google BigQuery, or Firebolt within minutes with just a few clicks.
Give Hevo a try. Sign Up here for a 14-day full feature access trial and experience the feature-rich Hevo suite firsthand. You can also check our transparent pricing and make a decision on your best-suited plan.
Share your thoughts on learning about PostgreSQL VACUUM Command in the comments section below. If you have any questions, do let us know.