As huge amounts of data are stored in databases, performance and scaling are affected. Partitioning helps as it can divide large tables into smaller tables. Hence reducing memory swap problems and table scans, ultimately increasing the performance. Huge datasets are divided into smaller partitions that are more accessible and manageable.
PostgreSQL is robust and reliable and is used as the primary data warehouse for many applications. This article will introduce you to PostgreSQL, its key features, and PostgreSQL partitions. It will explain the different types of partitions with syntax and examples. It will also dive into the advantages and limitations of PostgreSQL Partitions.
What is PostgreSQL Partition?
Partitioning refers to splitting one large table into smaller physical pieces that can be stored in different storage media based on its use. Each part has its characteristics and name. Partitioning helps in increasing the database server performance as the number of rows that need to be read, processed, and returned is significantly lesser. You can also use PostgreSQL partitions to divide indexes and indexed tables.
There are mainly two types of PostgreSQL Partitions: Vertical Partitioning and Horizontal Partitioning. In vertical partitioning, we divide column-wise and in horizontal partitioning, we divide row-wise.
Horizontal Partitioning involves putting different rows into different tables. For example, you store details of students who are over the age of 18 in one partition and below 18 in another. A view with the union can be created over the two partitions to display all the students.
Vertical Partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized.
Looking for a solution to replicate data from PostgreSQL? Hevo Data, a No-code Data Pipeline, helps load data from any data source such as Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services, and simplifies the ETL process. It supports 150+ Data Sources such as PostgreSQL, including 40+ Free Sources.
Simplify your Data Analysis with Hevo today!
SIGN UP HERE FOR A 14-DAY FREE TRIAL!
Understanding PostgreSQL Partitions
How to Create a Partition Table
First, you need to use CREATE TABLE and specify the partition key and partition type.
Then create each partition using CREATE TABLE while specifying each partition method.
You can use the following code snippet to create the main table:
CREATE TABLE main_table_name (
column_1 data type,
column_2 data type,
.
.
. ) PARTITION BY RANGE (column_2);
You can use the following code snippet to create the Partition Table:
CREATE TABLE partition_name
PARTITION OF main_table_name FOR VALUES FROM (start_value) TO (end_value);
Types of PostgreSQL Partitions
PostgreSQL Partition: List Partition
In List partitions, data is partitioned based on discrete values that have been specified. When you need to group discrete data, such as regions and departments, with arbitrary values, this method works well. For example, Job title, Split by region, etc.
Consider a scenario where you are using a table that manages the sales of each branch and creating a list partition that divides the table based on region.
Creating the Partition Table
mydb=# CREATE TABLE sales_region (id int, amount int, branch text, region text)
mydb-# PARTITION BY LIST (region);
Creating the Partitions
mydb=# CREATE TABLE London PARTITION OF sales_region FOR VALUES IN ('London');
mydb=# CREATE TABLE Sydney PARTITION OF sales_region FOR VALUES IN ('Sydney');
mydb=# CREATE TABLE Boston PARTITION OF sales_region FOR VALUES IN ('Boston');
Adding Data to the Partition Table
mydb=# COPY sales_region FROM '/home/tmp/listpart_1.sql';
Retrieving Data From the Partition Table
mydb=# SELECT * FROM sales_region;
id | amount | branch | region
-----+--------+------------+--------
136 | 150 | Kings Rd | London
147 | 10 | Regent St | London
245 | 100 | College St | Sydney
278 | 50 | George St | Sydney
561 | 100 | Charles St | Boston
537 | 5 | Ann St | Boston
510 | 10 | Park Dr | Boston
(7 rows)
Retrieve Data From Partition
mydb=# SELECT * FROM Boston;
id | amount | branch | region
-----+--------+------------+--------
561 | 100 | Charles St | Boston
537 | 5 | Ann St | Boston
510 | 10 | Park Dr | Boston
(3 rows)
PostgreSQL Partition: Range Partition
For Range partitions, data is divided into segments based on the chosen range. When you need to access time-series data, supplying a date, such as the year and month, is useful. For example, Admission date, splitting by sale date, etc.
Creating the Partition Table
mydb=# CREATE TABLE sales (id int, p_name text, amount int, sale_date date)
mydb-# PARTITION BY RANGE (sale_date);
Creating the Partitions
mydb=# CREATE TABLE sales_2019_Q4 PARTITION OF sales FOR VALUES FROM ('2019-10-01') TO ('2020-01-01');
mydb=# CREATE TABLE sales_2020_Q1 PARTITION OF sales FOR VALUES FROM ('2020-01-01') TO ('2020-04-01');
mydb=# CREATE TABLE sales_2020_Q2 PARTITION OF sales FOR VALUES FROM ('2020-04-01') TO ('2020-07-01');
Adding Data to the Partition Table
mydb=# INSERT INTO sales VALUES (1,'prod_A',100,'2020-06-01');
mydb=# INSERT INTO sales VALUES (8,'prod_B', 5,'2020-03-02');
Retrieving Data From the Partition Table
mydb=# SELECT * sales;
id | p_name | amount | sale_date
----+--------+--------+------------
1 | prod_A | 100 | 2020-06-01
8 | prod_B | 5 | 2020-03-02
2 | prod_F | 15 | 2020-03-02
3 | prod_B | 5 | 2020-01-15
4 | prod_C | 10 | 2020-02-11
6 | prod_F | 10 | 2020-01-05
10 | prod_E | 5 | 2020-02-10
5 | prod_E | 10 | 2020-05-15
7 | prod_D | 10 | 2020-04-11
9 | prod_C | 15 | 2020-04-30
(10 rows)
Retrieving Data From a Partition
mydb=# SELECT * FROM sales_2020_Q2;
id | p_name | amount | sale_date
----+---------+--------+------------
5 | prod_E | 10 | 2020-05-15
7 | prod_D | 10 | 2020-04-11
9 | prod_C | 15 | 2020-04-30
(3 rows)
PostgreSQL Partition: Hash Partition
Each partition’s data is partitioned by supplying a modulus and a remainder. Each partition will contain the rows for which the modulus divided by the hash value of the partition key yields the given remainder. When you wish to avoid access concentration to a single table by distributing data almost evenly, this method works well.
For example, you divide it into three sections (n is the hash value created from the value in the partition key)
n % 3 = 0 → Allocates to partition 1
n % 3 = 1 → Allocates to partition 2
n% 3 = 2 → Allocates to partition 3
Creating the Partition Table
mydb=# CREATE TABLE emp (emp_id int, emp_name text, dep_code int)
mydb-# PARTITION BY HASH (emp_id);
Creating the Partitions
mydb=# CREATE TABLE emp_0 PARTITION OF emp FOR VALUES WITH (MODULUS 3,REMAINDER 0);
mydb=# CREATE TABLE emp_1 PARTITION OF emp FOR VALUES WITH (MODULUS 3,REMAINDER 1);
mydb=# CREATE TABLE emp_2 PARTITION OF emp FOR VALUES WITH (MODULUS 3,REMAINDER 2);
Adding Data to the Partition Table
mydb=# INSERT INTO emp SELECT num,
mydb-# 'user_' || num , (RANDOM()*50)::INTEGER FROM generate_series(1,1000) AS num;
Check the Number of Rows in the Partition Table and Each Partition
mydb=# SELECT relname,reltuples as rows FROM pg_class
mydb-# WHERE relname IN ('emp','emp_0','emp_1','emp_2')
mydb-# ORDER BY relname;
relname | rows
---------+--------
emp | 0
emp_0 | 324 A
emp_1 | 333
emp_2 | 343
(4 rows)
PostgreSQL Partition: Composite Partition
To construct a more complex PostgreSQL partition layout, you can establish multiple partitions under a partition, as seen below. A Composite Partition, is sometimes known as a subpartition.
For example, a range partition separated by month and a list partition divided by product category can be created for the product sales database (partition table). Because you may narrow down the partitions to be searched, access performance is increased, such as when you want to get sales data for a specific product category in a specified period of time (for example, sales data for product category ‘ghi’ in July).
You can increase the number of range partitions and list partitions by specifying a new range and value for the partition key. However, Hash Partitions cannot be added in the same way because the number of partitions is determined by the division and the calculation of the specified remainder. Take a look at an example of repartitioning a hash partition by dividing and updating the values that are used too much. Here, the hash partition created above is used as an example.
You have to specify the multiple numbers of the current value set for division calculation. The example of changing the hash partition from 3 partitions to 6 partitions (a multiple of 3) is shown below:
Based on the above example, you can see how to divide into 6 partitions.
You can assume a partition table contains 1 million rows, and they are split into the partitions as follows.
mydb=# SELECT relname, reltuples as rows FROM pg_class
mydb-# WHERE relname IN ('emp','emp_0','emp_1','emp_2')
mydb-# ORDER BY relname;
relname | rows
---------+--------
emp | 0
emp_0 | 333263 A
emp_1 | 333497
emp_2 | 333240
(4 rows)
Detaching the Partitions From the Partition Table
mydb=# ALTER TABLE emp DETACH PARTITION emp_0;
mydb=# ALTER TABLE emp DETACH PARTITION emp_1;
mydb=# ALTER TABLE emp DETACH PARTITION emp_2;
Renaming the Partitions
mydb=# ALTER TABLE emp_0 RENAME TO emp_0_bkp;
mydb=# ALTER TABLE emp_1 RENAME TO emp_1_bkp;
mydb=# ALTER TABLE emp_2 RENAME TO emp_2_bkp;
Creating Six Partitions
mydb=# CREATE TABLE emp_0 PARTITION OF emp FOR VALUES WITH (MODULUS 6, REMAINDER 0);
mydb=# CREATE TABLE emp_1 PARTITION OF emp FOR VALUES WITH (MODULUS 6, REMAINDER 1);
mydb=# CREATE TABLE emp_2 PARTITION OF emp FOR VALUES WITH (MODULUS 6, REMAINDER 2);
mydb=# CREATE TABLE emp_3 PARTITION OF emp FOR VALUES WITH (MODULUS 6, REMAINDER 3);
mydb=# CREATE TABLE emp_4 PARTITION OF emp FOR VALUES WITH (MODULUS 6, REMAINDER 4);
mydb=# CREATE TABLE emp_5 PARTITION OF emp FOR VALUES WITH (MODULUS 6, REMAINDER 5);
Restoring Data From the Detached Partitions
mydb=# INSERT INTO emp SELECT * FROM emp_0_bkp;
mydb=# INSERT INTO emp SELECT * FROM emp_1_bkp;
mydb=# INSERT INTO emp SELECT * FROM emp_2_bkp;
Deleting Detached Partitions
mydb=# DROP TABLE emp_0_bkp;
mydb=# DROP TABLE emp_1_bkp;
mydb=# DROP TABLE emp_2_bkp;
Displaying the Divided Data as Six Partitions
mydb=# SELECT relname,reltuples as rows FROM pg_class
mydb-# WHERE relname IN ('emp','emp_0','emp_1','emp_2','emp_3','emp_4','emp_5')
mydb-# ORDER BY relname;
relname | rows
---------+---------
emp | 0
emp_0 | 166480 A
emp_1 | 166904
emp_2 | 166302
emp_3 | 166783
emp_4 | 166593
emp_5 | 166938
(7 rows)
Performing the above steps on a huge dataset may take time, so you can individually perform these steps for each partition.
When to Use and Not Use PostgreSQL Partitions?
Most of the benefits of partitioning can be achieved if a single table cannot provide them. Therefore, if you want to write a large amount of data to one table at any given time, you can say that you need partitioning. Apart from the data, there may be other factors that users should consider, such as significant performance gains from partitioning and the ability to scale PostgreSQL to larger datasets.
Here are some suggestions for when to partition a table:
- Tables bigger than 2GB should be considered.
- Tables containing historical data, and new data are added only to a new partition. For example, a table in which only the current month’s data has to be updated and the other 11 months are read-only.
When should You not Use Partitioning?
- When you want to ingest data at better rates and the tables are small. Also, if there is no chance that the tables will grow faster, partitioning will only add complexities without any advantage.
- When you don’t have a strong grip on PostgreSQL. Partitioning comes with its own operational and architectural complexity.
- When your queries access data uniformly across the table other than focusing on specific subsets (like ranges of data), then your query performance can get even worse.
- When most of your queries don’t use your partition key in the WHERE clause, scanning every partition every time will get slower as you have more partitions.
Advantages of PostgreSQL Partitions
Partitioning can have several benefits:
- Query performance is significantly higher compared to selecting from a single large table.
- It is very flexible and gives its users good control. Users can create partitions of any level according to their needs and use constraints, triggers, and indexes for each partition individually or all partitions together.
- Rarely-used data can be moved to media storage that is cheaper or slower.
- Bulk loads and data deletion can be done much faster, as these operations can be performed on individual partitions based on user requirements.
Limitations of PostgreSQL Partitions
Here are a few limitations of PostgreSQL Partitions:
- There is no option for automatically creating matching indexes on all partitions. Separate commands must be used to add indexes to each partition. This also means that there is no way to build a primary key, a unique constraint, or an exclusion constraint that spans all partitions; instead, each leaf partition must be constrained separately.
- Foreign keys referencing partitioned tables, as well as foreign key references from a partitioned table to another table, are not supported because primary keys are not supported on partitioned tables.
- It is not possible to mix temporary and permanent relations in the same partition tree. As a result, if the partitioned table is permanent, its partitions must be permanent as well, and vice versa if the partitioned table is temporary. All members of the partition tree must be from the same session when using temporary relations.
- Row triggers must be defined on individual partitions and not in the partitioned table.
- Range partitions do not accept NULL values.
Before we wrap up, take a look at some of the best practices as well.
Best Practices for Data Partitioning
- Select an optimum partition size: Having too many will increase planning time and could negatively affect query time, and you won’t be able to use ranges to exclude data.
- Keep a uniform partition size: This is required to keep the maintenance tasks and query performances consistent across partitions.
- Choose the right key that aligns with your query patterns: Suppose your queries are filtered by date. You need to use a timestamp or date column as the partitioning key.
- Ensure that you create partitions for upcoming periods in advance: This is required to ensure that there is no interruption in data ingestion. The issues arises when you want to include isolated records. Though a default partition can help with this, it’s difficult to maintain and wouldn’t perform well.
- Maintain old partitions through data retention policies: Schedule regular tasks to drop or archive old partitions having data that has only limited use when you partition by time.
- Optimize query performance by analyzing and understanding the query execution plan to confirm that you scan only important partitions.
- Place partitions carefully across all storage mediums.
Bonus section: PostgreSQL pg_partman Extension
- PostgreSQL pg_partman extension helps to automate the creation and maintenance of table partitions. This saves you the effort of manually creating each partition.
- Create a PostgreSQL partitioned table -> Call create_parent function to register it with pg_partman -> generate the required partitions tailored to the parameters you provided to the function.
- To automatically manage partitions, use run_maintenance_proc function on a scheduled basis.
- Schedule this function to run periodically for creating partitions whenever required. It is also possible to automatically drop partitions.
Conclusion
In a nutshell, partitioning is a method used in relational databases to break down large tables into smaller partitions. This helps in executing queries on a large table faster. In this article, you learned the 4 types of PostgreSQL partition and how to use them. The exact point at which a table benefits from partitioning is determined by the application, but a good rule of thumb is that the table’s size should exceed the database server’s physical memory.
However, as a Developer, extracting complex data from a diverse set of data sources like Databases, CRMs, Project management Tools, Streaming Services, Marketing Platforms to your PostgreSQL Database can seem to be quite challenging. If you are from non-technical background or new in the game of data warehouse and analytics, Hevo Data can help!
Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.
Visit our Website to Explore Hevo
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.
You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!
Reference: Amazon RDS user guide
Sharon is a data science enthusiast with a hands-on approach to data integration and infrastructure. She leverages her technical background in computer science and her experience as a Marketing Content Analyst at Hevo Data to create informative content that bridges the gap between technical concepts and practical applications. Sharon's passion lies in using data to solve real-world problems and empower others with data literacy.