PostgreSQL is a powerful object-relational database system. Having been in active development, for now, almost 30 years, PostgreSQL is known for its reliability, data integrity, performance, and robustness. PostgreSQL. Part of its popularity is due to the different use cases it can support using its Clauses.
In this article, you will gain information about PostgreSQL DISTINCT Clause. You will also gain a holistic understanding of PostgreSQL, its key features, different syntax, and examples of PostgreSQL Clause. By the end of reading this, you will be fully equipped to use the Clause. Let’s get started!
What is PostgreSQL DISTINCT Clause?
In the SELECT statement, the PostgreSQL DISTINCT clause is used to delete duplicate records from a result set. For each set of duplicates, the DISTINCT clause saves one row. PostgreSQL DISTINCT eliminates all duplicate rows and keeps just one entry for each duplicated row group. The DISTINCT clause can also be applied to multiple columns of a single table, on one or more columns of different tables attached through joins.
According to the PostgreSQL Documentation, “If SELECT DISTINCT
is specified, all duplicate rows are removed from the result set (one row is kept from each group of duplicates).“
1) Syntax
SELECT DISTINCT | DISTINCT ON (distinct_expressions_id)
column
FROM tables
[WHERE conds];
2) Parameters
- distinct_expressions_id: This represents the expressions that are used to delete duplicates.
- column: This represents the name(s) of the column(s) which are to be evaluated for unique values.
- tables: This represents the name(s) of the table(s) from which you want to get the records. The FROM operator must specify at least one table.
- WHERE conds: <Optional> This represents the conditions to be met for the records to be selected.
How to use the PostgreSQL DISTINCT Clause?
The different uses of PostgreSQL DISTINCT Clause are as follows:
Syntax #1
SELECT
DISTINCT column_name1
FROM
table_name;
In this case, the PostgreSQL DISTINCT clause is used for the column_name1 which evaluates the duplicate rows according to the column_name1 column of the table_name. Thus, the output will only show the list of unique column_name1 values.
Syntax #2
SELECT
DISTINCT column_name1, column_name2
FROM
table_name;
In this case, the PostgreSQL DISTINCT clause is applied to a combination of column names of a table. It checks for unique values for the combination (union) of the columns in that particular table. In the above statement, the PostgreSQL DISTINCT Clause evaluates and displays the output for the combination of values of column_name1 and column_name2.
Syntax #3
SELECT
DISTINCT ON (column_name1) column_name_alias,
column_name2
FROM
table_name
ORDER BY
column_name1,
column_name2;
In this case, the DISTINCT ON Postgres expression is similar to PostgreSQL DISTINCT clause. It is used with the SELECT statement to remove duplicates from a query set result. In addition, it maintains only the “first row” of each duplicate row in the query set result.
In this case, the firstmost row of every group is random. ORDER BY clause is used along with the PostgreSQL DISTINCT On clause to make the resulting set look organized.
To have in-depth information about the PostgreSQL DISTINCT On Clause, you can have a look at the example of the PostgreSQL DISTINCT On clause.
Examples of PostgreSQL DISTINCT Clause
For showcasing PostgreSQL DISTINCT example, you can consider the following table.
The different examples are as follows:
1) PostgreSQL DISTINCT Clause: Single Column
SELECT
DISTINCT fore_color
FROM
ColorProperties
ORDER BY
fore_color;
In this case, the PostgreSQL DISTINCT Clause is used to fetch the unique values of the fore_color column from the ColorProperties table. And to sort the output of the query set in ascending order, the Order By clause is used.
Output:
2) PostgreSQL DISTINCT Clause: Multiple Columns
SELECT
DISTINCT fore_color,
back_color
FROM
ColorProperties
ORDER BY
fore_color,
back_color;
In this case, the PostgreSQL SELECT DISTINCT Clause is used along with 2 columns i.e., fore_color and back_color. So, it evaluates the unique values for the combination of the values in the 2 columns.
3) PostgreSQL DISTINCT Clause: With DISTINCT ON Clause
SELECT
DISTINCT ON
(back_color) backgroundcolor,
fore_color
FROM
ColorProperties
ORDER BY
back_color,
fore_color;
In this case, the PostgreSQL DISTINCT On clause evaluates the unique values and removes all duplicate rows for the first column keeping a one-row entry for each value of the first column.
4) PostgreSQL DISTINCT Clause: Multiple Tables
To better understand the application of PostgreSQL DISTINCT Clause in multiple tables, let’s consider two tables i.e, students and departments.
Creating students table & Inserting Values:
CREATE TABLE students (
name text not null,
department_id text not null,
CONSTRAINT students_pk PRIMARY KEY (name)
);
INSERT INTO students (name, department_id) VALUES
('Jacob', '101'),
('David', '102'),
('John', '103');
Creating departments tables & inserting Values:
CREATE TABLE departments (
department_id text not null,
department_name text not null
);
INSERT INTO departments (department_id, department_name) VALUES
('101', 'Computer'),
('102', 'Electrical'),
('103', 'Mechanical');
Query for evaluating unique values:
SELECT DISTINCT ON (s.department_id) s.department_id, s.name, d.department_name
FROM students s
JOIN departments d ON d.department_id = s.department_id
ORDER BY s.department_id DESC
In this case, first, both the tables are joined based on the join condition. Then, the PostgreSQL DISTINCT On clause is applied to remove duplicate entries and keep a one-row entry for each value of the first column.
A Comprehensive Overview of PostgreSQL Features and Benefits
PostgreSQL is a high-performance, open-source relational database. It is famous for its Open-Source platform, which supports all RDBMS functionalities. In addition to RDBMS capabilities, it provides indexes, views, stored procedures, triggers, atomicity features, and so on. SQL and JSON queries are also supported. PostgreSQL is also capable of serving as a data warehouse system.
PostgreSQL is compatible with a wide range of operating systems, including Windows, Linux, macOS, and UNIX. The source code for PostgreSQL is publicly accessible under an open-source license, letting users use, modify, and implement it as needed. Since PostgreSQL offers no license costs, there is no risk of over-deployment, which saves a lot of expenses.
Key Features of PostgreSQL
Some of the key features of PostgreSQL are as follows:
- Customizable: PostgreSQL can be altered and customized by writing plugins to make the DBMS meet your needs. It also lets you integrate custom functions written in other programming languages like Java, C, C++, and others.
- Community Support: A professional community is always available to its users. PostgreSQL also has a large range of private, third-party support services available.
- Open-Source: As a free and open-source solution, PostgreSQL offers Object-Oriented and Relational Database capabilities.
- Users: It is a well-known and commonly used RDBMS. Apple, Etsy, Facebook, Instagram, Red Hat, Skype, Spotify, and many other companies utilize PostgreSQL.
- Code quality: Every line of code in PostgreSQL is evaluated by a team of professionals, and the whole development process is driven by the community, allowing for speedy bug reporting, changes, and verification.
- Data Availability and Resiliency: PostgreSQL versions that are privately supported provide additional high availability, resilience, and security for mission-critical production settings such as government agencies, financial institutions, and healthcare providers.
5) Postgres DISTINCT Clause: With COUNT(DISTINCT)
You can also use the DISTINCT clause along with the COUNT keywords. The following example shows how to use the DISTINCT clause with COUNT. It will return the number of countries in the customers table.
SELECT COUNT(DISTINCT country) FROM customers;
6) Postgres DISTINCT Clause: With GROUP BY and HAVING
You can use the GROUP BY clause along with the SELECT DISTINCT clause to select distinct values with groups. Similarly, the HAVING clause can be used to find duplicate values.
Both DISTINCT and GROUP BY clauses have similar functionality, but GROUP BY also allows you to use aggregate functions while getting unique data.
postgres=# select col1, col2 from test group by col1,col2 order by 1;
col1 | col2
——+——
1 | abc
2 | xyz
3 | tcs
postgres=# select col1, col2,count(*) from test group by col1,col2 order by 1;
col1 | col2 | count
——+——+——-
1 | abc | 2
2 | xyz | 2
3 | tcs | 2
postgres=# select col1 from test group by col1 having count(col1) > 1 order by 1;
col1
——
1
2
3
Conclusion
In this article, you learned about PostgreSQL DISTINCT Clause. This article also focused on PostgreSQL, its key features, different examples, and use cases of the PostgreSQL DISTINCT Clause. While mastering the nuances of PostgreSQL, its equally important to learn about ways to replicate data from PostgreSQL.
Share your experience of understanding the usage of PostgreSQL DISTINCT Clause in the comment section below! We would love to hear your thoughts.
Manisha Jena is a data analyst with over three years of experience in the data industry and is well-versed with advanced data tools such as Snowflake, Looker Studio, and Google BigQuery. She is an alumna of NIT Rourkela and excels in extracting critical insights from complex databases and enhancing data visualization through comprehensive dashboards. Manisha has authored over a hundred articles on diverse topics related to data engineering, and loves breaking down complex topics to help data practitioners solve their doubts related to data engineering.