Transfer Dock - Text - 20240823091413
Transfer Dock - Text - 20240823091413
PostgreSQL GROUP BY
Summary: in this tutorial, you will learn how to use PostgreSQL GROUP BY clause to
divide rows into groups.
For each group, you can apply an aggregate function such as SUM() to calculate the
sum of items or COUNT() to get the number of items in the groups.
SELECT
column_1,
column_2,
...,
aggregate_function(column_3)
FROM
table_name
GROUP BY
column_1,
column_2,
...;
Code language: SQL (Structured Query Language) (sql)
In this syntax:
First, select the columns that you want to group such as column1 and column2, and
column that you want to apply an aggregate function (column3).
Second, list the columns that you want to group in the GROUP BY clause.
The GROUP BY clause divides the rows by the values in the columns specified in the
GROUP BY clause and calculates a value for each group.
It’s possible to use other clauses of the SELECT statement with the GROUP BY
clause.
PostgreSQL evaluates the GROUP BY clause after the FROM and WHERE clauses and
before the HAVING SELECT, DISTINCT, ORDER BY and LIMIT clauses.
PostgreSQL GROUP BY
PostgreSQL GROUP BY clause examples
Let’s take a look at the payment table in the sample database.
payment
1) Using PostgreSQL GROUP BY without an aggregate function example
The following example uses the GROUP BY clause to retrieve the customer_id from the
payment table:
SELECT
customer_id
FROM
payment
GROUP BY
customer_id
ORDER BY
customer_id;
Code language: SQL (Structured Query Language) (sql)
Output:
customer_id
-------------
1
2
3
4
5
6
7
8
...
Each customer has one or more payments. The GROUP BY clause removes duplicate
values in the customer_id column and returns distinct customer ids. In this
example, the GROUP BY clause works like the DISTINCT operator.
The following query uses the GROUP BY clause to retrieve the total payment paid by
each customer:
SELECT
customer_id,
SUM (amount)
FROM
payment
GROUP BY
customer_id
ORDER BY
customer_id;
Code language: SQL (Structured Query Language) (sql)
Output:
customer_id | sum
-------------+--------
1 | 114.70
2 | 123.74
3 | 130.76
4 | 81.78
5 | 134.65
6 | 84.75
7 | 130.72
...
In this example, the GROUP BY clause groups the payments by the customer id. For
each group, it calculates the total payment.
The following statement uses the ORDER BY clause with GROUP BY clause to sort the
groups by total payments:
SELECT
customer_id,
SUM (amount)
FROM
payment
GROUP BY
customer_id
ORDER BY
SUM (amount) DESC;
Code language: SQL (Structured Query Language) (sql)
Output:
customer_id | sum
-------------+--------
148 | 211.55
526 | 208.58
178 | 194.61
137 | 191.62
144 | 189.60
3) Using PostgreSQL GROUP BY clause with the JOIN clause
The following statement uses the GROUP BY clause to retrieve the total payment for
each customer and display the customer name and amount:
SELECT
first_name || ' ' || last_name full_name,
SUM (amount) amount
FROM
payment
INNER JOIN customer USING (customer_id)
GROUP BY
full_name
ORDER BY
amount DESC;
Code language: SQL (Structured Query Language) (sql)
Output:
full_name | amount
-----------------------+--------
Eleanor Hunt | 211.55
Karl Seal | 208.58
Marion Snyder | 194.61
Rhonda Kennedy | 191.62
Clara Shaw | 189.60
...
In this example, we join the payment table with the customer table using an inner
join to get the customer names and group customers by their names.
SELECT
staff_id,
COUNT (payment_id)
FROM
payment
GROUP BY
staff_id;
Code language: SQL (Structured Query Language) (sql)
Output:
staff_id | count
----------+-------
1 | 7292
2 | 7304
(2 rows)
In this example, the GROUP BY clause divides the rows in the payment table into
groups and groups them by value in the staff_id column. For each group, it counts
the number of rows using the COUNT() function.
SELECT
customer_id,
staff_id,
SUM(amount)
FROM
payment
GROUP BY
staff_id,
customer_id
ORDER BY
customer_id;
Code language: SQL (Structured Query Language) (sql)
Output:
SELECT
payment_date::date payment_date,
SUM(amount) sum
FROM
payment
GROUP BY
payment_date::date
ORDER BY
payment_date DESC;
Code language: SQL (Structured Query Language) (sql)
Output:
payment_date | sum
--------------+---------
2007-05-14 | 514.18
2007-04-30 | 5723.89
2007-04-29 | 2717.60
2007-04-28 | 2622.73
...
Since the values in the payment_date column are timestamps, we cast them to date
values using the cast operator ::.
Summary
Use the PostgreSQL GROUP BY clause to divide rows into groups and apply an
aggregate function to each group
--------------------------------------------
Summary: in this tutorial, you will learn how to use the PostgreSQL COUNT()
function to count the number of rows in a table.
The following statement illustrates various ways of using the COUNT() function.
COUNT(*)
The COUNT(*) function returns the number of rows returned by a SELECT statement,
including NULL and duplicates.
SELECT
COUNT(*)
FROM
table_name
WHERE
condition;
Code language: SQL (Structured Query Language) (sql)
When you apply the COUNT(*) function to the entire table, PostgreSQL has to scan
the whole table sequentially. If you use the COUNT(*) function on a big table, the
query will be slow. This is related to the PostgreSQL MVCC implementation.
COUNT(column)
Similar to the COUNT(*) function, the COUNT(column_name) function returns the
number of rows returned by a SELECT clause. However, it does not consider NULL
values in the column_name.
SELECT
COUNT(column_name)
FROM
table_name
WHERE
condition;
Code language: SQL (Structured Query Language) (sql)
COUNT(DISTINCT column)
In this syntax, the COUNT(DISTINCT column_name) returns the number of unique non-
null values in the column_name.
SELECT
COUNT(DISTINCT column_name)
FROM
table_name
WHERE
condition;
Code language: SQL (Structured Query Language) (sql)
In practice, you often use the COUNT() function with the GROUP BY clause to return
the number of items for each group.
For example, you can use the COUNT() with the GROUP BY clause to return the number
of films in each film category.
payment
1) Basic PostgreSQL COUNT(*) example
The following statement uses the COUNT(*) function to return the number of
transactions in the payment table:
SELECT
COUNT(*)
FROM
payment;
Code language: SQL (Structured Query Language) (sql)
Output:
count
-------
14596
(1 row)
2) PostgreSQL COUNT(DISTINCT column) example
To get the distinct amounts that customers paid, you use the COUNT(DISTINCT amount)
function as shown in the following example:
SELECT
COUNT (DISTINCT amount)
FROM
payment;
Code language: SQL (Structured Query Language) (sql)
Output:
count
-------
19
(1 row)
3) Using PostgreSQL COUNT() function with GROUP BY clause example
The following example uses the COUNT() function with the GROUP BY function to
return the number of payments of each customer:
SELECT
customer_id,
COUNT (customer_id)
FROM
payment
GROUP BY
customer_id;
Code language: SQL (Structured Query Language) (sql)
Output:
customer_id | count
-------------+-------
184 | 20
87 | 28
477 | 21
273 | 28
...
If you want to display the customer name instead of id, you can join the payment
table with the customer table:
SELECT
first_name || ' ' || last_name full_name,
COUNT (customer_id)
FROM
payment
INNER JOIN customer USING (customer_id)
GROUP BY
customer_id;
Code language: JavaScript (javascript)
Output:
full_name | count
-----------------------+-------
Vivian Ruiz | 20
Wanda Patterson | 28
Dan Paine | 21
Priscilla Lowe | 28
...
4) Using PostgreSQL COUNT() function with HAVING clause
You can use the COUNT function in a HAVING clause to apply a specific condition to
groups. For example, the following statement finds customers who have made over 40
payments:
SELECT
first_name || ' ' || last_name full_name,
COUNT (customer_id)
FROM
payment
INNER JOIN customer USING (customer_id)
GROUP BY
customer_id
HAVING
COUNT (customer_id) > 40
Code language: SQL (Structured Query Language) (sql)
Output:
full_name | count
--------------+-------
Karl Seal | 42
Eleanor Hunt | 45
(2 rows)
Summary
Use the PostgreSQL COUNT() function to return the number of rows in a table.
-----------------------------------------------------------------------------------
-----
Summary: in this tutorial, you will learn how to use the PostgreSQL SELECT DISTINCT
clause to remove duplicate rows from a result set returned by a query.
The SELECT DISTINCT clause can be applied to one or more columns in the select list
of the SELECT statement.
SELECT
DISTINCT column1
FROM
table_name;
Code language: SQL (Structured Query Language) (sql)
In this syntax, the SELECT DISTINCT uses the values in the column1 column to
evaluate the duplicate.
If you specify multiple columns, the SELECT DISTINCT clause will evaluate the
duplicate based on the combination of values in these columns. For example:
SELECT
DISTINCT column1, column2
FROM
table_name;
Code language: SQL (Structured Query Language) (sql)
In this syntax, the SELECT DISTINCT uses the combination of values in both column1
and column2 columns for evaluating the duplicate.
Note that PostgreSQL also offers the DISTINCT ON clause that retains the first
unique entry of a column or combination of columns in the result set.
If you want to find distinct values of all columns in a table, you can use SELECT
DISTINCT *:
SELECT DISTINCT *
FROM table_name;
The star or asterisk (*) means all columns of the table_name.
Note that you will learn how to create a table and insert data into it in the
subsequent tutorial. In this tutorial, you need to execute the statement in psql or
pgAdmin to execute the statements.
First, create the colors table that has three columns: id, bcolor and fcolor using
the following CREATE TABLE statement:
INSERT INTO
colors (bcolor, fcolor)
VALUES
('red', 'red'),
('red', 'red'),
('red', NULL),
(NULL, 'red'),
(NULL, NULL),
('green', 'green'),
('blue', 'blue'),
('blue', 'blue');
Code language: SQL (Structured Query Language) (sql)
Third, retrieve the data from the colors table using the SELECT statement:
SELECT
id,
bcolor,
fcolor
FROM
colors;
Code language: SQL (Structured Query Language) (sql)
Output:
id | bcolor | fcolor
----+--------+--------
1 | red | red
2 | red | red
3 | red | null
4 | null | red
5 | null | null
6 | green | green
7 | blue | blue
8 | blue | blue
(8 rows)
Code language: JavaScript (javascript)
1) PostgreSQL SELECT DISTINCT one column example
The following statement selects unique values from the bcolor column of the t1
table and sorts the result set in alphabetical order by using the ORDER BY clause.
SELECT
DISTINCT bcolor
FROM
colors
ORDER BY
bcolor;
Code language: SQL (Structured Query Language) (sql)
Output:
bcolor
--------
blue
green
red
null
(4 rows)
Code language: JavaScript (javascript)
The bcolor column has 3 red values, two NULL, 1 green value, and two blue values.
The DISTINCT removes two read values, 1 NULL, and one blue.
Note that PostgreSQL treats NULLs as duplicates so that it keeps one NULL for all
NULLs when you apply the SELECT DISTINCT clause.
SELECT
DISTINCT bcolor, fcolor
FROM
colors
ORDER BY
bcolor,
fcolor;
Code language: SQL (Structured Query Language) (sql)
Output:
bcolor | fcolor
--------+--------
blue | blue
green | green
red | red
red | null
null | red
null | null
(6 rows)
Code language: JavaScript (javascript)
In this example, the query uses the values from both bcolor and fcolor columns to
evaluate the uniqueness of rows.
For example, you may want to know how many rental rates for films from the film
table:
SELECT DISTINCT
rental_rate
FROM
film
ORDER BY
rental_rate;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Output:
rental_rate
-------------
0.99
2.99
4.99
(3 rows)
Code language: plaintext (plaintext)
The output indicates that there are only three distinct rental rates 0.99, 2.99,
and 4.99.
Summary
Use the SELECT DISTINCT to remove duplicate rows from a result set of a query