0% found this document useful (0 votes)
12 views24 pages

GROUP BY and ORDER BY

Uploaded by

She Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views24 pages

GROUP BY and ORDER BY

Uploaded by

She Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

GROUP BY and ORDER

BY
Introduction
Aggregating data (also referred to as rolling
up, summarizing, or grouping data) is creating
some sort of total from a number of records.
Sum, min, max, count, and average are common
aggregate operations. In SQL you can group these
totals on any specified columns, allowing you to
control the scope of these aggregations easily.
Grouping Records
First, perform the simplest aggregation: count the
number of records in a table. Open the SQL editor and
get a count of records for station data:

SELECT COUNT(*) AS record_count FROM station_data;


The COUNT(*) means to count the records. We can also use this
in combination with other SQL operations, like WHERE. To count the
number of records where a tornado was present, input the following:

SELECT COUNT(*) AS record_count FROM station_data WHERE


tornado = 1;
We identified 3,000 records with tornadoes present. But
what if we wanted to separate the count by year (Figure 6-1)? We
can do that too with this query:

SELECT year, COUNT(*) AS record_count FROM station_data


WHERE tornado = 1
GROUP BY year;
This data suddenly becomes more meaningful. We now
see the tornado sighting count by year. Let’s break down this
query to see how this happened.
First, we select the year, then we select the COUNT(*)
from the records, and we filter only for records where tornado
is true:

SELECT year, COUNT(*) AS record_count FROM station_data


WHERE tornado = 1
GROUP BY year;
However, we also specify that we are grouping on
year. This is what effectively allows us to count the
number of records by year. The last line, highlighted in
bold, performs this grouping:

SELECT year, COUNT(*) AS record_count FROM station_data


WHERE tornado = 1
GROUP BY year;
We can slice this data on more than one field. If we wanted a
count by year and month, we could group on the month field as well
(Figure 6-2):

SELECT year, month, COUNT(*) AS record_count FROM station_data


WHERE tornado = 1

GROUP BY year, month


Alternatively, we can use ordinal positions instead of specifying the
columns in the GROUP BY. The ordinal positions correspond to each item’s
numeric position in the SELECT statement. So, instead of writing GROUP
BY year, month, we could instead make it GROUP BY 1, 2 (which is
especially helpful if our SELECT has long column names or expressions,
and we do not want to rewrite them in the GROUP BY):

SELECT year, month, COUNT(*) AS record_count FROM station_data


WHERE tornado = 1
GROUP BY 1, 2

Note that not all platforms support ordinal positions. With Oracle and SQL
Server, for example, you will have to rewrite the entire column name or
expression in the GROUP BY.
Ordering Records
Notice that the month column is not in a
natural sort we would expect. This is a good time
to bring up the ORDER BY operator, which you can
put at the end of a SQL statement (after any
WHERE and GROUP BY). If you wanted to sort by
year, and then month, you could just add this
command:
SELECT year, month, COUNT(*) AS record_count FROM station_data WHERE tornado =

GROUP BY year, month

ORDER BY year, month


However, you are probably more interested in recent data
and would like it at the top. By default, sorting is done with the
ASC operator, which orders the data in ascending order. If you
want to sort in descending order instead, apply the DESC
operator to the ordering of year to make more recent records
appear at the top of the results:

SELECT year, month, COUNT(*) AS record_count FROM


station_data
WHERE tornado = 1
GROUP BY year, month
ORDER BY year DESC, month
Aggregate Functions
We already used the COUNT(*) function to count records. But
there are other aggregation functions, including SUM(), MIN(), MAX(),
and AVG(). We can use aggregation functions on a specific column to
perform some sort of calculation on it.
But first let’s look at another way to use COUNT(). The COUNT()
function can be used for a purpose other than simply counting records.
If you specify a column instead of an asterisk, it will count the number
of non-null values in that column. For instance, we can take a count of
snow_depth recordings, which will count the number of non null values
(Figure 6-3):
SELECT COUNT(snow_depth) as recorded_snow_depth_count FROM
STATION_DATA
Note
Aggregate functions such as COUNT(), SUM(), AVG(),
MIN(), and MAX() will never include null values in their
calculations. Only non- null values will be considered.
Let’s move on to other aggregation tasks. If you wanted
to find the average temperature for each month since
2000, you could filter for years 2000 and later, group by
month, and perform an average on temp:

SELECT month, AVG(temp) as avg_temp FROM station_data


WHERE year >= 2000
GROUP BY month
As always, you can use functions on the aggregated
values and perform tasks such as rounding to make them look
nicer (Figure 6-5):

SELECT month, round(AVG(temp),2) as avg_temp FROM station_data


WHERE year >= 2000

GROUP BY month
SUM() is another common aggregate operation. To find the sum
of snow depth by year since 2000, run this query:

SELECT year, SUM(snow_depth) as total_snow FROM station_data


WHERE year >= 2000
GROUP BY year
There is no limitation on how many aggregate operations you can use
in a single query. Here we find the total_snow and total_precipitation
for each year since 2000 in a single query, as well as the
max_precipitation:
SELECT year,
SUM(snow_depth) as total_snow,
SUM(precipitation) as total_precipitation,
MAX(precipitation) as max_precipitation
FROM station_data
WHERE year >= 2000
GROUP BY year
It may not be apparent yet, but you can
achieve some very specific aggregations by
leveraging the WHERE. If you wanted the total
precipitation by year only when a tornado was
present, you would just have to filter on tornado
being true. This will only include tornado-related
precipitation in the totals:
SELECT year, SUM(precipitation) as tornado_precipitation FROM station_data
WHERE tornado = 1
GROUP BY year

You might also like