0% found this document useful (0 votes)
79 views

Database Management Systems-7

This document discusses various SQL queries involving nested queries and aggregate functions. It provides examples of correlated nested queries that depend on columns from the outer query, and how to use operators like EXISTS, IN, ALL, and ANY in nested queries. It also shows how to write queries using aggregate functions like AVG, MAX, MIN, and COUNT to return summary values from a table.

Uploaded by

Arun Sasidharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Database Management Systems-7

This document discusses various SQL queries involving nested queries and aggregate functions. It provides examples of correlated nested queries that depend on columns from the outer query, and how to use operators like EXISTS, IN, ALL, and ANY in nested queries. It also shows how to write queries using aggregate functions like AVG, MAX, MIN, and COUNT to return summary values from a table.

Uploaded by

Arun Sasidharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

The top-level query finds the names of sailors whose sid is in this set of sids.

For the
example instances, we get Dustin, Lubber, and Horatio.

(Q21) Find the names of sailors who have not reserved a red boat.

SELECT S.sname
FROM Sailors S
WHERE S.sid NOT IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM Boats B
WHERE B.color = ‘red’ )
This query computes the names of sailors whose sid is not in the set 22, 31, and 64.
Correlated Nested Queries

In the nested queries that we have seen thus far, the inner subquery has been completely
independent of the outer query. In general the inner subquery could depend on the row
that is currently being examined in the outer query (in terms of our conceptual
evaluation strategy). Let us rewrite the following query once more:

(Q1) Find the names of sailors who have reserved boat number 103.
SELECT S.sname
FROM Sailors S
WHERE EXISTS ( SELECT *
FROM Reserves R
WHERE R.bid = 103
AND R.sid = S.sid )
The EXISTS operator is another set comparison operator, such as IN. It allows us to
test whether a set is nonempty. Thus, for each Sailor row S, we test whether the set of
Reserves rows R such that R.bid = 103 AND S.sid = R.sid is nonempty. If so, sailor S has
reserved boat 103, and we retrieve the name. The subquery clearly depends on the current
row S and must be re-evaluated for each row in Sailors. The occurrence of S in the
subquery (in the form of the literal S.sid) is called a correlation, and such queries are
called correlated queries.

Set-Comparison Operators

We have already seen the set-comparison operators EXISTS, IN, and UNIQUE, along with
their negated versions. SQL also supports op ANY and op ALL, where op is one of the

Dept of CSE, Unit-2 Page 31


arithmetic comparison operators {<, <=, =, <>, >=, >}. (SOME is also available, but it is
just a synonym for ANY.)
(Q22) Find sailors whose rating is better than some sailor called Horatio.

SELECT S.sid
FROM Sailors S
WHERE S.rating > ANY ( SELECT S2.rating
FROM Sailors S2
WHERE S2.sname = ‘Horatio’ )

If there are several sailors called Horatio, this query finds all sailors whose rating is
better than that of some sailor called Horatio. On instance S3, this computes the
sids 31, 32, 58, 71, and 74. What if there were no sailor called Horatio? In this case
the comparison S.rating > ANY . . . is defined to return false, and the above query
returns an empty answer set. To understand comparisons involving ANY, it is useful to
think of the comparison being carried out repeatedly. In the example above, S.rating
is successively compared with each rating value that is an answer to the nested query.
Intuitively, the subquery must return a row that makes the comparison true, in order
for S.rating > ANY . . . to return true.

(Q23) Find sailors whose rating is better than every sailor called Horatio.

We can obtain all such queries with a simple modification to Query Q22: just replace
ANY with ALL in the WHERE clause of the outer query. On instance S3, we would get
the sids 58 and 71. If there were no sailor called Horatio, the comparison S.rating
> ALL . . . is defined to return true! The query would then return the names of all
sailors. Again, it is useful to think of the comparison being carried out repeatedly.
Intuitively, the comparison must be true for every returned row in order for S.rating
> ALL . . . to return true.

As another illustration of ALL, consider the following query:

(Q24) Find the sailors with the highest rating.

SELECT S.sid
FROM Sailors S
WHERE S.rating >= ALL ( SELECT S2.rating
FROM Sailors S2 )

Dept of CSE, Unit-2 Page 32


The subquery computes the set of all rating values in Sailors. The outer WHERE con-
dition is satisfied only when S.rating is greater than or equal to each of these rating
values, i.e., when it is the largest rating value. In the instance S3, the condition is
only satisfied for rating 10, and the answer includes the sids of sailors with this rating,
i.e., 58 and 71.

Note that IN and NOT IN are equivalent to = ANY and <> ALL, respectively.

More Examples of Nested Queries

Let us revisit a query that we considered earlier using the INTERSECT operator.

(Q6) Find the names of sailors who have reserved both a red and a green boat.

SELECT S.sname
FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’
AND S.sid IN ( SELECT S2.sid
FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid
AND B2.color = ‘green’ )
As it turns out, writing this query (Q6) using INTERSECT is more complicated because
we have to use sids to identify sailors (while intersecting) and have to return sailor
names:
SELECT S3.sname
FROM Sailors S3
WHERE S3.sid IN (( SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = ‘red’ )
INTERSECT
(SELECT R2.sid
FROM Boats B2, Reserves R2
WHERE R2.bid = B2.bid AND B2.color = ‘green’ ))

Our next example illustrates how the division operation in relational algebra can be
expressed in SQL.
(Q9) Find the names of sailors who have reserved all boats.

SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS (( SELECT B.bid
FROM Boats B )
EXCEPT
(SELECT R.bid
FROM Reserves R

Dept of CSE, Unit-2 Page 33


WHERE R.sid = S.sid ))

Notice that this query is correlated—for each sailor S, we check to see that the set of
boats reserved by S includes all boats. An alternative way to do this query without
using EXCEPT follows:

SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS ( SELECT B.bid
FROM Boats B
WHERE NOT EXISTS ( SELECT R.bid
FROM Reserves R
WHERE R.bid = B.bid
AND R.sid = S.sid ))

Intuitively, for each sailor we check that there is no boat that has not been reserved by
this sailor.
AGGREGATE OPERATORS

We now consider a powerful class of constructs for computing aggregate


values such as MIN and SUM. These features represent a significant extension of rela-
tional algebra. SQL supports five aggregate operations, which can be applied on any
column, say A, of a relation:
1. COUNT ([DISTINCT] A): The number of (unique) values in the A column.
2. SUM ([DISTINCT] A): The sum of all (unique) values in the A column.
3. AVG ([DISTINCT] A): The average of all (unique) values in the A column.
4. MAX (A): The maximum value in the A column.
5. MIN (A): The minimum value in the A column.

Note that it does not make sense to specify DISTINCT in conjunction with MIN or MAX
(although SQL-92 does not preclude this).
(Q25) Find the average age of all sailors.

SELECT AVG (S.age)


FROM Sailors S

On instance S3, the average age is 37.4. Of course, the WHERE clause can be used to
restrict the sailors who are considered in computing the average age:
(Q26) Find the average age of sailors with a rating of 10.

Dept of CSE, Unit-2 Page 34


SELECT AVG (S.age)
FROM Sailors S
WHERE S.rating = 10

There are two such sailors, and their average age is 25.5. MIN (or MAX) can be used
instead of AVG in the above queries to find the age of the youngest (oldest) sailor.
(Q27) Find the name and age of the oldest sailor. Consider the following attempt to
answer this query:
SELECT S.sname, MAX (S.age)
FROM Sailors S
The intent is for this query to return not only the maximum age but also the name of
the sailors having that age. However, this query is illegal in SQL—if the SELECT clause
uses an aggregate operation, then it must use only aggregate operations unless the query
contains a GROUP BY clause! (The intuition behind this restriction should become clear
when we discuss the GROUP BY clause in Section 5.5.1.) Thus, we cannot use MAX (S.age)
as well as S.sname in the SELECT clause. We have to use a nested query to compute the
desired answer to Q27:

SELECT S.sname, S.age


FROM Sailors S
WHERE S.age = ( SELECT MAX (S2.age)
FROM Sailors S2 )
Observe that we have used the result of an aggregate operation in the subquery as
an argument to a comparison operation. Strictly speaking, we are comparing an age
value with the result of the subquery, which is a relation. However, because of the use
of the aggregate operation, the subquery is guaranteed to return a single tuple with
a single field, and SQL converts such a relation to a field value for the sake of the
comparison. The following equivalent query for Q27 is legal in the SQL-92 standard
but is not supported in many systems:

SELECT S.sname, S.age


FROM Sailors S
WHERE ( SELECT MAX (S2.age)
FROM Sailors S2 ) = S.age
We can count the number of sailors using COUNT. This example illustrates the use of * as an
argument to COUNT, which is useful when we want to count all rows .
Dept of CSE, Unit-2 Page 35
(Q28) Count the number of sailors.
SELECT COUNT (*)
FROM Sailors S
We can think of * as shorthand for all the columns (in the cross-product of the from-
list in the FROM clause). Contrast this query with the following query, which computes the
number of distinct sailor names. (Remember that sname is not a key!)
(Q30) Find the names of sailors who are older than the oldest sailor with a rating of 10.

SELECT S.sname
FROM Sailors S
WHERE S.age > ( SELECT MAX ( S2.age )
FROM Sailors S2
WHERE S2.rating = 10 )

On instance S3, the oldest sailor with rating 10 is sailor 58, names of older sailors are Bob,
Dustin, Horatio, and Lubber. could alternatively be written as follows:

SELECT S.sname FROM Sailors S


WHERE S.age > ALL ( SELECT S2.age
FROM Sailors S2
WHERE S2.rating = 10 )

The GROUP BY and HAVING Clauses

we want to apply aggregate operations to each of a number of groups of rows in a


relation, where the number of groups depends on the relation instance (i.e., is not known in
advance). For example, consider the following query.

(Q31) Find the age of the youngest sailor for each rating level.

If we know that ratings are integers in the range 1 to 10, we could write 10 queries of
the form:

SELECT MIN (S.age)


FROM Sailors S
WHERE S.rating = i

where i = 1, 2, . . . , 10. Writing 10 such queries is tedious. More importantly, we may not
know what rating levels exist in advance.

To write such queries, we need a major extension to the basic SQL query form, namely,
the GROUP BY clause. In fact, the extension also includes an optional HAVING clause

Dept of CSE, Unit-2 Page 36


that can be used to specify qualifications over groups (for example, we may only
be interested in rating levels > 6). The general form of an SQL query with these
extensions is:

SELECT [ DISTINCT ] select-list


FROM from-list
WHERE qualification
GROUP BY grouping-list
HAVING group-qualification

Using the GROUP BY clause, we can write Q31 as follows:

SELECT S.rating, MIN (S.age)


FROM Sailors S
GROUP BY S.rating

Let us consider some important points concerning the new clauses:


The select-list in the SELECT clause consists of (1) a list of column names and (2) a list
of terms having the form aggop ( column-name ) AS new-name. The optional AS
new-name term gives this column a name in the table that is the result of the query.
Any of the aggregation operators can be used for aggop.Every column that appears in (1)
must also appear in grouping-list. The reason is that each row in the result of the query
corresponds to one group, which is a collection of rows that agree on the values of
columns in grouping-list. If a column appears in list (1), but not in grouping-list, it is not
clear what value should be assigned to it in an answer row.

The expressions appearing in the group-qualification in the HAVING clause must


have a single value per group. The intuition is that the HAVING clause determines whether
an answer row is to be generated for a given group. Therefore, a col-
umn appearing in the group-qualification must appear as the argument to an
aggregation operator, or it must also appear in grouping-list.
If the GROUP BY clause is omitted, the entire table is regarded as a single group.
We will explain the semantics of such a query through an example. Consider the query:

Q32) Find the age of the youngest sailor who is eligible to vote (i.e., is at least 18
years old) for each rating level with at least two such sailors.

SELECT S.rating, MIN (S.age) AS minage


FROM Sailors S
WHERE S.age >= 18

Dept of CSE, Unit-2 Page 37


GROUP BY S.rating
HAVING COUNT (*) > 1
Extending the conceptual evaluation strategy presented in Section 5.2, we proceed as
follows. The first step is to construct the cross-product of tables in the from-list. Because
the only relation in the from-list in Query Q32 is Sailors, the result is just the instance
shown in Figure 5.10.
sid sname rating age
22 Dustin 7 45.0
29 Brutus 1 33.0
31 Lubber 8 55.5
32 Andy 8 25.5
58 Rusty 10 35.0
64 Horati 7 35.0
o
71 Zorba 10 16.0
orb Horatio
74 9 35.0
a85 Art 3 25.5
95 Bob 3 63.5
bo
b
Figure 5.10 Instance S3 of Sailors

The second step is to apply the qualification in the WHERE clause, S.age >= 18. This

step eliminates the row 〈71, zorba, 10, 16〉. The third step is to eliminate unwanted

columns. Only columns mentioned in the SELECT clause, the GROUP BY clause, or
the HAVING clause are necessary, which means we can eliminate sid and sname in our
example. The result is shown in Figure 5.13. The fourth step is to sort the table

Rating minage
3 25.5
7 35.0
8 25.5
Figure 5.13 Final Result in Sample Evaluation

More Examples of Aggregate Queries

Q33) For each red boat, find the number of reservations for this boat.
SELECT B.bid, COUNT (*) AS sailorcount FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = ‘red’ GROUP BY B.bid

On instances B1 and R2, the answer to this query contains the two tuples 〈102, 3〉 and

〈104, 2〉.It is interesting to observe that the following version of the above query is illegal :
Dept of CSE, Unit-2 Page 38
SELECT B.bid, COUNT (*) AS sailorcount FROM Boats B, Reserves R
WHERE R.bid = B.bid GROUP BY B.bid HAVING B.color = ‘red’

(Q34) Find the average age of sailors for each rating level that has at least two sailors.

SELECT S.rating, AVG (S.age) AS avgage


FROM Sailors S
GROUP BY S.rating
HAVING COUNT (*) > 1
After identifying groups based on rating, we retain only groups with at least two sailors. The
answer to this query on instance S3 is shown in Figure 5.14.

rating avgage
rating avgage rating avgage
3 44.5 3 45.5 3 45.5
7 40.0 7 40.0 7 40.0
8 40.5 8 40.5 8 40.5
10 25.5 10 35.0
Figure 5.14 Q34 Answer Figure 5.15 Q35 Answer Figure 5.16 Q36 Answer

SELECT S.rating, AVG ( S.age ) AS avgage


FROM Sailors S
GROUP BY S.rating
HAVING 1 < ( SELECT COUNT (*)
FROM Sailors S2
WHERE S.rating = S2.rating )

(Q35) Find the average age of sailors who are of voting age (i.e., at least 18 years old) for
each rating level that has at least two sailors.

SELECT S.rating, AVG ( S.age ) AS avgage


FROM Sailors S
WHERE S. age >= 18
GROUP BY S.rating
HAVING 1 < ( SELECT COUNT (*)
FROM Sailors S2 WHERE S.rating = S2.rating

(Q36) Find the average age of sailors who are of voting age (i.e., at least 18 years old)
for each rating level that has at least two such sailors.

SELECT S.rating, AVG ( S.age ) AS avgage


FROM Sailors S
WHERE S. age > 18
GROUP BY S.rating

Dept of CSE, Unit-2 Page 39


HAVING 1 < ( SELECT COUNT (*)
FROM Sailors S2
WHERE S.rating = S2.rating AND S2.age >= 18 )

The above formulation of the query reflects the fact that it is a variant of Q35. The
answer to Q36 on instance S3 is shown in Figure 5.16. It differs from the answer to
Q35 in that there is no tuple for rating 10, since there is only one tuple with rating 10
and age ≥ 18.

SELECT S.rating, AVG ( S.age ) AS avgage


FROM Sailors S
WHERE S. age > 18
GROUP BY S.rating
HAVING COUNT (*) > 1

This formulation of Q36 takes advantage of the fact that the WHERE clause is applied
before grouping is done; thus, only sailors with age > 18 are left when grouping is
done. It is instructive to consider yet another way of writing this query:
SELECT Temp.rating, Temp.avgage
FROM ( SELECT S.rating, AVG ( S.age ) AS avgage,
COUNT (*) AS ratingcount
FROM Sailors S WHERE S. age > 18 GROUP BY S.rating ) AS Temp
WHERE Temp.ratingcount > 1

This alternative brings out several interesting points. First, the FROM clause can also
contain a nested subquery according to the SQL-92 standard. 6 Second, the HAVING
clause is not needed at all. Any query with a HAVING clause can be rewritten without
one, but many queries are simpler to express with the HAVING clause. Finally, when a
subquery appears in the FROM clause, using the AS keyword to give it a name is neces-
sary (since otherwise we could not express, for instance, the condition Temp.ratingcount
> 1).
(Q37) Find those ratings for which the average age of sailors is the minimum over all
ratings.

We use this query to illustrate that aggregate operations cannot be nested. One might
consider writing it as follows :

SELECT S.rating
FROM Sailors S
WHERE AVG (S.age) = ( SELECT MIN (AVG (S2.age))

Dept of CSE, Unit-2 Page 40

You might also like