Database Management Systems-7
Database Management Systems-7
For the
example instances, we get Dustin, Lubber, and Horatio.
(Q21) Find the names of sailors who have not reserved a red boat.
SELECT S.sname
FROM Sailors S
WHERE S.sid NOT IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM Boats B
WHERE B.color = ‘red’ )
This query computes the names of sailors whose sid is not in the set 22, 31, and 64.
Correlated Nested Queries
In the nested queries that we have seen thus far, the inner subquery has been completely
independent of the outer query. In general the inner subquery could depend on the row
that is currently being examined in the outer query (in terms of our conceptual
evaluation strategy). Let us rewrite the following query once more:
(Q1) Find the names of sailors who have reserved boat number 103.
SELECT S.sname
FROM Sailors S
WHERE EXISTS ( SELECT *
FROM Reserves R
WHERE R.bid = 103
AND R.sid = S.sid )
The EXISTS operator is another set comparison operator, such as IN. It allows us to
test whether a set is nonempty. Thus, for each Sailor row S, we test whether the set of
Reserves rows R such that R.bid = 103 AND S.sid = R.sid is nonempty. If so, sailor S has
reserved boat 103, and we retrieve the name. The subquery clearly depends on the current
row S and must be re-evaluated for each row in Sailors. The occurrence of S in the
subquery (in the form of the literal S.sid) is called a correlation, and such queries are
called correlated queries.
Set-Comparison Operators
We have already seen the set-comparison operators EXISTS, IN, and UNIQUE, along with
their negated versions. SQL also supports op ANY and op ALL, where op is one of the
SELECT S.sid
FROM Sailors S
WHERE S.rating > ANY ( SELECT S2.rating
FROM Sailors S2
WHERE S2.sname = ‘Horatio’ )
If there are several sailors called Horatio, this query finds all sailors whose rating is
better than that of some sailor called Horatio. On instance S3, this computes the
sids 31, 32, 58, 71, and 74. What if there were no sailor called Horatio? In this case
the comparison S.rating > ANY . . . is defined to return false, and the above query
returns an empty answer set. To understand comparisons involving ANY, it is useful to
think of the comparison being carried out repeatedly. In the example above, S.rating
is successively compared with each rating value that is an answer to the nested query.
Intuitively, the subquery must return a row that makes the comparison true, in order
for S.rating > ANY . . . to return true.
(Q23) Find sailors whose rating is better than every sailor called Horatio.
We can obtain all such queries with a simple modification to Query Q22: just replace
ANY with ALL in the WHERE clause of the outer query. On instance S3, we would get
the sids 58 and 71. If there were no sailor called Horatio, the comparison S.rating
> ALL . . . is defined to return true! The query would then return the names of all
sailors. Again, it is useful to think of the comparison being carried out repeatedly.
Intuitively, the comparison must be true for every returned row in order for S.rating
> ALL . . . to return true.
SELECT S.sid
FROM Sailors S
WHERE S.rating >= ALL ( SELECT S2.rating
FROM Sailors S2 )
Note that IN and NOT IN are equivalent to = ANY and <> ALL, respectively.
Let us revisit a query that we considered earlier using the INTERSECT operator.
(Q6) Find the names of sailors who have reserved both a red and a green boat.
SELECT S.sname
FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’
AND S.sid IN ( SELECT S2.sid
FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid
AND B2.color = ‘green’ )
As it turns out, writing this query (Q6) using INTERSECT is more complicated because
we have to use sids to identify sailors (while intersecting) and have to return sailor
names:
SELECT S3.sname
FROM Sailors S3
WHERE S3.sid IN (( SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = ‘red’ )
INTERSECT
(SELECT R2.sid
FROM Boats B2, Reserves R2
WHERE R2.bid = B2.bid AND B2.color = ‘green’ ))
Our next example illustrates how the division operation in relational algebra can be
expressed in SQL.
(Q9) Find the names of sailors who have reserved all boats.
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS (( SELECT B.bid
FROM Boats B )
EXCEPT
(SELECT R.bid
FROM Reserves R
Notice that this query is correlated—for each sailor S, we check to see that the set of
boats reserved by S includes all boats. An alternative way to do this query without
using EXCEPT follows:
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS ( SELECT B.bid
FROM Boats B
WHERE NOT EXISTS ( SELECT R.bid
FROM Reserves R
WHERE R.bid = B.bid
AND R.sid = S.sid ))
Intuitively, for each sailor we check that there is no boat that has not been reserved by
this sailor.
AGGREGATE OPERATORS
Note that it does not make sense to specify DISTINCT in conjunction with MIN or MAX
(although SQL-92 does not preclude this).
(Q25) Find the average age of all sailors.
On instance S3, the average age is 37.4. Of course, the WHERE clause can be used to
restrict the sailors who are considered in computing the average age:
(Q26) Find the average age of sailors with a rating of 10.
There are two such sailors, and their average age is 25.5. MIN (or MAX) can be used
instead of AVG in the above queries to find the age of the youngest (oldest) sailor.
(Q27) Find the name and age of the oldest sailor. Consider the following attempt to
answer this query:
SELECT S.sname, MAX (S.age)
FROM Sailors S
The intent is for this query to return not only the maximum age but also the name of
the sailors having that age. However, this query is illegal in SQL—if the SELECT clause
uses an aggregate operation, then it must use only aggregate operations unless the query
contains a GROUP BY clause! (The intuition behind this restriction should become clear
when we discuss the GROUP BY clause in Section 5.5.1.) Thus, we cannot use MAX (S.age)
as well as S.sname in the SELECT clause. We have to use a nested query to compute the
desired answer to Q27:
SELECT S.sname
FROM Sailors S
WHERE S.age > ( SELECT MAX ( S2.age )
FROM Sailors S2
WHERE S2.rating = 10 )
On instance S3, the oldest sailor with rating 10 is sailor 58, names of older sailors are Bob,
Dustin, Horatio, and Lubber. could alternatively be written as follows:
(Q31) Find the age of the youngest sailor for each rating level.
If we know that ratings are integers in the range 1 to 10, we could write 10 queries of
the form:
where i = 1, 2, . . . , 10. Writing 10 such queries is tedious. More importantly, we may not
know what rating levels exist in advance.
To write such queries, we need a major extension to the basic SQL query form, namely,
the GROUP BY clause. In fact, the extension also includes an optional HAVING clause
Q32) Find the age of the youngest sailor who is eligible to vote (i.e., is at least 18
years old) for each rating level with at least two such sailors.
The second step is to apply the qualification in the WHERE clause, S.age >= 18. This
step eliminates the row 〈71, zorba, 10, 16〉. The third step is to eliminate unwanted
columns. Only columns mentioned in the SELECT clause, the GROUP BY clause, or
the HAVING clause are necessary, which means we can eliminate sid and sname in our
example. The result is shown in Figure 5.13. The fourth step is to sort the table
Rating minage
3 25.5
7 35.0
8 25.5
Figure 5.13 Final Result in Sample Evaluation
Q33) For each red boat, find the number of reservations for this boat.
SELECT B.bid, COUNT (*) AS sailorcount FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = ‘red’ GROUP BY B.bid
On instances B1 and R2, the answer to this query contains the two tuples 〈102, 3〉 and
〈104, 2〉.It is interesting to observe that the following version of the above query is illegal :
Dept of CSE, Unit-2 Page 38
SELECT B.bid, COUNT (*) AS sailorcount FROM Boats B, Reserves R
WHERE R.bid = B.bid GROUP BY B.bid HAVING B.color = ‘red’
(Q34) Find the average age of sailors for each rating level that has at least two sailors.
rating avgage
rating avgage rating avgage
3 44.5 3 45.5 3 45.5
7 40.0 7 40.0 7 40.0
8 40.5 8 40.5 8 40.5
10 25.5 10 35.0
Figure 5.14 Q34 Answer Figure 5.15 Q35 Answer Figure 5.16 Q36 Answer
(Q35) Find the average age of sailors who are of voting age (i.e., at least 18 years old) for
each rating level that has at least two sailors.
(Q36) Find the average age of sailors who are of voting age (i.e., at least 18 years old)
for each rating level that has at least two such sailors.
The above formulation of the query reflects the fact that it is a variant of Q35. The
answer to Q36 on instance S3 is shown in Figure 5.16. It differs from the answer to
Q35 in that there is no tuple for rating 10, since there is only one tuple with rating 10
and age ≥ 18.
This formulation of Q36 takes advantage of the fact that the WHERE clause is applied
before grouping is done; thus, only sailors with age > 18 are left when grouping is
done. It is instructive to consider yet another way of writing this query:
SELECT Temp.rating, Temp.avgage
FROM ( SELECT S.rating, AVG ( S.age ) AS avgage,
COUNT (*) AS ratingcount
FROM Sailors S WHERE S. age > 18 GROUP BY S.rating ) AS Temp
WHERE Temp.ratingcount > 1
This alternative brings out several interesting points. First, the FROM clause can also
contain a nested subquery according to the SQL-92 standard. 6 Second, the HAVING
clause is not needed at all. Any query with a HAVING clause can be rewritten without
one, but many queries are simpler to express with the HAVING clause. Finally, when a
subquery appears in the FROM clause, using the AS keyword to give it a name is neces-
sary (since otherwise we could not express, for instance, the condition Temp.ratingcount
> 1).
(Q37) Find those ratings for which the average age of sailors is the minimum over all
ratings.
We use this query to illustrate that aggregate operations cannot be nested. One might
consider writing it as follows :
SELECT S.rating
FROM Sailors S
WHERE AVG (S.age) = ( SELECT MIN (AVG (S2.age))