0% found this document useful (0 votes)
66 views8 pages

Skyline Daa 1

The document describes algorithms for computing skyline points (maximal points) in 2D and 3D spaces. It presents an O(n log n) algorithm for 2D skyline points that sorts the points by x-coordinate and scans them, keeping track of the maximum y-coordinate seen. For 3D, it uses a sweep plane algorithm that processes points by decreasing z-coordinate, using a binary search tree to check if each new point is dominated in xy-space. The algorithms rely on properties of the skyline visualization as a staircase.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views8 pages

Skyline Daa 1

The document describes algorithms for computing skyline points (maximal points) in 2D and 3D spaces. It presents an O(n log n) algorithm for 2D skyline points that sorts the points by x-coordinate and scans them, keeping track of the maximum y-coordinate seen. For 3D, it uses a sweep plane algorithm that processes points by decreasing z-coordinate, using a binary search tree to check if each new point is dominated in xy-space. The algorithms rely on properties of the skyline visualization as a staircase.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Chapter 2

Skyline Points (Maximal Points)

2.1 Introduction
Let P be a set of n points in a d-dimensional space. A point p(p1 , . . . , pd ) dominates a point
q(q1 , . . . , qd ) if pi > qi , ∀i ∈ [1, d]. In a skyline query, the goal is to report points of P which are
not dominated by any other points in P . In the computational geometry literature, skyline points
are also known as maximal or minimal points. See Figure 2.1 for an example in the context of cricket
(a popular sport in India). In cricket, for a batsman, higher the average (resp., strike rate), the bet-
ter he/she is considered. In Figure 2.1, each point corresponds to a batsman with the x-coordinate
(resp., y-coordinate) being the average (resp., strike rate). The skyline points represent potentially
good batsmen.
Strike Rate
9
5
10
2
7

4
1
6
8
3

Average

Figure 2.1: Points 1, 2, 5 and 9 are skyline points.

It is an extremely popular summarization query in databases, since it filters out potentially useful
points (other applications include finding “best” products or hotels based on some d parameters).
As a result, there has been an extensive amount of work done on computing skyline points and its
several variants in the database community.

2.2 Warm-up: Algorithm in 2D


A naive approach to compute the skyline points would be to compare each pair of points which would
take O(n2 ) time. Instead, we can solve the problem in O(n log n) time.

Theorem 1. Given a set P of n points in 2D, its skyline points can be computed in O(n log n) time.

4
Let p1 , . . . , pn be the sequence of points of P in decreasing order of their x-coordinate values.
Initialize a variable ymax ← −∞. The algorithm will scan the points in the order p1 , p2 , . . . , pn .
When a point pi is scanned, if the y-coordinate of pi is greater than ymax , then pi is declared as a
skyline point and ymax is updated to the y-coordinate of pi ; otherwise, pi is declared as a non-skyline
point and ymax remains the same. See Figure 2.2 (a) and (b) for an illustration of these two cases.
The running time is dominated by the sorting step which takes O(n log n) time.

Question. Prove the correctness of this algorithm.

ymax pi ymax a
b
pi c
d

(a) (b) (c)

Figure 2.2: (a) pi is a non-skyline point, (b) pi is a skyline point, and (c) a 2D skyline can be
visualized as a staircase.

It will be useful to visualize the 2D skyline as a “staircase” represented as a collection of horizontal


and vertical segments shown in bold in Figure 2.2(c). Two useful properties of the staircase are the
following:

1. If a point p 6∈ P lies “below” the staircase, then p is dominated by one of the skyline points.
On the other hand, if a point p 6∈ P lies “above” the staircase, then p is not dominated by any
of the skyline points.

2. As we walk from left-to-right on the staircase, the y-coordinates of the skyline points visited
keeps decreasing.

2.3 Sweepline algorithm in 3D


In this section a sweep-plane based algorithm for computing skyline points in 3D will be discussed.
Let p1 , p2 , . . . , pn be the sequence of points in P in decreasing order of their z-coordinate values.
Imagine a plane parallel to the xy-plane starting from z = +∞ and sweeping towards z = −∞. As
the sweep proceeds, the plane will intersect points in P (in decreasing order of their z-coordinate
values). Let Si be the skyline points found by the algorithm after processing p1 , . . . , pi . Note that p1
will always be a skyline point of P , and hence, initialize S1 ← {p1 }. Let Sixy be the skyline points of
Si where the domination is defined w.r.t. x and y coordinates only (i.e., the z coordinate is ignored).

Question. Come up with a small example where Si is not equal to Sixy .


When the sweep-plane reaches a point pi , then the algorithm performs the following test:
xy
Does any point in Si−1 dominate pi ?

5
If the answer is yes (Figure 2.3(b)), then the algorithm declares that pi is not a skyline point of P .
Update Si ← Si−1 and Sixy ← Si−1 xy
. On the other hand, if the answer is no (Figure 2.3(a)), then the
algorithm declares that pi is a skyline point of P . Update Si ← Si−1 ∪ {pi } and Sixy is obtained from
xy xy
Si−1 by adding pi and removing points of Si−1 dominated by pi .

pi

pi
y
x
(a) (b)

xy xy
Figure 2.3: Si−1 is the set of black points. (a) No point in Si−1 dominates pi , (b) pi is dominated by
xy
a point in Si−1 .

The final output of the algorithm will be Sn . Answering the following questions proves the
correctness of the algorithm.

Question. While testing if pi is a skyline point or not, why did the algorithm not consider points
pi+1 , . . . , pn ?

xy
Question. If a point in Si−1 dominates pi , then why is pi not a skyline point of P ?

xy
Question. If no point in Si−1 dominates pi , then why is pi a skyline point of P ?

Efficient implementation. An efficient data structure is needed to ensure that the above algo-
xy
rithm runs in O(n log n) time. Even though the question of “Does any point in Si−1 dominate pi ”
looks like a 2D question, it is actually a 1D problem. We will build a binary search tree based on the
xy
points in Si−1 . We will store the points at the leaf nodes ordered from left to right in increasing order
of their x-coordinate values. By Property (2) of 2D skyline, this also implies that the leaf nodes are
ordered in decreasing order of their y-coordinate values. Additionally, the leaf nodes are connected
as a doubly linked list, so for any leaf node its predecessor and successor leaf node can be accessed
in constant time.
Recall from previous section that a 2D skyline can be conceptually represented as a staircase. To
xy
answer “Does any point in Si−1 dominate pi ” is equivalent to answering whether pi lies “above” the
staircase or “below” the staircase. We will perform a successor query on the binary search tree with
the x-coordinate of pi , and let pj be the point corresponding to it.

• If pj dominates pi w.r.t. x and y coordinates, then pi will lie below the staircase.

• If pj does not dominate pi w.r.t. x and y coordinates, then pi will lie above the staircase (since
none of the points on the staircase to the right of pj can dominate pi ).

If pi lies above the staircase, then we also need to update the staricase by inserting pi and removing
the points on the staircase dominated by pi . The points on the updated staircase will be the set Sixy .

6
Question. Assume that ki points on the staircase are dominated by pi . Show that these ki points
can be identified and deleted from the binary search tree in O((ki + 1) log n) amortized time. You
can assume that deletion and insertion of a point in the balanced binary search tree takes O(log n)
amortized time.
Once a point gets deleted from the staircase (and the binary search tree), then it can never re-
appear. Using this crucial observation, the total time taken to update the staircase can be bounded
by !!
Xn Xn Xn X n
O((ki + 1) log n) = O log n ki + 1 = O(n log n), since ki ≤ n.
i=1 i=1 i=1 i=1

Before the sweep-plane starts, the algorithm sort the points based on their z-coordinate values
which takes O(n log n) time. For each point visited, it takes O(log n) amortized time to answer a
successor query. Finally, we showed above that updating the staircase (and hence, Sixy ) through
out the algorithm takes O(n log n) time. Therefore, the overall running time of the algorithm is
O(n log n).
Theorem 2. Given a set P of n points in 3D, its skyline points can be computed in O(n log n) time.

2.4 Output-sensitive algorithm in 2D


In the previous sections, we have seen an O(n log n) time algorithms to solve the skyline problem in
2D and 3D. In the comparison model of computation, a lower bound of Ω(n log n) can be shown for
the skyline problem. Therefore, the O(n log n) time algorithms are optimal in the comparison model.
The goal of this section is to design algorithms which will perform better than O(n log n) when
the size of the skyline is “small”, and when the size of the skyline is “large”, then the running time
remains O(n log n). Such algorithms are known as output-sensitive algorithms since their running
time is proportional to the size of the output. If the algorithm designer is told that k = 1 for the
given input, then it is trivial to design an O(n) time algorithm. This hints that it might be possible
to beat the n log n barrier when k is small.

Slow algorithm. Let P be a set of n points in 2D and let k be the number of skyline points in P .
Consider the following simple algorithm. Let pmax be the point in P with the largest x-coordinate,
and let D(pmax ) ⊂ P be the points dominated by pmax . Let P ← P \ ({pmax ∪ D(pmax )}). Repeat
the above step recursively on P . The algorithm stops when P becomes empty.

Exercise. Argue that the above algorithm reports the k skyline points correctly. Also, prove that
the running time of the algorithm is O(nk).

2.4.1 Chan’s algorithm.


Now we will describe an interesting O(n log k) time algorithm designed by Timothy M. Chan (origi-
nally for the convex hull problem). Let us first dive right into the details of the algorithm. Later we
will recap the key properties used in the algorithms.

Sub-problem. Given P and integer parameter k̂, design an O(n log k̂) time algorithm which does
the following:
• If k̂ ≥ k, then report the skyline points of P ,

7
• else if k̂ < k, then report failure.

The algorithm consists of the following three steps (See Figure 2.4):

1. Construct k̂ + 1 vertical lines `1 , `2 , . . . , `k̂+1 , such that

• +∞ = x(`1 ) > x(`2 ) > . . . > x(`k̂+1 ) = −∞, where x(`i ) is the x-coordinate of `i , and
• for all 1 ≤ i ≤ k̂, the number of points of P whose x-coordinate lies between x(`i ) and
x(`i+1 ) is n/k̂.

For all 1 ≤ i ≤ k̂, define Pi ⊆ P to be the points lying in [x(`i+1 ), x(`i )]. Also, initialize S ← ∅
and y(p∗ ) ← −∞.

2. For i = 1 till k̂,

• if |S| ≥ k̂ + 1, then report failure.


• Otherwise, let Pi0 ⊆ Pi be the points with y-coordinate greater than y(p∗ ). Run the “slow
algorithm” on Pi0 . Let S(Pi ) be the skyline points reported. Set S ← S ∪ S(Pi ). Define
y(p∗ ) to be the y-coordinate of the point in S with the largest y-coordinate.

3. Report S.

Exercise. Prove that the above algorithm solves the sub-problem correctly.

`5 `4 `3 `2 `1

Figure 2.4: Execution of the algorithm for k̂ = 4. In this example, it will report a failure.

Exercise. Construct the k̂+1 vertical lines in O(n log k̂) time. (Hint: recursively use the linear-time
median finding algorithm.)
Step (1) of the algorithm can be performed in O(n log k̂) time. Running the slow algorithm on
|Pi | takes O((n/k̂) × |S(Pi )|) time. Summing over all the iterations, it takes O((n/k̂) k̂i=1 |S(Pi )|) =
0
P

O((n/k̂) · k̂) = O(n) time. Therefore, the overall running time of the sub-problem is O(n log k̂).

8
Guessing k̂. Now comes the mysterious part where we need to guess the value of k. A poor guess
would be to set k̂ to 1, 2, 3 . . . until k̂ ←
P k, and run the sub-problem for each guess. This would
lead to an overall running time of O(n ki=1 log i) = O(nk log k). Another poor choice would be to
do binary search: here the initial guess of k would be n/2, which implies that running time of the
sub-problem on k̂ ← n/2 will be Ω(n log n), whereas it could happen that k  n/2.
A decent choice would be the doubling trick, in which the algorithm sets k̂ ← 2i in the i-th
iteration (i ≥ 0). Then the running time of the algorithm will be:
dlog ke dlog ke
X X
i
O(n log 2 ) = O(n i) = O(n log2 k).
i=0 i=0

i
The cleverness of Chan’s algorithm lies in guessing k̂ ← 22 in the i-th iteration. Then the running
time of the algorithm will be:
dlog log ke dlog log ke
2i
X X
O(n log 2 ) = O(n 2i ) = O(n log k).
i=0 i=0

2i
Exercise. Prove that a more aggressive choice of guessing k̂ ← 22 will not help! Specifically, prove
that the running time will be O(n log2 k).

Comments. We conclude by making a few observations about the algorithm. Notice that unlike
the O(n log n) time algorithms, here the algorithm cannot afford to sort the input points. Step (1) of
the algorithm tackles this issue by doing “partial sorting”. For a given Pi , the points within Pi are
not sorted among themselves. However, given i < j, all the points in Pi have a larger x-coordinate
value than any point in Pj . Secondly, the algorithm nicely combines the sweepline approach of the
O(n log n) time algorithm with the “slow algorithm”. Unlike the traditional sweepline algorithms, in
this algorithm, the sweepline sweeps n/k̂ points at a time and runs the slow algorithm on the points
swept at that time. Thirdly, this algorithm inherently requires knowing the value of k. However, we
do not know it upfront. Chan’s guessing technique hits the “sweet spot”: it leads to a geometric
series, where the largest term dominates the summation of the remaining terms.

2.5 Exercises
1 (Divide and conquer) We will try to come up with a divide and conquer algorithm to compute
the skyline points in 3D. Let P be a set of n points in 3D. The algorithm computes a median plane
(x = xmed ) such that there are equal number of points of P on either side of the median plane; call
these sets P` and Pr , respectively. Recursively, compute the skyline points of P` and Pr .

• Let S` and Sr be the skyline points of P` and Pr , respectively. Design a merge procedure which
takes as input S` and Sr , and in O(n log n) time computes the skyline points of S` ∪ Sr .

• Prove that the set of skyline points of S` ∪ Sr is equal to the set of skyline points of P .

• What is the overall running time of such an algorithm?

9
2. Now improve the running time of the above algorithm to O(n log n). Hint 1: Merge operation
should happen in O(n) time. Hint 2: Before the algorithm starts, pre-sort P based on their z-
coordinate values. Hint 3: Maintain the invariant that the skyline points of any subproblem (in the
divide and conquer algorithm) are reported in sorted order based on their z-coordinate values.

3 (Output sensitive algorithm-I) Let P be a set of n points lying in a d-dimensional space,


where d is a constant independent of n such as 5 or 10. Design an output-sensitive algorithm for
computing skyline points of P in O(nh) time, where h is the number of skyline points.

4 (Output sensitive algorithm-II) Consider the following algorithm to compute the skyline
points in 2D.
skyline(P ):
1. if |P | = 1, then return P .
2. divide P into the left and right halves P` and Pr by the median x-coordinate.
3. discover the point p with the maximum y-coordinate in Pr .
4. prune all points from P` that are dominated by p.
5. return the concatenation of skyline(P` ) and skyline(Pr ).

Prove that the above algorithm runs in O(n + n log h) time, where h is the number of skyline
points in P . (Hint: Define T (n, h) to be the running time of the algorithm on an input of size n
and output size h. Write a recurrence in terms of T (n, h) and establish an important base case of
T (n, 1) ≤ cn, where c is a constant independent of n.)

5 (Output sensitive algorithm-III) In this problem, the goal is to design a fast output-sensitive
algorithm in 3D. Specifically, the goal is to design an O(n log k) time algorithm, where k is the size of
the output. To solve this problem, first read this article(https://www.cse.cuhk.edu.hk/~taoyf/
course/5010/notes/maxima.pdf) which presents an alternate method to design an O(n log k) time
algorithm in 2D. The idea of guessing the output size is extremely clever! Next, adapt this algorithm
for 3D.

6 (Skyline in Rd ) Let T (n, d) be the time taken to compute skyline points in Rd . Then desgin
an O(T (n, d) · log n) time algorithm to compute skyline points in Rd+1 . (Hints: One approach is via
range trees.)

7 (ε-skyline in 2d) In many instances the number of skyline points reported could be very large
which can make it hard for the user to get an “informative summary” of the underlying data.
Therefore, in some scenarios, it would be ideal to guarantee a small output size to the user. This
motivates the study of ε-skyline queries. Here we will relax the notion of domination and tolerate a
buffer in each dimension which checking for domination.
Formally, let P be lying inside a unit cube [0, 1]d . Now a point p ε-dominates another point q if
pi + ε > qi , ∀i ∈ [1, d], where ε is a fixed real number in the range (0, 1). The ε-skyline is a set of
points in P such that they ε-dominate all the points in P .

• Construct an example where the size of ε-skyline almost 1/ε.

• Construct an example where the size of the skyline is large, but the size of ε-skyline is O(1)
(independent of n and ε).

10
p1
p2

p3 p
4

Figure 2.5: An input consisting of four points p1 = (0.1, 0.9), p2 = (0.15, 0.81), p3 = (0.8, 0.3), and
p4 = (0.85, 0.21). The value of ε = 0.1. The skyline would consist of {p1 , p2 , p3 , p4 }. However, one
possible ε-skyline would be {p1 , p3 }. The other possible ε-skyline would be {p2 , p4 }.

• Prove that in 2D one can construct an ε-skyline of size O(1/ε).

• Describe an algorithm to compute the ε-skyline in O(n log 1ε ) time.

2.6 Open problems


(ε-skyline in higher dimensions) The size of the ε-skyline in Rd (d ≥ 3) is not known. Is the
size of the ε-skyline O(1/ε) in any Rd ? Assume d is a constant and the dependency on d is hidden
in the big-Oh. Or, is there an input instance on which it can be shown that the size of any ε-skyline
is ω(1/ε)?

TODO: Afshani’s zig-zag sweep?, Instance-optimal algorithms in 2d and 3d, Open problems.

11

You might also like