0% found this document useful (0 votes)

18 views248 pages

connecting-discrete-mathematics-and-computer-science-instructor-solution-manual-solutions-2nbsped-1009150499-9781009150491_compress

The document is a solution manual for exercises related to basic data types in discrete mathematics and computer science. It covers various topics including arithmetic operations, logarithms, and modular arithmetic, providing detailed solutions and explanations for each exercise. The content is structured in a way that facilitates understanding of mathematical concepts applied in computer science.

Uploaded by

siamibne1512

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views248 pages

connecting-discrete-mathematics-and-computer-science-instructor-solution-manual-solutions-2nbsped-1009150499-9781009150491_compress

Uploaded by

siamibne1512

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 248

Connecting Discrete Mathematics

and Computer Science

Solution Manual

This version: January 21, 2022

David Liben-Nowell
2 Basic Data Types

2.2 Booleans, Numbers, and Arithmetic

2.1 112 and 201

2.2 111 and 201

2.3 18 and 39

2.4 17 and 38

2.5 Yes: there’s no fractional part in either x or y, so when they’re summed then there’s still no fractional part.

2.6 Yes: if we can write x = a

b
and y = c
d
with b ̸= 0 and d ̸= 0, then we can write x + y as ad+cb
bd
(and bd ̸= 0 because
neither b = 0 nor d = 0).

2.7 No, and here’s a counterexample: π and 1 − π are both irrational numbers, but π + (1 − π) = 1, which is rational.

2.8 6, because ⌊2.5⌋ = 2 and ⌈3.75⌉ = 4 (and 2 + 4 = 6).

2.9 3, because ⌊3.14159⌋ = 3 and ⌈0.87853⌉ = 1 (and 3 · 1 = 3).

2.10 34 = 3 · 3 · 3 · 3 = 81, because ⌊3.14159⌋ = 3 and ⌈3.14159⌉ = 4.

2.11 The functions floor and truncate differ on negative numbers. For any real number x, we have ⌊x⌋ ≤ x—even if x
is negative. That’s not true for truncation. For example, ⌊−3.14159⌋ = −4 but trunc(−3.14159) = −3 .14159 = −3.

2.12 ⌊x + 0.5⌋

2.13 0.1 · ⌊10x + 0.5⌋

2.14 10−k · 10k x + 0.5

2.15 10−k · 10k x

2.16 If x is an integer, then ⌊x⌋ + ⌈x⌉ = 2x; if x isn’t an integer, then ⌊x⌋ + ⌈x⌉ = 2 + 3 for any x in [2, 3]. Thus the
expression x − ⌊x⌋+⌈x⌉
2
is equivalent to x − x = 0 for x = 2 or 3 (yielding values 0 and 0), and x − 2.5 for noninteger x.
So the largest possible value for this expression is 0.4999999 · · · , when x = 2.9999999 · · · .

2.17 By the same logic as in Exercise 2.16, the smallest value of the expression x − ⌊x⌋+⌈x⌉
2
is −0.5 + ϵ, which occurs for
x = 2 + ϵ, for arbitrarily small values of ϵ. For example, the value is −0.4999999 when x = 2.00000001.

2.18 ⌊x⌋

2.19 ⌈x⌉

3
4 Basic Data Types

2.20 ⌈x⌉

2.21 ⌊x⌋

2.22 No: for example, |⌊−3.5⌋| = |−4| = 4, but ⌊|−3.5|⌋ = ⌊3.5⌋ = 3.

2.23 Yes: we have that x − ⌊x⌋ = (x + 1) − ⌊x + 1⌋ because both x and x + 1 have the same fractional part (so both x − ⌊x⌋
and (x + 1) − ⌊x + 1⌋ denote the fractional part of x).

2.24 No: if x = y = 0.5, then ⌊x⌋ + ⌊y⌋ = ⌊0.5⌋ + ⌊0.5⌋ = 0 + 0 = 0, but ⌊x + y⌋ = ⌊0.5 + 0.5⌋ = ⌊1⌋ = 1.

2.25 The key observation is that ⌈x⌉ = ⌊x⌋ when x is an integer (and otherwise ⌈x⌉ = ⌊x⌋ + 1). Thus 1 + ⌊x⌋ − ⌈x⌉ is 1
when x is an integer, and it’s 0 when x is not an integer.

2.26 Observe that ⌊ n+1

2
⌋ = ⌈ n2 ⌉: it’s true when n is even, and it’s true when n is odd. Thus there are ⌈ n2 ⌉ − 1 elements less
than the specified entry, and n − (⌈ 2n ⌉ − 1 + 1) = n − ⌈ 2n ⌉ = ⌊ n2 ⌋ elements larger.

2.27 310 is bigger: 310 > 39 = (33 )3 = 273 , which is definitely (much) larger than 103 .

2.28 48 = (22 )8 = 216 = 65,536

2.29 1/216 = 1
65,536
, which is (by calculator) 0.00001525878.

2.30 65,536

2.31 −4 · 65,536 = −262,144

2.32 4, because 44 = 256.

√ √
2.33 4
8 ≈ 1.6818, because 1.68184 ≈ 8.0001. (Alternatively, 4
8 = (23 )1/4 = 23/4 .)
√ √
512 ≈ 4.7568, because 4.75684 ≈ 511.9877. (Alternatively, 83/4 =
4
2.34 4
83 = (29 )1/4 = 29/4 .)

2.35 Undefined (because we don’t include imaginary numbers in this book): there’s no real number x for which x4 = −9.

2.36 log2 8 = 3, because 23 = 8.

2.37 log2 (1/8) = −3, because 2−3 = 1/23 = 1/8.

2.38 log8 2 = 1/3, because 81/3 = 2 (which is true because 23 = 8).

2.39 log1/8 2 = −1/3, because (1/8)−1/3 = 81/3 = 2.

2.40 log10 17 is larger than 1 (because 101 < 17), but log17 10 is smaller than 1 (because 171 > 10). So log10 17 is larger.

2.41 By the definition of logarithms, logb 1 is the real number y such that by = 1. By Theorem 2.8.1, for any real number
b we have b0 = 1, so logb 1 = 0.

2.42 By the definition of logarithms, logb b is the real number y such that by = b. By Theorem 2.8.2, for any real number
b we have b1 = b, so logb b = 1.

2.43 Let q = logb x and r = logb y. By the definition of logarithms, then, bq = x and br = y. Using Theorem 2.8.3, we
have x · y = bq · br = bq+r . Because xy = bq+r , we have by the definition of logs that logb xy = q + r.
2.2 Booleans, Numbers, and Arithmetic 5

2.44 Let q = logb x. By the definition of logarithms, we have bq = x. Thus, raising both sides to the yth power, we have
(bq )y = xy . We can rewrite (bq )y = bqy using Theorem 2.8.4, so xy = bqy . Again using the definition of logarithms,
therefore logb xy = qy. By the definition of q, this value is y · logb x.

2.45 Let q = logc b and r = logb x, so that cq = b and br = x. Then x = (cq )r = cqr by the definition of logs and
Theorem 2.8.4, and therefore logc x = qr = (logc b) · (logb x); rearranging yields the claim.

2.46 We show that b[logb x] = x by showing that logb of the two sides are equal:
h i
logb b[logb x] = [logb x] · logb b by Theorem 2.10.5
= [logb x] · 1 by Theorem 2.10.2
= logb x.

2.47 We show that n[logb a] = a[logb n] by showing that logb of the two sides are equal:
h i
logb n[logb a] = [logb a] · logb n by Theorem 2.10.5

= [logb n] · logb a x · y = y · x for any x and y

h i
= logb a[logb n] . by Theorem 2.10.5, applied “backward”

2.48 The property logb x

y
= logb x − logb y follows directly from the rule for the log of a product:

logb x
y
= logb (x · y−1 ) x
y
= x · y−1 for any x and y
−1
= logb x + logb y by Theorem 2.10.3
= logb x + (−1) · logb y. by Theorem 2.10.5

2.49 The hyperceiling of n can be defined as ⌈n⌉ = 2⌈log2 n⌉

2.50 The number of columns needed to write down n in standard decimal notation is


⌈log10 (n + 1)⌉ if n > 0
1 if n = 0

1 + ⌈log (−n + 1)⌉ if n < 0,
10

where the “1 + ” in the last case accounts for the negative sign, which takes up a column.

2.51 202 mod 2 = 0, because 202 = 0 + 101 · 2.

2.52 202 mod 3 = 1, because 202 = 1 + (67 · 3).

2.53 202 mod 10 = 2, because 202 = 2 + (20 · 10).

2.54 −202 mod 10 = 8, because −202 = 8 + (−21 · 10).

2.55 17 mod 42 = 17, because 17 = 17 + (0 · 42).

2.56 42 mod 17 = 8, because 42 = 8 + (2 · 17).

2.57 17 mod 17 = 0, because 17 = 0 + (1 · 17).

2.58 −42 mod 17 = 9, because −42 = 9 + (−3 · 17).

2.59 −42 mod 42 = 0, because −42 = 0 + (−1 · 42).

2.60 Here is a definition of the behavior of % in Python that’s consistent with the reported behavior:
6 Basic Data Types

• If k > 0, then n mod k is n − k · ⌊ nk ⌋.

• If k < 0, then n mod k is −(−n mod −k) = n − k · ⌊ nk ⌋.
(And Python generates a ZeroDivisionError if k = 0.)

2.61 30

2.62 1

2.63 10

2.64 68

2.65 53

2.66 A solution in Python is shown in Figure S.2.1 on p. 7.

2.67 A solution in Python is shown in Figure S.2.2 on p. 7.

2.68 A solution in Python is shown in Figure S.2.3 on p. 7. The numbers produced as output are: 4, 9, 25, 49, 121, 169, 289,
361, 529, 841, and 961. Note that these output values are 22 , 32 , 52 , 72 , 112 , 132 , . . . , 312 —the squares of all sufficiently
small prime numbers. For a prime number p, the three factors of p2 are 1, p, and p2 (and p2 has no other factor).

2.69 6 + 6 + 6 + 6 + 6 + 6 = 36

2.70 1 + 4 + 9 + 16 + 25 + 36 = 91

2.71 4 + 16 + 64 + 256 + 1024 + 4096 = 5460

2.72 1(2) + 2(4) + 3(8) + 4(16) + 5(32) + 6(64) = 642

2.73 (1 + 2) + (2 + 4) + (3 + 8) + (4 + 16) + (5 + 32) + (6 + 64) = 147

2.74 6 · 6 · 6 · 6 · 6 · 6 = 46,656

2.75 1 · 4 · 9 · 16 · 25 · 36 = 518,400

2.76 22 · 24 · 26 · 28 · 210 · 212 = 242 = 4,398,046,511,104

2.77 1(2) · 2(4) · 3(8) · 4(16) · 5(32) · 6(64) = 1,509,949,440

2.78 (1 + 2) · (2 + 4) · (3 + 8) · (4 + 16) · (5 + 32) · (6 + 64) = 10,256,400

2.79 21 + 42 + 63 + 84 + 105 + 126 = 441

2.80 21 + 40 + 54 + 60 + 55 + 36 = 266

2.81 1 + 6 + 18 + 40 + 75 + 126 = 266

2.82 8 + 14 + 18 + 20 + 20 + 18 + 14 + 8 = 120

2.83 36 + 35 + 33 + 30 + 26 + 21 + 15 + 8 = 204

2.84 44 + 49 + 51 + 50 + 46 + 39 + 29 + 16 = 324
2.2 Booleans, Numbers, and Arithmetic 7

1 # Determine whether a given positive integer $n$ is prime by testing all possible divisors
2 # between 2 and n-1. Use your program to find all prime numbers less than 202.
3
4 def isPrime(n):
5 for d in range(2, n):
6 if n % d == 0:
7 return False
8 return True
9
10 for n in range(2, 202):
11 if isPrime(n):
12 print(n)
13
14 # When executed, the outputted numbers are:
15 # 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103
16 # 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199

Figure S.2.1 A Python program testing whether a given positive integer n is prime.

1 # A perfect number is a positive integer n that has the following property:

2 # n is equal to the sum of all positive integers k < n that evenly divide n.
3 # Write a program that finds the four smallest perfect numbers.
4
5 def perfect(n):
6 '''Is n a perfect number?'''
7 factorSum = 0
8 for d in range(1, n):
9 if n % d == 0:
10 factorSum += d
11 return factorSum == n
12
13 perfect_numbers = []
14 n = 1
15 while len(perfect_numbers) < 4:
16 if perfect(n):
17 perfect_numbers.append(n)
18 n += 1
19 print(perfect_numbers)
20
21 # Output: [6, 28, 496, 8128]

Figure S.2.2 A Python program testing whether a given positive integer n is a perfect number.

1 # Find all integers between 1 and 1000 that are evenly divisible by *exactly three* integers.
2
3 def threeFactors(n):
4 factorCount = 0
5 for d in range(1,n+1):
6 if n % d == 0:
7 factorCount += 1
8 return factorCount == 3
9
10 for n in range(1001):
11 if threeFactors(n):
12 print(n)

Figure S.2.3 A Python program to find all integers between 1 and 1000 that are evenly divisible by exactly three
integers.
8 Basic Data Types

2.85 (11 + 21 + 31 + 41 ) + (12 + 22 + 32 + 42 ) + (13 + 23 + 33 + 43 ) + (14 + 24 + 34 + 44 ), which is equal to

(1 + 2 + 3 + 4) + (1 + 4 + 9 + 16) + (1 + 8 + 27 + 64) + (1 + 16 + 81 + 256),

or 494.

2.3 Sets: Unordered Collections

2.86 Yes, 6 ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f}.

2.87 No, h ∈
/ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f}.

2.88 No, the only elements of H are single characters, so a70e ∈

/ H.

2.89 |H| = 16

2.90 For the set S = {0 + 0, 0 + 1, 1 + 0, 1 + 1, 0 · 0, 0 · 1, 1 · 0, 1 · 1}:

• 0 + 0 and 0 · 0 and 0 · 1 and 1 · 0 are all equal to 0.
• 0 + 1 and 1 + 0 and 1 · 1 are all equal to 1.
• 1 + 1 is equal to 2.
So 0, 1, and 2 are elements of S, but 3 is not.

2.91 |S| = 3: although there are eight different descriptions of elements in Exercise 2.90, only three distinct values result
and so there are exactly three elements in the set.

2.92 4 is in H but not in T, because 4 mod 2 = 0 but 4 mod 3 ̸= 0. Other elements in H but not T are 2, 3, 5, 8, 9, a, b, c,
d, e, and f.

2.93 13 is an element of T, because 13 mod 2 = 1 = 13 mod 3. Other elements in T but not H are 12, 18, and 19.

2.94 The elements of T that are not in S are: 6, 7, 12, 13, 18, and 19.

2.95 The only element of S that is not in T is 2.

2.96 |T| = 8

2.97 {0, 1, 2, 3, 4}

2.98 {}

2.99 {4, 5, 9}

2.100 {0, 1, 3, 4, 5, 7, 8, 9}

2.101 {1, 3, 7, 8}

2.102 {0}

2.103 {1, 2, 3, 6, 7, 8}

2.104 {1, 2, 3, 4, 5, 7, 8, 9}

2.105 {4, 5}
2.3 Sets: Unordered Collections 9

2.106 C − ∼C is just C itself: {0, 3, 6, 9}.

2.107 C − ∼A is {3, 9}, so ∼(C − ∼A) is {0, 1, 2, 4, 5, 6, 7, 8}.

2.108 Yes, it’s possible: if E = Y, then we have E − Y = Y − E = ∅. (But if E ̸= Y, then there’s some element x in one
set but not the other—and if x ∈ E but x ∈
/ Y, then x ∈ E − Y but x ∈
/ Y − E, so E − Y ̸= Y − E. A similar situation occurs
if there’s a y ∈ Y but y ∈
/ E.)

2.109 D ∪ E may be a subset of D: for D1 = E1 = {1}, we have D1 ∪ E1 = {1} ⊆ D; for D2 = ∅ and E2 = {1}, we
have D2 ∪ E2 = {1} ̸⊆ D2 .

2.110 D ∩ E must be a subset of D: every element of D ∩ E is by definition an element of both D and E, which means that
any x ∈ D ∩ E must by definition satisfy x ∈ D.

2.111 D − E must be a subset of D: every element of D − E is by definition an element of D (but not of E), which means
that any x ∈ D − E must by definition satisfy x ∈ D.

2.112 E − D contains no elements in D, so no element x ∈ E − D satisfies x ∈ D—but if E − D is empty, then E − D

is a subset of every set! For D1 = E1 = {1}, we have E1 − D1 = {} ⊆ D; for D2 = ∅ and E2 = {1}, we have
E2 − D2 = {1} ̸⊆ D2 . Thus E − D may be a subset of D.

2.113 ∼D contains no elements in D, so no element x ∈ ∼D satisfies x ∈ D—but if ∼D is empty, then ∼D is a subset

of every set, including D! For D1 = U (where U is the universe), we have ∼D1 = {} ⊆ D1 ; for D2 = ∅, we have
∼D2 = U ̸⊆ D2 . Thus ∼D may be a subset of D.

2.114 Not disjoint: 1 ∈ F and 1 ∈ G.

2.115 Not disjoint: 3 ∈ ∼F and 3 ∈ G.

2.116 Disjoint: F ∩ G = {1} and 1 ∈

/ H.

2.117 Disjoint: by definition, no element is in both a set and the complement of that set.

2.118 S ∪ T is smallest when one set is a subset of the other. Specifically, if S = {1, 2, . . . , n} and T = {1, 2, . . . , m},
then S ∪ T = {1, 2, . . . , max(n, m)} and has cardinality max(n, m).

2.119 S ∩ T is smallest when the two sets are disjoint. Specifically, if S = {1, 2, . . . , n} and T = {−1, −2, . . . , −m}, then
S ∩ T = ∅ and has cardinality 0.

2.120 S − T is smallest when as many elements of S as possible are also in T. Specifically, if S = {1, 2, . . . , n} and
T = {1, 2, . . . , m}, then S − T = {m + 1, m + 2, . . . , n} if n ≥ m + 1, and otherwise has cardinality 0.

2.121 S∪T is largest when the two sets are disjoint. Specifically, if S = {1, 2, . . . , n} and T = {n + 1, n + 2, . . . , n + m},
then S ∪ T = {1, 2, . . . , n + m} and has cardinality n + m.

2.122 S ∩ T is largest when one set is a subset of the other. Specifically, if S = {1, 2, . . . , n} and T = {1, 2, . . . , m}, then
S ∩ T = {1, 2, . . . , min(n, m)} and has cardinality min(n, m).

2.123 S − T is largest when no element T is also in S, so that S − T = S. Specifically, if S = {1, 2, . . . , n} and

T = {−1, −2, . . . , −m}, then S − T = {1, 2, . . . , n} and has cardinality n.

2.124 |A ∩ B| = 2, |A ∩ C| = 1, and |B ∩ C| = 1.

2.125 We have |A ∩ B| = 2, |A ∩ C| = 1, and |B ∩ C| = 1; and furthermore |A ∪ B| = 5, |A ∪ C| = 3, and |B ∪ C| = 4.

Thus the Jaccard similarity of A and B is 52 = 0.4; the Jaccard similarity of A and C is 31 = 0.33; and the Jaccard similarity
of B and C is 14 = 0.25.
10 Basic Data Types

2.126 No. If A = {1} and B = {1, 2, 3} and C = {2, 3, 4}, then A’s closest set is B but B’s closest set is C, not A.

2.127 Still no! If A = {1} and B = {1, 2, 3} and C = {2, 3, 4}, then A’s closest set is B (a Jaccard coefficient of 1
3
for B,
versus 04 for C) but B’s closest set is C (a Jaccard coefficient of 42 ), not A (a Jaccard coefficient of 13 ).

2.128 True. Looking at Venn diagrams for ∼A and ∼B, we see that

A B A B A B
unioned with is

which includes every element not in A ∩ B. Thus ∼A ∪ ∼B contains precisely those elements not in A ∩ B, and therefore
A ∩ B = ∼(∼A ∪ ∼B).

2.129 True. Looking at Venn diagrams for ∼A and ∼B, we see that

A B A B A B
intersected with is

which includes every element not in A ∪ B. Thus ∼A ∩ ∼B contains precisely those elements not in A ∪ B, and therefore
A ∪ B = ∼(∼A ∩ ∼B).

2.130 No, (A − B) ∪ (B − C) and (A ∪ B) − C are not the same. The first set includes elements of A ∩ ∼B ∩ C; the latter
doesn’t:

(A − B) ∪ (B − C) (A ∪ B) − C

A B A B

C C

For example, suppose A = C = {1} and B = ∅. Then (A − B) ∪ (B − C) = ({1} − ∅) ∪ (∅ − {1}) = {1} − ∅ = {1},
but (A ∪ B) − C = ({1} ∪ ∅) − {1} = {1} − {1} = ∅.

2.131 Yes, (B − A) ∩ (C − A) and (B ∩ C) − A are the same. The first set contains any element that’s in B but not A and
also in C but not A—in other words, an element in B and C but not A. That’s just the definition of (B ∩ C) − A:

A B

2.132 There are five ways of partitioning {1, 2, 3}:

• All separate: {{1} , {2} , {3}}.
• 1 alone; 2 and 3 together: {{1} , {2, 3}}.
2.4 Sequences: Ordered Collections 11

• 2 alone; 1 and 3 together: {{2} , {1, 3}}.

• 3 alone; 1 and 2 together: {{3} , {1, 2}}.
• All together: {{1, 2, 3}}.

2.133 A partition that does it is

{{Alice, Bob, Desdemona} , {Charlie} , {Eve, Frank}} .
The ABD set has intracluster distances {0.0, 1.7, 0.8, 1.1}; the EF set has intracluster distances {0.0, 1.9}; and the only
intracluster distance in the C set is 0.0. The largest of these distances is less than 2.0.

2.134 Just put each person in their own subset. The only way x ∈ Si and y ∈ Si is when x = y, so the intracluster distance
is 0.0.

2.135 Again, just put all people in their own subset. The intercluster distance is the largest entry in the table, namely 7.8.

2.136 No, it’s not a partition of S, because the sets are not disjoint. For example, 6 ∈ W and 6 ∈ H.

2.137 {{} , {1} , {a} , {1, a}}

2.138 {{} , {1}}

2.139 {{}}

2.140 P(P(1)) = P({{} , {1}}). This set is the power set of a 2-element set, and so it contains four elements: the
empty set, the two singleton sets, and the set containing both elements:
 
 {} , 
 
{{}} ,
.
 {{1}} ,
 

{{} , {1}}

2.4 Sequences: Ordered Collections

2.141 {⟨1, 1⟩, ⟨1, 4⟩, ⟨1, 16⟩, ⟨2, 1⟩, ⟨2, 4⟩, ⟨2, 16⟩, ⟨3, 1⟩, ⟨3, 4⟩, ⟨3, 16⟩}

2.142 {⟨1, 1⟩, ⟨1, 2⟩, ⟨1, 3⟩, ⟨4, 1⟩, ⟨4, 2⟩, ⟨4, 3⟩, ⟨16, 1⟩, ⟨16, 2⟩, ⟨16, 3⟩}

2.143 {⟨1, 1, 1⟩}

2.144 {⟨1, 2, 1⟩, ⟨1, 2, 4⟩, ⟨1, 2, 16⟩, ⟨1, 3, 1⟩, ⟨1, 3, 4⟩, ⟨1, 3, 16⟩, ⟨2, 2, 1⟩, ⟨2, 2, 4⟩, ⟨2, 2, 16⟩, ⟨2, 3, 1⟩, ⟨2, 3, 4⟩,
⟨2, 3, 16⟩}

2.145 A = {1, 2} and B = {1}.

2.146 T = {2, 4}. Because we are told that ⟨?, 2⟩, ⟨?, 4⟩ ∈ S × T (for some values of ?), we know that 2 ∈ T and 4 ∈ T.
And because, for S = {1, 2, 3, 4, 5, 6, 7, 8}, the set S × {2, 4} contains 16 elements already, there can’t be any other
elements in T.

2.147 T = ∅. If there were any element t ∈ T, then ⟨1, t⟩ ∈ S × T—but we’re told that S × T is empty, so there can’t be
any t ∈ T.

2.148 We can conclude that 3 ∈ T and that 1, 2, 4, 5, 6, 7, 8 ∈

/ T (but we cannot conclude anything about any other possible
elements in T). For an element ⟨x, x⟩ to be in S × T, we need x ∈ S ∩ T; for ⟨x, x⟩ not to be in S × T, we need x ∈
/ S ∩ T.
Thus 3 ∈ T, and 1, 2, 4, 5, 6, 7, 8 ∈
/ T. But T can contain any other element not in S: for example, if T = {3, 9} then
12 Basic Data Types

S × T = {⟨1, 3⟩, . . . , ⟨8, 3⟩, ⟨1, 9⟩, . . . , ⟨8, 9⟩} and T × S = {⟨3, 1⟩, . . . , ⟨3, 8⟩, ⟨9, 1⟩, . . . , ⟨9, 8⟩}. The only element
that appears in both of these sets is ⟨3, 3⟩.

2.149 Every pair in S × T must also be in T × S, and vice versa. This situation can happen in two ways. First, we could
have T = S = {1, 2, . . . , 8}, so that S × T = T × S = S × S. Second, we could have T = ∅, because S × ∅ = ∅ × S = ∅.
Those are the only two possibilities, so either T = S or T = ∅.

2.150 {a, h} × {1}

2.151 {c, f} × {1, 8}

2.152 {a, b, c, d, e, f, g, h} × {2, 7}

2.153 {a, b, c, d, e, f, g, h} × {3, 4, 5, 6}

2.154 There are 27 elements in this set: {⟨0, 0, 0⟩, ⟨0, 0, 1⟩, ⟨0, 0, 2⟩, ⟨0, 1, 0⟩, ⟨0, 1, 1⟩, ⟨0, 1, 2⟩, ⟨0, 2, 0⟩, ⟨0, 2, 1⟩,
⟨0, 2, 2⟩, ⟨1, 0, 0⟩, ⟨1, 0, 1⟩, ⟨1, 0, 2⟩, ⟨1, 1, 0⟩, ⟨1, 1, 1⟩, ⟨1, 1, 2⟩, ⟨1, 2, 0⟩, ⟨1, 2, 1⟩, ⟨1, 2, 2⟩, ⟨2, 0, 0⟩, ⟨2, 0, 1⟩, ⟨2, 0, 2⟩,
⟨2, 1, 0⟩, ⟨2, 1, 1⟩, ⟨2, 1, 2⟩, ⟨2, 2, 0⟩, ⟨2, 2, 1⟩, ⟨2, 2, 2⟩}

2.155 Omitting the angle brackets and commas, the set is: {ACCE, ACDE, ADCE, ADDE, BCCE, BCDE, BDCE, BDDE}.

2.156 Omitting the angle brackets and commas, the set is: {0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111}.

2.157 Σ8

2.158 (Σ − {A, E, I, O, U})5

2.159 Here’s one reasonably compact way of representing the answer: in a sequence of 6 symbols, we’ll allow the ith
symbol to be any element of Σ, while the others must not be vowels. The desired set is:
5 h
[ i
(Σ − {A, E, I, O, U})i × Σ × (Σ − {A, E, I, O, U})5−i .
i=0

(There are many other ways to express the answer too.)

2.160 Here’s one of many ways to express this set:

[
[(Σ − {A, E, I, O, U}) ∪ {v}]6 .
v∈{A,E,I,O,U}

√ √ √
2.161 12 + 32 = 1+9= 10
p √ √
2.162 22 + (−2)2 = 4+4= 8
√ √ √
2.163 42 + 02 = 16 + 0 = 16 = 4

2.164 ⟨1 + 2, 3 + (−2)⟩ = ⟨3, 1⟩

2.165 ⟨3 · (−3), 3 · (−1)⟩ = ⟨−9, −3⟩

2.166 ⟨2 · 1 + 4 − 3 · 2, 2 · 3 + 0 − 3 · (−2)⟩ = ⟨0, 12⟩

√ √ √
2.167 ∥a∥ + ∥c∥ = 10 + 4 ≈ 7.1623, while ∥a + c∥ = ∥⟨5, 3⟩∥ = 52 + 32 = 34 ≈ 5.8310.
√ √ p √
2.168 ∥a∥ + ∥b∥ = 10 + 8 ≈ 5.9907, while ∥a + b∥ = ∥⟨3, −1⟩∥ = 32 + (−1)2 = 10 ≈ 3.1623.
2.4 Sequences: Ordered Collections 13

p √
2.169 3∥d∥ = 3 (−3)2 + (−1)2 = 3 10 ≈ 9.4868.p
√
We get the same value for ∥3d∥ = ∥⟨−9, −3⟩∥ = (−9)2 + (−3)2 = 90 ≈ 9.4868.

2.170 Consider an arbitrary vector x ∈ Rn and an arbitrary scalar a ∈ R. Here’s a derivation to establish that ∥ax∥ = a∥x∥:
sX
∥ax∥ = [(ax)i ]2 definition of length
i
sX
= a2 x2i definition of vector times scalar
i
s X
= a2 x2i a2 is the same for every i
i
s sX
= a2 · x2i Theorem 2.8.5
i
sX
=a x2i definition of square root
i

= a∥x∥. definition of length

2.171 We have ∥x∥ + ∥y∥ = ∥x + y∥ whenever x and y “point in exactly the same direction”—that is, when we can
write x = ay for some scalar a ≥ 0. In this case, we have ∥x∥ + ∥y∥ = ∥x∥ + ∥ax∥ = (1 + a)∥x|, and we also have
∥x + y∥ = ∥x + ax∥ = ∥(1 + a)x∥ = (1 + a)∥x∥. When x ̸= ay, then the two sides aren’t equal.
Visually, the sum of the lengths of the dashed lines in the picture below show ∥x∥ + ∥y∥ (and the dotted lines show
∥y∥ + ∥x∥), while the solid line has length ∥x + y∥. Because the latter “cuts the corner” whenever x and y don’t point in
exactly the same direction, it’s smaller than the former.

x+y

y
x

2.172 1 · 2 + 3 · −2 = 2 − 6 = −4

2.173 1 · −3 + 3 · −1 = −3 − 3 = −6

2.174 4 · 4 + 0 · 0 = 16 + 0 = 16

p
2.175 The Manhattan
√ distance is |1 − 2| + |3 − −2| = 1 + 5 = 6; the Euclidean distance is (1 − 2)2 + (3 − −2)2 =
√
12 + 52 = 26 ≈ 5.099.

√ √ √
2.176 The Manhattan distance is 4 + 4 = 8; the Euclidean distance is 42 + 42 = 32 = 4 2 ≈ 5.6568.

√ √ √
2.177 The Manhattan distance is 2 + 2 = 4; the Euclidean distance is 22 + 22 = 8 = 2 2 ≈ 2.8284.

2.178 The largest possible Euclidean distance is 1—for example, if x = ⟨0, 0⟩ and y = ⟨1, 0⟩, then the Manhattan distance
is 1, and so is the Euclidean distance.
14 Basic Data Types

2.179 The smallest possible Euclidean distance is √12 : if x = ⟨0, 0⟩ and y = ⟨0.5, 0.5⟩, then the Manhattan distance is
indeed 1 = 0.5 + 0.5, while the Euclidean distance is

q q
1 2 1 2 1 √1
2
+ 2
= 2
= 2
.

2.180 The smallest possible Euclidean distance is 1

√
n
—for example, if x = ⟨0, 0, . . . , 0⟩ and y = ⟨ 1n , 1n , . . . , 1n ⟩, then

Manhattan distance = + n1 + · · · +
1
n
1
n
=1
p q
Euclidean distance = n · ( 1n )2 = 1n .

3
2.181
2

−1

−2

−3

−3 −2 −1 0 1 2 3

3
2.182
2

−1

−2

−3

−3 −2 −1 0 1 2 3

2.183 Under Manhattan distance, the point ⟨16, 40⟩ has distance 12 + 2 = 14 from distance 8 + 7 = 15 from s,
√ g and has √
so it’s closer to√
g. n Under Euclidean
√ distance, the point ⟨16, 40⟩ has distance 122 + 22 = 146 ≈ 12.08305 · · · from
g and distance 82 + 72 = 113 ≈ 10.6301 · · · from s, so it’s closer to s.

2.184 The Manhattan distance from the point ⟨8 + δ, y⟩ to s is δ + |33 − y|. The Manhattan distance from this point to g
is 4 + δ + |42 − y|. Because δ appears in both formulas, the point is closer to g exactly when 4 + |42 − y| < |33 − y|.

• If y < 33, then these distances are 4 + 42 − y and 33 − y; the distance to g is always larger.
• If 33 ≤ y < 42, then these distances are 4 + 42 − y and y − 33; the former is smaller when 4 + 42 − y < y − 33,
which, solving for y, occurs when 2y > 79. The distance to g is smaller when y > 39.5.
• If y > 42, then these distances are 4 + y − 42 and y − 33; the distance to g is always smaller.

Thus s is closer if y < 39.5 and g is closer if y > 39.5 (and they’re equidistant if y = 39.5).

2.185 A point ⟨4 − δ, y⟩ is 4 units closer to g than to s in the horizontal direction. Thus, unless the point is 4 or more units
closer to g than to s vertically, g is closer. To be 4 units closer vertically means |42 − y| < 4 + |33 − y|, which happens
exactly when y < 35.5.

2.186 3|x − 8| + 1.5|y − 33|

2.187 An image of the Voronoi diagram is shown below. The three different colors represent the three regions; the dashed
lines represent the bisectors of each pair of points.
2.4 Sequences: Ordered Collections 15

2.188 As before, the three colors represent the three regions; the dashed lines represent the bisectors of each pair of points:

2.189 Here is an image of the diagram (with notation as in the previous two solutions):

2.190 For a pair of points p = ⟨p1 , p2 ⟩ and q = ⟨q1 , q2 ⟩, we will need to be able to find the line that bisects them:

• The midpoint of p and q is on that line—namely the point ⟨ p1 +q 2

, 2 ⟩.
1 p2 +q2

−(p1 +q1 )
• The slope of the bisecting line is the negative reciprocal of the slope of the line joining p and q—that is, p2 +q2
.

Given these facts, the algorithm is fairly straightforward: given three points p, q, r, we find the bisector for each pair of
points. The portion of the plane closest to p is that portion that’s both on p’s side of the p–q bisector and on p’s side of
the p–r bisector. The other regions are analogous.
Coding this in a programming language requires some additional detail in representing lines, but the idea is just as
described here. A solution in Python is shown in Figure S.2.4, on p. 16.

2.191 6 by 3

2.192 6
16 Basic Data Types

1 import sys
2
3 class Point:
4 def __init__(self, x, y):
5 self.x = x
6 self.y = y
7
8 def getX(self): return self.x
9 def getY(self): return self.y
10
11 def midpoint(pointA, pointB):
12 '''Finds the point halfway between two Points.'''
13 return Point((pointA.getX() + pointB.getX()) / 2, (pointA.getY() + pointB.getY()) / 2)
14
15 def bisectorSlope(pointA, pointB):
16 '''Finds the slope of the bisector between two Points.'''
17 return -(pointA.getX() - pointB.getX()) / (pointA.getY() - pointB.getY())
18
19 def bisectorRegion(pointA, pointB):
20 '''Returns a representation of the half plane on a's side of bisector between a and b.'''
21 halfway = midpoint(pointA, pointB)
22
23 # Case I: the bisector of {a,b} is a vertical line.
24 if pointA.getY() == pointB.getY():
25 if pointA.getX() < halfway.getX(): # a is left of the bisector
26 return "x < %.2f" % halfway.getX()
27 else:
28 return "x > %.2f" % halfway.getX()
29
30 # Case II: the bisector of {a,b} is a horizontal line.
31 slope = bisectorSlope(pointA, pointB)
32 if slope == 0:
33 if pointA.getY() < halfway.getY(): # a is below the bisector
34 return "y < %.2f" % halfway.getY()
35 else:
36 return "y > %.2f" % halfway.getY()
37
38 # Case III: the bisector of {a,b} has nonzero, noninfinite slope.
39 yIntercept = - slope * halfway.getX() + halfway.getY()
40 if pointA.getY() < pointB.getY(): # a is below the bisector
41 return "y < %.2f*x + %.2f" % (slope, yIntercept)
42 else:
43 return "y > %.2f*x + %.2f" % (slope, yIntercept)
44
45 def voronoiRegion(ego, alterA, alterB):
46 '''Returns a representation of the portion of the plane closer to a than to either b or c.'''
47 return bisectorRegion(ego, alterA) + " and " + bisectorRegion(ego, alterB)
48
49 # Usage: python voronoi_shareable.py 0 0 4 5 3 1 [for points (0,0) and (4,5) and (3,1)]
50 # [error-checking code omitted for brevity.]
51 a = Point(float(sys.argv[1]), float(sys.argv[2]))
52 b = Point(float(sys.argv[3]), float(sys.argv[4]))
53 c = Point(float(sys.argv[5]), float(sys.argv[6]))
54
55 print(voronoiRegion(a, b, c))
56 print(voronoiRegion(b, a, c))
57 print(voronoiRegion(c, a, b))

Figure S.2.4 Python code for Voronoi diagrams.

2.4 Sequences: Ordered Collections 17

2.193 ⟨4, 1⟩, ⟨5, 1⟩, and ⟨7, 3⟩

 
9 27 6
0 27 24
 
18 6 0
2.194 3M = 
21 15 15
21 6 12
3 18 21
     
0 8 0 7 2 7 7 10 7
2.195 9 6 0  + 3 5 6 = 12 11 6
2 3 3 1 2 5 3 5 8

2.196 Undefined; the dimensions of B and F don’t match.

3 1 8 4 11 5
2.197 + =
0 8 3 2 3 10

     
0 8 0 0 8 0 0 16 0
2.198 9 6 0  + 9 6 0 = 18 12 0
2 3 3 2 3 3 4 6 6

3 1 −6 −2
2.199 −2 · =
0 8 0 −16

1 2 9 0.5 1 4.5
2.200 0.5 · =
5 4 0 2.5 2 0

    
0 8 0 5 8 56 40
2.201 9 6 0 7 5 = 87 102
2 3 3 3 2 40 37

    
0 8 0 7 2 7 24 40 48
2.202 9 6 0 3 5 6 = 81 48 99
2 3 3 1 2 5 26 25 47

2.203 Undefined: A has three columns but F has only two rows, so the dimensions don’t match up.

2.204 Undefined: B has two columns but C has three rows, so the dimensions don’t match up.

3 1 8 4 27 14
2.205 =
0 8 3 2 24 16

8 4 3 1 24 40
2.206 =
3 2 0 8 9 19

     
1 0 0 0 0 0 .25 0 0
2.207 0.25 1 0 0 + 0.75 0 1 0 = .25 .75 0
1 1 0 1 1 1 1 1 .75

     
1 0 0 0 0 0 .5 0 0
2.208 0.5 1 0 0 + 0.5 0 1 0 = .5 .5 0
1 1 0 1 1 1 1 1 .5
18 Basic Data Types

2.209 Here’s one example (there are many more):

   
0 0 0 1 0 0
G = 0
′
0 0 and H = 1
′
1 0
1 1 0 1 1 1

2.210 Here’s a solution in Python:

1 # This function uses an image-processing library that supports the following operations:
2 # Image(w, h) --> create a new empty image with width w and height h
3 # image.getWidth() --> the width of the image (in pixels)
4 # image.getHeight() --> the height of the image (in pixels)
5 # image.getPixel(x, y) --> the color value of the pixel value at coordinate (x, y)
6 # image.setPixel(x, y, color) --> change the pixel at coordinate (x, y) to have the given color
7
8 def average(imageA, imageB, lambda):
9 width = imageA.getWidth()
10 height = imageA.getHeight()
11 if lambda < 0 or lambda > 1
12 or imageB.getWidth() != width or
13 or imageB.getHeight() != height:
14 return "Error!"
15 newImage = Image(width, height)
16 for x in range(height):
17 for y in range(width):
18 avg = lambda * imageA.getPixel(x, y) + (1 - lambda) * imageB.getPixel(x, y)
19 newImage.setPixel(x, y, avg)
20 return newImage

2.211 Let A be an m-by-n matrix, and I be the n-by-n identity matrix. We must show that AI is identical to A. Here’s a
derivation showing that the ⟨i, j⟩th entries of the two matrices are the same:

X
n
(AI)i,j = Ai,k Ik,j definition of matrix multiplication
k=1
(
X
n
0 if k ̸= j
= Ai,k · definition of I
k=1
1 if k = j
= Ai,j · 1 simplfying: there is only one nonzero term in the sum (namely for k = j)
= Ai,j .

Thus for any indices i and j we have that (AI)i,j = Ai,j .

2 3 2 3 2 3 2 3 7 9 23 30
2.212 = =
1 1 1 1 1 1 1 1 3 4 10 13

2 1
2.213
1 1

2
2 1 5 3
2.214 =
1 1 3 2

2
5 3 1 1 34 21 1 1 55 34
2.215 · = · =
3 2 1 0 21 13 1 0 34 21

2.216 We’re given that y + w = 0 and 2y + w = 1. From the former, we know y = −w; plugging this value for y into the
latter yields −2w + w = 1, so we have w = −1 and y = 1.
2.5 Functions 19

From x + z = 1 and 2x + z = 0, similarly, we know z = 1 − x and thus that 2x + 1 − x = 0. Therefore x = −1 and

z = 2. The final inverse is therefore

−1 1
.
2 −1

1 1 −1 1 1 0
And to verify the solution: indeed, we have · = , as desired.
2 1 2 −1 0 1

−2 1
2.217
1.5 −0.5

0 1
2.218
1 0

1 0
2.219
0 1

x y 1 1 1 1 x y 1 0
2.220 Suppose that the matrix were the inverse of —that is, suppose · = .
z w 1 1 1 1 z w 0 1
Then we’d need x + z = 1 (to make the ⟨1, 1⟩st entry of the product correct) and x + z = 0 (to make the ⟨1, 2⟩th entry of
the product correct). But it’s not possible for x + z to simultaneously equal 0 and 1!

2.221 [0, 0, 0, 0, 0, 0, 0]

2.222 [0, 1, 1, 0, 0, 1, 1]

2.223 [1, 0, 0, 1, 1, 0, 0]

2.5 Functions
2.224 f(3) = (32 + 3) mod 8 = 12 mod 8 = 4

2.225 f(7) = (72 + 3) mod 8 = 52 mod 8 = 4

2.226 f(x) = 3 when x = 0 and when x = 4.

2.227 x f(x)
0 3
1 4
2 7
3 4
4 3
5 4
6 7
7 4

n
2.228 quantize(n) = 52 · 52
+ 28

2.229 The domain is {0, 1, . . . , 255} × {1, . . . , 256}. The range is {0, 1, . . . , 255}: all output colors are possible, because
for any n ∈ {0, 1, . . . , 255} we have that quantize(n, 256) = n.
20 Basic Data Types

2.230 The step size is ⌈ 256

k
⌉, so we have
$ %
256
256 n
quantize(n, k) = · 256 + k .
k k
2

Note that this expression will do something a little strange for large k: for example, when k = 200, we have ⌈ 256
k
⌉ = 2—so
the first 127 quanta correspond to colors {1, 3, 5, . . . , 255}, and the remaining 73 quanta are never used (because they’re
bigger than 256).

2.231 A function quantize(n) : {0, 1, . . . , 255} → {a1 , a2 , . . . , ak } can be c-to-1 only if k | 256—that is, if k evenly
divides 256. (For example, there’s no way to quantize 256 colors into three precisely equal pieces, because 256 is not
evenly divisible by 3.) So k must be a power of two; specifically, we have to have k ∈ {1, 2, 4, 8, 16, 32, 64, 128, 256}.

2.232 Here’s a solution in Python:

1 # This function uses an image-processing library that supports the following operations:
2 # image.copyGray() --> create a [grayscale] copy of the given image
3 # image.getWidth() --> the width of the image (in pixels)
4 # image.getHeight() --> the height of the image (in pixels)
5 # image.getPixel(x, y) --> the color value of the pixel value at coordinate (x, y)
6 # image.setPixel(x, y, color) --> change the pixel at coordinate (x, y) to have the given color
7
8 def quantize(image, k):
9 quantizedImage = image.copyGray()
10 stepSize = int(math.ceil(256 / k))
11 for x in range(image.getWidth()):
12 for y in range(image.getHeight()):
13 color = (stepSize // 2) + stepSize * (image.getPixel(x,y) // stepSize)
14 quantizedImage.setPixel(x, y, color)
15 return quantizedImage

2.233 The domain is R; the range is R≥0 .

2.234 The domain is R; the range is Z.

2.235 The domain is R; the range is R>0 .

2.236 The domain is R>0 ; the range is R.

2.237 The domain is Z; the range is {0, 1}.

2.238 The domain is Z≥2 ; the range is {0, 1, 2}.

2.239 The domain is Z × Z≥2 ; the range is Z≥0 .

2.240 The domain is Z; the range is {True, False}.

S
2.241 The domain is n∈Z≥1 Rn ; the range R≥0 .

2.242 The domain is R; the range is the set of all unit vectors in R2 —that is, the set x ∈ R2 : ∥x∥ = 1 .

2.243 The function add can be written as add(⟨h, m⟩, x) = ⟨[(h + m + 60x − 1) mod 12] + 1, m + x mod 60⟩. (The
hours component is a little more complicated to write because of 0-based indexing vs. 1-based indexing: the hour is
represented as [x − 1 mod 12] + 1 because anything mod 12 is a value between 0 and 11, and we need the hours value
to be between 1 and 12.)

2.244 (f ◦ f)(x) = f(f(x)) = x mod 10

2.5 Functions 21

2.245 (h ◦ h)(x) = h(h(x)) = 4x

2.246 (f ◦ g)(x) = f(g(x)) = x + 3 mod 10

2.247 (g ◦ h)(x) = g(h(x)) = 2x + 3

2.248 (h ◦ g)(x) = h(g(x)) = 2(x + 3) = 2x + 6

2.249 (f ◦ h)(x) = f(h(x)) = 2x mod 10

2.250 (f ◦ g ◦ h)(x) = f(g(h(x))) = 2x + 3 mod 10

2.251 (g ◦ h)(x) = g(h(x)) = 2 · h(x), so we need h(x) = 32 x + 21 .

2.252 (h ◦ g)(x) = h(g(x)) = h(2x), so we need h(2x) = 3x + 1 for every x. That’s true when h(x) = 23 x + 1.

2.253 Yes, f(x) = x is onto; every output value is hit.

2.254 The function f(x) = x2 mod 4 has the values

f(0) = 02 mod 4 = 0 mod 4 = 0

f(1) = 12 mod 4 = 1 mod 4 = 1
f(2) = 22 mod 4 = 4 mod 4 = 0
f(3) = 32 mod 4 = 9 mod 4 = 1
Thus there is no x such that f(x) = 2 or f(x) = 3, so the function is not onto.

2.255 For the function f : {0, 1, 2, 3} → {0, 1, 2, 3} defined as f(x) = (x2 − x) mod 4, the values are

f(0) = (02 − 0) mod 4 = 0 − 0 mod 4 = 0

f(1) = (12 − 1) mod 4 = 1 − 1 mod 4 = 0
f(2) = (22 − 2) mod 4 = 4 − 2 mod 4 = 2
f(3) = (32 − 3) mod 4 = 9 − 3 mod 4 = 2
There’s no x such that f(x) = 1 or f(x) = 3, so the function is not onto.

2.256 Yes, this function f : {0, 1, 2, 3} → {0, 1, 2, 3}, which we could also have written as f(x) = 3 − x, is onto; every
output value is hit.

2.257 There’s no x such that f(x) = 0 or f(x) = 3, so the function is not onto.

2.258 For the function f : {0, 1, 2, 3} → {0, 1, . . . , 7}, we have

f(0) = 02 mod 8 = 0 mod 8 = 0

f(1) = 12 mod 8 = 1 mod 8 = 1
.
f(2) = 22 mod 8 = 4 mod 8 = 4
f(3) = 32 mod 8 = 9 mod 8 = 1
This function is not one-to-one, because f(1) = f(3).

2.259 For f(x) = x3 mod 8 as a function f : {0, 1, 2, 3} → {0, 1, . . . , 7}, we have

f(0) = 03 mod 8 = 0 mod 8 = 0

f(1) = 13 mod 8 = 1 mod 8 = 1
f(2) = 23 mod 8 = 8 mod 8 = 0
f(3) = 33 mod 8 = 27 mod 8 = 7
This function is not one-to-one, because f(0) = f(2).
22 Basic Data Types

2.260 For f(x) = (x3 − x) mod 8 as a function f : {0, 1, 2, 3} → {0, 1, . . . , 7}, we have
f(0) = (03 − 0) mod 8 = 0 mod 8 = 0
f(1) = (13 − 1) mod 8 = 0 mod 8 = 0
f(2) = (23 − 2) mod 8 = 6 mod 8 = 6
f(3) = (33 − 3) mod 8 = 24 mod 8 = 0
This function is not one-to-one, because f(0) = f(1) = f(3).

2.261 For f(x) = (x3 + 2x) mod 8 as a function f : {0, 1, 2, 3} → {0, 1, . . . , 7}, we have
f(0) = (03 + 0) mod 8 = 0 mod 8 = 0
f(1) = (13 + 2) mod 8 = 3 mod 8 = 3
f(2) = (23 + 4) mod 8 = 12 mod 8 = 4
f(3) = (33 + 6) mod 8 = 33 mod 8 = 1
This function is one-to-one, as no output is hit more than once.

2.262 This function is not one-to-one, because f(1) = f(3).

2.263 Every element except A[1] (which is the root) has a parent. An element i is a parent only if it has a left child, which
occurs when 2i ≤ n. Thus parent : {2, 3, . . . , n} → {1, 2, . . . , ⌊ 2n ⌋}, and:
• The domain is {2, 3, . . . , n}.
• The range is {1, 2, . . . , ⌊ n2 ⌋}.
• The function parent is not one-to-one: for example, parent(2) = parent(3) = 1.

2.264 An element i has a left child when 2i ≤ n and a right child when 2i + 1 ≤ n. Every element except A[1] is the child
of some other element; even-numbered elements are left children and odd-numbered elements are right children. Thus:
• left : {1, 2, . . . , ⌊ n2 ⌋} → {k ≥ 2 : 2 | k and k ≤ n}.
The domain is {1, 2, . . . , ⌊ 2n ⌋} and the range is the set of even integers {2, 4, . . . , n} [or n − 1 if n is odd].
• right : {1, 2, . . . , ⌊ n−1 2
⌋} → {k ≥ 2 : 2 ̸ | k and k ≤ n}.
The domain is {1, 2, . . . , ⌊ n−1 2
⌋} and the range is the set of odd integers {3, 5, . . . , n} [or n − 1 if n is even].
Both left and right are one-to-one: if i ̸= i′ , then 2i ̸= 2i′ and 2i + 1 ̸= 2i′ + 1. Thus there are no two distinct elements i
and i′ that have left(i) = left(i′ ) or right(i) = right(i′ ).

2.265 (parent ◦ left)(i) = parent(left(i)) = ⌊ 2i2 ⌋ = ⌊i⌋ = i.

In other words, parent ◦ left is the identity function: for any i, we have (parent ◦ left)(i) = i. In English, this
composition asks for the parent of the left child of a given index—but the parent of i’s left child simply is i!

2.266 (parent ◦ right)(i) = parent(right(i)) = ⌊ 2i+1 2

⌋ = ⌊i + 12 ⌋ = i. In other words, parent ◦ right is the identity
function: for any i, we have (parent ◦ right)(i) = i. In English, this composition asks for the parent of the right child of
a given index—but, just as for parent ◦ left—the parent of i’s right child simply is i.

2.267 (left ◦ parent)(i) = left(parent(i)) = 2 ⌊ 2i ⌋ for any i ≥ 2. In other words, this function is


i if 2 | i
(left ◦ parent)(i) = i − 1 if 2 ̸ | i

undefined if i = 1.

In English, this composition asks for the left child of a given index’s parent. An index that’s a left child is the left child of
its parent; for an index that’s a right child, the left child of its parent is its left-hand sibling.

2.268 (right ◦ parent)(i) = right(parent(i)) = 2 ⌊ 2i ⌋ + 1 for any i ≥ 2. In other words, this function is


i + 1 if 2 | i
(right ◦ parent)(i) = i if 2 ̸ | i

undefined if i = 1.
2.5 Functions 23

In English, this composition asks for the right child of a given index’s parent. An index that’s a right child is the right
child of its parent; for an index that’s a left child, the right child of its parent is its right-hand sibling.

2.269 f−1 (y) = y−1

√
2.270 g−1 (y) = 3 y

2.271 h−1 (y) = log3 y

2.272 This function isn’t one-to-one (it can’t be: there are 24 different inputs and only 12 different outputs—for example,
f(7) = f(19)), so it doesn’t have an inverse.

2.273 The degree is 3.

2.274 The degree is 3.

2.275 The degree is 3; the polynomial is p(x) = 4x4 − (2x2 − x)2 = 4x4 − 4x4 + 4x3 − x2 = 4x3 − x2 .

2.276 The largest possible degree is still 7; the smallest possible degree is 0 (if p and q are identical—for example, for
p(x) = x7 and q(x) = x7 , we have p(x) − q(x) = 0).

P P
2.277 The largest possible degree is 14, and that’s the smallest, too! If p(x) = 7i=0 ai xi and q(x) = 7i=0 bi xi , where
a7 ̸= 0 and b7 ̸= 0, then there’s a term a7 b7 x14 in the product. And neither a7 nor b7 can equal zero, because p and q are
both polynomials with degree 7.

P7 P7
2.278 The largest possible degree is 49, and that’s the smallest, too! If p(x) = i=0 ai xi and q(x) = i=0 bi xi , where
a7 ̸= 0 and b7 ̸= 0, then we have
P
7 i
p(q(x)) = p i=0 bi x
X7 P j
7
= aj i=0 bi xi
j=0
P 7
= a7 7
i=0 bi x
i
+ [terms of degree ≤ 42] all the terms when j ≤ 7 have exponent ≤ 7 · 6 = 42
7
= a7 b77 x7 + [terms of degree ≤ 6] + [terms of degree ≤ 42]

= a7 b77 (x7 )7 + [terms of degree ≤ 42] + [more terms of degree ≤ 42]

= a7 b77 x49 + [many terms of degree ≤ 42]

Because a7 ̸= 0 and b7 ̸= 0, then, this polynomial has degree 49.

2.279 p(x) = x2 + 1 (there’s no value of x ∈ R for which 1 + x2 = 0, because we’d need x2 = −1).

2.280 p(x) = x2 (which has a single root at x = 0).

2.281 p(x) = x2 − 1 (which has roots at x = −1 and x = 1).

2.282 Note that if L has an even number of elements, then there are two values “in the middle” of the sorted order. The
lower median is the smaller of the two; the upper median is the larger of the two. We’ll find the lower median in the case
of an even-length array.
Here’s a simple algorithm (though there are much faster solutions!), making use of findMaxIndex from Figure 2.57
(and the analogous findMinIndex, which is a small modification of findMaxIndex [changing the > to < in Line 3]):
24 Basic Data Types

findMedian(L):
Input: A list L with n ≥ 1 elements L[1], . . . , L[n].
Output: An index i ∈ {1, 2, . . . , n} such that L[i] is the (lower) median value in L

1 while |L| > 2:

2 maxIndex := findMaxIndex(L)
3 minIndex := findMinIndex(L)
4 delete L[minIndex] and L[maxIndex]
5 return L[1]
(We could have found the upper median, instead of the lower median, by returning L[2] in the case that |L| = 2.)
3 Logic

3.2 An Introduction to Propositional Logic

3.1 False: 22 + 32 = 4 + 9 = 13 ̸= 16 = 42 .

3.2 False: the binary number 11010010 has the value 2 + 16 + 64 + 128 = 210, not 202.

3.3 True: in repeated iterations, the value of x is 202, then 101, then 50, then 25, then 12, then 6, and then 3. When x is 3,
then we do one last iteration and set x to 1, and the loop terminates.

3.4 x ** y is valid Python if and only if x and y are both numeric values: r ⇔ u ∧ v.

3.5 x + y is valid Python if and only if x and y are both numeric values, or they’re both lists: p ⇔ (u ∧ v) ∨ (w ∧ z).

3.6 x * y is valid Python if and only if x and y are both numeric values, or if one of x and y is a list and the other is
numeric: q ⇔ (u ∧ v) ∨ (u ∧ z) ∨ (v ∧ w).

3.7 x * y is a list if x * y is valid Python and x and y are not both numeric values: s ⇐ q ∧ ¬(u ∧ v).

3.8 If x + y is a list, then x * y is not a list: s ⇒ ¬t.

3.9 x + y and x ** y are both valid Python only if x is not a list: p ∧ r ⇒ ¬w.

3.10 She should answer “yes”: p ⇒ q is true whenever p is false, and “you’re over 55 years old” is false for her. (That is,
False ⇒ True is true.)
P Q
3.11 To write
Vn the solution more compactly, we’ll use Wn notation for ands and ors that’s similar to and notation: we
will write i=1 pi to mean p1 ∧ p2 ∧ · · · ∧ pn , and i=1 pi to mean p1 ∨ p2 ∨ · · · ∨ pn .
To express “at least 3 of {p1 , . . . , pn } are true,” we’ll require that (for some values of i and j and k) we have that pi , pj ,
and pk are all true. We’ll then take the “or” over all the values of i, j, and k, where i < j < k. Formally, the proposition is
_
n _
n _
n
(pi ∧ pj ∧ pk ).
i=1 j=i+1 k=j+1

3.12 The easiest way to write “at least n − 1 of {p1 , . . . , pn } are true” is to say that, for every two variables from
{p1 , . . . , pn }, at least one of the two is true. Using the same notation as in Exercise 3.11, we can write this proposition as
^
n ^
n
(pi ∨ pj ).
i=1 j=i+1

3.13 The identity of ∨ is False: x ∨ False ≡ False ∨ x ≡ x.

3.14 The identity of ∧ is True: x ∧ True ≡ True ∧ x ≡ x.

3.15 The identity of ⇔ is True: x ⇔ True ≡ True ⇔ x ≡ x.

25
26 Logic

3.16 The identity of ⊕ is True: x ⊕ False ≡ False ⊕ x ≡ x.

3.17 The zero of ∨ is True: x ∨ True ≡ True ∨ x ≡ True.

3.18 The zero of ∧ is False: x ∧ False ≡ False ∧ x ≡ False.

3.19 The operator ⇔ doesn’t have a zero. Because True ⇔ x ≡ x, the proposition True ⇔ x is true when x = True and
it’s false when x = False. Similarly, because True ⇔ x ≡ ¬x, the proposition False ⇔ x is false when x = True and it’s
true when x = False. So neither True nor False is a zero for ⇔.

3.20 The operator ⊕ doesn’t have a zero: there’s no value z such that z ⊕ True ≡ z ⊕ False, so no z is a zero for ⊕.

3.21 The left identity of ⇒ is True; the right identity is False: p ⇒ False ≡ p and True ⇒ p ≡ p.

3.22 The right zero of ⇒ is True: p ⇒ True ≡ True. But there is no left zero for implies: True ⇒ p is equivalent to p,
not to True; and False ⇒ p is equivalent to True, not to False.

3.23 x * y is equivalent to x ∧ y.

3.24 x + y is equivalent to x ∨ y.

3.25 1 - x is equivalent to ¬x and 1 - y is equivalent to ¬y. Addition is the equivalent of a disjunction, so 2 - x - y

is equivalent to ¬x ∨ ¬y.

3.26 (x * (1 - y)) + ((1 - x) * y) is equivalent to (x ∧ ¬y) ∨ (¬x ∧ y), which is equivalent to x ⊕ y.

3.27 (p1 + p2 + p3 + · · · + pn ) ≥ 3

3.28 (p1 + p2 + p3 + · · · + pn ) ≥ n − 1

3.29 x3 : x is greater than or equal to 8 if and only if x’s binary representation is 1???.

3.30 ¬x0 ∧ ¬x1 : x is evenly divisible by 4 if and only if x’s binary representation is ??00.

3.31 The value of x is a + b, where a = 4x2 + 1x0 and b = 8x3 + 2x1 . It takes a bit of work to persuade yourself of this
fact, but it turns out that x is divisible by 5 if and only if both a and b are divisible by 5. Thus the proposition is:

(x0 ⇔ x2 ) ∧ (x1 ⇔ x3 ) .
a is divisible by 5 b is divisible by 5

You can verify that this proposition is correct with a truth table.

3.32 There are two values of x that are evenly divisible by 9: x = 0 and x = 9. Written in binary, those values are 0000
and 1001. Thus x is evenly divisible by 9 if the middle two bits are both zero, and the outer two bits match:
(x0 ⇔ x3 ) ∧ ¬x1 ∧ ¬x2 .

3.33 There are several ways to express that x is an exact power of two. Here’s one, which simply lists the four different
values that are powers of two (1000, 0100, 0010, and 0001, respectively):
[¬x0 ∧ ¬x1 ∧ ¬x2 ∧ x3 ] ∨ [¬x0 ∧ ¬x1 ∧ x2 ∧ ¬x3 ] ∨ [¬x0 ∧ x1 ∧ ¬x2 ∧ ¬x3 ] ∨ [x0 ∧ ¬x1 ∧ ¬x2 ∧ ¬x3 ]
Another version is to say that at least one bit is 1 but that no two bits are 1:
(x1 ∨ x2 ∨ x3 ∨ x4 ) ∧ (¬x1 ∨ ¬x2 ) ∧ (¬x1 ∨ ¬x3 ) ∧ (¬x1 ∨ ¬x4 ) ∧ (¬x2 ∨ ¬x3 ) ∧ (¬x2 ∨ ¬x4 ) ∧ (¬x3 ∨ ¬x4 ).

3.34 (x0 ⇔ y0 ) ∧ (x1 ⇔ y1 ) ∧ (x2 ⇔ y2 ) ∧ (x3 ⇔ y3 )

3.3 Propositional Logic: Some Extensions 27

3.35 Expressing x ≤ y is a bit more complicated than expressing equality. The following proposition does so by writing
out “x and y are identical before the ith position, and furthermore xi < yi ” for each possible index i:

(y3 ∧ ¬x3 )
∨ [(y3 ⇔ x3 ) ∧ (y2 ∧ ¬x2 )]
∨ [(y3 ⇔ x3 ) ∧ (y2 ⇔ x2 ) ∧ (y1 ∧ ¬x1 )]
∨ [(y3 ⇔ x3 ) ∧ (y2 ⇔ x2 ) ∧ (y1 ⇔ x1 ) ∧ (y0 ∨ ¬x0 )].

(The last disjunct also permits y0 = x0 when x1,2,3 = y1,2,3 .)

3.36 The key point is that, when written in binary, doubling a number simply involves shifting it one position to the left.
Thus:

¬y3 we need y ≤ 8 (otherwise else 2y > 16), and also:

∧¬x0 (1) we need x to be even, and
∧(y2 ⇔ x3 ) ∧ (y1 ⇔ x2 ) ∧ (y0 ⇔ x1 ). (2) the other bits to match

3.37 The only values of x and y that satisfy xy = yx are either those for which x = y, or {x, y} = {2, 4}, as 24 = 16 = 42 .
Thus:

(x0 ⇔ y0 ) ∧ (x1 ⇔ y1 ) ∧ (x2 ⇔ y2 ) ∧ (x3 ⇔ y3 ) either x = y,

∨ ([¬x3 ∧ ¬y3 ] ∧ [x1 ⊕ x2 ] ∧ [y1 ⊕ y2 ] ∧ [¬x0 ∧ ¬y0 ]) or {x, y} = {2, 4}.

(Because 2 and 4 are represented as 0010 or 0100 in binary, we need the 0th and 3rd bits to be zero, and exactly one of
the middle bits to be one.)

3.38 Given a 4-bit number x written in binary, let y be (x + 1) mod 16. Then:

y0 = ¬x0
y1 = (¬x0 ∧ x1 ) ∨ (x0 ∧ ¬x1 )
y2 = ((¬x0 ∨ ¬x1 ) ∧ x2 ) ∨ (x0 ∧ x1 ∧ ¬x2 )
y3 = ((¬x0 ∨ ¬x1 ∨ ¬x2 ) ∧ x3 ) ∨ (x0 ∧ x1 ∧ x2 ∧ ¬x3 )

3.39 A solution in Python is shown in Figure S.3.1 on p. 28, in the first block of code (which defines a Proposition class
with a getVariables method).

3.40 An added method for Propositions (the evaluate method), and corresponding a TruthAssignment class, are
shown in the second block of code in Figure S.3.1 on p. 28.

3.41 A function to produce all TruthAssignments for a given set of variables is shown in Figure S.3.2; sample output on
a small proposition is shown in the last block in the same figure.

3.3 Propositional Logic: Some Extensions

3.42 p ⇒ p ≡ True

3.43 p ⊕ p ≡ False

3.44 p ⇔ p ≡ True

3.45 Either p ⇒ (¬p ⇒ (p ⇒ q)) or (p ⇒ ¬p) ⇒ (p ⇒ q) is a tautology.

3.46 (p ⇒ (¬p ⇒ p)) ⇒ q

28 Logic

1 class Proposition:
2 def __init__(self, parsed_expression):
3 '''A class for Boolean formulas. Representations are lists in prefix notation, as in:
4 "p" ["not", "p"] ["and", ["not", "p"], "q"]
5 (Legal connectives: not, or, and, implies, iff, xor.)'''
6 self.formula = parsed_expression
7
8 def getFormula(self):
9 return self.formula
10
11 def getVariables(self):
12 '''Extract all propositional variables from this proposition.'''
13 if type(self.formula) == str:
14 return set([self.formula])
15 elif self.formula[0] == "not":
16 Phi = Proposition(self.formula[1])
17 return Phi.getVariables()
18 elif self.formula[0] in ["implies", "and", "or", "xor", "iff"]:
19 Phi, Psi = Proposition(self.formula[1]), Proposition(self.formula[2])
20 return Phi.getVariables().union(Psi.getVariables())

21 def evaluate(self, truth_assignment): # Add this method to the Proposition class.

22 '''What is the truth value of this proposition under the given
23 truth_assignment (a dictionary mapping variables to True or False).'''
24 def helper(expr):
25 if type(expr) == str: return truth_assignment.lookup(expr)
26 elif expr[0] == "not": return not helper(expr[1])
27 elif expr[0] == "and": return helper(expr[1]) and helper(expr[2])
28 elif expr[0] == "or": return helper(expr[1]) or helper(expr[2])
29 elif expr[0] == "implies": return not helper(expr[1]) or helper(expr[2])
30 elif expr[0] == "xor": return helper(expr[1]) != helper(expr[2])
31 elif expr[0] == "iff": return helper(expr[1]) == helper(expr[2])
32 return helper(self.formula)
33
34 class TruthAssignment:
35 '''A truth assignment, built from a dictionary mapping variable names to True or False.'''
36 def __init__(self, variable_mapping):
37 self.mapping = variable_mapping
38
39 def lookup(self, variable):
40 return self.mapping[variable]
41
42 def extend(self, variable, truth_value):
43 self.mapping[variable] = truth_value

Figure S.3.1 Evaluating a proposition (and finding the set of truth assignments that make it true), in Python.

3.47 The implicit parentheses make the given proposition into ((p ⇒ ¬p) ⇒ p) ⇒ q, which is logically equivalent to
p ⇒ q.

3.48 Set all three of p, q, and r to be False. Then

p ⇒ (q ⇒ r) = True but (p ⇒ q) ⇒ r) = False.

False ⇒ (False ⇒ False) (False ⇒ False) ⇒ False
False ⇒ True True ⇒ False
True False

3.49 Here’s the truth table for both p ⇒ (q ⇒ q) and (p ⇒ q) ⇒ q:

3.3 Propositional Logic: Some Extensions 29

44 def all_truth_assignments(variables):
45 '''Compute a list of all TruthAssignments for the given set of variables.'''
46 if len(variables) == 0:
47 return [TruthAssignment({})]
48 else:
49 assignmentsT = all_truth_assignments(list(variables)[1:])
50 assignmentsF = copy.deepcopy(assignmentsT)
51 [rho.extend(list(variables)[0], True) for rho in assignmentsT]
52 [rho.extend(list(variables)[0], False) for rho in assignmentsF]
53 return assignmentsT + assignmentsF
54
55 def satisfying_assignments(proposition):
56 variable_set = proposition.getVariables()
57 return [rho for rho in all_truth_assignments(variable_set) if proposition.evaluate(rho)]

58 # For example, the following code outputs three truth assignments:

59 # {'q': True, 'p': True} {'q': True, 'p': False} {'q': False, 'p': False}
60 for rho in satisfying_assignments(Proposition(["or", ["not", "p"], "q"])):
61 print(rho.mapping)

Figure S.3.2 Evaluating a proposition (and finding the set of truth assignments that make it true), in Python.

p q p⇒q (p ⇒ q) ⇒ q q⇒q p ⇒ (q ⇒ q)
T T T T T T
T F F T T T
F T T T T T
F F T F T T

So p ⇒ (q ⇒ q) is a tautology (because its conclusion q ⇒ q is a tautology), but (p ⇒ q) ⇒ q is false when both p and
q are false.

3.50 The propositions p ⇒ (p ⇒ q) and (p ⇒ p) ⇒ q are logically equivalent; they’re both true except when p is true
and q is false. Thus they are both logically equivalent to p ⇒ q. Neither proposition is a tautology; both are satisfiable.

3.51 This answer is wrong: False ⊕ False is false, but when p = q = False, then ¬(p ∧ q) ⇒ (¬p ∧ ¬q) has the value
¬(False ∧ False) ⇒ (¬False ∧ ¬False)
≡ ¬False ⇒ True
≡ True ⇒ True
≡ True.

3.52 This answer is wrong: False ⊕ False is false, but when p = q = False, then (p ⇒ ¬q) ∧ (q ⇒ ¬p) has the value
(False ⇒ ¬False) ∧ (False ⇒ ¬False)
≡ (False ⇒ True) ∧ (False ⇒ True)
≡ True ∧ True
≡ True.

3.53 Let’s build a truth table:

p q ¬p ¬p ⇒ q ¬(p ∧ q) (¬p ⇒ q) ∧ ¬(p ∧ q) [the given proposition]

T T F T F F
T F F T T T
F T T T T T
F F T F T F

The last column of this truth table matches the truth table for p ⊕ q, so the solution is correct.
30 Logic

3.54 The given proposition is

¬ [(p ∧ ¬q ⇒ ¬p ∧ q) ∧ (¬p ∧ q ⇒ p ∧ ¬q)] ,

which has two repeated subexpressions, namely p ∧ ¬q and ¬p ∧ q, which we might call φ and φ, respectively. Then the
given proposition can be written as ¬[(φ ⇒ ψ) ∧ (ψ ⇒ φ)]. Here’s a truth table:

p q φ = p ∧ ¬q ψ = ¬p ∧ q φ⇒ψ ψ⇒φ (φ ⇒ ψ ) ∧ (ψ ⇒ φ ) ¬[(φ ⇒ ψ) ∧ (ψ ⇒ φ)]

T T F F T T T F
T F T F F T F T
F T F T T F F T
F F F F T T T F

The truth table for p ⊕ q is exactly the last column of this truth table—the given proposition—so the solution is correct.

3.55 One proposition that works: (p ∨ q) ⇒ (¬p ∨ ¬q). (There are many other choices too.)

3.56 Note that p ∨ (¬p ∧ q) is equivalent to p ∨ q, so

1 if x > 20 or (x ≤ 20 and y < 0) then 1 if x > 20 or y < 0 then
2 foo(x, y) can be rewritten as 2 foo(x, y) .
3 else 3 else
4 bar(x, y) 4 bar(x, y)

3.57 Note that (x − y) · y ≥ 0 if and only if x − y and y are either both positive, or they are both negative. In other words,
the condition (x − y) · y ≥ 0 is equivalent to (x − y ≥ 0) ⇔ (y ≥ 0). Thus, writing y ≥ 0 as p, and x ≥ y as q, the given
expression is p ∨ q ∨ (p ⇔ q). This proposition is a tautology! Thus
1 if y ≥ 0 or y ≤ x or (x − y) · y ≥ 0 then
2 foo(x, y) can be rewritten simply as .
1 foo(x, y)
3 else
4 bar(x, y)

3.58 Write p to denote 12 | x and q to denote 4 | x. Note that if p is true, then q must be true. By unnesting the conditions,
we see that foo is called if p ∧ ¬q (which is impossible). Thus
1 if x mod 12 = 0 then
2 if x mod 4 ̸= 0 then
1 if x mod 12 = 0 then
3 foo(x, y)
2 bar(x, y)
4 else
3 else
5 bar(x, y) can be simplified into .
4 if x = 17 then
6 else
5 baz(x, y)
7 if x = 17 then
6 else
8 baz(x, y)
7 quz(x, y)
9 else
10 quz(x, y)

3.59 Let’s build a truth table for (¬p ⇒ q) ∧ (q ∧ p ⇒ ¬p):

p q ¬p ⇒ q (q ∧ p ⇒ ¬p) (¬p ⇒ q) ∧ (q ∧ p ⇒ ¬p)

T T T F F
T F T T T
F T T T T
F F F T F

Thus the given proposition is logically equivalent to p ⊕ q.

3.3 Propositional Logic: Some Extensions 31

3.60 The expression p ⇒ p is always true, and so too is q ⇒ True. Thus we can immediately rewrite the statement as

(p ⇒ ¬p) ⇒ ( (q ⇒ (p ⇒ p) ) ⇒ p) ≡ (p ⇒ ¬p) ⇒ ((q ⇒ True) ⇒ p) ≡ (p ⇒ ¬p) ⇒ (True ⇒ p).

True
q ⇒ True ≡ True
If p is true, then (p ⇒ ¬p) ⇒ (True ⇒ p) is (True ⇒ False) ⇒ (True ⇒ True), which is just False ⇒ True, or True.
If p is false, then (p ⇒ ¬p) ⇒ (True ⇒ p) is (False ⇒ True) ⇒ (True ⇒ False), which is just True ⇒ False, or False.
Thus the whole expression is logically equivalent to p.

3.61 Both p ⇒ p and ¬p ⇒ ¬p are tautologies, so the given proposition (p ⇒ p) ⇒ (¬p ⇒ ¬p) ∧ q is equivalent to
True ⇒ True ∧ q. That proposition is true when q is true, and false when q is false—so the whole expression is logically
equivalent to q.

3.62 The claim states that every proposition over the single variable p is either logically equivalent to p or it’s logically
equivalent to ¬p. The claim is false; a proposition over a single variable p might be logically equivalent to True or False
instead of being logically equivalent to p or ¬p. (For example, p ∧ ¬p.) However, every proposition φ over p is logically
equivalent to one of {p, ¬p, True, False}. To prove this, observe that there are only two truth assignments for φ, one
where p is true and one where p is false. Thus there are only four possible truth tables for φ:

p (1) (2) (3) (4)

T T T F F
F T F T F

These columns are, respectively: (1) True, (2) p, (3) ¬p, and (4) False.

3.63 p q p⇒q ¬q (p ⇒ q) ∧ ¬q ¬p (p ⇒ q) ∧ ¬q ⇒ ¬p
T T T F F F T
T F F T F F T
F T T F F T T
F F T T T T T

3.64 p q p∨q p ⇒ (p ∨ q)
T T T T
T F T T
F T T T
F F F T

3.65 p q p∧q (p ∧ q) ⇒ p
T T T T
T F F T
F T F T
F F F T

3.66 p q ¬p p∨q (p ∨ q) ∧ ¬p (p ∨ q) ∧ ¬p ⇒ q
T T F T F T
T F F T F T
F T T T T T
F F T F F T

3.67 p q ¬p p⇒q ¬p ⇒ q (p ⇒ q) ∧ (¬p ⇒ q) [(p ⇒ q) ∧ (¬p ⇒ q)] ⇒ q

T T F T T T T
T F F F T F T
F T T T T T T
F F T T F F T
32 Logic

3.68 p q r p⇒q q⇒r (p ⇒ q) ∧ (q ⇒ r) p⇒r [(p ⇒ q) ∧ (q ⇒ r)] ⇒ (p ⇒ r)

T T T T T T T T
T T F T F F F T
T F T F T F T T
T F F F T F F T
F T T T T T T T
F T F T F F T T
F F T T T T T T
F F F T T T T T

3.69 p q r q∧r p⇒q p⇒r p⇒q∧r (p ⇒ q) ∧ (p ⇒ r) ⇔ p ⇒ q ∧ r

T T T T T T T T
T T F F T F F T
T F T F F T F T
T F F F F F F T
F T T T T T T T
F T F F T T T T
F F T F T T T T
F F F F T T T T

3.70 p q r q∨r p⇒q p⇒r p⇒q∨r (p ⇒ q) ∨ (p ⇒ r) ⇔ p ⇒ q ∨ r

T T T T T T T T
T T F T T F T T
T F T T F T T T
T F F F F F F T
F T T T T T T T
F T F T T T T T
F F T T T T T T
F F F F T T T T

3.71 p q r q∨r p ∧ (q ∨ r) p∧q p∧r (p ∧ q) ∨ (p ∧ r) p ∧ (q ∨ r) ⇔ (p ∧ q) ∨ (p ∧ r)

T T T T T T T T T
T T F T T T F T T
T F T T T F T T T
T F F F F F F F T
F T T T F F F F T
F T F T F F F F T
F F T T F F F F T
F F F F F F F F T

3.72 p q r q⇒r p ⇒ (q ⇒ r) p∧q p∧q⇒r p ⇒ (q ⇒ r) ⇔ p ∧ q ⇒ r

T T T T T T T T
T T F F F T F T
T F T T T F T T
T F F T T F T T
F T T T T F T T
F T F F T F T T
F F T T T F T T
F F F T T F T T
3.3 Propositional Logic: Some Extensions 33

3.73 p q p∧q p ∨ (p ∧ q)
T T T T
T F F T
F T F F
F F F F
Because the p column and p ∨ (p ∧ q) column match, the proposition p ∨ (p ∧ q) ⇔ p is a tautology.

3.74 p q p∨q p ∧ (p ∨ q)
T T T T
T F T T
F T T F
F F F F
Because the p column and p ∧ (p ∨ q) column match, the proposition p ∧ (p ∨ q) ⇔ p is a tautology.

3.75 p q p⊕q p∨q p⊕q⇒p∨q

T T F T T
T F T T T
F T T T T
F F F F T

3.76 p q p∧q ¬(p ∧ q) ¬p ¬q ¬p ∨ ¬ q

T T T F F F F
T F F T F T T
F T F T T F T
F F F T T T T

3.77 p q p∨q ¬(p ∨ q) ¬p ¬q ¬p ∧ ¬ q

T T T F F F F
T F T F F T F
F T T F T F F
F F F T T T T

3.78 p q r q∨r p ∨ (q ∨ r) p∨q (p ∨ q) ∨ r

T T T T T T T
T T F T T T T
T F T T T T T
T F F F T T T
F T T T T T T
F T F T T T T
F F T T T F T
F F F F F F F

3.79 p q r q∧r p ∧ (q ∧ r) p∧q (p ∧ q) ∧ r

T T T T T T T
T T F F F T F
T F T F F F F
T F F F F F F
F T T T F F F
F T F F F F F
F F T F F F F
F F F F F F F
34 Logic

3.80 p q r q⊕r p ⊕ (q ⊕ r) p⊕q (p ⊕ q) ⊕ r

T T T F T F T
T T F T F F F
T F T T F T F
T F F F T T T
F T T F F T F
F T F T T T T
F F T T T F T
F F F F F F F

3.81 p q r q⇔r p ⇔ (q ⇔ r) p⇔q (p ⇔ q) ⇔ r

T T T T T T T
T T F F F T F
T F T F F F F
T F F T T F T
F T T T F F F
F T F F T F T
F F T F T T T
F F F T F T F

3.82 p q p⇒q ¬p ¬p ∨ q
T T T F T
T F F F F
F T T T T
F F T T T

3.83 p q r q⇒r p ⇒ (q ⇒ r) p∧q p∧q⇒r

T T T T T T T
T T F F F T F
T F T T T F T
T F F T T F T
F T T T T F T
F T F F T F T
F F T T T F T
F F F T T F T

3.84 p q p⇔q ¬p ¬q ¬p ⇔ ¬ q
T T T F F T
T F F F T F
F T F T F F
F F T T T T

3.85 p q p⇒q ¬(p ⇒ q) ¬q p ∧ ¬q

T T T F F F
T F F T T T
F T T F F F
F F T F T F

3.86 If p stands for an expression that causes an error when it’s evaluated, then the two blocks of code aren’t equivalent.
The first block causes a crash; the second doesn’t. For example, if p stands for the expression 3 / 0 > 1, the left block
crashes and the second merrily sets x to 51.

3.87 The tautology is p ⇔ (p ⇔ True). Observe that p ⇔ (p ⇔ True) is logically equivalent to (p ⇔ p) ⇔ True by the
associativity of ⇔. And p ⇔ p is a tautology, by the idempotence of ⇔.
3.3 Propositional Logic: Some Extensions 35

3.88 We must find a circuit that’s true when the true inputs are {q} or {r}, and false when the true inputs are {p} or {p, q}
or {p, q, r}. The only one-gate circuit of this form is ¬p—though there are many less efficient ways to express a logically
equivalent solution, or an inequivalent one, with additional gates. One equivalent example, among many, is ¬(p ∧ (r ∨ p));
one inequivalent answer is ¬p ∧ (q ∨ r).

3.89 We must find a circuit that’s true when the true inputs are {p, q} or {p, r}, and false when the true inputs are {p} or
{q} or {r}. No 0- or 1-gate circuits achieve this specification, but the 2-gate circuit p ∧ (q ∨ r) does. (Again, there are
many less efficient ways of expressing an equivalent formula.)

3.90 There are two possible settings that can make the light go on (the only types of inputs that aren’t fully specified in
the question have zero or two true inputs):

• If the light is on when the true inputs are {}, then the circuit is equivalent to ¬(p ∨ q ∨ r).
• If the light is on when the true inputs are some pair of inputs (say, {p, q}, without loss of generality), then the circuit
is equivalent to p ∧ q ∧ ¬r. (If it were a different pair of input variables that caused the light to go on, then those two
input variables would appear unnegated in the circuit’s formula, and the third variable would appear negated.)

There is no circuit with two or fewer gates that achieves this specification.

3.91 There are two possible pairs of settings that can make the light go on (the only types of inputs that aren’t fully specified
in the question have zero or one true inputs):

• If the light is on when the true inputs are {} or {p}, then the simplest circuit is equivalent to ¬(q ∨ r). (It doesn’t matter
which singleton input causes the bulb to go on, so without loss of generality let’s call it p.)
• If the light is on when the true inputs are {p} or {q} (without loss of generality, similarly), then the circuit ¬(r∨(p∧q))
matches the specification. (In this case, the {} input also causes the bulb to go on.) There is no circuit with two or fewer
gates that matches this case.

The former circuit, of the form ¬(q ∨ r), uses only two gates; the latter requires three.

3.92 The circuit is equivalent to False, which can be written as p∧¬p (and cannot be expressed with fewer than two gates).

3.93 The sixteen propositions over p and q are listed in Figure 4.34, or reproduced here:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
p q True p ∨ q p ⇐ q p p⇒q q p ⇔ q p ∧ q p nand q p ⊕ q ¬q ¬p p nor q False
T T T T T T T T T T F F F F F F F F
T F T T T T F F F F T T T T F F F F
F T T T F F T T F F T T F F T T F F
F F T F T F T F T F T F T F T F T F

Of these, here are the ones that can be expressed with zero, one, or two ∧, ∨, and ¬ gates:

• Two can be expressed with 0 gates: p and q.

• Four can be expressed with 1 gate: ¬p, ¬q, p ∧ q, and p ∨ q.
• Eight can be expressed with 2 gates: true (p ∨ ¬p); false (p ∧ ¬p); nand (¬(p ∧ q)), nor (¬(p ∨ q)); p ⇒ q (as ¬p ∨ q);
q ⇒ p (as ¬q ∨ p); column #12 (p ∧ ¬q); and column #14 (¬p ∧ q).

The only two propositions that can’t be expressed with two or fewer gates are p ⊕ q and p ⇔ q.

3.94 There are 84 different propositions, which is verified by the output of the Python program in Figure S.3.3.

3.95 The proposition p ⊕ q ⊕ r ⊕ s ⊕ t is true when an odd number of the Boolean variables {p, q, r, s, t} are true—that
is, when p + q + r + s + t mod 2 = 1.

3.96 Here is the pseudocode:

36 Logic

1 def equivalent_propositions(propA, propB, truth_assignments):

2 '''Is propA's truth value equal to propB's in each rho in truth_assignments?'''
3 return all([propA.evaluate(rho) == propB.evaluate(rho) for rho in truth_assignments])
4
5 def all_formulae(variables, numGates):
6 '''Compute a list of all Propositions over the given variables with the given number of gates.'''
7 if numGates == 0:
8 return [Proposition(var) for var in variables]
9 else:
10 subformulae = {}
11 for i in range(numGates):
12 subformulae[i] = all_formulae(variables, i)
13 result = [Proposition(["not", phi.getFormula()]) for phi in subformulae[numGates-1]]
14 for i in range(numGates):
15 result += [Proposition([operator, phi.getFormula(), psi.getFormula()])
16 for operator in ["and", "or"]
17 for phi in subformulae[i]
18 for psi in subformulae[numGates-i-1]]
19 return result
20
21 def cull_equivalent_formulae(formulae, truth_assignments):
22 '''Filter out any logically equivalent formulae in the given list of formulae.'''
23 uniqueList = []
24 for phi in formulae:
25 if not any(equivalent_propositions(phi, psi, truth_assignments) for psi in uniqueList):
26 uniqueList.append(phi)
27 return uniqueList
28
29 variables = ["p", "q", "r"]
30 gate_count = 3
31 allAssignments = all_truth_assignments(variables)
32 allFormulas = [phi for num_gates in range(gate_count+1)
33 for phi in all_formulae(variables, num_gates)]
34 allUniqueFormulas = cull_equivalent_formulae(allFormulas, allAssignments)
35 print(len(allUniqueFormulas), "distinct formulas are representable with", gate_count, "gates.")
36 # Output: 84 distinct formulas are representable with 3 gates.

Figure S.3.3 Python code finding circuits with inputs {p, q, r}, and at most three gates chosen from {∧, ∨, ¬}. The
code here relies on the class and function definitions found in Figure S.3.1.

1 for y := 1, 2, . . . , height:
2 for x := 1, 2, . . . , width:
3 if P[x,y] is more white than black then
4 error := “white” − P[x, y]
5 P[x, y] := “white”
6 else
7 error := “black” − P[x, y]
8 P[x, y] := “black”
9 if x < width then distribute E.
10 if x > 1 and y < height then distribute SW.
11 if y < height then distribute S.
12 if x < width and y < height then distribute SE.

3.97 r ∨ (p ∧ q)

3.98 (q ∧ r) ∨ (¬q ∧ ¬r) ∨ (¬p ∧ ¬q ∧ r) ∨ (¬p ∧ q ∧ ¬r)

3.99 p ∨ q
3.3 Propositional Logic: Some Extensions 37

3.100 ¬p ∧ ¬q ∧ ¬r

3.101 (p ∨ r) ∧ (q ∨ r)

3.102 p ∧ (¬q ∨ ¬r)

3.103 (p ∨ ¬q) ∧ (¬q ∨ r)

3.104 p ∧ (q ∨ r)

3.105 We can produce a one-clause 3CNF tautology: for example, p ∨ ¬p ∨ q.

3.106 Using De Morgan’s Laws, we know that a clause (a ∨ b ∨ c) is logically equivalent to ¬(¬a ∧ ¬b ∧ ¬c). That is, the
truth assignment a = b = c = False does not satisfy any proposition · · · ∧ (a ∨ b ∨ c) ∧ · · ·. However, this particular
clause is satisfied by any other truth assignment. Thus, we’ll need to use a different clause to rule out each of the eight
possible truth assignments. So the smallest 3CNF formula that’s not satisfiable has three variables and eight clauses, each
of which rules out one possible satisfying truth assignment:
(p ∨ q ∨ r) ∧ (p ∨ q ∨ ¬r) ∧ (p ∨ ¬q ∨ r) ∧ (p ∨ ¬q ∨ ¬r)
∧ (¬p ∨ q ∨ r) ∧ (¬p ∨ q ∨ ¬r) ∧ (¬p ∨ ¬q ∨ r) ∧ (¬p ∨ ¬q ∨ ¬r).

3.107 Tautological DNF formulae correspond to satisfiable CNF formulae. A Boolean formula φ is a tautology if and
only if ¬φ is not satisfiable, and for an m-clause φ in 3DNF, by an application of De Morgan’s Laws we see that ¬φ
is equivalent to an m-clause 3CNF formula. Thus the smallest 3DNF formula that’s a tautology has three variables and
eight clauses, each of which “rules in” one satisfying truth assignment.
(p ∧ q ∧ r) ∨ (p ∧ q ∧ ¬r) ∨ (p ∧ ¬q ∧ r) ∨ (p ∧ ¬q ∧ ¬r)
∨ (¬p ∧ q ∧ r) ∨ (¬p ∧ q ∧ ¬r) ∨ (¬p ∧ ¬q ∧ r) ∨ (¬p ∧ ¬q ∧ ¬r).

3.108 We can give a one-clause non-satisfiable 3DNF formula: p ∧ ¬p ∧ q.

3.109 One way to work out this quantity is to split up the types of legal clauses based on the number of unnegated variables
in them. We can have clauses with 0, 1, 2, or 3 unnegated clauses:
• 3 unnegated variables: the only such clause is p ∨ q ∨ r.
• 2 unnegated variables: there are three different choices of which variable fails to appear in unnegated form (p, q, r), and
there are three different choices of which variable does appear in negated form (again, p, q, r). Thus there are 3 · 3 = 9
total such clauses.
• 1 unnegated variable: again, there are three choices for the unnegated variable, and three choices for the omitted
negated variable. Again there are 3 · 3 = 9 total such clauses.
• 0 unnegated variables: the only such clause is ¬p ∨ ¬q ∨ ¬r.
So the total is 1 + 9 + 9 + 1 = 20 clauses.

3.110 Because a formula in 3CNF is a conjunction of clauses, the entire proposition is a tautology if and only if every
clause is a tautology. The only tautological clauses are of the form a ∨ ¬a ∨ b, so the largest tautological 3CNF formula
has six clauses:
(p ∨ ¬p ∨ q) ∧ (p ∨ ¬p ∨ r) ∧ (q ∨ ¬q ∨ p) ∧ (q ∨ ¬q ∨ r) ∧ (r ∨ ¬r ∨ p) ∧ (r ∨ ¬r ∨ q).

3.111 Suppose that φ is satisfied by the all-true truth assignment—that is, when p = q = r = True then φ itself evaluates
to true. It’s fairly easy to see that if the all-true truth assignment is the only truth assignment that satisfies φ then there
can be more clauses in φ than if there are many satisfying truth assignments. We cannot have the clause ¬p ∨ ¬q ∨ ¬r
in φ; otherwise, the all-true assignment doesn’t satisfy φ. But any clause that contains at least one unnegated literal can
be in φ. Those clauses are:
• 1 clause with 3 unnegated variables: the only such clause is p ∨ q ∨ r.
• 9 clauses with 2 unnegated variables: there are three choices of which variable fails to appear in unnegated form, and
there are three different choices of which variable does appear in negated form.
38 Logic

• 9 clauses with 1 unnegated variable: again, there are three choices for the unnegated variable, and three choices for
the omitted negated variable.
So the total is 1 + 9 + 9 = 19 clauses:
(p ∨ q ∨ r)
∧ (¬p ∨ q ∨ r) ∧ (p ∨ ¬q ∨ r) ∧ (p ∨ q ∨ ¬r) ∧ (¬p ∨ p ∨ q) ∧ (¬p ∨ p ∨ r)
∧ (¬q ∨ q ∨ p) ∧ (¬q ∨ q ∨ r) ∧ (¬r ∨ r ∨ p) ∧ (¬r ∨ r ∨ q)
∧ (¬p ∨ ¬q ∨ r) ∧ (¬p ∨ q ∨ ¬r) ∧ (p ∨ ¬q ∨ ¬r) ∧ (p ∨ ¬p ∨ ¬q) ∧ (p ∨ ¬p ∨ ¬r)
∧ (q ∨ ¬q ∨ ¬p) ∧ (q ∨ ¬q ∨ ¬r) ∧ (r ∨ ¬r ∨ ¬p) ∧ (r ∨ ¬r ∨ ¬q).

3.4 An Introduction to Predicate Logic

3.112 P(x) = x has strong typing and x is object-oriented.

3.113 P(x) = x is imperative.

3.114 P(x) = x has strong typing and x is not object-oriented.

3.115 P(x) = x has either scope, or x is both imperative and has strong typing.

3.116 P(x) = x is either object-oriented or scripting.

3.117 P(x) = x does not have strong typing or x is not object-oriented.

3.118 n ≥ 8

3.119 ∃i ∈ {1, 2, . . . , n} : isLower(xi )

3.120 ∃i ∈ {1, 2, . . . , n} : ¬isLower(xi ) ∧ ¬isUpper(xi ) ∧ ¬isDigit(xi )

3.121 If there are no chairs at all in Bananaland, then both “all chairs in Bananaland are green” and “no chairs in Bananaland
are green” are true.

3.122 The given predicate is logically equivalent to [age(x) < 18]∨[gpa(x) ≥ 3.0]. Here’s the relevant truth table, writing
p for [age(x) < 18] and q for [gpa(x) ≥ 3.0)]:

p q ¬p ∧ q p ∨ (¬p ∧ q) p∨q
T T F T T
F T T T T
T F F T T
F F F F F

3.123 The given predicate is logically equivalent to ¬[takenCS(x)]. Here’s the relevant truth table, writing p for takenCS(x)
and q for [home(x) = Hawaii]:

p q q∧p q⇒q∧p ¬(q ⇒ q ∧ p) p ⇒ ¬(q ⇒ (q ∧ p)) ¬p

T T T T F F F
T F F T F F F
F T F F T T T
F F F T F T T

3.124 The given predicate is logically equivalent to hasMajor(x) ∧ ([schoolYear(x) ̸= 3] ∨ ¬onCampus(x)). Here’s the
relevant truth table, writing p for hasMajor(x) and q for [schoolYear(x) = 3] and r for onCampus(x):
3.4 An Introduction to Predicate Logic 39

p q r p ∧ ¬q ∧ r p ∧ ¬q ∧ ¬r p ∧ q ∧ ¬r (p ∧ ¬q ∧ r) ∨ (p ∧ ¬q ∧ ¬r) ∨ (p ∧ q ∧ ¬r) p ∧ (¬q ∨ ¬r)

T T T F F F F F
T T F F F T T T
T F T T F F T T
T F F F T F T T
F T T F F F F F
F T F F F F F F
F F T F F F F F
F F F F F F F F

3.125 The claim is false. Let’s think of this problem as a question about a proposition involving {True, False, ∧, ∨, ¬, ⇒}
and—once each—p and q. We’ll prove that the claim is false by proving that it’s impossible to express p ⇔ q and p ⊕ q
with this kind of proposition.
First, observe that any proposition involving p and q once each must involve a subproposition of the form φ ⋄ ψ, where
φ uses p and ψ uses q. (In the tree-based representation, φ ⋄ ψ is the least common ancestor of p and q.)
But φ, then, is a proposition involving p as the only variable, and ψ is a proposition involving q as the only variable.
That means that φ can only be logically equivalent to one of four functions: True, False, p, ¬p. And ψ can only be
logically equivalent to one of four functions: True, False, q, ¬q. (See Exercise 3.62.) Then what can φ ⋄ ψ represent, for
a connective ∧, ∨, or ⇒ (or ⇐)?
If φ doesn’t express either p or ¬p, or if ψ doesn’t express either q or ¬q, the entire proposition can only be equivalent
to one of the six logical functions that do not involve both p and q: True, False, p, ¬p, q, ¬q. Otherwise, we are left with
only these possible expressions:

φ≡p φ ≡ ¬p
ψ≡q p∧q ¬p ∧ q
p∨q ¬p ∨ q
p ⇒ q ≡ ¬p ∨ q ¬p ⇒ q ≡ p ∨ q
p ⇐ q ≡ ¬q ∨ p ¬p ⇐ q ≡ ¬q ∨ ¬p
ψ ≡ ¬q p ∧ ¬q ¬p ∧ ¬q
p ∨ ¬q ¬p ∨ ¬q
p ⇒ ¬q ≡ ¬p ∨ ¬q ¬p ⇒ ¬q
p ⇐ ¬q ≡ p ∨ q ¬p ⇐ ¬q ≡ ¬p ∨ q

But φ ⋄ ψ therefore can only compute the following functions:

• the six functions not involving both p and q: True, False, p, ¬p, q, ¬q.
• the four conjunctions: p ∧ q, ¬p ∧ q, p ∧ ¬q, ¬p ∧ ¬q.
• the four disjunctions: p ∨ q, ¬p ∨ q, p ∨ ¬q, ¬p ∨ ¬q.

(The implications are all logically equivalent to one of these disjunctions, as the table shows.) There are only 14 entries
here, which leaves two unimplementable logical functions. (See Exercise 3.93, or Figure 4.34.) They are, perhaps
intuitively, p ⇔ q and p ⊕ q.

3.126 “I am ambivalent about writing my program in Java.”

3.127 “I was excited to learn about a study abroad computer science program on the island of Java.”

3.128 “I love to start my day with a cup o’ Java.”

3.129 “Implementing this algorithm in Java really helped me get with the program.”

3.130 We can prove that this statement is false by counterexample: the number n = 2 is prime, but n
2
= 1 is in Z.

3.131 ∃x : ¬[Q(x) ⇒ P(x)], which is logically equivalent to ∃x : Q(x) ∧ ¬P(x).

3.132 “There exists a negative entry in the array A.”

40 Logic

3.133 The sentence is ambiguous, because “or” of “parentheses or braces” might be inclusive or exclusive, though the
much more natural reading is the former. (It’s a rare programming language that has two fundamentally different ways
of expressing block structure.) Negating that version yields: “There exists a decent programming language that denotes
block structure with neither parentheses nor braces.”

3.134 “No odd number is evenly divisible by any other odd number.”

3.135 This sentence is ambiguous: the type of quantification expressed by “a lake” could be universal or existential. (In
other words: are we talking about some particular lake, and claiming that there’s some place in Minnesota that’s far from
that one lake? Or are we saying that there’s a place in Minnesota that is far away from all lakes?) Thus there are two
reasonable readings of the original sentence:
∃x ∈ Minnesota : ∃ℓ ∈ Lakes : d(x, ℓ) ≥ 10
∃x ∈ Minnesota : ∀ℓ ∈ Lakes : d(x, ℓ) ≥ 10.
The second reading seems more natural, and we can negate that reading as follows: “For every point in Minnesota, there
is some lake within 10 miles of that point.”

3.136 This sentence is ambiguous; the order of quantification is unclear. Let’s write A to denote the set of sorting algorithms
and In to denote the set of n-element arrays. Then there are two natural readings:
∀x ∈ A : ∃y ∈ In : x(y) takes ≥ n log n steps
and ∃y ∈ In : ∀x ∈ A : x(y) takes ≥ n log n steps
(There’s also some ambiguity in the meaning of n, which may be implicitly quantified or may have a particular numerical
value from context. We’ll leave this ambiguity alone.)
The first reading, informally: after I see your code for sorting, it’s possible for me to construct an array that make your
particular sorting algorithm slow. Negated: “There’s a sorting algorithm that, for every n-element array, takes fewer than
n log n steps on that array.” (Some sorting algorithm is always fast.)
The second reading, informally: there’s a single devilish array that makes every sorting algorithm slow. Negated: “For
every n-element array, there’s some sorting algorithm that takes fewer than n log n steps on that array.” (No array is always
sorted slowly.)

3.137 Assume the antecedent—that is, assume that there exists a particular value x∗ ∈ S such that P(x∗ ) ∧ Q(x∗ ). Then
certainly (1) there exists an x ∈ S such that P(x) (namely, x∗ ) and (2) there exists an x ∈ S such that Q(x) (again, namely,
x∗ ). But (1) just expresses ∃x ∈ S : P(x), and (2) just expresses ∃x ∈ S : Q(x).

3.138 We’re asked to show that ∀x ∈ S : [P(x) ∧ Q(x)] is true if and only if [∀x ∈ S : P(x)] ∧ [∀x ∈ S : Q(x)] is. We’ll be
lazy and use Example 3.44 to solve the problem, by showing that the desired claim is an instance of Example 3.44:

∀x ∈ S : P(x) ∧ Q(x) ⇔ ∀x ∈ S : P(x) ∧ ∀x ∈ S : Q(x)

≡ ¬∀x ∈ S : P(x) ∧ Q(x) ⇔ ¬ ∀x ∈ S : P(x) ∧ ∀x ∈ S : Q(x) p ⇔ q ≡ ¬p ⇔ ¬q

≡ ∃x ∈ S : ¬[P(x) ∧ Q(x)] ⇔ ¬∀x ∈ S : P(x) ∨ ¬∀x ∈ S : Q(x) De Morgan’s Laws and their predicate-logic analogues

≡ ∃x ∈ S : ¬P(x) ∨ ¬Q(x) ⇔ ∃x ∈ S : ¬P(x) ∨ ∃x ∈ S : ¬Q(x). De Morgan’s Laws and their predicate-logic analogues
But both ¬P and ¬Q are predicates too! So the given proposition is logically equivalent to the theorem from Example 3.44,
and thus is a theorem as well.

3.139 Suppose that ∀x ∈ S : P(x) ⇒ Q(x) and ∀x ∈ S : P(x). We must show that ∀x ∈ S : Q(x) is true. But let x∗ ∈ S
be arbitrary. By the two assumptions, we have that P(x∗ ) and P(x∗ ) ⇒ Q(x∗ ). Therefore, by Modus Ponens, we have
Q(x∗ ). Because x∗ was arbitrary, we’ve argued that ∀x ∈ S : Q(x).

3.140 The given statements, respectively, express “every condition-A-satisfying element also satisfies condition B” and
“every element satisfies if-A-then-B.” These two conditions are true precisely when {x ∈ S : P(x)} ⊆ {x ∈ S : Q(x)}.

3.141 The given statements, respectively, express “there exists a condition-A-satisfying element that also satisfies con-
dition B” and “there exists an element that satisfies both condition A and condition B.” These two conditions are true
precisely when {x ∈ S : P(x)} ∩ {x ∈ S : Q(x)} is nonempty.
3.4 An Introduction to Predicate Logic 41

3.142 For intuition, here are two completely logically equivalent (English-language) statements:

(1) Ronald Reagan was the tallest U. S. president.

φ
(2) For every positive integer n, Ronald Reagan was the tallest U. S. president.
φ

Because n doesn’t appear in φ, either φ is true for every integer n (in which case φ and ∀n : φ are both true), or φ is
true for no integer n (in which case φ and ∀n : φ are both false).
To prove the equivalence, first suppose φ is true. Then, for an arbitrary x ∈ S, we have that φ is still true (because φ
doesn’t depend on x). That is, ∀x ∈ S : φ.
For the other direction, suppose that ∀x ∈ S : φ. Let a ∈ S be arbitrary. Then, plugging in a for every free occurrence
of x in φ must yield a true statement—but there are no such occurrences of x, so φ itself must be true.

3.143 Let’s consider two cases: either φ is true, or φ is false.

If φ is true, then the left-hand side is obviously true (it’s True ∨ something). And the right-hand side is also true: φ is
true (and doesn’t depend on x), so for any particular value of x, we still have that φ is true—and therefore so is φ ∨ P(x).
Because that’s true for any x ∈ S, we have ∀x ∈ S : φ ∨ P(x) too.
If φ is false, then the left-hand side is true if and only if ∀x ∈ S : P(x) (because False ∨ something ≡ something).
And, on the right-hand side, for any particular value of x, we have that φ ∨ P(x) ≡ P(x). Thus the right-hand side is also
true if and only if ∀x ∈ S : P(x).

3.144 We’ll use Exercise 3.143 and De Morgan’s Laws to show this equivalence.
h i h i
φ ∧ ∃x ∈ S : P(x) ⇔ ∃x ∈ S : φ ∧ P(x)
h i h i
≡ ¬ φ ∧ ∃x ∈ S : P(x) ⇔ ¬ ∃x ∈ S : φ ∧ P(x) Exercise 3.84: p ⇔ q ≡ ¬p ⇔ ¬q
h i h i
≡ ¬φ ∨ ¬ ∃x ∈ S : P(x) ⇔ ¬ ∃x ∈ S : φ ∧ P(x) De Morgan’s Laws
h i h i
≡ ¬φ ∨ ∀x ∈ S : ¬P(x) ⇔ ∀x ∈ S : ¬φ ∨ ¬P(x) . De Morgan’s Laws

But this last statement is exactly Exercise 3.143 (where the proposition is ¬φ and the predicate is ¬P(x)).

3.145 Let’s consider two cases: either φ is true, or φ is false.

If φ is false, then the left-hand side is vacuously true, and the right-hand side is also true: φ is false (and doesn’t
depend on x), so for any particular value of x, we still have that φ is false—and therefore so is φ ⇒ P(x). Because that’s
true for any x ∈ S, we have ∃x ∈ S : φ ⇒ P(x) too.
If φ is true, then the left-hand side is true if and only if [∃x ∈ S : P(x)]. And, on the right-hand side, for any particular
value of x, we have that True ⇒ P(x) if only if P(x), so the right-hand side is also true if and only if ∀x ∈ S : P(x).

3.146 Again, we’ll be lazy: p ⇒ q ≡ ¬p ∨ q, so

h i h i
∃x ∈ S : P(x) ⇒ φ ≡ ¬ ∃x ∈ S : P(x) ∨ φ
h i
≡ ∀x ∈ S : ¬P(x) ∨ φ De Morgan’s Laws
h i
≡ ∀x ∈ S : ¬P(x) ∨ φ Exercise 3.143
h i
≡ ∀x ∈ S : P(x) ⇒ φ .

3.147 Let φ be “x = 0” and let P(x) be “x ̸= 0” and let S be R.

Then the left-hand side says that the predicate [x = 0] ∨ [∀x ∈ R : x ̸= 0] is true for every integer. But the proposition
is true when x = 0 but false when x = 42.
But the right-hand side says that the predicate ∀x ∈ R : [x = 0] ∨ [x ̸= 0], which is a theorem, is true for every integer.
And it is.
42 Logic

3.5 Nested Quantifiers

3.148 The proposition is true. For any c ∈ R, we choose the constant function fc that always outputs c—that is, the function
defined as fc (x) = c for all x ∈ R. (For example, f3 (x) is the function that takes an input x and, no matter what x is, returns
the value 3.) Then fc (x) = c for any x—and, in particular, fc (0) = c.

3.149 The proposition is false. It would require a single function f such that we have both that f(0) = 0 and that f(0) = 1,
for example—but that’s not possible for a function, because f(0) can have only one value.

3.150 The proposition is true. For any c ∈ R, we choose the function fc that subtracts c from its input—that is, the function
defined as fc (x) = x − c for all x ∈ R. And fc (c) = 0.

3.151 The proposition is true. The always-zero function zero defined as zero(x) = 0 for any x ∈ R satisfies the condition
that ∀x ∈ R : zero(x) = 0.

3.152 ∀t ∈ T : ∀j ∈ J : ∀j′ ∈ J : scheduledAt(j, t) ∧ scheduledAt(j′ , t) ⇒ j = j′

3.153 ∀j ∈ J : ∃t ∈ T : scheduledAt(j, t)

3.154 We can require that every time at which A is run is followed by two minutes of A not being run:
∀t ∈ T : scheduledAt(A, t) ⇒ ¬scheduledAt(A, t + 1) ∧ ¬scheduledAt(A, t + 2).
Alternatively, we look at pairs of times when A is scheduled and require that the times be more than two minutes apart:
∀t ∈ T : ∀t′ ∈ T : scheduledAt(A, t) ∧ scheduledAt(A, t′ ) ⇒ (t = t′ ∨ |t − t′ | > 2).
(In the latter case, we have to be careful not to forbid A being scheduled at time t and time t′ where t = t′ !)

3.155 ∃t, t′ , t′′ ∈ T : scheduledAt(B, t) ∧ scheduledAt(B, t′ ) ∧ scheduledAt(B, t′′ ) ∧ t ̸= t′ ∧ t ̸= t′′ ∧ t′ ̸= t′′

3.156 One way of expressing this condition: if C is scheduled at times t and t′ > t, then there’s no time later than t′ at
which C is also scheduled. Formally:

∀t, t′ ∈ T : scheduledAt(C, t) ∧ scheduledAt(C, t′ ) ∧ t < t′ ⇒ ∀t′′ ∈ T : t′′ > t′ ⇒ ¬scheduledAt(C, t′′ ) .

Alternatively, if C is scheduled at times t, t′ , and t′′ , then at least two of {t, t′ , t′′ } are actually equal:
∃t, t′ , t′′ ∈ T : scheduledAt(C, t) ∧ scheduledAt(C, t′ ) ∧ scheduledAt(C, t′′ ) ⇒ t = t′ ∨ t = t′′ ∨ t′ = t′′ .

3.157 The given statement is equivalent to the following: every time t at which E is run is followed by a time at which D
is run. Formalizing that:

∀t ∈ T : scheduledAt(E, t) ⇒ ∃t′ ∈ T : t′ > t ∧ scheduledAt(D, t′ ) .

3.158 ∀t, t′ ∈ T : scheduledAt(G, t) ∧ scheduledAt(G, t′ ) ∧ t < t′ ⇒ [∃t′′ ∈ T : t < t′′ < t′ ∧ scheduledAt(F, t′′ )]

3.159 ∀t, t′ ∈ T : scheduledAt(I, t) ∧ scheduledAt(I, t′ ) ∧ t < t′ ⇒

∀t′′ , t′′′ ∈ T : t < t′′ < t′ ∧ t < t′′′ < t′ ∧ t′′ ̸= t′′′ ∧ scheduledAt(H, t′′ ) ⇒ ¬scheduledAt(H, t′′′ )

3.160 ∀x ∈ {1, . . . , n} : ∀y ∈ {1, . . . m} : P[x, y] = 0

3.161 ∃x ∈ {1, . . . , n} : ∃y ∈ {1, . . . m} : P[x, y] = 1

3.162 ∀y ∈ {1, . . . m} : ∃x ∈ {1, . . . , n} : P[x, y] = 1

3.163 ∀x ∈ {1, . . . , n} : ∀y ∈ {1, . . . m − 1} : P[x, y] = 1 ⇒ P[x, y + 1] = 0

3.164 ∀i, j ∈ {1, . . . , 15} : G[i, j] ⇒ (G[i − 1, j] ∨ G[i + 1, j]) ∧ (G[i, j − 1] ∨ G[i, j + 1])
3.5 Nested Quantifiers 43

3.165 We can write this by saying that the first open square in any horizontal word (G[i, j] and ¬G[i − i, j]) is followed by
two open squares horizontally (G[i + 1, j] and G[i + 2, j]), and a similar condition for vertical words:
h i
∀i, j ∈ {1, . . . , 15} : G[i, j] ∧ ¬G[i − 1, j] ⇒ G[i + 1, j] ∧ G[i + 2, j]
h i
∧ G[i, j] ∧ ¬G[i, j − 1] ⇒ G[i, j + 1] ∧ G[i, j + 2] .

3.166 This condition is the simplest one to state: ∀i, j ∈ {1, . . . , 15} : G[i, j] = G[16 − i, 16 − j].

3.167 A path from ⟨x1 , y1 ⟩ to ⟨xk , yk ⟩ is a sequence ⟨x1 , y1 ⟩, ⟨x2 , y2 ⟩, . . . , ⟨xk , yk ⟩ of squares such that

∀i ∈ {1, . . . , k} : G[xi , yi ] ∧ ∀i ∈ {1, . . . , k − 1} : |xi − xi+1 | + |yi − yi+1 | = 1 .

all the squares in the path are open we only go one square down or across in each step of the path
Let P(i, j, x, y) be a predicate that is true exactly when there exists a path from ⟨i, j⟩ to ⟨x, y⟩. Overall interlock means that

∀i, j, x, y ∈ {1, . . . , 15} : G[i, j] ∧ G[x, y] ⇒ P(i, j, x, y).

3.168 Write K to denote {1, . . . , k}. Then the set {A1 , A2 , . . . , Ak } is a partition of a set S if:

(i) ∀i ∈ K : Ai ̸= ∅ all Ai s are nonempty

S
(ii) ki=1 Ai =hS i A1 ∪ A2 ∪ · · · ∪ Ak = S
(iii) ∀i ∈ K : ∀j ∈ K : i ̸= j ⇒ Ai ∩ Aj = ∅ Ai and Aj are disjoint for i ̸= j

Alternatively, we can state (ii) directly with quantifiers: first, ∀i ∈ K : Ai ⊆ S; and, second, ∀x ∈ S : [∃i ∈ K : x ∈ Ai ].

3.169 The maximum of the array A[1 . . . n] is an integer x ∈ Z such that ∃i ∈ {1, 2, . . . , n} : A[i] = x and such that
∀i ∈ {1, 2, . . . , n} : A[i] ≤ x. (The first condition is the one more frequently omitted: it says that we aren’t allowed to
make up a maximum like 9 for the array ⟨1, 2, 3⟩, even though 9 is greater than or equal to every entry in A.)

3.170 right(c) means: ∀t ∈ T : c(t) = t.

3.171 keepsTime(c) means: ∃x ∈ Z : ∀t ∈ T : c(t) = add(t, x).

3.172 closeEnough(c) means: ∀t ∈ T : ∃x ∈ {−2, −1, 0, 1, 2} : c(t) = add(t, x).

3.173 broken(c) means: ∃t′ ∈ T : ∀t ∈ T : c(t) = t′ .

3.174 The adage says broken(c) ⇒ ∃t ∈ T : c(t) = t.

Suppose that broken(c). That is, by definition, there exists a time t′ ∈ T such that, for all t ∈ T, we have c(t) = t′ . In
particular, then, when t = t′ , we have c(t′ ) = t′ . Thus, if broken(c), then there exists a time t′ ∈ T such that c(t′ ) = t′ .

3.175 ∀i ∈ {1, 2, . . . , n} : ri = i

3.176 Suppose that we start with the sorted pile ⟨1, 2, . . . , k − 1, k, k + 1, k + 2, . . . , n⟩, and perform one flip of k ≥ 2
pancakes. The resulting pile is ⟨k, k − 1, . . . , 1, k + 1, k + 2, . . . , n⟩. That’s precisely the structure of a pile of pancakes
that can be sorted with one flip (undoing the unsorting flip), so:

∃k ∈ {2, 3, . . . , n} : ∀i ∈ {1, 2, . . . , k} : ri = k + 1 − i ∧ ∀i ∈ {k + 1, k + 2, . . . , n} : ri = i .
the pile starts with k, k − 1, . . . , 1 … … and ends with k + 1, k + 2, . . . , n

3.177 As in Exercise 3.176, after a single flip of k ≥ 2 pancakes, the resulting pile is ⟨k, k − 1, . . . , 1, k + 1, k + 2, . . . , n⟩.
Suppose we then perform an additional flip of j ≥ 2 pancakes:

• If j = k, the result is automatically sorted. (We just undid the first flip.)
• If j < k, the result is ⟨k − j − 1, k − j, . . . , k, k − j, . . . , 1, k + 1, k + 2, . . . , n⟩.
• If j > k, the result is ⟨j, j − 1, . . . , k + 1, 1, 2, . . . , k, j + 1, j + 2, . . . , n⟩.
44 Logic

Thus the pancake piles that can be sorted with two flips satisfy:

∀i ∈ {1, 2, . . . , n} : ri = i ∨ ∃j, k ∈ {2, . . . , n} : j < k ∨ ∃j, k ∈ {2, . . . , n} : j > k .

do anything, and then undo it ∧ ∀i ∈ {1, . . . , j} : ri = k − j + i ∧ ∀i ∈ {1, . . . , j − k} : ri = j + 1 − i
∧ ∀i ∈ {j + 1, . . . , k} : ri = k + 1 − i ∧ ∀i ∈ {j − k + 1, . . . , j} : ri = i + k − j
∧ ∀i ∈ {k + 1, . . . , n} : ri = i ∧ ∀i ∈ {j + 1, . . . , n} : ri = i
first undo the j-flip, then the k-flip [for j < k] first undo the j-flip, then the k-flip [for j > k]
h i
3.178 ∀x ∈ P : ∃t ∈ T : bought(x, t) ⇒ ∃y ∈ P : ∃t′ ∈ T : t′ < t ∧ bought(y, t′ ) ∧ friends(x, y)

3.179 The claim is probably false. There exists some person who was the very first person to buy an iPad. (Call her Alice.)
Then Alice is in P, and there exists a unique t0 ∈ T such that bought(x, t0 ). But Alice certainly has no friends who bought
an iPad before time t0 —in fact, nobody bought an iPad before t0 .
The reason that we can’t rule out the possibility that the claim is true is that a person in P could buy more than one
iPad. Concretely, suppose that P = {p1 , p2 , . . . , pn }, and that T = {1, 2, . . . , 2n}. Suppose that, for each person pi , we
have that bought(pi , i) and bought(pi , n + i). And suppose that everybody in P has at least one friend. Then every p ∈ P
who bought an iPad has a friend who bought an iPad before (the second time that) p did.

3.180 This assertion fails on an input in which every element of A is identical—for example, ⟨2, 2, 2, 2⟩.

3.181 This assertion fails on an array A that has two initial negative entries, where the second is larger than the first—for
example, ⟨−2, −1, 0, 1⟩.

3.182 This assertion fails on an array A that has nonunique entries—for example, ⟨2, 2, 4, 4⟩.

3.183 ∃! x ∈ Z : P(x) is equivalent to ∃x ∈ Z : [P(x) ∧ (∀y ∈ Z : P(y) ⇒ x = y)] .

3.184 ∃∞ x ∈ Z : P(x) is equivalent to ∀y ∈ Z : [∃x ∈ Z : x > y ∧ P(x)] .

3.185 Write P(n) to denote n > 2; Q(n) to denote 2 | n; and R(p, q, n) to denote isPrime(p) ∧ isPrime(q) ∧ n = p + q.
Then we have:
∀n ∈ Z : P(n) ∧ Q(n) ⇒ [∃p ∈ Z : ∃q ∈ Z : R(p, q, n)]
≡ ∀n ∈ Z : ∃p ∈ Z : ∃q ∈ Z : [P(n) ∧ Q(n) ⇒ R(p, q, n)] Exercise 3.145
≡ ∀n ∈ Z : ∃p ∈ Z : ∃q ∈ Z : [¬(P(n) ∧ Q(n)) ∨ R(p, q, n)] p ⇒ q ≡ ¬p ∨ q
≡ ∀n ∈ Z : ∃p ∈ Z : ∃q ∈ Z : [¬P(n) ∨ ¬Q(n) ∨ R(p, q, n)]. De Morgan’s Laws

3.186 The predicate isPrime(x) can be written as ∀d ∈ Z≥2 : d < x ⇒ d ̸ | x (a number x is prime if all potential divisors
between 2 and x − 1 do not evenly divide x). Thus we can rewrite Goldbach’s conjecture as:
∀n ∈ Z : ∃p ∈ Z : ∃q ∈ Z :
h h ii
n ≤ 2 ∨ 2 ̸ | n ∨ ∀d ∈ Z≥2 : (d < p ⇒ d ̸ | p) ∧ (d < q ⇒ d ̸ | q) ∧ n = p + q .

3.187 The predicate d | m can be written as ∃k ∈ Z : dk = m. Thus we can further rewrite Goldbach’s conjecture as:
∀n ∈ Z : ∃p ∈ Z : ∃q ∈ Z :

n ≤ 2 ∨ [∀k ∈ Z : 2k ̸= n]
h i
∨ ∀d ∈ Z≥2 : (d < p ⇒ [∀k ∈ Z : kd ̸= p]) ∧ (d < q ⇒ [∀k ∈ Z : kd ̸= q]) ∧ n = p + q .

3.188 A solution in Python—a slow solution; we’ll see faster algorithms for finding primes in Chapter 7—is shown in
Figure S.3.4.

3.189 This proposition is sometimes false. Let S be the set of students at a certain college in the midwest, and let
P(x, y) denote “x and y are taking at least one class together this term.” Then ∀x ∈ S : ∃y ∈ S : P(x, y) is true: every
3.5 Nested Quantifiers 45

1 def findPrimes(n):
2 primes = [2]
3 candidate = 3
4 while candidate < n:
5 if 0 not in [candidate % d for d in primes]:
6 primes.append(candidate)
7 candidate += 2
8 return primes
9
10 def testGoldbach(n):
11 primes = findPrimes(n)
12 for k in range(2, n, 2): # 2, 4, 6, ..., n
13 happy = False
14 for prime in primes:
15 if k - prime in primes:
16 print("%d = %d + %d" % (k, prime, k - prime))
17 happy = True
18 break
19 if not happy:
20 print("VIOLATION FOUND!", k)
21
22 testGoldbach(10000)

Figure S.3.4 Some Python code to test for violations of Goldbach’s conjecture. (It doesn’t find any.)

class has at least two students, so every student must have at least one classmate. On the other hand, the statement
∃y ∈ S : ∀x ∈ S : P(x, y) is false: there is no single person taking classes with everyone (every CS major, every music
major, every sociology major, etc.) on campus.

3.190 This proposition is always true. Let’s prove it. Suppose that ∃y ∈ S : ∀x ∈ S : P(x, y). That means that there is a
special y ∈ S—call it y∗ —so that ∀x ∈ S : P(x, y∗ ). To show that ∀x ∈ S : ∃y ∈ S : P(x, y), we need to show that, for an
arbitrary x, there is some yx so that P(x, yx ). But we can just choose yx = y∗ for every x.

3.191 We want to show that ∀x ∈ S : [P(x) ⇒ (∃y ∈ S : P(y))]. To do so, let’s consider a generic x ∈ S. We want to
show that P(x) ⇒ ∃y ∈ S : P(y). But this is easy! Assume the antecedent—that is, assume that P(x). But then showing
that ∃y ∈ S : P(y) is trivial: we can just choose y to be x itself. (To take a step back and state this less formally: for any
x ∈ S that satisfies P, of course there exists some element of S that satisfies P!)

3.192 We’ll prove ∃x ∈ S : [P(x) ⇒ (∀y ∈ S : P(y))] by case analysis: either ∀y ∈ S : P(y), or not.
• Suppose ∀y ∈ S : P(y). Then the given statement is equivalent to ∃x ∈ S : [P(x) ⇒ True]. But P(x) ⇒ True is true
for any x—anything implies true!—and so the given statement is equivalent to ∃x ∈ S : True, which is obviously true.
• On the other hand, suppose that ¬∀y ∈ S : P(y). Distributing the negation using De Morgan’s Laws, we know that
∃y ∈ S : ¬P(y). (†)
The statement we’re trying to prove is equivalent to ∃x ∈ S : [P(x) ⇒ False] (because ∀y ∈ S : P(y) is False under
our assumption), which is equivalent to ∃x ∈ S : [¬P(x)], because q ⇒ False is equivalent to ¬q. But ∃x ∈ S : ¬P(x)
is just (†), which we’ve already proven.
(Again, to take a step back and state this less formally: if P holds for every element of S, then the body of the existential
quantifier is satisfied for every x ∈ S, but if not then it’s satisfied for every x for which ¬P(x).)

3.193 There are two natural readings:

• ∃c : ∀d : c crashes on day d
There’s a single computer that crashes every day.
• ∀d : ∃c : c crashes on day d
Every day, some computer or another crashes.

3.194 There are two natural readings:

46 Logic

• ∀ prime p ̸= 2 : ∃ odd d > 1 : d divides p

For every prime number p except 2, there’s some odd integer d > 1 that evenly divides p.
• ∃ odd d > 1 : ∀ prime p ̸= 2 : d divides p
There’s a single odd integer d > 1 such that d evenly divides every prime number except 2.

3.195 There are four (!) natural readings:

• ∀ student s : ∃ class c : ∀ term t : s takes c during t
For every student, there’s a class c such that the student takes c every term.
• ∀ term t : ∃ class c : ∀ student s : s takes c during t
For every term, there’s a single class in that term that every student takes.
• ∀ student s : ∀ term t : ∃ class c : s takes c during t
For every student, in every term, that student takes one class or another.
• ∃ class c : ∀ student s : ∀ term t : s takes c during t
There’s a particular class c such that every student takes c in every term.

3.196 There are at least three natural readings:

• For every submitted program p, there’s a student-submitted test case (possibly different for different programs p, each
possibly submitted by a different student) that p fails on:
∀ program p : ∃ case c : ∃ student s : s submitted c, and p fails on c
• There’s a single student s such that, for every submitted program p, there’s a test case (possibly different for different
programs p submitted by s that p fails on:
∃ student s : ∀ program p : ∃ case c : s submitted c, and p fails on c
• There’s a single student who submitted a single test case c such that every submitted program fails on c:
∃ student s : ∃ case c : ∀ program p : s submitted c, and p fails on c.
(A fourth interpretation, ∃ case : ∀ programs : ∃ student s : · · · , doesn’t seem natural.)

3.197 The theorem is the interpretation For every prime number p except 2, there’s some odd integer d > 1 that evenly
divides p.” Here’s the proof: any prime number p other than 2 is necessarily odd. Thus, given any prime p ̸= 2, we can
simply choose d to be equal to p itself. After all, p is odd, and p | p.
The non-theorem is the interpretation There’s a single odd integer d > 1 such that d evenly divides every prime number
except 2. For d > 1 to evenly divide the prime number 3, we’d need d = 3—but 3 ̸ | 5.

3.198 ∃x ∈ S : P(x)

3.199 ∃x ∈ S : ∃y ∈ S : ¬P(x, y)

3.200 ∃x ∈ S : ∀y ∈ S : P(x, y)

3.201 ∀x ∈ S : ∃y ∈ S : ¬P(x, y)

3.202 ∀x ∈ S : ∀y ∈ S : ¬P(x)

3.203 ∃x ∈ S : ∃y ∈ S : P(x, y)

3.204 We’ll give a recursive algorithm to turn this type of quantified expression into a quantifier-free proposition.
Specifically:
• translate every universal quantifier ∀x ∈ {0, 1} : P(x) into the conjunction P(0) ∧ P(1).
• translate every existential quantifier ∃x ∈ {0, 1} : P(x) into the disjunction P(0) ∨ P(1).
The resulting expression is a statement of propositional logic, where the atomic propositions are Pi (0) and Pi (1), where
the predicates used in φ are P1 , P2 , . . . , Pk .
We can now figure out whether φ is a theorem by applying the algorithm for propositional logic: build a truth table
(there are 2k atomic propositions, so there are 22k rows in the truth table), and test whether the resulting expression is true
in every row. If so, φ is a theorem; if not, φ is not a theorem.
4 Proofs

4.2 Error-Correcting Codes

4.1 A solution in Python is shown in Figure S.4.1.

4.2 It does not allow us to detect any single substitution error. For example, a 7 in an odd-indexed digit would now
increase sum by 4—just as a 2 would. Thus an odd-indexed substitution of 2 ↔ 7 wouldn’t be detected. For example,
2345 6789 0123 4567 and 7345 6789 0123 4567 would produce the same value of sum.

4.3 It does not. For example, a 4 and a 1 in an odd-indexed digit have the same effect on sum: increasing sum by 3 when
the digit is 1 (because 1 · 3 = 3), and also increasing sum by 3 when the digit is 4 (because 4 · 3 = 12 → 1 + 2 = 3). So
we couldn’t detect an odd-indexed substitution of 1 ↔ 4, among other examples.

4.4 The change to quintuple the odd-indexed digits (instead of doubling them) does work! The crucial observation, as
Exercise 4.1 describes, is that the sum of the ones and tens digits of 5n are distinct for every digit n:
original digit n=0 1 2 3 4 5 6 7 8 9
quintupled tens’ digit ⌊ 10
5n
⌋=0 0 1 1 2 2 3 3 4 4
quintupled ones’ digit 5n mod 10 = 0 5 0 5 0 5 0 5 0 5
change to sum 5n mod 10 + ⌊ 10
5n
⌋=0 5 1 6 2 7 3 8 4 9
Because the 10 values in the last row of this table are all different, the quintupling version of the code successfully detects
any single substitution error.

4.5 While cc-check can detect many transpositions errors, it cannot detect them all. For example, swapping an adjacent 0
and 9 has no effect on the overall sum. If the first two digits of the number are 09 [doubling the first], then the contribution
to sum from these digits is (0 + 0) + 9 = 9. But if we transpose those first two digits to 90 [doubling the first], then the
contribution to sum is (1 + 8) + 0 = 9. Both sequences contribute 9 to sum, so they cannot be distinguished.

4.6 To argue that the Hamming distance ∆ : {0, 1}n × {0, 1}n → R≥0 is a metric, we will argue that the three desired
properties (reflexivity, symmetry, and transitivity) hold for arbitrary inputs. Let x, y, z ∈ {0, 1}n be arbitrary n-bit strings.
Reflexivity: d(x, x) = 0 because x and x differ in zero positions. For x ̸= y, there must be an index i such that xi ̸= yi
(that’s what it means for x ̸= y). Thus ∆(x, y) ≥ 1.
Symmetry: we need to argue that d(x, y) = d(y, x) for all x, y. The function ∆(x, y) denotes the number of bit positions
in which x and y differ—that is, ∆(x, y) = |{i : xi ̸= yi }|. Similarly, ∆(y, x) = |{i : yi ̸= xi }|. But {i : xi ̸= yi } =
{i : yi ̸= xi }, because ̸= is a symmetric relation: a ̸= b just means precisely the same thing as b ̸= a.
Triangle inequality: we need to argue that d(x, y) ≤ d(x, z) + d(z, y) for all x, y, z. Let’s think about a single bit at
a time. We claim that, for any xi , yi , zi ∈ {0, 1}, the following holds: if xi ̸= yi , then either xi ̸= zi or yi ̸= zi . (This
fact ought to be fairly obvious; the easiest way to see it may be by drawing out all eight cases—for each xi ∈ {0, 1},
yi ∈ {0, 1}, and zi ∈ {0, 1}—and checking them all.) Thus
∆(x, y) = |{i : xi ̸= yi }|
≤ |{i : xi ̸= zi or yi ̸= zi }| the above argument
= |{i : xi ̸= zi } ∪ {i : yi ̸= zi }|
≤ |{i : xi ̸= zi }| + |{i : yi ̸= zi }| because |S ∪ T| ≤ |S| + |T|
= ∆(x, z) + ∆(z, y).

4.7 This function is a metric.

47
48 Proofs

1 def is_valid_credit_card_number(numstring):
2 '''
3 Is the given number (represented as a string) a legitimate credit-card number?
4 '''
5 total = 0
6 for i in range(len(numstring)):
7 digit = int(numstring[i])
8 if i % 2 == 0: # Python is 0-indexed, so even i --> double the digit.
9 digit = 2 * digit
10 total += (digit % 10) + (digit // 10) # ones' digit + tens' digit.
11 return total % 10 == 0
12
13 def complete_credit_card_number(numstring):
14 '''
15 Given a numerical string with zero or one "?"s, fill in the ? [if any] to make
16 it a legitimate credit-card number. (Returns "None" in error conditions.)
17 '''
18 if numstring.count("?") == 0 and is_valid_credit_card_number(numstring):
19 return numstring
20 elif numstring.count("?") != 1:
21 return None # we can't handle multiple s, or invalid -less numbers.
22
23 missing_index = numstring.index("?")
24 total = 0
25 for i in range(len(numstring)):
26 if numstring[i] != "?":
27 digit = int(numstring[i])
28 if i % 2 == 0:
29 digit = 2 * digit
30 total += (digit % 10) + (digit // 10)
31 total = total % 10
32
33 if missing_index % 2 == 1: # we're missing an undoubled digit.
34 missing_digit = -total % 10
35 elif total % 2 == 0: # we're missing a doubled digit, and the total without it is even.
36 missing_digit = (-total % 10) // 2
37 else: # we're missing a doubled digit, and the total without it is odd.
38 missing_digit = 9 - (total // 2)
39
40 return numstring.replace("?", str(missing_digit))
41
42 Alice = "4389 5867 4645 4389".replace(" ","") # https://fossbytes.com/tools/credit-card-generator
43 Bob = "4664 6925 9756 4261".replace(" ","") # invalid!
44 for card in [Bob, Alice] + [Alice[:i] + "?" + Alice[i+1:] for i in range(16)]:
45 print(card, complete_credit_card_number(card))

Figure S.4.1 Testing the validity of a credit-card number, or filling in a missing digit, in Python.

Reflexivity is easy: certainly d(x, y) = 0 if and only if x = y, because x1,...,n = y1,...,n means x = y.
Symmetry is also easy, because the definition of d was symmetric with respect to its two arguments; xi+1,...,n = yi+1,...,n
if and only if yi+1,...,n = xi+1,...,n .
For the triangle inequality, we need to argue that d(x, y) ≤ d(x, z) + d(y, z). If x = y, we’re done immediately.
Otherwise, suppose that xi ̸= yi , but xi+1,...,n = yi+1,...,n . Because zi therefore cannot equal both xi and yi , we have that
d(x, y) ≤ d(x, z) or d(x, y) ≤ d(y, z). In either case, certainly d(x, y) ≤ d(x, z) + d(y, z).

4.8 This function meets reflexivity and symmetry, but it fails the triangle inequality. For example, d(0000, 1111) = 4,
but d(0000, 0101) = 1 and d(0101, 1111) = 1 (because there’s never more than one consecutive disagreement between
0101 and either 0000 or 1111). But then

d(0000, 1111) ̸≤ d(0000, 0101) + d(0101, 1111) .

4 1 1
4.2 Error-Correcting Codes 49

4.9 This function fails reflexivity. The problem statement itself gives an example: d(11010, 01010) = |2 − 2| = 0, but
11010 ̸= 01010.

4.10 This function violates the triangle inequality. For example, let x = 01, y = 10, and z = 11. Then:

d(01, 10) =1− 2·0

1+1
=1− 0
2
=1
d(01, 11) =1− 2·1
1+2
=1− 2
3
= 31
d(10, 11) =1− 2·1
1+2
=1− 2
3
= 31 .

But then d(01, 10) ̸≤ d(01, 11) + d(11, 01) .

1 1/3 1/3

4.11 This function is a metric—in fact, it’s exactly equivalent to regular-old geometric distance on the line.
Reflexivity is straightforward (|a − b| = 0 if and only if a = b, and x and y represent the same number in binary if
and only if x = y (as strings).
Symmetry is immediate from the fact that |a − b| = |b − a|.
The triangle inequality follows because


|a − b| if b ≤ c ≤ a, or if a ≤ c ≤ b
|a − c| + |c − b| = 2|a − b| + |b − c| if a ≤ b ≤ c, or if c ≤ b ≤ a

2|a − b| + |a − c| if b ≤ a ≤ c, or if c ≤ a ≤ b.

In all three cases, this value is at least |a − b|.

4.12 By definition of the minimum distance of a code C being 2t + 1, there exist two codewords x∗ , y∗ ∈ C such that
∗ ∗
∆(x , y ) = 2t + 1.
Suppose we receive y∗ as our received (possibly corrupted) codeword. We cannot distinguish between y∗ as the trans-
mitted codeword (and there was no error), or x∗ as the transmitted codeword (and it was corrupted by precisely 2t + 1
errors into y∗ ). So we cannot detect 2t + 1 errors.
Let i be the smallest index such that ∆(x1...i , y1...i ) = t + 1. Thus ∆(xi+1...n , yi+1...n ) = t. Suppose we receive the
bitstring z, where zj = x∗j for j ≤ i and zj = y∗j for j > i. Then ∆(y∗ , z) = t + 1 and ∆(x∗ , z) = t. If up to t + 1 errors
have occurred, we can’t distinguish these two cases. So we cannot correct t + 1 errors.

4.13 The argument is very similar to Theorem 4.6.

For error detection, we can detect 2t − 1 errors. It’s still the case that the only way we’d fail to detect an error is if the
transmitted codeword c were corrupted into a bitstring c′ that was in fact another codeword. Because ∆(c, c′ ) ≥ 2t, this
mistake can only occur if there were 2t or more errors.
For error correction, we can correct t−1 errors. (When the minimum distance is even, after t errors we might be exactly
halfway between two codewords, in which case we can’t correct the error.) Let x ∈ C, let c′ ∈ {0, 1}n be a corrupted
bitstring with ∆(x, c′ ) ≤ t − 1, and let y ∈ C − {x} be any other codeword. Then:
′ ′
∆(c , y) ≥ ∆(x, y) − ∆(x, c ) triangle inequality
′
≥ 2t − ∆(x, c ) ∆(·, ·) ≤ 2t by definition of minimum distance
≥ 2t − (t − 1) ∆(x, c′ ) ≤ t − 1 by assumption
=t+1
≥ ∆(x, c′ ). ∆(x, c′ ) ≤ t − 1 by assumption

Just as in the previous exercise, we cannot detect 2t errors (a received bitstring in C could have experienced 2t errors)
or detect t errors (the bitstring z precisely halfway between x, y ∈ C with ∆(x, y) = 2t has ∆(x, z) = ∆(y, z) = t).

4.14 Let C ⊆ {0, 1}n be a code. Call a codeword x ∈ C consistent with a string x′ ∈ {0, 1, ?}n , for all i ∈ {1, 2, . . . , n},
we have that x′i = xi or x′i = ?.
If two bitstrings x, y ∈ {0, 1}n are both consistent with x′ , then x and y can differ only in those positions where x′i = ?.
Thus, if the number of ?s in x′i is ≤ t, then ∆(x, y) ≤ t.
Now suppose that C can detect t substitution errors. Thus the minimum distance of C must be at least t + 1. (See the
previous two exercises.) But then, if we erase t entries in any codeword x ∈ C to produce x′ , there is no other codeword
consistent with x′ . So we can correct t erasure errors by identifying the unique codeword consistent with the received
string.
50 Proofs

4.15 Because a deletion is just a silent erasure, we can transform any erasure into a deletion by deleting each ‘?’. So, if
we receive a codeword with at most t erasures, we turn it into a codeword with at most t deletions; because the code can
correct these deletion errors, we can use the deletion-error-correcting algorithm, whatever it is, to correct the resulting
string.

4.16 The code C = {0001, 0010, 0100, 1000} can correct any single erasure error—if there’s a 1 in the unerased part of
the codeword, the erased bit was a 1; otherwise the erased bit was a 0. But C cannot handle a single deletion error: the
corrupted codeword 000 can result from a single deletion in any of the four codewords in C.

4.17 We have n-bit codewords, and n-bit messages (because |C| = 2n ). Any distinct codewords differ in at least one
position (otherwise they’d be the same!), and 000 · · · 0 and 100 · · · 0 differ in only one position. Thus the rate of this code
is nn = 1, and the minimum distance is 1 as well.
Because the minimum distance is 2 · t + 1 with t = 0, this code can correct up to t = 0 errors, and detect up to 2t = 0
errors. (That is, it’s useless.)

4.18 There are 2 = 21 codewords with n bits each, so the rate is 1n . The minimum distance is just the Hamming distance
between the two codewords, which is n. We can detect up to n − 1 errors, and we can correct up to ⌊(n − 1)/2⌋ errors.

4.19 We have n-bit codewords, where the nth bit is fixed given the first n − 1 bit values. So we have (n − 1)-bit messages
and n-bit codewords. Thus the rate is (n − 1)/n = 1 − n1 . Because any two bitstrings that differ in exactly one position
also have different parity, the minimum distance of this code is 2.
Because the minimum distance is 2 · t with t = 1, this code can correct up to t = 1 errors, and detect up to t − 1 = 0
errors. (That is, it can detect a single error, but can’t correct anything—it’s got the same basic properties as the credit-card
number scheme.)

4.20 The key fact that we’ll use in the proof is that for two bitstrings x, y ∈ {0, 1}n , we have parity(x) = parity(y) if
and only if ∆(x, y) is even. (To see this fact, observe that the bits where xi = yi contribute equally to both sides of the
equation, and the bits where xi ̸= yi contribute a 1 to exactly one side. The two sides end up having equal parity if and
only if we’ve added 1 to one side or the other an even number of times.)
Consider two distinct codewords x, y ∈ C. By definition of minimum distance, ∆(x, y) ≥ 2t + 1. We’ll give an
argument by cases that the corresponding codewords x′ , y′ ∈ C ′ have Hamming distance that’s at least 2t + 2.
If ∆(x, y) ≥ 2t + 2, then we’re immediately done: we’ve added an extra bit to the end of x and y, but that can’t possibly
decrease the Hamming distance.
If ∆(x, y) = 2t + 1, then we argued above that parity(x) ̸= parity(y)—but then ∆(x′ , y′ ) = 1 + ∆(x, y) =
1 + (2t + 1) = 2t + 2.
(Note that ∆(x, y) cannot be less than 2t + 1, by definition.)

4.21 Given a bitstring c′ , let c∗ be the codeword in the Repetitionℓ code closest to c′ in Hamming distance. By definition,
the codeword c∗ consists of ℓ identical repetitions of some block m. Suppose for a contradiction that there exists an index i
such that there are k < ℓ2 blocks of c′ in which the ith entry of the block differs from c∗i . Define codeword ĉ to match c∗ in
all bits except the ith of each block, which is flipped. It’s easy to see that ∆(c∗ , c′ ) − ∆(ĉ, c′ ) = (ℓ − k) − k = ℓ − 2k > 0,
which means that ∆(c∗ , c′ ) > ∆(ĉ, c′ ). This contradicts the assumption that c∗ was the closest codeword to c′ .

4.22 We fail to correct an error in a Repetition3 codeword only if the same error is repeated in two of the three codeword
blocks. Thus we could successfully correct as many as 4 errors in the 12-bit codewords: for example, if the codeword
were 0000 0000 0000 and it were corrupted to 1111 0000 0000.

4.23 Rephrased, the upper-bound argument of Section 4.2.5 implies that every 7-bit string is within Hamming distance 1
of a codeword. (There are 16 codewords, and 8 unique bitstrings within Hamming distance 1 of each, and so there are
16 · 8 = 128 different bitstrings within Hamming distance 1 of their nearest codeword. But there are only 128 different
7-bit strings!) Therefore, if a codeword c is corrupted by two different errors, then necessarily the resulting string c′ is
closer to a different codeword than c, because c′ is within Hamming distance 1 of some codeword (and it’s not c).

4.24 A solution in Python appears in Figure S.4.2.

4.25 The proof of Lemma 4.14 carries over directly. Consider any two distinct messages m, m′ ∈ {0, 1}n . We must show
that the codewords c and c′ associated with m and m′ satisfy ∆(c, c′ ) ≥ 3. We’ll consider three cases based on ∆(m, m′ ):
Case I: ∆(m, m′ ) ≥ 3. We’re done immediately; the message bits of c and c′ differ in at least three positions.
4.2 Error-Correcting Codes 51

1 def dot_product(x, y): # error-checking if len(x) != len(y) is omitted.

2 total = 0
3 for i in range(len(x)):
4 total += x[i] * y[i]
5 return total
6
7 def hamming_distance(x, y): # error-checking if len(x) != len(y) is omitted.
8 dist = 0
9 for i in range(len(x)):
10 if x[i] != y[i]:
11 dist += 1
12 return dist
13
14 def encode_hamming(m): # error-checking if len(m) != 4 is omitted.
15 return [dot_product(m, x) % 2 for x in [[1, 0, 0, 0],
16 [0, 1, 0, 0],
17 [0, 0, 1, 0],
18 [0, 0, 0, 1],
19 [0, 1, 1, 1],
20 [1, 0, 1, 1],
21 [1, 1, 0, 1]]]
22
23 # All 16 4-bit messages:
24 messages = [[(x // 8) % 2, (x // 4) % 2, (x // 2) % 2, x % 2] for x in range(16)]
25
26 # Print distance between encodings (under Hamming code) of all message pairs.
27 # Output: 16 pairs of distance 0, 112 of distance 3, 112 of distance 4, and 16 of distance 7.
28 # (The pairs of distance 0 are when m1 == m2.)
29 for m1 in messages:
30 for m2 in messages:
31 c1, c2 = encode_hamming(m1), encode_hamming(m2)
32 print(c1, c2, hamming_distance(c1, c2))

Figure S.4.2 Python code to verify that the minimum distance of the Hamming code is 3.

Case II: ∆(m, m′ ) = 2. Then at least one of the three parity bits contains one of the bit positions where mi ̸= m′i but
not the other. (This fact follows from (b).) Therefore this parity bit differs in c and c′ , giving two differing message bits
and at least one differing parity bit.
Case III: ∆(m, m′ ) = 1. Then at least two of the three parity bits contain the bit position where mi ̸= m′i . (This fact
follows from (a).) This fact implies that there is one differing message bits and at least two differing parity bit.

4.26 We’re looking for four parity bits for 11-bit messages. There are a lot of options; here’s one of them. Writing the
message bits as abcdefghijk, define the following parity bits:

parity bit #1: the parity of bcd hij k

parity bit #2: the parity of a cd fg j k
parity bit #3: the parity of ab d e g i k
parity bit #4: the parity of abc ef h k

There are four message bits in three different parity bits (abcd); six message bits in two different parity bits (efghij);
and one message bit all four parity bits (k). Thus every message bit appears in at least two parity bits, and it’s immediate
from the table above that no two message bits appears in exactly the same parity bits.

4.27 There are a lot of options, but here’s one of them. Writing the message bits as abcdefghijklmnopqrstuvwxyz,
define the following parity bits:
52 Proofs

parity bit #1: bcde jklmno pqrs z

parity bit #2: a cde ghi mno p tuv z
parity bit #3: ab de f hi kl o q t wx z
parity bit #4: abc e fg ij l n r u w y z
parity bit #5: abcd fgh jk m s v xy z

There are 5 message bits in four different parity bits (abcde); 10 message bits in three different parity bits (fghijklmno);
10 message bits in two different parity bits (pqrstuvwxy); and one message bit all five parity bits (z). Thus every message
bit appears in at least two parity bits, and it’s again immediate from the list of parity bits that no two message bits appears
in exactly the same set of parity bits.

4.28 We have (n − ℓ)-bit messages and n-bit codewords, so we need to define ℓ parity bits satisfying conditions (a) and
(b) from Exercise 4.25. That is, the message bits are numbered 1, 2, . . . , 2ℓ − ℓ − 1. Define the set of all ℓ-bit bitstrings
that contain two or more ones:
B = {x ∈ {0, 1}ℓ : x contains at least 2 ones} .

Note that there are 2ℓ bitstrings in {0, 1}ℓ , of which ℓ contain exactly one 1 and one contains no 1 at all. (This argument
is exactly the one in Lemma 4.15.) Thus |B| = 2ℓ − ℓ − 1.
Label each message bit with a unique bitstring from B. Then, for i ∈ {1, 2, . . . , ℓ}, define parity bit #i to be the parity of
message bits {x ∈ B : xi = 1}. Because each x ∈ B has at least two ones, by definition, we satisfy condition (a). Because
each bitstring in B is distinct, we satisfy condition (b).

4.29 We can solve the rat-poisoning problem using the parity bits of the Hamming code, with k = 3 rats. Call the purported
poison bottles {a, b, c, d, e, f, g, h}, and feed the rats as follows:
• Rat #1 gets {a, b, c, e}.
• Rat #2 gets {a, b, d, f}.
• Rat #3 gets {a, c, d, g}.
Because each purported poison has been given to a distinct subset of the rats, we can figure out which bottle is real poison
by seeing which rats died. (Observe that the key fact is that we have 23 = 8 subsets of rats, and ≤ 8 bottles to test.) Here
is the decoding of the results of the evil experiment:

nobody dies =⇒ h is the real poison.

rat #1 dies =⇒ e is the real poison.
rat #2 dies =⇒ f is the real poison.
rat #3 dies =⇒ g is the real poison.
rats #1 and #2 die =⇒ b is the real poison.
rats #1 and #3 die =⇒ c is the real poison.
rats #2 and #3 die =⇒ d is the real poison.
rats #1, #2, and #3 die =⇒ a is the real poison.

4.30 One solution in Python is shown in Figure S.4.3.

4.31 We’re going to use the fact that every 7-bit string is either (a) a Hamming code codeword, or (b) within Hamming
distance 1 of a Hamming code codeword. We’ll also rely on the fact that the number of codewords themselves is a small
fraction of {0, 1}7 —many more bitstrings fall into category (b).
Number the people 1, 2, . . . , 7. If you are the ith person, look at the 7-bit string corresponding to the hats that you see
on all other people (call blue 0 and red 1), placing a ? in your own position i. Observe that there is necessarily a way to fill
in the ? so that the resulting bitstring x ∈ {0, 1}7 is not a Hamming codeword: if you see 6 bits that are consistent with a
codeword c, then ? would be ¬ci ; if you see 6 bits that differ from a codeword c in exactly one spot, then ? would be ci .
Declare that your hat color is the bit that you just filled in for ?.
• If your hats are assigned in such a way that the corresponding bitstring is a Hamming code codeword, then you will
all incorrectly state the color of your hat. You lose.
• If your hats are assigned in such a way that the corresponding bitstring is not a Hamming code codeword, then you
will all correctly state the color of your hat. (If your hats are not a codeword, there’s one person i whose hat is wrong
relative to the closest codeword c; everyone except i answers consistently with c, and i answers inconsistently with
c—which means you’re all correct!) You win.
4.3 Proof Techniques 53

1 def golay_code():
2 output = open("golay_code.txt", "w")
3 codewords = []
4 candidate = 0
5 while candidate < 2**23:
6 okay = True
7 for i in range(len(codewords), 0, -1): # counting down!
8 if hamming_distance(int_to_binary(candidate), codewords[i-1]) < 7:
9 okay = False
10 break
11 if okay:
12 codewords.append(int_to_binary(candidate))
13 output.write(int_to_binary(candidate) + "\n")
14 candidate += 1
15 output.close()

Figure S.4.3 A Python implementation of the “greedy algorithm” for the Golay code.

There are 24 = 16 codewords in the Hamming code, and 27 = 128 hat configurations. You only lose on codewords, and
there are 16 codewords. Thus you lose a 16/128 = 1/8 fraction of the time. You win with probability 7/8 = 0.875.

4.32 Consider the set of codewords with their first d − 1 bits deleted:

C ′ = {xd,d+1,...,n : x ∈ C} .

Let x and y be two distinct codewords in C. Observe that their corresponding bitstrings in C ′ must still be distinct:
∆(xd,d+1,...,n , yd,d+1,...,n ) ≥ 1 because the first d − 1 bits of x, y can’t differ by more than d − 1 positions!
Therefore |C| = |C ′ |. But C ′ ⊆ {0, 1}n−d+1 , and so |C ′ | ≤ 2n−d+1 . Because |C| = 2k by definition, we have
2 = |C| = |C ′ | ≤ 2n−d+1 , and thus k ≤ n − d + 1.
k

4.3 Proof Techniques

4.33 Assume that x and y are rational. By definition, then, there exist integers a ̸= 0, b ̸= 0, c, and d, such that x = c
a
and
y = bd . Then

x−y= c
a
+ d
b
= cb+da
ab
.

Because a, b, c, d ∈ Z, we have that cb + da ∈ Z and ab ∈ Z too; because a ̸= 0 and b ̸= 0 we have that ab ̸= 0.

Therefore x − y is rational, by definition, because we can write it as mn for integers n and m ̸= 0—namely, n = cb + da
and m = ab.

4.34 Assume that x and y are rational, and y ̸= 0. By definition, then, there exist integers a ̸= 0, b ̸= 0, c, and d, such that
x = ac and y = db ; because y ̸= 0, we also have that d ̸= 0. Then
c
x c b bc
= a
d
= · = .
y b
a d ad

Because a, b, c, d ∈ Z, we have that bc, ad ∈ Z; because a ̸= 0 and d ̸= 0 we have that ad ̸= 0. Therefore x

y
is rational
by definition.

4.35 The first statement is false, and the second is true.

√
• If xy and x are both rational, then y is too. This statement is false for x = 0 and y = 2: in this case x = 0 and xy = 0
are both rational, but y is not rational.
• If x − y and x are both rational, then y is too. This statement is true. To prove it, we’ll simply invoke Exercise 4.33,
which states that if a and b are rational, then a − b is rational too. Let a = x and b = x − y. Then Exercise 4.33 says
that a − b = x − (x − y) = y is rational too.
54 Proofs

4.36 Let n be any odd integer. We can express n in binary as ⟨bk , bk−1 , . . . , b1 , b0 ⟩ ∈ {0, 1}k+1 for some k ≥ 1, so that

n = 2k bk + 2k−1 bk−1 + · · · + 2b1 + b0 .

Then, taking both sides modulo 2 and observing that 2i bi mod 2 = 0 for any i ≥ 1, we have
n mod 2 = b0 mod 2.
Because n is odd, we know that n mod 2 = 1. Thus b0 = b0 mod 2 = 1 as well.

4.37 Let n be a positive integer. We can express the digits of n as ⟨ak , ak−1 , . . . , a1 , a0 ⟩ ∈ {0, 1, . . . , 9}k+1 for some k ≥ 1,
so that
n = 10k ak + 10k−1 ak−1 + · · · + 10a1 + a0 .
Taking both sides modulo 5 and observing that 10i ai mod 5 = 0 for any i ≥ 1, we have
n mod 5 = a0 mod 5.
The integer n is divisible by 5 if and only if n mod 5 = 0, which therefore occurs if and only if a0 mod 5 = 0. The only
two digits that are evenly divisible by 5 are {0, 5}, so n is divisible by 5 if and only if a0 ∈ {0, 5}.

4.38 Let dℓ , dℓ−1 , . . . , d1 , d0 denote the digits of n, reading from left to right, so that

n = d0 + 10d1 + 100d2 + 1000d3 + · · · + 10ℓ dℓ ,

or, factoring 10i into 2i · 5i and then dividing both sides by 2k ,

X
ℓ
n= 2i 5i di
i=0

X
ℓ
n/2k = 2i−k 5i di . (∗)
i=0

The integer n is a divisible by 2k if and only if n/2k is an integer, which because of (∗) occurs if and only if the right-hand
P
side of (∗) is an integer. And that’s true if and only if k−1
i=0 2
i−k i
5 di is an integer, because all other terms in the right-hand
k Pk−1 i i P Pk−1 i
side of (∗) are integers. Therefore 2 | n if and only if 2 | i=0 2 5 di —but k−1
k i i
i=0 2 5 di = i=0 10 di simply is the last
k digits of n.

4.39 We can factor n3 − n = (n2 − 1)n = (n + 1)(n − 1)n. We can then give a proof by cases, based on the value of
n mod 3:
• If n mod 3 = 0, then (n + 1)(n − 1)n is divisible by three because n is.
• If n mod 3 = 1, then (n + 1)(n − 1)n is divisible by three because n − 1 is.
• If n mod 3 = 2, then (n + 1)(n − 1)n is divisible by three because n + 1 is.

4.40 We’ll give a proof by cases based on n mod 3:

• If n mod 3 = 0, then we can write n = 3k for an integer k. Then n2 + 1 = (3k)2 + 1 = 9k2 + 1. Because 3k2 is an
integer—that is, because 3 | 9k2 , we have that n2 + 1 mod 3 = 1.
• If n mod 3 = 1, then we can write n = 3k + 1 for an integer k. Then n2 + 1 = (3k + 1)2 + 1 = (9k2 + 6k + 1) + 1 =
(9k2 + 6k) + 2 = 3(3k2 + 2k) + 2. Thus we have that n2 + 1 mod 3 = 2.
• If n mod 3 = 2, then we can write n = 3k + 2 for an integer k. Then n2 + 1 = (3k + 2)2 + 1 = (9k2 + 12k + 4) + 1 =
(9k2 + 12k) + 5 = 3(3k2 + 4k) + 5. Thus we have that n2 + 1 mod 3 = 5 mod 3 = 2.

In all of these cases, n2 + 1 is not evenly divisible by three.

4.41 We can prove this claim from scratch, or use Example 4.14, which states that |a − b| ≤ |a − c| + |b − c|. Let a = x,
b = −y, and c = 0. Then we have |x − (−y)| ≤ |x − 0| + |(−y) − 0| which, rewriting, states that |x + y| ≤ |x| + | − y|.
The claim follows because |y| = | − y|.
4.3 Proof Techniques 55

4.42 Rewriting, we have to show that |x| ≤ |y| + |x − y|. We can prove this claim from scratch, or use Example 4.14,
which states that |a − b| ≤ |a − c| + |b − c|. Let a = x + y, b = y, and c = 2y.
Then we have |(x + y) − y| ≤ |(x + y) − 2y| + |y − 2y|, which, rewriting, states that |x| ≤ |x − y| + | − y|. Because
| − y| = |y|, we thus have |x| ≤ |x − y| + |y|. Subtracting |y| from both sides yields |x| − |y| ≤ |x − y|, as desired.

4.43 We’ll prove this result by cases, depending whether both, neither, or one of {x, y} is negative:

• Suppose both are negative: that is, x < 0 and y < 0. Then |x| = −x and |y| = −y, and x · y > 0 so |x · y| = xy. Thus
|x||y| = (−x)(−y) = xy = |x · y|.
• Suppose neither is negative: that is, x ≥ 0 and y ≥ 0. Then |x| = x and |y| = y, and x · y > 0 so |x · y| = xy. Thus
|x||y| = xy = |x · y|.
• Suppose that one of x and y is negative. Without loss of generality, suppose x < 0 and y ≥ 0. Then |x| = −x and
|y| = y, and x · y ≤ 0 so |x · y| = −xy. Thus |x||y| = (−x)y = −xy = |x · y|.

4.44 Assume that |x| ≤ |y|. This claim is a straightforward consequence of Exercise 4.41:

|x + y| ≤ |x| + |y| Exercise 4.41

≤ |y| + |y| |x| ≤ |y| by assumption
= 2|y|.

Thus |x + y|/2 ≤ |y|.

4.45 We’ll prove the result by mutual implication. First, we’ll prove the right-to-left direction: if A = ∅ or B = ∅ or
A = B, then A × B = B × A. We’ll consider all three cases:

• If A = ∅, then A × B = ∅ and B × A = ∅. Thus A × B = B × A.

• If B = ∅, similarly, we have A × B = B × A.
• If A = B, then A × B and B × A are equal because they’re actually expressing the same set (which is the same as A × A
and B × B).

For the converse, we’ll prove the contrapositive: if A ̸= ∅ and B ̸= ∅ and A ̸= B, then A × B ̸= B × A. Because A ̸= B,
there exists an element in one set but not the other—that is, an element a ∈ A but a ∈
/ B, or an element b ∈ B but b ∈
/ A.
Without loss of generality, we’ll consider the former case. Suppose a ∈ A but a ∈
/ B. Because B ̸= ∅, there is an element
b ∈ B. Then ⟨a, b⟩ ∈ A × B, but a ∈ / B so ⟨a, b⟩ ∈/ B × A. Thus A × B ̸= B × A.

4.46 This claim is straightforward to see with two cases:

• if x ≥ 0 then x · x = positive · positive = positive.

• if x ≤ 0 then x · x = negative · negative = positive.

4.47 By Exercise 4.46, we have (x − y)2 ≥ 0. Adding 4xy to both sides, we have (x − y)2 + 4xy ≥ 4xy. Observe
√ that
(x − y)2 + 4xy =√x2 − 2xy + y2 + 4xy = x2 + 2xy + y2 = (x + y)2 . Thus we have (x + y)2 ≥ 4xy. So x + y ≥ 4xy,
and (x + y)/2 ≥ xy.

4.48 The “if” direction

√ is simple: if x = y, then arithmetic mean of x and y is (x + y)/2 = 2x/2 = x, and the geometric
√ √
mean is xy = x2 = x. The “only if” direction requires us to show that if (x + y)/2 = xy, then x = y. We show this
fact by algebra:
√ √
(x + y)/2 = xy ⇔ x + y = 2 xy
√
⇒ (x + y)2 = (2 xy)2
⇒ x2 + 2xy + y2 = 4xy
⇔ x2 − 2xy + y2 = 0
⇔ (x − y)2 = 0
⇔ x = y.
56 Proofs

√
4.49 This claim follows immediately from the Arithmetic Mean–Geometric Mean inequality: because y ≥ x, we know
that y ≥ 0. And we’re told that x ≥ 0, so x/y ≥ 0 too. Thus we can invoke Exercise 4.47 on y and x/y:
y + yx r
x
≥ y· by Exercise 4.47
2 y
√
= x.

4.50 Observe that

√√ √ √ √ √
x| − | x − xy | = (y − x) − ( x − xy )
|y − y> x by assumption, so x/y < x
√
= y − 2 x + xy
√
= (y2 − 2y x + x)/y
√
= (y − x)2 /y.
√ √ √
By assumption, y > x ≥ 0. Thus (y − x)2 > 0, and (y − x)2 /y > 0 too. Thus
√ √
|y − x| − | x − xy | > 0.
√ √ √
Adding | x − xy | to both sides and observing that | x − xy | = | xy − x| yields the desired claim:
√ √
|y − x| > | xy − x|. (∗)

For the second part of the statement, (∗) and Exercise 4.44 tell us that
√ √
√
x
y
− x+y− x
y− x >
2
√
x
y
+y−2 x
=
2
x
y
+y √
= − x .
2

4.51 We’ll prove the contrapositive: if n is a perfect square, then n mod 4 ∈/ {2, 3} (that is, n mod 4 ∈ {0, 1}). Assume
that n is a perfect square—say n = k2 for an integer k. Let’s look at two cases: k is even, or k is odd.
Case I: k is even. Then k = 2d for an integer d. Therefore n = k2 = (2d)2 = 4d2 . Because d is an integer, 4d2 mod 4 =
0, and thus n mod 4 = 0.
Case II: k is odd. Then k = 2d + 1 for an integer d. Therefore n = k2 = (2d + 1)2 = 4d2 + 2 · (2d · 1) + 12 =
4d + 4d + 1. Because d is an integer, we have 4d2 mod 4 = 0 and 4d mod 4 = 0, and thus n mod 4 = 1.
2

4.52 We’ll prove the contrapositive: if n or m is evenly divisible by 3, then nm is. Without loss of generality, suppose that n
is divisible by three—say, n = 3k for an integer k. Then nm = 3km for integers k and m—and 3 · (any integer) is divisible
by three by definition.

4.53 We’ll prove the contrapositive: if n is odd, then 2n4 + n + 5 is even. But 2n4 is even, and n is odd by the assumption
that n is odd, and 5 is odd—and even + odd + odd is even, so the claim follows.

4.54 We proceed by mutual implication.

Suppose that n is even. Any integer times an even number is even, and so thus n · n2 = n3 is even (because n is even).
Conversely, we need to show that if n3 is even, then n is even. We’ll prove the contrapositive: if n is odd, then n3 is
odd. If n is odd, we can write n = 2k + 1 for an integer k, and thus n3 = (2k + 1)3 = 8k3 + 12k2 + 6k + 1—which is
odd, because the first three terms are all even.

4.55 We proceed by mutual implication.

First, assume 3 | n. By definition, there exists an integer k such that n = 3k. Therefore n2 = (3k)2 = 9k2 = 3 · (3k2 ).
Thus 3 | n2 too.
4.4 Some Examples of Proofs 57

Second, for the converse, we prove the contrapositive: if 3 ̸ | n, then 3 ̸ | n2 . Assume that 3 ̸ | n. Then there exist
integers k and r ∈ {1, 2} such that n = 3k + r. Therefore n2 = (3k + r)2 = 9k2 + 6kr + r2 = 3(3k2 + 6r) + r2 .
Thus 3 | n2 only if 3 | r2 . But r ∈ {1, 2}, and 12 = 1 and 22 = 4, neither of which is divisible by 3. Therefore
n2 = (3k + r)2 = 9k2 + 6kr + r2 = 3(3k2 + 6r) + r2 is not divisible by 3.

4.56 Assume for a contradiction that x and y are both integers. By factoring the left-hand side, we see that (x−y)(x+y) = 1.
The only product of integers equal to 1 is 1 · 1, so we have x − y = 1 and x + y = 1. Adding these two equations together
yields x − y + (x + y) = 1 + 1—that is, 2x = 2. Thus x = 1. But then y must equal 0, which isn’t a positive integer.

4.57 Suppose for a contradiction that 12x + 3y = 254 for integers x and y. But 12x + 3y is therefore divisible by three,
because 3 | 12x and 3 | 3y. But 254 = 3 · (84) + 2, so 3 ̸ | 254. But then 3 | 12x + 3y and 3 ̸ | 254, which contradicts the
assumption that 12x + 3y = 254.
√ √
4.58 Assume that 3 2 is rational. Thus we can write 3 2 = n/d for integers n and d ̸= 0 where n and d are in lowest terms.
Cubing both sides yields that n3 /d3 = 2, and therefore that n3 = 2d3 . Because 2d3 is even, we know that n3 is even.
Therefore, by Exercise 4.54, n is even too. Because n is even, we can write n = 2k for an integer k, so that n3 = 8k3 .
Thus n3 = 8k3 and n3 = 2d3 , so 2d3 = 8k3 and d3 = 4k3 . Hence d3 is even, and—again using Exercise 4.54—we have
that d is even. But that is a contradiction: we assumed that n/d was in lowest terms, but now we have shown that n and d
are both even!
√ √ √
4.59 Assume that 3 is rational—that is, assume that we can write 3 in lowest terms as 3 = n/d, for integers n and
d ̸= 0 where n and d have no common divisors.
Squaring both sides yields that n2 /d2 = 3, and therefore that n2 = 3 · d2 . Because 3 | 3 · d2 , we know that 3 | n2 .
Therefore, by Exercise 4.55, we have that n is itself divisible by 3.
Because 3 | n, we can write n = 3k for an integer k, so n2 = (3k)2 = 9k2 . Thus n2 = 9k2 and n2 = 3d2 , so 3d2 = 9k2
and d2 = 3k2 . Hence d2 is divisible by 3, and—again using Exercise 4.55—we have that 3 | d.
That’s a contradiction: we assumed that n/d was in lowest terms, but we have now shown that n and d are both divisible
by three!

4.60 Suppose for a contradiction that x and y ̸= x are both strict majority elements. Then, by definition, the sets

X = {i ∈ {1, 2, . . . , n} : A[i] = x} and Y = {i ∈ {1, 2, . . . , n} : A[i] = y}

satisfy |X| > 2n and |Y| > n2 . But because X, Y ⊆ {1, 2, . . . , n}, which has only n elements, there must be an index i in
X ∩ Y. But by definition of X and Y we must have A[i] = x and A[i] = y—but x ̸= y, so these conditions are contradictory.

√ √
4.61 Consider x = 2 and y = 1/ 2. Then xy = 1, which is rational, but neither x nor y is rational.

√ √
4.62 Consider x = 2 and y = 2. Then x − y = 0, which is rational, but neither x nor y is rational.

4.63 Consider x = π and y = π. Then x/y = 1, which is rational, but neither x nor y is rational.

4.64 I found a counterexample to Claim 2 by writing a program (by exhaustively computing the value of x2 + y2 for all
1 < x < y < 100, and then looking for values that were triplicated). We can express 325 as 12 + 182 = 102 + 152 =
62 + 172 .

4.65 The function f(n) = n2 + n + 41 does not yield a prime for every nonnegative integer n. For example, f(41) =
412 + 41 + 41 = 41(41 + 1 + 1) = 41 · 43, which is not prime.

4.4 Some Examples of Proofs

4.66 First, here are propositions using only {∨, ∧, ⇒, ¬} that are logically equivalent to operators 1 – 8 from Figure 4.34:
58 Proofs

column truth table [in standard order] original proposition equivalent proposition
1 TTTT True p⇒p
2 TTTF p∨q p∨q
3 TTFT p⇐q q⇒p
4 TTFF p p
5 TFTT p⇒q p⇒q
6 TFTF q q
7 TFFT p⇔q (p ⇒ q) ∧ (q ⇒ p)
8 TFFF p∧q p∧q

For the remaining columns, observe that 9 – 16 are just the negations of 1 – 8 , in reversed order: 9 is ¬ 8 ; 10 is ¬ 7 ;
etc. Because we’re allowed to use ¬ in our equivalent expressions, we can express 9 – 16 by negating the corresponding
translation from the above table.

4.67 Given the solution to Exercise 4.66, it suffices to show that we can express ⇒ using {∧, ∨, ¬}. And, indeed, p ⇒ q
is equivalent to ¬p ∨ q. (Now, to translate i to use only {∧, ∨, ¬}, we translate i as in Exercise 4.66 to use only
{∧, ∨, ¬, ⇒}, and then further translate any use of ⇒ to use ∨ and ¬.)

4.68 Given the solution to Exercise 4.67, it suffices to show that we can express ∧ using {∨, ¬}, and ∨ using {∧, ¬}. But
these are just De Morgan’s Laws: p ∧ q is equivalent to ¬(¬p ∨ ¬q), and p ∨ q is equivalent to ¬(¬p ∧ ¬q).

4.69 Given the solution to Exercise 4.68, it suffices to show that we can express ∧ and ¬ using the Sheffer stroke. Indeed,
p | p is logically equivalent to ¬p, and ¬(p | q) ≡ (p | q) | (p | q) is logically equivalent to p ∧ q.

4.70 Given the solution to Exercise 4.68, it suffices to show that we can express ∨ and ¬ using the Peirce arrow ↓. Indeed,
p ↓ p is logically equivalent to ¬p, and ¬(p ↓ q) ≡ (p ↓ q) ↓ (p ↓ q) is logically equivalent to p ∨ q.

4.71 Call a Boolean formula φ over variables p and q truth-preserving if φ is true whenever p and q are both true. (So φ
is truth-preserving when the first row of its truth table is True.)
We claim that any logical expression φ that involves only {p, q, ∨, ∧}, in any combination, is truth-preserving. A fully
rigorous proof needs induction (see Chapter 5), but the basic idea is fairly simple. The trivial logical expressions p or q
certainly are truth-preserving. And if both φ and ψ are truth-preserving then both φ ∧ ψ and φ ∨ ψ are truth-preserving
too. In sum, using only ∧s and ∨s, we can only build truth-preserving expressions.
But there are non-truth-preserving formulas—for example, p ⊕ q is false when p = q = T—which therefore cannot
be expressed solely using the operators {∧, ∨}. Therefore this set of operators is not universal.

4.72 Let’s say that a proposition in the form ∀/∃ x1 : ∀/∃ x2 : · · · ∀/∃ xk : P(x1 , x2 , . . . , xk ) is outer-quantifier-only form
(OQOF). (P may use any of the logical connectives from propositional logic.) We must show that any fully quantified
proposition φ of predicate logic is logically equivalent to one in OQOF.
First, rename all the variables in φ to ensure that all bound variables in φ are unique. Then apply the transformations
of Exercises 4.66–4.71 to ensure that the only operators in φ are ¬, ∧, ∀, and ∃. We’ll give a recursive transformation to
translate φ into OQOF, where we only make a recursive call on a proposition with strictly fewer of those four operators.
Case I: φ is of the form ¬ψ. First, recursively convert ψ to OQOF. If the resulting expression ψ′ has no quantifiers, then
we’re done: just return ¬ψ′ . Otherwise, ψ′ is of the form ∀/∃ x : τ . Then we can return ∃/∀ x : ρ, where ρ is the OQOF
of ¬τ , found recursively.
Case II: φ is of the form ψ ∧ τ . Recursively convert ψ and τ to OQOF, and call the resulting expressions ψ′ and τ ′ :
•If ψ′ = ∀x : ρ, then φ is [∀x : ρ] ∧ τ , which is logically equivalent to [∀x : ρ ∧ τ ]:
[∀x : ρ] ∧ τ ≡ [∀x : ρ] ∧ [∀x : τ ] ≡ [∀x : ρ ∧ τ ]. Exercise 3.142 and Exercise 3.138

Recursively convert ρ ∧ τ to OQOF.

•If ψ′ = ∃x : ρ, then φ ≡ [∃x : ρ] ∧ τ ≡ [∃x : ρ ∧ τ ] by Exercise 3.144. Recursively convert ρ ∧ τ to OQOF.
•If ψ′ contains no quantifiers but τ ′ does, then follow the previous two bullet points (but on τ ′ instead of φ’).
•If neither ψ′ nor τ ′ contain quantifiers, then we’re done: just return ψ′ ∧ τ ′ .
Case III: φ is of the form ∀/∃ x : ψ. Recursively convert ψ to OQOF.

4.73 Let φ = x1 ∨ x2 ∨ · · · ∨ xn . This proposition has only one clause, but is true under every truth assignment except the
all-false one. But our translation thus builds 2n − 1 clauses in the equivalent DNF formula.
4.4 Some Examples of Proofs 59

4.74 Draw an identical right triangle, rotated 180◦ from the original. The two triangles fit together to create a x-by-y
rectangle, which has area xy. Thus the area of the triangle is xy/2.

xy
y y
x x

4.75 Draw four identical right triangles, with legs of length a and b and hypotenuse of length c, arranged as in the figure.
The entire enclosed space is a c-by-c square, which has area c2 . The unfilled inner square has side length b − a, so its area
is (b − a)2 = b2 − 2ab + a2 . Thus, writing ∆ for the area of the triangle, we have

4∆ + b2 − 2ab + a2 = c2 .

By the previous exercise, we have ∆ = ab/2; thus b2 + a2 = c2 .

4.76 The right triangle in question has hypotenuse x + y and its vertical leg has length x − y. Writing d for the length of
the other (horizontal) leg, the Pythagorean Theorem tells us that

(x − y)2 + d2 = (x + y)2
and thus x2 − 2xy + y2 + d2 = x2 + 2xy + y2 multiplying out
2
d = 4xy adding 2xy − y2 to both sides
√
and so d = 2 xy. taking square roots, since d and x and y are positive

√ √
The hypotenuse is the longest side of a right triangle, so x + y ≥ 2 xy—or, dividing by two, (x + y)/2 ≥ xy.

√
4.77 Writing a = |x1 − y1 | and b = |x2 − y2 |, the Manhattan distance is a + b and the Euclidean distance is a2 + b2 .
Because a and b are both nonnegative, we have
p
a+b≥ a2 + b2
p
⇔ (a + b)2 ≥ ( a2 + b2 )2
⇔ a2 + 2ab + b2 ≥ a2 + b2
⇔ 2ab ≥ 0.

And certainly 2ab ≥ 0, because a ≥ 0 and b ≥ 0.

4.78 As in the previous exercise,

p write a = |x1 − y1 |. Let d be the Manhattan distance, so that d = a + |x2 − y2 |. Thus
the Euclidean distance is a2 + (d − a)2 . It’s fairly easy to see from calculus (or from plotting the function) that the
Euclidean
p distance
p is maximized
√ when a2 = (d − a)2 —that is, when a = d/2. In this case, the Euclidean distance is
2 2
2(d/2) = d /2 = d/ 2. Thus we have

√
dmanhattan (x, y) ≤ 2 · deuclidean (x, y).

√
The points x = ⟨0, 0⟩ and y = ⟨d/2, d/2⟩ satisfy dmanhattan (x, y) = d and deuclidean (x, y) = d/ 2, as desired.

p
4.79 We’ll give a direct proof. The given point ⟨x, y⟩ is distance d = x2 + y2 away from the origin, in some angle α
above the x-axis where cos α = x/d and sin α = y/d. We now seek a point at distance d away from the origin, but now
at angle α + θ above the x-axis:
60 Proofs

the rotated point

⟨x, y⟩
θ
α

Thus the new position is

⟨d cos(α + θ), d sin(α + θ)⟩ = ⟨d[cos α cos θ − sin α sin θ], d[sin α cos θ + cos α sin θ])⟩
= ⟨cos θ[d cos α] − sin θ[d sin α], cos θ[d sin α] + sin θ[d cos α])⟩
= ⟨x cos θ − y sin θ, y cos θ + x sin θ⟩,

as desired.

4.80 The number 6 is a perfect number: its factors are 1, 2, and 3, and 1 + 2 + 3 = 6. (One way to find this fact is
by writing a program to systematically test small integers; 6 is the smallest integer that passes the test. The next one is
28 = 1 + 2 + 4 + 7 + 14.)
√
4.81 If p is prime, then the only positive integer factors of p2 are 1, p, and p2 . But 1 + p = p2 only for p = (1 ± 5)/2,
which is not a prime integer. (See Example 6.27.)

4.82 Taking the numbers {n, n + 1, . . . , n + 5} all modulo 6, we must have the values {0, 1, 2, 3, 4, 5} (in some order).
The three numbers k such that k mod 6 ∈ {0, 2, 4} are all even, and the number k such that k mod 6 = 3 is divisible by
three. Because n ≥ 10, we have that 2 and 3 are not k themselves, and thus the only two candidate primes are the two
numbers k such that k mod 6 ∈ {1, 5}.

4.83 The claim is false. I found a counterexample by writing a small Python program:
1 def isPrime(n):
2 for d in range(2, int(n**0.5)):
3 if n % d == 0:
4 return False
5 return True
6
7 n = 2
8 while any(map(isPrime, range(n, n+10))):
9 n = n + 1
10 print("n =", n, "through n+9 =", n+9)

My program terminated with n = 200. Indeed, none of {200, 201, . . . , 209} is prime:

200 = 23 · 52 201 = 3 · 67 202 = 2 · 101 203 = 7 · 29 204 = 22 · 3 · 17

205 = 5 · 41 206 = 2 · 103 207 = 32 · 23 208 = 24 · 13 209 = 11 · 19.

4.84 Let k be a locker number. Note that k’s state is changed once per factor of k. For example, if k is prime, then k is
opened when i = √ 1 and closed when i = k, and left untouched at every other stage. The only time k has an odd number
of factors is when k is a factor of k—every other factor i is “balanced” by the factor k/i. Thus every locker is left closed
except the perfect squares: {1, 4, 9, 16, 25, 36, 49, 64, 81, 100}.

consider n = 2. The modified claim would be: √

4.85 It’s a small bug, but it’s still a bug:√ 2 is composite if and only if 2 is
evenly divisible by some k ∈ {2, . . . , ⌈ 2⌉}. But 2 is divisible by 2, and 2 ∈ {2, . . . , ⌈ 2⌉}, and 2 is not composite.
√
4.86 It’s immediate that if n is divisible by such a k ∈ n , . . . , n − 1 , then n is composite. For the other direction,
√
assume that n is composite. By Theorem 4.32, there exists an integer k ∈ 2, 3, . . . , n that divides n evenly. That
√ √
is, there exists an integer d such that kd = n. But k ≥ 2 implies that d < n, and k ≤ n implies that d ≥ n .
4.5 Proof Techniques 61

√ √
4.87 Take the number n = 202. It’s composite, but it is not evenly divisible by any number in n/2 , . . . , 3 n/2 =
{7, 8, . . . , 21}: its only nontrivial factors are 2 and 101.

4.88 Let n be p2 for any prime 2

√ p. Then the only even divisors of n are 1, p, and p . The smallest prime factor that evenly
divides n is therefore p = n. And there are infinitely many such integers n, because there are infinitely many primes.

4.5 Proof Techniques

4.89 Invalid. The premises are of the form p ⇒ q and ¬p; the conclusion is q. But with p = True and q = False, the
premises are both true and the conclusion is false.

4.90 Invalid. The premises are of the form p ⇒ q and q; the conclusion is p. But with p = False and q = True, the
premises are both true and the conclusion is false.

4.91 Valid. The premises are of the form p ⇒ q and ¬q; the conclusion is ¬p. This inference is Modus Tollens, which is
a valid step.

4.92 Invalid. The premises are of the form p ∨ q and p; the conclusion is q. But with p = True and q = False, the premises
are both true and the conclusion is false.

4.93 Valid. The premises are of the form ∀x ∈ S : P(x) and a ∈ S; the conclusion is P(a). This is a valid inference (it’s
precisely what “for all” means).

4.94 Valid. The premises are of the form p ⇒ q ∨ r and p and ¬r; the conclusion is q. This is a valid inference, because
in the only truth assignment in which all of the premises are true (when p = q = True and r = False), the conclusion is
true:

p q r p ⇒ (q ∨ r) all premises (p and ¬r and p ⇒ q ∨ r) conclusion (q)

T T T T F
T T F T T T
T F T T F
T F F F F
F T T T F
F T F T F
F F T T F
F F F T F

4.95 The problem is in the statement “Because p and q are both prime, we know that p does not evenly divide q.” The
stated assumptions were that p and q both evenly divide n and are prime. If p and q are different, then p ̸ | q—but there’s
nothing in the theorem or proof that requires p ̸= q.

4.96 Let n = 15, p = 5, and q = 5. Indeed n is a positive integer, p, q ∈ Z≥2 , p and q are prime, and n is evenly divisible
by both p and q. But n is not divisible by pq = 25.

4.97 The problem is a fundamental confusion of the variables k and n. Example 4.7 establishes that 6! + 1 is not evenly
divisible by any k ∈ {2, . . . , 6}, not any k ∈ {2, . . . , 6!}.

4.98 721 is divisible by 7, as 7 · 103 = 721. Thus 721 is not prime.

4.99 Here are two small examples: 2! + 1 = 2 + 1 = 3 is prime, and 3! + 1 = 6 + 1 = 7 is prime.

√
2 2
√
4.100 This argument affirms the consequent: Example 4.11 says that if 2
and 2
are rational, then their product is
rational—and not the converse.
62 Proofs

√ √
4.101 Suppose for a contradiction
√ that √82 were rational—that is, √82 = nd for integers n and d. But then 8d
n
= 8
8
2
= 2
would be rational too—but 2 is not rational, which contradicts our assumption.

4.102 The problem is in this sentence: “Because r < 12, adding r2 to a multiple of 12 does not result in another multiple of
12.” There may be numbers not divisible by 12 whose squares are divisible by 12 (unlike for divisibility by 2: the square
of an odd number is never even): for example, 62 = 36, which is divisible by 12.

4.103 One counterexample: n = 18 is not divisible by 12, but 182 = 324 = 27 · 12 is divisible by 12. (In fact, it’s true
that 6 | n if and only if 12 | n2 .)

4.104 The problem is in the sentence “Therefore, by the same logic as in Example 4.18, we have that n is itself divisible
by 4.” We can have 4 | n2 without 4 | n—for example, for n = 6, we have 4 | 36 but 4 ̸ | 6.

4.105 The problem is in the sentence “Observe that whenever a ≤ b and c ≤ d, we know that ac ≤ bd.” If we multiply
by a negative number, then the inequality swaps: n ≤ m if and only if −2n ≥ −2m, for example. In fact, we’re plugging
in x = y = √12 —so both y − 4x and y − 2x are negative. Thus the multiplication based on (2) in (3) should have swapped
the inequality.

4.106 The issue is that the average of rs and ut might be very far from s+u
r+t
. Thus we can’t validly draw the conclusion that
the last sentence infers.
To be precise, suppose algorithm F is correct on a out of b members of W and c out of d members of B. Suppose
algorithm G is correct on r out of s members of W and t out of u members of B. By assumption,

a r c t
> and > .
b s d u

Implicitly the argument concludes that, therefore, we have

a+c r+t
> .
b+d s+u

But that’s not necessarily true. For example, consider a = 34, b = 100, r = 1, s = 3, c = 1, d = 2, t = 49, u = 50.
Then we have

a = 34 r=1 c=1 t = 49
> and > .
b = 100 s=3 d=2 u = 50

But a + c = 34 + 1 = 35 and b + d = 100 + 2 = 102, so a+c

b+d
= 35
102
≈ 0.3431. Meanwhile r + t = 1 + 49 = 50 and
r+t
s + u = 3 + 100 = 103 so s+u 50
= 103 ≈ 0.4854.

4.107 The success rate is the number of correct answers over the number of people encountered. The success rate for F is

.75 · 100 + .60 · 100 75 + 60 135

= = = 0.675.
200 200 200

The success rate for G is

.70 · 1000 + .50 · 100 700 + 50 750

= = = 0.6818 · · · .
1100 1100 1100

4.108 The error is that the two shapes that have been produced are not right triangles—in fact, they are not triangles at all!
The slope of the first constituent triangle is 38 = 0.375. The slope of the second constituent triangle is 25 = 0.4. Thus when
these two pieces are lined up, the apparent hypotenuse of the composed “triangle” is not a straight line! In the left-hand
configuration, it’s “bowed in”; in the right-hand configuration it’s “bowed out.” Here are the drawings from the question,
augmented with the actual diagonal (a dotted line) from the lower-left to upper-right corners of the rectangles.
4.5 Proof Techniques 63

The difference between these “hypotenuses” is one unit of area—enough to account for the apparent difference. To
see this more explicitly, here is the same rectangle, with the two “hypotenuses” drawn and the area between them shaded.

4.109 This proof uses the fallacy of proving true. It assumes that the Pythagorean Theorem is true, and derives True (the
similarity of the triangles).
5 Mathematical Induction

5.2 Proofs by Mathematical Induction

5.1 Let P(n) denote the claim that

X
n
i2 = n(n+1)(2n+1)
6
.
i=0

We will prove that P(n) holds for all integers n ≥ 0 by induction on n.

For the base case (n = 0), we need to show that P(0)—that is, we need to prove that

X
0
i2 = 0(0+1)(2·0+1)
6
.
i=0

But the left-hand side is 02 = 0, and the right-hand side is 06 = 0, so P(0) holds.
For the inductive case (n ≥ 1), we assume the inductive hypothesis P(n − 1), and we must prove P(n):

X
n X
n−1
i2 = n2 + i2 definition of summation
i=0 i=0
(n−1)(n)(2n−1)
= n2 + 6
the inductive hypothesis P(n − 1)
2n3 +3n2 +n
= 6
multiplying out and collecting like terms
n(2n+1)(n+1)
= 6
. factoring

Pn n(n+1)(2n+1)
Thus we have proven i=0 i2 = 6
, which is precisely P(n).

5.2 Let P(n) denote the claim that

X
n
n4 +2n3 +n2
i3 = 4
.
i=0

We prove that P(n) holds for all integers n ≥ 0 by induction on n.

For the base case (n = 0), we need to show that P(0) holds, which is straightforward because

X
n X
0
04 +2·03 +02 n4 +2n3 +n2
i3 = i3 = 03 = 0 = 4
= 4
.
i=0 i=0

64
5.2 Proofs by Mathematical Induction 65

For the inductive case (n ≥ 1), we assume the inductive hypothesis P(n − 1), and we must prove P(n):

X
n X
n−1
i3 = n3 + i3 definition of summation
i=0 i=0

(n − 1)4 + 2(n − 1)3 + (n − 1)2

= n3 + inductive hypothesis P(n − 1)
4
4
3 [n − 4n + 6n − 4n + 1] + 2[n − 3n + 3n − 1] + [n − 2n + 1]
3 2 3 2 2
=n + multiplying out
4
n − 2n + n
4 3 2
= n3 + collecting like terms
4
4 3 2
n + 2n + n
= , algebra
4

precisely as required by P(n).

5.3 Let P(n) denote the claim that (−1)n = 1 if n is even, and (−1)n = −1 if n is odd. We’ll prove that P(n) holds for all
nonnegative integers n by induction on n.

Base case (n = 0): We must prove P(0): that is, we must prove that −10 = 1 if 0 is even, and −10 = 0 if 0 is odd. But 0
is even and (−1)0 = 1 by the definition of exponentiation, so P(0) holds.
Inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1), and we must prove P(n).

(−1)n = −1 · (−1)n−1 definition of exponentiation

(
1 if n − 1 is even
= −1 · inductive hypothesis
−1 if n − 1 is odd
(
1 if n is odd
= −1 · n is odd ⇔ n − 1 is even
−1 if n is even
(
−1 if n is odd
= multiplying through
1 if n is even

Thus we have shown that (−1)n is −1 if n is odd, and it’s 1 if n is even—in other words, we’ve proven P(n).

5.4 Let P(n) denote the claim that

X
n
n
1
i(i+1)
= .
i=1
n+1

We prove that P(n) holds for any n ≥ 0, by induction on n.

Base case (n = 0): Observe that

X
0
1
i(i+1)
=0
i=1

0
by definition (the summation is over an empty range), and 0+1
= 0 too. Thus P(0) holds.
66 Mathematical Induction

Inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1), and we must prove that P(n) holds too.

X
n
1 X n−1
1 1
i(i+1)
= + i(i+1)
definition of summation
i=1
n(n + 1) i=1
1 n−1
= + inductive hypothesis
n(n + 1) n
1 + (n − 1)(n + 1)
= putting terms over common denominator
n(n + 1)
1 + n2 − 1
= multiplying out and combining like terms
n(n + 1)
n
= . factoring
n+1

Thus P(n) holds.

5.5 Let P(n) denote the claim that

X
n
3 1 1
2
i(i+2)
= − − .
i=1
2 n+1 n+2

We prove that P(n) holds for any n ≥ 0, by induction on n.

Base case (n = 0): Observe that the left-hand side is 0 (the summation is over an empty range), and 32 − 11 − 12 = 0 too.
Thus P(0) holds.
Inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1), and we must prove that P(n) holds too.

X
n
2 X n−1
2 2
i(i+2)
= + i(i+2)
definition of summation
i=1
n(n + 2) i=1

2 3 1 1
= + − − inductive hypothesis
n(n + 2) 2 n n+1
3 2(n + 1) − (n + 2)(n + 1) − n(n + 2)
= + putting terms over common denominator
2 n(n + 2)(n + 1)
3 2n2 + 3n
= − multiplying out and collecting like terms
2 n(n + 2)(n + 1)
3 2n + 3
= − cancelling the n
n
2 (n + 2)(n + 1)
3 [n + 1] + [n + 2]
= − rearranging via algebra—a choice made only by knowing the expression we’re aiming for!
2 (n + 2)(n + 1)
3 1 1
= − − . algebra
2 n+2 n+1

Thus P(n) holds.

5.6 Let P(n) denote the claim

X
n
i · (i!) = (n + 1)! − 1.
i=1

We’ll prove that P(n) holds for any n ≥ 0 by induction.

For the base case (n = 0), we must prove P(0). The left-hand side of P(0) is an empty summation, so it’s equal to 0.
The right-hand side of P(0) is (0 + 1)! − 1 = 1! − 1 = 1 − 1 = 0. Thus P(0) holds because both sides of the equation
are equal to 0.
5.2 Proofs by Mathematical Induction 67

For the inductive case (n ≥ 1), we assume the inductive hypothesis P(n − 1) and prove P(n):

X
n X
n−1
i · (i!) = n · (n!) + i · (i!) definition of summation
i=0 i=0

= n · (n!) + n! − 1 inductive hypothesis

= (n + 1) · (n!) − 1 factoring
= (n + 1)! − 1. definition of factorial

5.7 We’ll show that dn = (

50
√
2)n
mm. Let P(n) denote this claim. Here’s a proof by induction that P(n) holds for all n ≥ 0:
base case (n = 0): We were told by definition that d0 = 50 mm. And (√502)0 = 50
1
= 50. Thus P(0) holds.
inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1)—namely, we assume that dn−1 = (
√ 50
2)n−1
mm.
We must prove P(n):
dn−1
dn = √ definition of dn
2
√ 50 mm
( 2)n−1
= √ indutive hypothesis P(n − 1)
2
50
= √ √ mm algebra
( 2)n−1 · 2
50
= √ mm. algebra
( 2)n
Thus we have proven P(n), as desired.

5.8 For a circle of diameter d, the area is π · ( d2 )2 . Thus the light-gathering area when the lens is set to dn is
2 2
1 1 50 mm 625π
π· · dn = π · · √ = n mm2 .
2 2 ( 2)n 2

That is, the light-gathering area decreases by a factor of 2 with every step.
The f-stop when the aperture diameter is dn is defined as 50dmmn
, so the f-stop is
50 mm √
h i = ( 2)n .
50
√mm
( 2)n
√ √ √
Thus the f-stops are 1, 2, ( 2)2 , ( 2)3 , etc. Or, rounding to two significant digits: 1, 1.4, 2, 2.8, 4, 5.6, 8, 11, etc.
(These values are the names of the f-stop settings.)
P
5.9 The ith odd positive number is equal to 2i − 1. Thus the quantity that we’re trying to understand is ni=1 (2i − 1). For
n = 1, this quantity is 1;P
for n = 2, it’s 1 + 3 = 4; for n = 3, it’s 1 + 3 + 5 = 9. From the pattern 1, 4, 9, . . ., we might
conjecture that the sum ni=1 (2i − 1) is equal to n2 . Let’s prove it.
P
Let P(n) denote the claim ni=1 (2i − 1) = n2 . We’ll prove that P(n) holds for all n ≥ 1 by induction on n.
P
base case (n = 1): By definition, 1i=1 (2i − 1) = 2 − 1 = 1, and 12 = 1.
inductive case (n ≥ 2): We assume the inductive hypothesis P(n − 1), and prove that P(n) holds too:

X
n X
n−1
(2i − 1) = 2n − 1 + (2i − 1) definition of summation
i=1 i=1

= 2n − 1 + (n − 1)2 inductive hypothesis P(n − 1)

= 2n − 1 + n2 − 2n + 1 algebra
2
=n . algebra

Thus P(n) holds, and the result follows.

68 Mathematical Induction

5.10
Pn The ith positive even integerPisn 2i. One way to analyze the sum of the
Pnfirst n even positive integers is to observe that
P2in is precisely equal to 2 · i=1 i; we showed in Theorem 5.3 that i=1 i = · 2 . But we can also directly prove
n(n+1)
i=1
that i=1 2i is n(n + 1). Let P(n) denote this claim. We proceed by induction on n:
P
base case (n = 0): Indeed, we have ni=1 2i = 0 = 0 · (0 + 1).
inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1), and prove P(n):

X
n X
n−1
2i = 2n + 2i definition of summation
i=1 i=1
= 2n + (n − 1)(n − 1 + 1) inductive hypothesis P(n − 1)
= 2n + (n − 1)n algebra
= n(2 + n − 1) factoring
= n(n + 1). algebra

Thus P(n) is true too.

P
5.11 A summation like 12 + 14 + 18 + 161 + 321 1
is exactly 32 less than 1. Let P(n) denote the property that ni=1 21i = 1 − 21n .
We’ll prove that P(n) holds for all integers n ≥ 0 by induction on n.
P
For the base case (n = 0), the property P(0) requires that 0i=1 21i = 1 − 210 . But the left-hand side is an empty
summation, so its value is 0. The right-hand side is 1 − 1 = 1 − 1 = 0 too. Thus P(0) holds.
1

P case (n ≥ 1), we assume the inductive hypothesis P(n − 1) and we must prove P(n)—that is, we
For the inductive
must prove that ni=1 21i = 1 − 21n . We’ll start from the left-hand side and prove it equal to the right-hand side:
" n−1 #
X n X
1 1
2i
= 2i
+ 21n breaking apart the summation
i=1 i=1

= [1 − 2n−1 1
1 ] + 2n inductive hypothesis
= 1 − 2 · 2n + 2n
1 1 1
2n−1
=2· 1
2n
= 1 − 21n . algebra

Thus P(n) holds, and the entire claim holds.

5.12 Let P(n) denote the claim that

X
n
rn(n+1)
(x0 + ir) = (n + 1)x0 + 2
.
i=0

Here’s a proof that P(n) holds for all n ≥ 0, by induction on n.

P
base case (n = 0): We have 0i=0 x0 + ir = x0 + 0 · r = x0 + r 0·1 2
, as desired.
inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1), and we must prove P(n):
" n−1 #
Xn X
(x0 + ir) = [x0 + nr] + (x0 + ir) definition of summation
i=0 i=0
h i
= [x0 + nr] + n · x0 + r(n−1)(n)
2
inductive hypothesis P(n − 1)

= (n + 1)x0 + rn · ( n−1
2
+ 1) combining like terms
r·n(n+1)
= (n + 1)x0 + 2
. algebra

Thus P(n) is true too.

5.13 Let P(n) denote the claim that a knight’s walk exists for an n-by-n chessboard. We’ll prove that P(n) is true for any
n ≥ 4 by induction on n. Actually, we will prove something slightly stronger: let K(n) denote the claim that, for any two
squares i and j in an n-by-n board, there’s a knight’s walk that starts at square i and goes to square j. We’ll prove that
K(n) holds for every n ≥ 4 by induction. (Our walks will be very inefficient—repeatedly revisiting many squares—but
the repetition will make the proof somewhat simpler.)
5.2 Proofs by Mathematical Induction 69

base case (n = 4): We must show that there exists a knight’s walk for a 4-by-4 board, starting at any position and ending
at any position. Consider the following walk:
1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18

This walk w starts at the bottom-left cell and returns to that same cell, visiting every cell of the board along the way.
But that means that that there’s a walk starting from cell i and going to cell j, for arbitrary i and j: consider doing the
given walk three times in a row. Delete the portion of the triplicated walk before the first visit to i (which is Part I),
and delete the portion after the last visit to j (which is in Part III). Because Part II of the triplicated walk is intact, the
unexcised portion of the triplicated walk is an i-to-j walk that visits all of the cells one or more times each.
inductive case (n ≥ 5): We assume the inductive hypothesis K(n − 1): for any cells i and j in an (n − 1)-by-(n − 1) board,
there’s a knight’s walk from i to j. We must prove K(n).
To prove K(n), we will use consecutive walks on four (n − 1)-by-(n − 1) sub-boards, and then paste them together.
Consider the n-by-n board, and divide it into four overlapping (n − 1)-by-(n − 1) boards:

original A B C D

(The same square is highlighted in each of the four subgrids.) Now, to generate a knight’s walk on the entire board
from an arbitrary cell i to an arbitrary cell j, complete the following six consecutive walks:
•in an arbitrary one of the sub-boards (A, B, C, or D) in which i appears, a walk from cell i to cell .
•a walk from cell to cell in A.
•a walk from cell to cell in B.
•a walk from cell to cell in C.
•a walk from cell to cell in D.
•in an arbitrary one of the sub-boards (A, B, C, or D) in which j appears, a walk from cell to cell j.
In each of the six cases, the described walk exists by the inductive hypothesis. These six walks consecutively form a
knight’s walk from cell i to j in the n-by-n board. Because i and j were arbitrary, we have proven K(n).

5.14 A solution in Python is shown in Figure S.5.1.

5.15 Let R(n) denote the claim that there’s a rook’s tour for any n-by-n chessboard, starting from any corner position of
the board. We’ll prove that R(n) holds for all n ≥ 1 by induction on n.
base case (n = 1): There’s nothing to do! We start in the only cell, and we’re done after zero moves.
inductive case (n ≥ 2): We assume the inductive hypothesis R(n − 1)—that there is a rook’s tour starting in any corner
of an (n − 1)-by-(n − 1) board. We must prove R(n). Here is a construction of a rook’s tour of the n-by-n board:
1. Start in any corner.
2. Go straight vertically to the adjacent corner.
3. Go straight across horizontally to the adjacent corner.
4. Move vertically by one cell away from the corner.
5. Complete a rook’s tour of the (n − 1)-by-(n − 1) uncovered sub-board.
After step 4, we have visited each cell in one outside column and one outside row, and we’re in the corner of the (n−1)-
by-(n − 1) remainder of the board. By the inductive hypothesis R(n − 1), a rook’s tour of the (n − 1)-by-(n − 1)
uncovered sub-board must exist.
70 Mathematical Induction

1 def knight_walk(start, end, shift, n):

2 '''
3 Consider the n-by-n square in the chessboard whose upper-left coordinate is
4 (1 + shift[0], 1 + shift[1]). For example, a standard chessboard is an 8-by-8
5 square with shift (0,0); the bottom right quadrant has n = 4 and shift = (4,4).
6 This function constructs a walk from the start square to the end square, covering
7 every square in this n-by-n subboard.
8 '''
9 if n == 4:
10 # A sequence of 18 moves that covers all 16 squares of a 4-by-4 board ...
11 w = [(1,1), (2,3), (4,4), (3,2), (1,3), (3,4), (4,2), (2,1), (3,3),
12 (4,1), (2,2), (1,4), (2,2), (4,3), (3,1), (1,2), (2,4), (3,2), (1,1)]
13 # ... adjusted by the given (x,y) shift.
14 w = [(a + shift[0], b + shift[1]) for (a,b) in w]
15
16 # Triplicate the sequence, and trim the portion before the first visit
17 # to start and the last visit to end.
18 return w[w.index(start):-1] + w + w[1:w.index(end)+1]
19
20 else:
21 # The square we keep returning to.
22 touchstone = (2 + shift[0], 2 + shift[1])
23
24 # What (n-1)-by-(n-1) subgrid is start in? If it's not in the nth column or nth
25 # row of the grid, then it's in the subgrid starting in the same position as this
26 # whole n-by-n grid. Else, we shift right by one column or down by one row (or both).
27 start_shift = (shift[0] + (start[0] == shift[0] + n), shift[1] + (start[1] == shift[1] + n))
28 # ... and similarly for the end square.
29 end_shift = (shift[0] + (end[0] == shift[0] + n), shift[1] + (end[1] == shift[1] + n))
30
31 w1 = knight_walk(start, touchstone, start_shift, n-1)
32 w2 = knight_walk(touchstone, touchstone, (shift[0], shift[1]), n-1) # cover subgrid A
33 w3 = knight_walk(touchstone, touchstone, (shift[0]+1, shift[1]), n-1) # cover subgrid B
34 w4 = knight_walk(touchstone, touchstone, (shift[0], shift[1]+1), n-1) # cover subgrid C
35 w5 = knight_walk(touchstone, touchstone, (shift[0]+1, shift[1]+1), n-1) # cover subgrid D
36 w6 = knight_walk(touchstone, end, end_shift, n-1)
37
38 return w1 + w2[1:] + w3[1:] + w4[1:] + w5[1:] + w6[1:]

Figure S.5.1 Finding a knight’s walk in an n-by-n chessboard.

Here is an example for an 8-by-8 board:

8
0Z0Z0Z0Z 8
RZ0Z0Z0Z 8
0Z0Z0Z0S 8
0Z0Z0Z0Z
7
Z0Z0Z0Z0 7
Z0Z0Z0Z0 7
Z0Z0Z0Z0 7
Z0Z0Z0ZR
6
0Z0Z0Z0Z 6
0Z0Z0Z0Z 6
0Z0Z0Z0Z 6
0Z0Z0Z0Z
5
Z0Z0Z0Z0 5
Z0Z0Z0Z0 5
Z0Z0Z0Z0 5
Z0Z0Z0Z0
4
0Z0Z0Z0Z 4
0Z0Z0Z0Z 4
0Z0Z0Z0Z 4
0Z0Z0Z0Z
3
Z0Z0Z0Z0 3
Z0Z0Z0Z0 3
Z0Z0Z0Z0 3
Z0Z0Z0Z0
2
0Z0Z0Z0Z 2
0Z0Z0Z0Z 2
0Z0Z0Z0Z 2
0Z0Z0Z0Z
1
S0Z0Z0Z0 1
Z0Z0Z0Z0 1
Z0Z0Z0Z0 1
Z0Z0Z0Z0
a b c d e f g h a b c d e f g h a b c d e f g h a b c d e f g h

s
5.16 Observe that a Von Koch line of size s consists of four Von Koch lines of size of one lower level, so the total length
3
of the segments increases by a factor of 43 with each level. That is, write the length of a Von Koch line with level ℓ and
size s as length(ℓ, s). Then
length(0, s) = s and length(1, s) = 4 · length(n, 3s ).

We claim, then, that length(ℓ, s) = ( 34 )ℓ ·s. We’ll prove this fact by induction. Let F(ℓ) denote the claim that length(ℓ, s) =
( 43 )ℓ · s for any s ≥ 0. We’ll prove that F(ℓ) holds for all ℓ ≥ 0 by induction on ℓ.

base case (ℓ = 0): By definition, we have length(0, s) = s, and indeed ( 34 )0 · s = 1 · s = s.

5.2 Proofs by Mathematical Induction 71

inductive case (ℓ ≥ 1): We assume the inductive hypothesis F(ℓ − 1) and we must prove F(ℓ). Indeed

length(ℓ, s) = 4 · length(ℓ − 1, 3s ) definition of the fractal

= 4 · ( 43 )ℓ−1 · ( 3s ) inductive hypothesis
= ( 34 ) · ( 43 )ℓ−1 ·s algebra
= ( 34 )ℓ · s. algebra

The total perimeter of the Von Koch snowflake is therefore 3 · length(ℓ, s) = 3 · s · ( 34 )ℓ .

5.17 Write perimeter(ℓ, s) to denote the perimeter of a Sierpiński triangle with side length s and level ℓ. Observe that

perimeter(0, s) = 3s and perimeter(ℓ + 1, s) = 3 · perimeter(ℓ, 2s ).

We claim by induction that perimeter(ℓ, s) = 3 · ( 23 )ℓ s.

The base case (ℓ = 0) is immediate, because perimeter(0, s) = 3s, and 3 · ( 23 )0 s = 3s too.
For the inductive case (ℓ ≥ 1), we assume the inductive hypothesis—namely perimeter(ℓ − 1, s) = 3 · ( 32 )ℓ−1 s. Now

perimeter(ℓ, s) = 3 · perimeter(ℓ − 1, s/2) definition of the fractal

= 3 · 3 · ( 23 )ℓ−1 2s inductive hypothesis

=3· ( 23 )ℓ s, algebra

as desired.

5.18 Observe that the outer perimeter of a Sierpiński carpet with side length s is precisely 4s. What about the total interior
perimeter? It has an interior perimeter 4s3 —the central ( 3s )-by-( 3s ) square—plus 8 times the interior perimeter of Sierpiński
carpet of level ℓ − 1 with side length 3s . Thus the interior perimeter p(ℓ, s) satisfies

p(0, s) = 0 and p(ℓ + 1, s) = 4s

3
+ 8 · p(ℓ, 3s ).

We claim that p(ℓ, s) = 45 · s · [( 38 )ℓ − 1], which we’ll prove by induction on ℓ. (You might be able to generate this
conjecture by writing out the summation for a few levels, and then using Theorem 5.5.)

• For the base case, indeed p(0, s) = 0 by definition and 54 · s · [( 38 )0 − 1] = 0, because ( 83 )0 − 1 = 1 − 1 = 0.

• For the inductive case, we assume the inductive hypothesis—namely that p(ℓ − 1, s) = 54 · s · [( 83 )ℓ−1 − 1] for any
s—and we must analyze p(ℓ, s).

p(ℓ, s) = 4s
3
+ 8 · p(ℓ − 1, 3s ) definition of the fractal
= 4s
3
+ 8 · 45 · ( 3s ) · [( 83 )ℓ−1 − 1] inductive hypothesis

= 4
5
· s · 53 + ( 83 ) · [( 83 )ℓ−1 − 1] algebra—making the expression look as much like the goal as possible

= 4
5
· s · 53 + ( 83 )ℓ − 83 algebra

= 4
5
· s · ( 38 )ℓ − 1 . algebra

Thus the inductive claim is proven.

Thus the total internal perimeter of the Sierpiński carpet of level ℓ and size s is ( 4s5 ) · [( 38 )ℓ − 1], and the outer perimeter
is precisely 4s. Thus the total perimeter is 4s + 4s5 · [( 83 )ℓ − 1].

5.19 In the Von Koch snowflake of level ℓ, we start with a single s-sided equilateral triangle, and, for each level i = 1, . . . , ℓ,
on each of 3 sides we “bump out” an additional 4i−1 triangles—each with s
√ sidex2length 3i .
By trigonometry, an equilateral triangle with side length x has area 3 · 4 . (Dropping a perpendicularp line from a
vertex to the opposite side shows that the height h of the triangle satisfies ( 2x )2 + h2 = x2 , so h = 3x2 /4.)
72 Mathematical Induction

For s = 1, then, the total area of the figure is

" #
X
ℓ
(area of an equilateral triangle with side length 1) + 3·4 i−1
· (area of a ( 31i )-sided equilateral triangle)
i=1
√ " ℓ √ #
3 X 3 · ( 31i )2 √
= + 3·4 ·i−1
an equilateral triangle with side length x has area 3· x2
4
4 i=1
4
√ " #
3 X ℓ
= 1+ 3 · 4 · ( 3i )
i−1 1 2
4 i=1
√ " #
3 X ℓ
= 1+ 3 ·1 4 i−1
(9) 3 · ( 31i )2 = 3 · 1
9i
=3· 1
9
· 1
9i−1
4 i=1
√ " #
3 Xℓ−1
= 1+ 3 ·1 4 i
(9) reindexing the summation
4 i=0
√
3 1 1 − ( 49 )ℓ
= 1+ · Theorem 5.5
4 3 1 − 94
√
3
= 1 + 35 · (1 − ( 49 )ℓ )
4
√
3 8
= − 35 · ( 94 )ℓ
4 5
√ √ ℓ
2 3 3 3 4
= − · .
5 20 9

5.20 Write area(ℓ, s) to denote the area of a Sierpiński triangle with side length s and level ℓ. We can give a short inductive
proof to show that area(ℓ, s) = ( 43 )ℓ · area(0, s).
The base case (ℓ = 0) is immediate, because ( 34 )0 · area(0, s) = 1 · area(0, s) = area(0, s).
For the inductive case (ℓ ≥ 1), we assume the inductive hypothesis—namely area(ℓ − 1, s) = ( 43 )ℓ−1 · area(0, s) for
any s. Therefore:
area(ℓ, s) = 3 · area(ℓ − 1, 2s ) definition of the fractal

= 3 · ( 34 )ℓ−1 · area(0, 2s ) inductive hypothesis

= 3 · ( 34 )ℓ−1 · area(0,s)
4
halving the side length of an equilateral triangle quarters the area

= 3
4
· ( 3 ℓ−1
4
) · area(0, s) algebra
= ( 4 ) · area(0, s),
3 ℓ
algebra

completing the inductive proof. √ √

x2 s2
As in the previous exercise, an equilateral triangle with sides of length x has area 3 · 4
, so area(0, s) = 3· 4
.
Thus we have that the area of the Sierpiński triangle at level ℓ with size 1 is
√
area(ℓ, 1) = ( 34 )ℓ · 4
3
.

5.21 Write area(ℓ, s) to denote the area of a Sierpiński carpet with side length s and level ℓ. Here’s a short inductive proof
that area(ℓ, s) = ( 98 )ℓ · s2 :
The base case (ℓ = 0) is immediate, because ( 89 )0 = 1, and the area of the s-by-s square (the Sierpiński carpet at level
0) is s2 . For the inductive case (ℓ ≥ 1), we assume the inductive hypothesis—namely area(ℓ − 1, s) = ( 89 )ℓ−1 · s2 for any
s. Therefore:
area(ℓ, s) = 8 · area(ℓ − 1, 3s ) definition of the fractal
= 8 · ( 89 )ℓ−1 · ( 3s )2 inductive hypothesis

= ( 89 )ℓ · s2 , algebra

as desired.
5.2 Proofs by Mathematical Induction 73

5.22 The perimeter is infinite, because ( 43 )ℓ (see Exercise 5.16) grows without bound as ℓ increases.

5.23 The perimeter of the Sierpiński triangle at level ∞ is infinite, because 3 · ( 32 )ℓ (from Exercise 5.17) goes to infinity
as ℓ grows.

5.24 The perimeter of the Sierpiński carpet at level ∞ is infinite, because 4s + (4s/5) · [( 83 )ℓ − 1] (from Exercise 5.18)
goes to infinity as ℓ grows.

5.25 The area of the infinite Von Koch snowflake converges:

√ " #
1 X 4 i
ℓ−1
3
area of the Von Koch snowflake = 1+ · ( ) Exercise 5.19
4 3 i=0 9
√
3 1 1
= 1+ · Corollary 5.6, when ℓ is infinite
4 3 1 − ( 49 )
√
3 1 9
= 1+ ·
4 3 5
√
2 3
= .
5

√
5.26 The area of the Sierpiński triangle at level ∞ is zero, because ( 34 )ℓ · 4
3
(from Exercise 5.20) tends to zero as ℓ grows.

5.27 Like the Sierpiński triangle, the area of the Sierpiński carpet at level ∞ is zero, because ( 89 )ℓ (from Exercise 5.21)
tends to zero as ℓ grows.

5.28 A solution in Python is shown in the second block of code in Figure S.5.2.

5.29 A solution in Python is shown in the last block of code in Figure S.5.2.

5.30 The total of the entries in an n-by-n magic square is

2
X
n
n2 (n2 + 1)
i= ,
i=1
2

by Theorem 5.3. There are n rows, so if their sums are equal then the sum of the entries2 in each must be 1n th of the total.
Thus the row, column,2 and diagonal sums of an n-by-n magic square must all be n(n 2+1) . (To check this formula: for
n = 3, that value is 3(3 2+1) = 3·10
2
= 15. Indeed, each row, column, and diagonal of the example magic square given in
the exercise sums to 15.)

5.31 Let P(k) denote the claim H2k ≥ k

2
+ 1. We’ll prove P(k) for all integers k ≥ 0 by induction on k:

base case (k = 0): We need to show that H20 = H1 = 1 is at least 20 + 1—and, indeed, 1 ≥ 1.
inductive case (k ≥ 1): We assume the inductive hypothesis P(k − 1)—namely,

k−1
H2k−1 ≥ + 1,
2
74 Mathematical Induction

1 # Python solution using a graphics package in which:

2 # -- a Point is defined using a (x,y) coordinate pair.
3 # -- a Polygon is defined using the list of points that form its vertices.
4 # -- to draw a Polygon object p in a given window, we must call the p.draw(window) method.

5 def sierpinski_triangle(level, length, x, y):

6 '''Draws a Sierpinski triangle with side length length, level level, and
7 with bottom-left coordinate (x,y) in the graphics window called window.'''
8 if level == 0:
9 shape = Polygon(Point(x, y),
10 Point(x + length, y),
11 Point(x + length/2, y - int(length*math.sin(math.pi/3.0))))
12 shape.draw(window)
13 return 3 * length, 3**0.5 * length**2 / 4.0
14 else:
15 perimeter, area = 0, 0
16 for (newX, newY) in [(x, y),
17 (x + length / 2, y),
18 (x + length / 4, y - length * 0.5 * math.sin(math.pi/3.0))]:
19 p, a = sierpinski_triangle(level - 1, length / 2, newX, newY)
20 perimeter += p
21 area += a
22 return perimeter, area

23 def sierpinski_carpet(level, length, x, y):

24 '''Draws a Sierpinski carpet with side length length, level level, and
25 with bottom-left coordinate (x,y) in the graphics window called window.'''
26 if level == 0:
27 square = Rectangle(Point(x, y), Point(x + length, y + length))
28 square.setFill("blue")
29 square.draw(window)
30 return length * length
31 else:
32 area = 0
33 for dx in range(3):
34 for dy in range(3):
35 if dx != 1 or dy != 1:
36 area += sierpinski_carpet(level - 1, length / 3,
37 x + dx * (length / 3), y - dy * (length / 3))
38 return area

Figure S.5.2 Drawing a Sierpiński triangle and Sierpiński carpet in Python.

and we must prove P(k):

k
X2
1
H2k = definition of harmonic numbers
i=1
i
 k− 1   
2X X 2k
1 1
= +  definition of summation
i i
i=1 i=2k−1 +1
 
X 2k
1
= H2k−1 +   definition of harmonic numbers
i
i=2k−1 +1
 
X 2k
k−2 1
≥ +1+  inductive hypothesis
2 k− 1
i
i=2 +1
5.2 Proofs by Mathematical Induction 75

 
X2k
k−2 1
≥ +1+ every term in the summation is ≥ 1
2k
2 k− 1
2k
i=2 +1

k−2 1
= + 1 + 2k−1 · k there are 2k−1 terms in the summation
2 2
k−2 1
= +1+ algebra
2 2
k−1
= + 1. algebra
2
Thus P(k) is true too.

5.32 Assuming Theorem 5.9, we can finish the proof by observing the following. Let k = ⌊log n⌋, so that 2k ≤ n < 2k+1 —
that is, k ≤ log n < k + 1. Therefore, because Ha > Hb if and only if a > b, we know that
H2k ≤ Hn < H2k+1 . (∗)
By Theorem 5.9 and (∗), then, we know that
k/2 + 1 ≤ H2k ≤ Hn < H2k+1 ≤ k + 2.
Thus k/2 + 1 ≤ Hn < k + 2. By definition of k, we have that k ≤ log n < k + 1, which means that log n − 1 < k (and
thus (log n − 1)/2 + 1 < k/2 + 1), and that log n ≥ k (and thus log n + 2 ≥ k + 2). Putting these facts together yields
(log n − 1)/2 + 1 < k/2 + 1 ≤ Hn < k + 2 ≤ log n + 2.

5.33 Let x ≥ −1 be arbitrary. Let P(n) denote the claim that (1 + x)n ≥ 1 + nx. We’ll prove that P(n) holds for every
integer n ≥ 1 by induction.
base case (n = 1): P(1) follows because (1 + x)n and 1 + nx are both equal to 1 + x when n = 1.
inductive case (n ≥ 2): We assume the inductive hypothesis P(n − 1)—namely that (1 + x)n−1 ≥ 1 + (n − 1)x—and we
must prove P(n).

(1 + x)n = (1 + x) · (1 + x)n−1
≥ (1 + x) · (1 + (n − 1)x) inductive hypothesis

(Note that a ≤ b ⇒ ay ≤ by only when y ≥ 0, but because x ≥ −1 we know that 1 + x ≥ 0.)

= 1 + (n − 1)x + x + (n − 1)x2 algebra (multiplying out)

= 1 + nx + (n − 1)x 2
algebra (combining like terms)
≥ 1 + nx

because both n − 1 ≥ 0 (because n ≥ 2) and x2 ≥ 0 (because all squares are nonnegative; see Exercise 4.46), so
(n − 1)x2 ≥ 0.

5.34 For n = 3, we have 2n = 8 and n! = 3 · 2 · 1 = 6. For n = 4, we have 2n = 16 and n! = 4 · 3 · 2 · 1 = 24. So we’ll

prove that 2n ≤ n! for all integers n ≥ 4.
For the base case (n = 4), we argued above that 2n = 16 and n! = 24. And, indeed, 24 ≥ 16.
For the inductive case (n ≥ 5), we assume the inductive hypothesis 2n−1 ≤ (n − 1)! and we must prove 2n ≤ n!. But

2n = 2 · 2n−1
≤ 2 · (n − 1)! inductive hypothesis
≤ n · (n − 1)! n ≥ 5, so definitely n ≥ 2
= n!

5.35 Fix a particular integer b ≥ 1. Unlike most of these exercises, probably the hardest part here is identifying the integer
k after which the inequality bn ≤ n! begins to hold. (Another way of stating it: unusually, the base case is probably harder
to prove than the inductive case.) It turns out that k = 2b2 will work. Let P(n) be the claim that bn ≤ n!. Here’s a proof
that P(n) holds for all n ≥ 2b2 .
76 Mathematical Induction

2
base case (n = 2b2 ): We must prove P(2b2 )—that is, we must show that b[2b ] ≤ (2b2 )!.
2
Y
2b
(2b2 )! = i definition of factorial
i=1
2
Y
2b
≥ i dropping multiplicands i = 1, 2, . . . , b2
i=b2 +1
2
Y
2b
≥ b2 for i = b2 + 1, b2 + 2, . . . , 2b2 , we have i ≥ b2
i=b2 +1
2
= (b2 )[b ]
there are b2 multiplicands, each equal to b2
2
= b[2b ] . Theorem 2.8.4: (bx )y = bxy

2
Thus we have shown that (2b2 )! ≥ b[2b ] , which is just P(2b2 ).
inductive case (n ≥ 2b2 + 1): We assume the inductive hypothesis P(n − 1)—that is, bn−1 ≤ (n − 1)!—and we must
prove bn ≤ n!. But this follows precisely as in Exercise 5.34:

bn = b · bn−1
≤ b · (n − 1)! inductive hypothesis
≤ n · (n − 1)! n ≥ 2b2 + 1, so definitely n ≥ b
= n!.

5.36 Note that n2 ≥ 3n if and only if n2 − 3n = n · (n − 3) ≥ 0. Thus n2 = 3n for n ∈ {0, 3}. Let’s choose k = 3. We’ll
prove that n2 ≥ 3n for all n ≥ 3, by induction on n.
For the base case (n = 3), we have n2 = 9 and 3n = 9. And, indeed, 9 ≥ 9.
For the inductive case (n ≥ 4), we assume the inductive hypothesis (n − 1)2 ≥ 3(n − 1) and we must prove n2 ≥ 3n:

n2 = (n − 1)2 + 2n − 1
≥ 3(n − 1) + 2n − 1 inductive hypothesis
= 3n + 2n − 4.

Because n ≥ 4, we know 2n − 4 ≥ 0, so we have now shown that n2 ≥ 3n.

5.37 Observe that n3 ̸≤ 2n for n = 9, as 93 = 729 and 29 = 512. But 103 = 1000 and 210 = 1024. And once these
functions cross, they do not cross back. Let’s prove it: let’s prove that n3 ≤ 2n for all n ≥ 10 by induction on n.
For the base case (n = 10), we have n3 = 1000 and 2n = 1024. And, indeed, 1000 ≤ 1024.
For the inductive case (n ≥ 11), we assume the inductive hypothesis (n − 1)3 ≤ 2n−1 and we must prove n3 ≤ 2n . But

2n = 2 · 2n−1
≥ 2 · (n − 1)3 inductive hypothesis

= 2 · (n − 3n + 3n − 1)
3 2

= n3 + n3 − 6n2 + 6n − 2
≥ n3 + (11n2 − 6n2 ) + (66 − 2) n ≥ 11 implies both that n3 ≥ 11n2 and 6n ≥ 66
3 2
= n + 5n + 64
≥ n3 . 5n2 ≥ 0 and 64 ≥ 0

5.38 Let P(n) be the stated property, that odd?(n) returns True if and only if n is odd. We’ll prove that P(n) holds for all
integers n ≥ 0 by induction on n.
For the base case (n = 0), observe that 0 is even and that by inspection of the algorithm, indeed odd?(0) returns False
as desired.
5.2 Proofs by Mathematical Induction 77

For the inductive case (n ≥ 1), we assume the inductive hypothesis P(n − 1)—that is, odd?(n − 1) returns True if
and only if n − 1 is odd. We must prove P(n).
odd?(n) returns True ⇔ odd?(n − 1) returns False inspection of the algorithm
⇔ ¬ (n − 1 is odd) inductive hypothesis
⇔ n − 1 is even
⇔ n is odd.

5.39 Consider the following property:

Pn+k
P(k) = for any integer n, sum(n, n + k) returns i=n i.

Pfor all integers k ≥ 0, by induction on k. (An equivalent way of stating this proof is that we
We’ll prove that P(k) holds
are proving sum(n, m) = mi=n i for any n and m ≥ n by induction on the quantity m − n.)
For the base case, we must prove P(0). Let n be an arbitrary integer. Then
sum(n, n + 0) = n + 0 inspection of the algorithm
=n
X
n+0
= i.
i=n

For the inductive case (k ≥ 1), we assume the inductive hypothesis P(k − 1), and we must prove P(k).

sum(n, n + k) = n + sum(n + 1, n + k) inspection of the algorithm

X
n+k
=n+ i inductive hypothesis (applied on n′ = n + 1)
i=n+1

X
n+k
= i.
i=n

5.40 The proof of Exercise 5.39 would have to change almost not at all. The inductive hypothesis still applies (because we
have (m − 1) − n = (m − n) − 1, just like m − (n + 1) = (m − n) − 1). The key fact about summations that we used
in the previous proof was
X
n+k X
n+k
i=n+ i;
i=n i=n+1

in the modified proof, we’d use the key fact that

X
n+k X
n+k−1
i = (n + k) + i.
i=n i=n

But no other changes would have to be made.

5.41 Let P(n) denote the proposition that 8n − 3n is divisible by 5. We will prove P(n) for all n ≥ 0 by induction on n.
For the base case (n = 0), we must prove that 80 − 30 is divisible by 5. Because 80 − 30 = 1 − 1 = 0, the property
holds immediately because 0 is divisible by 5.
For the inductive case (n ≥ 1), we assume the inductive hypothesis P(n − 1), that 8n−1 − 3n−1 is evenly divisible by
5—say, 8n−1 − 3n−1 = 5k for some integer k. We must prove P(n), that 8n − 3n is evenly divisible by 5 too.
8n − 3n = 8 · (8n−1 ) + 3 · (3n−1 )
= 8 · (8n−1 ) + 3 · (3n−1 ) + 5 · (3n−1 ) − 5 · (3n−1 )
= 8 · (8n−1 + 3n−1 ) − 5 · (3n−1 )
= 8 · (5k) − 5 · (3n−1 ) inductive hypothesis

= 5 · (8k − 3 n−1
).
78 Mathematical Induction

Because (8k − 3n−1 ) is an integer, then, 8n − 3n is evenly divisible by 5, and the theorem follows.

5.42 By computing 9n mod 10 for a few small values of n, a pattern begins to emerge:

90 mod 10 = 1 mod 10 =1
1
9 mod 10 = 9 mod 10 =9
2
9 mod 10 = 81 mod 10 =1
3
9 mod 10 = 729 mod 10 = 9
94 mod 10 = 6561 mod 10 = 1

Thus we conjecture that the following property P(n) holds for any integer n ≥ 0:

(
n 1 if n is even
9 mod 10 =
9 if n is odd.

Let’s prove it by induction on n.

base case (n = 0): The number 0 is even, and indeed 90 mod 10 = 1. Thus P(0) holds.
inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1), and we must prove P(n). Note that, by definition
of mod, the fact that x mod 10 = y means that there exists an integer k such that x = 10k + y. We’ll use that fact in
our proof:

9n mod 10
= 9 · 9n−1 mod 10 definition of exponentiation
(
9 · (10k + 1) mod 10, for some integer k if n − 1 is even
= inductive hypothesis and the above discussion
9 · (10k + 9) mod 10, for some integer k if n − 1 is odd
(
(90k + 9) mod 10, for some integer k if n is odd
= n − 1 is odd ⇔ n is even; multiplying out the product
(90k + 81) mod 10, for some integer k if n is even
(
9 mod 10 if n is odd
= 10k + b mod 10 = b mod 10 for any integers k and b
81 mod 10 if n is even
(
9 if n is odd
=
1 if n is even.

Thus we have shown P(n), and the inductive case follows.

5.43 By computing 2n mod 7 for a few small values of n, we see that the value cycles among three values: 1 (when n is a
multiple of three), 2 (when n is one more than a multiple of three), and 4 (when n is two more than a multiple of three).
Here’s a slightly glib way of phrasing this property:

2n mod 7 = 2n mod 3 .

We’ll prove that this property holds for all n ≥ 0 by induction on n. For the base case (n = 0), we have

20 mod 7 = 1 mod 7 = 1 = 20 = 20 mod 3 ,

as required.
5.3 Proofs by Mathematical Induction 79

For the inductive case (n ≥ 1), we assume the inductive hypothesis 2n−1 mod 7 = 2n−1 mod 3 , and we must prove
2 mod 7 = 2n mod 3 . Here is the proof:
n

2n mod 7
= (2 · 2n−1 ) mod 7 definition of exponentiation


2 · (7k + 1) mod 7, for some integer k if n − 1 mod 3 = 0
= 2 · (7k + 2) mod 7, for some integer k if n − 1 mod 3 = 1 inductive hypothesis and the discussion in Exercise 5.42

2 · (7k + 4) mod 7, for some integer k if n − 1 mod 3 = 2


2 mod 7 if n − 1 mod 3 = 0
= 4 mod 7 if n − 1 mod 3 = 1 7k + b mod 7 = b mod 7 for any integers k and b

8 mod 7 if n − 1 mod 3 = 2


2 if n − 1 mod 3 = 0, a.k.a. if n mod 3 = 1
= 4 if n − 1 mod 3 = 1, a.k.a. if n mod 3 = 2 properties of mod

1 if n − 1 mod 3 = 2, a.k.a. if n mod 3 = 0

= 2n mod 3 .

5.44 When we’re counting from 00 · · · 0 to 11 · · · 1, how many times does the ith bit have to change? The least-significant
bit changes every step. The second-least-significant bit changes every other step. The third-least-significant bit changes
every fourth step. And so forth. In total, then, the number of bit flips is

X
n−1 n
2 X
n−1 Xn X
n−1 X
n−1

i
= 2n−i = 2j = 2k+1 = 2 · 2k .
i=0
2 i=0 j=1 k=0 k=0

Thus, by Example 5.2, the total number of bit flips is 2 · (2n − 1) = 2n+1 − 2. (For n = 3, as in the example in the question,
this quantity is 24 − 2 = 16 − 2 = 14, so the formula checks out.) Thus the average number of bit flips per step is

2n+1 − 2 1
= 2 − n−1 ,
2n 2
just shy of two bit flips per step on average.

5.45 Let P(n) be the proposition that the regions defined by n fences can be labeled 0 (no dog) or 1 (dog) so that no pair
of adjacent regions has the same label. We’ll prove that P(n) holds for all n by induction on n.
For the base case (n = 0), there are no fences and the whole yard is a single region. Label it 1. Every pair of adjacent
regions has different labels vacuously, because there are no pairs of adjacent regions. Thus P(0) is true.
For the inductive case (n ≥ 1), we assume the inductive hypothesis P(n − 1): it’s possible to label the regions defined
by any n − 1 fences so that no adjacent pair of regions has the same label. We must prove P(n). Consider an arbitrary
configuration of n fences, numbered #1, #2, . . . #(n − 1), #n. Ignoring fence #n, by the inductive hypothesis we can label
the polygonal regions defined by only the first n − 1 fences—fences #1, #2, . . . #(n − 1)—so that no adjacent pair of these
regions have the same label. Now place the nth fence down, and invert the label for every region on one side of fence #n
(0 becomes 1; 1 becomes 0). We claim that adjacent regions in this configuration must different labels:
• If two regions are adjacent because they share a fence other than fence #n, then they have different labels because, by
the inductive hypothesis, our labels for the first n − 1 fences had no adjacent regions with the same label.
• If two regions are adjacent because of fence #n, then they have different labels because they had the same label before
the inversion (they were actually the same region before fence #n was added), so after the inversion they have different
labels.
Therefore, P(n − 1) implies P(n), and the entire claim follows.

5.3 Proofs by Mathematical Induction

5.46 We proceed by strong induction on n.
80 Mathematical Induction

Base cases (n = 0 and n = 1): By inspection, on inputs n = 0 and n = 1 the algorithm returns in Line 2 with only
the one call to parity (that is, without making a recursive call). Thus the depth of the recursion is 1, and indeed
1 + ⌊0/2⌋ = 1 + ⌊1/2⌋ = 1 + 0 = 1.
Inductive case (n ≥ 2): We assume the inductive hypothesis—for any integer 0 ≤ k < n, the recursion depth of parity(k)
is 1 + ⌊k/2⌋—and we must prove that the recursion depth of parity(n) is 1 + ⌊n/2⌋.

[recursion depth of parity(n)] = 1 + [recursion depth of parity(n − 2)] by inspection (because n ≥ 2 and by Line 4)
= 1 + 1 + ⌊(n − 2)/2⌋ by the inductive hypothesis
= 1 + ⌊1 + (n − 2)/2⌋ adding an integer inside or outside the floor is equivalent
= 1 + ⌊n/2⌋ ,

as desired.

5.47 We proceed by strong induction on n.

Base cases (n = 0 and n = 1): By inspection, on inputs n = 0 and n = 1 the algorithm returns ⟨b0 ⟩ = ⟨n⟩ in Line 2.
Indeed, exactly as desired, we have

X
0
bi 2i = b0 20 = b0 = n.
i=0

Inductive case (n ≥ 2): We assume the inductive hypothesis—namely that, for any integer 0 ≤ m < n, the value returned
P
by toBinary(m) is a tuple ⟨aℓ , . . . , a0 ⟩ satisfying ℓi=0 ai 2i = m. We must show that toBinary(n) returns ⟨bk , . . . , b0 ⟩
Pk
such that i=0 bi 2i = n. Let ⟨bk , bk−1 , . . . , b1 , b0 ⟩ be the value returned by toBinary(n) in Line 6. By inspection
of the algorithm, we see that ⟨bk , bk−1 , . . . , b1 ⟩ = ⟨ak−1 , . . . , a0 ⟩, where ⟨ak−1 , . . . , a0 ⟩ is the value returned by
toBinary(⌊n/2⌋), and b0 = parity(n). Thus:

" #
X
k X
k
i
bi 2 = ai−1 2 i
+ (parity(n)) · 20 inspection of the algorithm/definition of bi and ai values
i=0 i=1
" k−1 #
X
=2· ai 2 i
+ (parity(n)) · 20 reindexing and pulling out a factor of 2
i=0

= 2 · ⌊n/2⌋ + (parity(n)) · 20 inductive hypothesis (which applies because 0 ≤ ⌊n/2⌋ < n)

(
2 · n2 + 0 if n is even
= definition of parity and floor
2 · n−1
2
+ 1 if n is odd
= n.
Pk
Thus we have shown that i=0 bi 2i = n, where toBinary(n) returns ⟨bk , . . . , b0 ⟩, precisely as required.

5.48 Let P(k) denote the following claim:

Pk−1 Pk−1
If i=0 ai 2i = i=0 bi 2i with a, b ∈ {0, 1}k , then a = b.

We’ll prove P(k) for all k ≥ 1, by induction on k.

base case (k = 1): There are only two 1-bit bitstrings, ⟨0⟩ and ⟨1⟩. If a0 · 20 = b0 · 20 , then necessarily a0 = b0 .
inductive case (k ≥ 1): We assume the inductive hypothesis P(k − 1), and we must prove P(k). Suppose that we have
bitstrings a ∈ {0, 1}k and b ∈ {0, 1}k such that

X
k−1 X
k−1
ai 2i = bi 2i .
i=0 i=0

Write n to denote the common value of these summations.

5.3 Proofs by Mathematical Induction 81

First, we argue that ak−1 = bk−1 (that is, the most-significant bits match). Suppose not—namely suppose (without
loss of generality) that ak−1 = 1 but bk−1 = 0. But then

X
k−1 X
k−2
ai 2i = 1 · 2k−1 + ai 2i ≥ 1 · 2k−1 = 2k−1
i=0 i=0

X
k−1 X
k−2 X
k−2
and bi 2i = 0 · 2k−1 + bi 2i ≤ 2i = 2k−1 − 1
i=0 i=0 i=0
Pk−1 Pk−1
by Example 5.2. But then i=0 ai 2 ≥ 2
i k−1
and i=0 bi 2 ≤ 2
i k−1
− 1, so these two summations cannot be equal.
Now observe that
X
k−1 X
k−1
ai 2i = bi 2i by assumption
i=0 i=0

−ak−1 2k−1 = −bk−1 2k−1 by the above argument that ak−1 = bk−1

and therefore

X
k−2 X
k−2
ai 2i = bi 2i . by subtraction
i=0 i=0

By the inductive hypothesis P(k − 1), then, we know that ⟨ak−2 , ak−3 , . . . , a0 ⟩ = ⟨bk−2 , bk−3 , . . . , b0 ⟩. Because we’ve
already shown that ak−1 = bk−1 , we therefore know that a = b, and we’ve established P(k).

5.49 Here’s the algorithm:

remainder(n, k): // assume n ≥ 0 and k ≥ 1 are integers.

1 if n ≤ k − 1 then
2 return n
3 else
4 return remainder(n − k, k)

Let k ≥ 1 be an arbitrary integer. For an integer n, let Pk (n) denote the property that remainder(n, k) = n mod k. We
claim that Pk (n) holds for any integer n ≥ 0. We proceed by strong induction on n:
Base cases (n ∈ {0, 1, . . . , k − 1}): By inspection, on input n ≤ k − 1, the algorithm returns n in Line 2. Indeed, by
definition of mod we have that n mod k = n for any such n. Thus Pk (0), Pk (1), . . . , Pk (k − 1) all hold.
Inductive case (n ≥ k): We assume the inductive hypothesis Pk (0) ∧ Pk (1) ∧ · · · ∧ Pk (n − 1), and we must prove Pk (n).
remainder(n, k) = remainder(n − k, k) by inspection (specifically because n ≥ k and by Line 4)
= (n − k) mod k by the inductive hypothesis Pk (n − k), specifically because n ≥ k ⇒ n > n − k ≥ 0
= n mod k. (n − k) mod k = n mod k by Definition 2.11

5.50 Here is the algorithm:

baseConvert(n, k): // assume n ≥ 0 and k ≥ 2 are integers.

1 if n ≤ k − 1 then
2 return ⟨n⟩
3 else
4 ⟨bk , . . . , b0 ⟩ := baseConvert(⌊n/k⌋ , k)
5 x := remainder(n, k)
6 return ⟨bk , . . . , b0 , x⟩

Pℓ
We must prove that baseConvert(n, k) = ⟨bℓ , bℓ−1 , . . . , b0 ⟩ implies n = i=0 ki bi . We’ll use strong induction on n:
82 Mathematical Induction

Base cases (n ∈ {0, 1, . . . , k − 1}): By inspection, on input n ∈ {0, 1, . . . , k − 1} the algorithm returns ⟨b0 ⟩ = ⟨n⟩ in
Line 2. Indeed, exactly as desired, we have that
X
0
bi ki = b0 k0 = b0 = n.
i=0

Inductive case (n ≥ k): We assume the inductive hypothesis—for any integer 0 ≤ m < n, the value ⟨aℓ , . . . , a0 ⟩ returned
P
by baseConvert(m, k) satisfies ℓi=0 ai ki = m. We must show that baseConvert(n, k) returns ⟨bt , . . . , b0 ⟩ such
Pt
that i=0 bi ki = n. Let ⟨bt , bt−1 , . . . , b1 , b0 ⟩ be the value returned by baseConvert(n, k) in Line 6. By inspec-
tion, ⟨bt , bt−1 , . . . , b1 ⟩ = ⟨aℓ−1 , . . . , a0 ⟩, where ⟨aℓ−1 , . . . , a0 ⟩ is the value returned by baseConvert(⌊n/k⌋ , k), and
b0 = remainder(n, k). Thus:

" #
X
ℓ X
ℓ
i
bi k = ai−1 k i
+ (remainder(n, k)) · k0 inspection of the algorithm/definition of bi and ai values
i=0 i=1
" ℓ−1 #
X
=k· ai k i
+ (remainder(n, k)) · k0 reindexing and pulling out a factor of k
i=0

= k · ⌊n/k⌋ + (remainder(n, k)) · k0 inductive hypothesis (which applies because 0 ≤ ⌊n/k⌋ < n)
= k · ⌊n/k⌋ + n mod k Exercise 5.49
= n. by the definition of mod, Definition 2.11
Pℓ
Thus we have shown that i=0 bi k = n, where baseConvert(n, k) returns ⟨bℓ , . . . , b0 ⟩, precisely as required.
i

5.51 A solution in Python is shown in Figure S.5.3.

5.52 Let P(n) denote the claim “the first player can force a win if and only if n mod 3 ̸= 0.” We’ll prove that P(n) holds
for any integer n ≥ 1, by strong induction on n.
There are two base cases: if n = 1 or n = 2, then the first player immediately wins by taking n cards (and indeed
1 mod 3 ̸= 0 and 2 mod 3 ̸= 0).
For the inductive case, let n ≥ 3 be arbitrary. We’ll show that, assuming the inductive hypotheses P(n−1) and P(n−2),
that P(n) holds as well. The key observation is the following:
• if n is divisible by three, then neither n − 1 nor n − 2 are.
• if n is not divisible by three, then either n − 1 or n − 2 is.

1 def remainder(n, k):

2 '''
3 Compute n mod k, given n>=0 and k>=1.
4 '''
5 if n <= k - 1:
6 return n
7 else:
8 return remainder(n - k, k)
9
10 def base_convert(n, k):
11 '''
12 Convert n>=0 into base k>=1. The output is of the form [xM-1, xM-2, ..., x0]
13 where each xI is an integer between 0 and k-1.
14 '''
15 if n <= k - 1:
16 return [n]
17 else:
18 L = base_convert(n // k, k)
19 x = remainder(n, k)
20 return L + [x]

Figure S.5.3 Implementations of remainder and baseConvert in Python.

5.3 Proofs by Mathematical Induction 83

In the former case, regardless of whether the first player takes one or two cards, the remaining number of cards is not
divisible by three, so the player who goes next can force a win, by the inductive hypothesis. Thus if n is divisible by three,
then the second player can force a win.
In the latter case, the first player can choose to take 1 card if n − 1 is divisible by three, or 2 cards if n − 2 is divisible by
three. By the inductive hypothesis, in the resulting configuration the next player is forced to lose. Thus if n is not divisible
by three, then the first player can force a win.

5.53 Define Pk (n) to denote the claim that “the first player can guarantee a win if and only if k + 1 ̸ | n.” We’ll prove by
strong induction that Pk (n) holds for any integer n ≥ 1.
base cases (n ≤ k): Then there are at most k cards on the table, so the first player can take all n cards and win. Indeed
k + 1 ̸ | n for n ∈ {1, 2, . . . , k}, and thus Pk (n) holds.
inductive case (n ≥ k + 1): We assume the inductive hypotheses Pk (1), Pk (2), . . . , Pk (n − 1). We must prove Pk (n). The
key number-theoretic observation is the following:

k + 1 | n if and only if there is no a ∈ {1, 2, . . . , k} such that k + 1 | n − a.

Thus we have

k + 1 ̸ | n ⇔ ∃a ∈ {1, 2, . . . , k} : k + 1 | n − a the above properties of divisibility

⇔ ∃a ∈ {1, 2, . . . , k} : if player #1 takes a cards, then player #2 has no winning move
inductive hypotheses Pk (n − 1), Pk (n − 2), . . . , Pk (n − k)
⇔ player #1 has a winning move.

In other words, player #1 has a winning move when there are n cards on the table if and only if she can leave a multiple
of k + 1 cards on the table after her move, which she can do if and only if n is not itself a multiple of k + 1. Thus Pk (n)
follows.

5.54 Let P(n) denote the claim “the first player can force a win if and only if n mod 3 ̸= 1.” We’ll prove that P(n) holds
for any integer n ≥ 1, by strong induction on n.
For the base cases (n ∈ {1, 2}), observe that the first player immediately loses with n = 1, and indeed 1 mod 3 = 1;
and the first player wins with n = 2 by taking one card (leaving one for the second player to take), and indeed 2 mod 3 ̸= 1.
For the inductive case n ≥ 2, we assume the inductive hypotheses P(n − 1) and P(n − 2), and we must prove P(n).
Player #1 has two choices: taking one card and leaving n − 1 behind, or taking two cards and leaving n − 2 behind.
Player #1 can force a win if either of those two choices leaves a game in which Player #2 cannot force a win. Note that
n mod 3 = 1 if and only if 1 ∈ / {n − 1 mod 3, n − 2 mod 3}. By the inductive hypothesis, Player #2 wins the game with
n − a cards if and only n − a mod 3 = 1, and thus

Player #1 wins the game with n cards

⇔ Player #2 does not win the game with n − 1 cards or Player #2 does not win the game with n − 2 cards
⇔ [n − 1 mod 3 = 1] or [n − 2 mod 3 = 1] by the inductive hypothesis
⇔ n mod 3 ̸= 1. by properties of mod

Thus P(n) follows.

5.55 Let P(n) be the property that n can be written as n = 2a + 5b for nonnegative integers a, b. We’ll prove that P(n)
holds for all integers n ≥ 4 by strong induction on n.
There are two base cases, for n = 4 and n = 5: P(4) follows for a = 2 and b = 0 (because 2 · 2 + 0 · 5 = 4), and
P(5) follows for a = 0 and b = 1 (because 0 · 2 + 1 · 5 = 5).
For the inductive case (n ≥ 6), we assume the inductive hypotheses, namely P(0), P(1), . . . , P(n − 1). We must prove
P(n). Because n ≥ 6, we know that n − 2 ≥ 4; thus by the inductive hypothesis P(n − 2) we know that 2c + 5d = n − 2
for nonnegative integers c and d. Then selecting a = c + 1 and b = d, we know that 2a + 5b = n, which establishes P(n).

5.56 Let P(n) denote the claim “a team can score exactly n points”. It’s fairly clear that P(11) is false: neither 11 nor
4 = 11 − 7 are divisible by 3, and two or more 7-point scores aren’t possible for a total of 11. But we’ll prove that
∀n ≥ 12 : P(n) by strong induction on n.
There are three base cases, for n ∈ {12, 13, 14}:

12 = 3 + 3 + 3 + 3 13 = 3 + 3 + 7 14 = 7 + 7.
84 Mathematical Induction

For the inductive case (n ≥ 15), we assume the inductive hypothesis, namely ∀n′ ∈ {12, . . . , n − 1} : P(n′ ). We must
prove P(n). Let m = n − 3. Because n ≥ 15, we have that 12 ≤ m < n. By the inductive hypothesis, then, we know that
P(m) holds. Therefore P(n) must hold as well: to accumulate n points, first accumulate n − 3 points according to P(m),
and then accumulate three more via a 3-point score.
For small values of n, one can verify that {0, 3, 6, 7, 9, 10} are achievable and {1, 2, 4, 5, 8, 11} are not. Thus the set
of scores that are achievable are Z≥0 − {1, 2, 4, 5, 8, 11}.

5.57 We claim that fn mod 2 = 0 if and only if n mod 3 = 0. Here’s a proof by strong induction on n:
base case I (n = 1): Note that 1 mod 3 ̸= 0 and f1 mod 2 = 1 mod 2 = 1. So for n = 1 indeed both fn mod 2 ≠ 0 and
n mod 3 ̸= 0.
base case II (n = 2): Note that 2 mod 3 ̸= 0 and f2 mod 2 = 1 mod 2 = 1. So for n = 2 indeed both fn mod 2 ̸= 0 and
n mod 3 ̸= 0.
inductive case (n ≥ 3): We assume the inductive hypothesis—namely, that fk mod 2 = 0 if and only if k mod 3 = 0 for
any k ≤ n − 1—and we need to prove the desired property for fn . Note that n − 1 mod 3 = (n mod 3) − 1 unless
n mod 3 = 0, in which case n − 1 mod 3 = 2, and similarly n − 2 mod 3 = (n mod 3) + 1 unless n mod 3 = 2, in
which case n − 2 mod 3 = 0. Thus:
fn = fn−1 + fn−2 definition of the Fibonaccis


odd + odd if n mod 3 = 0
= even + odd if n mod 3 = 1 inductive hypothesis and above discussion

odd + even if n mod 3 = 2


even if n mod 3 = 0
= odd if n mod 3 = 1 odd plus odd is even; even plus odd is odd

odd if n mod 3 = 2.

Thus fn mod 3 = 0 if and only if n mod 3 = 0.

5.58 The proof is similar to Exercise 5.57, though we’re going to have to build a much bigger inductive claim. (There are
lots more cases!) Let P(n) denote the following property:


 0 if n mod 8 = 0



 1 if n mod 8 = 1



 1 if n mod 8 = 2

2 if n mod 8 = 3
fn mod 3 =

 0 if n mod 8 = 4



 2 if n mod 8 = 5



 2 if n mod 8 = 6


1 if n mod 8 = 7

We’ll prove P(n) for all n ≥ 1 by strong induction on n.

base cases (n ∈ {1, 2, . . . , 8}): Examine the following table:


 1 mod 3 = 1 if n = 1



 1 mod 3 = 1 if n = 2



 2 mod 3 = 2 if n = 3

3 mod 3 = 0 if n = 4
fn mod 3 =

 5 mod 3 = 2 if n = 5



 8 mod 3 = 2 if n = 6



 13 mod 3 = 1 if n = 7


21 mod 3 = 0 if n = 8 (when n mod 8 = 0)

These cases all match the definition of the property P, and thus P(1), P(2), . . . , P(8) all follow.
inductive case (n ≥ 9): We assume the inductive hypothesis—namely, P(k) for any k ≤ n − 1—and we must prove P(n).
First, observe that
5.3 Proofs by Mathematical Induction 85

if n mod 8 = 0 1 2 3 4 5 6 7
then n − 1 mod 8 = 7 0 1 2 3 4 5 6
and n − 2 mod 8 = 6 7 0 1 2 3 4 5 .
Thus, by the induction hypothesis, we have that
if n mod 8 = 0 1 2 3 4 5 6 7
then fn−1 mod 3 = 1 0 1 1 2 0 2 2
and fn−2 mod 3 = 2 1 0 1 1 2 0 2
and thus fn−1 mod 3 + fn−2 mod 3 = 3 1 1 2 3 2 2 4 .
Because fn = fn−1 + fn−2 by definition, and (a + b) mod 3 = [a mod 3 + b mod 3] mod 3 for any a and b, thus
if n mod 8 = 0 1 2 3 4 5 6 7
then fn mod 3 = 0 1 1 2 0 2 2 1 ,
exactly as required by the definition of the property P.

5.59 Let P(n) denote the property that

X
n
fi = fn+2 − 1.
i=1

We’ll prove that P(n) holds for all n ≥ 1 by (weak) induction on n. (Despite the appearance of the Fibonaccis, we’ll
actually be able to prove this result by weak induction.)
For the base cases (n = 1 and n = 2), we have that
X
1 X
2
fi = f1 = 1 = 2 − 1 = f3 − 1 and fi = f1 + f2 = 1 + 1 = 3 − 1 = f4 − 1
i=1 i=1

because f1 = f2 = 1 and f3 = 2 and f4 = 3. Thus P(1) and P(2) follow.

For the inductive case (n ≥ 3), we assume the inductive hypothesis P(n − 1), and we must prove P(n):
" n−1 #
Xn X
fi = fi + fn definition of summations
i=1 i=1
= [fn+1 − 1] + fn inductive hypothesis P(n − 1)
= [fn+1 + fn ] − 1 rearranging
= fn+2 − 1. definition of the Fibonaccis

5.60 Let P(n) denote the property that

X
n
(fi )2 = fn · fn+1 .
i=1

We’ll prove that P(n) holds for all n ≥ 1 by (weak) induction on n.

For the base cases (n = 1 and n = 2), we have that
X
1 X
2
(fi )2 = 12 = 1 = 1 · 1 = f1 · f2 and (fi )2 = 12 + 12 = 1 + 1 = 1 · 2 = f2 · f3
i=1 i=1

because f1 = f2 = 1 and f3 = 2. Thus P(1) and P(2) follow.

For the inductive case (n ≥ 2), we assume the inductive hypothesis P(n − 1), and we must prove P(n):
" n−1 #
Xn
2
X 2
(fi ) = (fi ) + (fn )2 definition of summations
i=1 i=1

= fn−1 · fn + (fn )2 inductive hypothesis P(n − 1)

= fn · (fn−1 + fn ) factoring
= fn · fn+1 . definition of the Fibonaccis
86 Mathematical Induction

5.61 Let P(n) denote the property that

fn−1 · fn+1 − (fn )2 = (−1)n .

We’ll prove that P(n) holds for all n ≥ 2 by (weak) induction on n.
For the base case (n = 2), we have that
f2−1 · f2+1 − f22 = f1 · f3 − f22 = 1 · 2 − 12 = 2 − 1 = 1 = (−1)2
because f1 = f2 = 1 and f3 = 2. Thus P(2) follows.
For the inductive case (n ≥ 3), we assume the inductive hypothesis P(n − 1), and we must prove P(n):

fn−1 · fn+1 − (fn )2 = fn−1 · (fn−1 + fn ) − (fn )2 definition of the Fibonacci numbers

= f2n−1 − fn · (fn − fn−1 ) rearranging terms

= − fn · fn−2
f2n−1 definition of the Fibonacci numbers
h i
= −1 · fn · fn−2 − f2n−1 rearranging terms

= −1 · (−1) n−1
inductive hypothesis
= (−1)n .

5.62 Let P(n) denote the property that

n−1
1 1 1 fn
· = .
1 0 0 fn−1

We’ll prove P(n) for all n ≥ 2 by (weak) induction on n.

For the base case (n = 2), observe that
1
1 1 1 1 1 1 1
· = · =
1 0 0 1 0 0 1

by definition of M1 = M · M0 = M · I = M, and by definition of the identity matrix I and of matrix multiplication.

For the inductive case (n ≥ 3), we assume the inductive hypothesis P(n − 1), and we must prove P(n).
n−1 n−2 !
1 1 1 1 1 1 1 1
· = · · definition of matrix powers
1 0 0 1 0 1 0 0
n−2 !
1 1 1 1 1
= · · associativity of matrix multiplication
1 0 1 0 0

1 1 f
= · n−1 inductive hypothesis P(n − 1)
1 0 fn−2

f + fn−2
= n−1 definition of matrix multiplication
fn−1

fn
= . definition of the Fibonacci numbers
fn−1

5.63 Let P(n) denote the property that Ln = fn + 2fn−1 . We proceed by strong induction on n that P(n) for any n ≥ 2.
We’ll prove two base cases, n = 2 and n = 3. For the first base case (n = 2), we have that L2 = 3 by definition, and
indeed f2 + 2f1 = 1 + 2 · 1 = 3 as well. For the second base case (n = 3), we have that L3 = L2 + L1 = 1 + 3 = 4 by
definition, and indeed f3 + 2f2 = 2 + 2 · 1 = 4 as well.
For the inductive case (n ≥ 4), we assume the inductive hypotheses—namely P(n − 1) and P(n − 2)—and we must
prove P(n):
Ln = Ln−1 + Ln−2 definition of the Lucas numbers
= fn−1 + 2fn−2 + fn−2 + 2fn−3 inductive hypothesis
= fn−1 + fn−2 + 2(fn−2 + fn−3 ) rearranging
= fn + 2(fn−1 ). definition of the Fibonacci numbers
5.3 Proofs by Mathematical Induction 87

5.64 We’ll prove

Ln−1 + Ln+1
fn =
5
by strong induction on n.
We’ll prove two base cases, n = 2 and n = 3. For n = 2, we have (L1 + L3 )/5 = (1 + 4)/5 = 1 = f2 . For n = 3, we
have (L2 + L4 )/5 = (3 + 7)/5 = 2 = f3 .
L +L
For the inductive case (n ≥ 4), we assume the inductive hypotheses—specifically, fk = k−1 5 k+1 for any positive
Ln−1 +Ln+1
integer k with 2 ≤ k < n—and we must prove fn = 5
:

fn = fn−1 + fn−2 definition of the Fibonacci numbers

Ln−2 + Ln Ln−3 + Ln−1
= + inductive hypothesis
5 5
[Ln−2 + Ln−3 ] + [Ln + Ln−1 ]
= rearranging
5
Ln−1 + Ln+1
= . definition of the Lucas numbers
5

5.65 Let P(n) denote the property that (Ln )2 = 5(fn )2 + 4(−1)n . In the midst of the inductive case of a proof that P(n)
holds for all n, I found myself with a sequence of calculations that showed that
h i
(Ln )2 = 5 (fn )2 − 2fn−1 fn−2 + 2Ln−1 Ln−2 .

But the property P(n) that we need to prove is

(Ln )2 = 5(fn )2 + 4(−1)n .

So, apparently, we need the following property to be true:
−10fn−1 fn−2 + 2Ln−1 Ln−2 = 4(−1)n
or, that is, Ln−1 Ln−2 − 5fn−1 fn−2 = 2(−1)n .

Call Q(n) the property Ln Ln−1 = 5fn fn−1 + 2(−1)n−1 . So I tried to prove Q(n) by induction—but in the midst of the
inductive case of a proof that Q(n) holds for all n, I found myself with a sequence of calculations relied on P(n − 1).
So we’re going to prove P(n) ∧ Q(n) for all n ≥ 2 by strong induction on n. (In other words, the property that we seek
to prove inductively is that both P(n) and Q(n) are true.)
First note that P(1) holds, because

(L1 )2 = 12 = 1 and 5(f1 )2 + 4(−1)1 = 5 · 12 − 4 = 1.

Now for the inductive proof of P(n) ∧ Q(n) for all n ≥ 2:
base case #1 (n = 2): To prove P(2), note that

(L2 )2 = 32 = 9 and 5(f2 )2 + 4(−1)2 = 5 · 12 + 4 = 9

as well. To prove Q(2), observe that

L1 L2 = 1 · 3 = 3 and 5f2 f1 + 2(−1)1 = 5 · 1 · 1 − 2 = 5 − 2 = 3,

as desired.
base case #2 (n = 3): To prove P(3), note that

(L3 )2 = 42 = 16 and 5(f3 )2 + 4(−1)3 = 5 · 22 − 4 = 20 − 4 = 16

as well. To prove Q(3), observe that

L3 L2 = 4 · 3 = 12 and 5f3 f2 + 2(−1)2 = 5 · 2 · 1 + 2 = 10 + 2 = 12

as desired.
88 Mathematical Induction

inductive case (n ≥ 4): We assume the inductive hypotheses Q(2) ∧ P(2), . . ., Q(n − 1) ∧ P(n − 1), and we must prove
Q(n) and P(n). Let’s start with P(n):

(Ln )2 = (Ln−1 + Ln−2 )2 definition of the Lucas numbers

2 2
= (Ln−1 ) + 2Ln−1 Ln−2 + (Ln−2 ) multiplying out

= 5(fn−1 )2 + 4(−1)n−1 + 2Ln−1 Ln−2 + 5(fn−2 )2 + 4(−1)n−2 inductive hypotheses P(n − 1) and P(n − 2)
{ }
= 5(fn−1 )2 + 2Ln−1 Ln−2 + 5(fn−2 )2 n − 1 is even if and only n − 2 is odd, so 4(−1)n−1 , 4(−1)n−2 = {4, −4}

= 5(fn−1 )2 + 2[5fn fn−1 + 2(−1)n−2 ] + 5(fn−2 )2 inductive hypothesis Q(n − 1)

2 2 n−2
= 5[(fn−1 ) + 2fn fn−1 + (fn−2 ) ] + 4(−1) rearranging

= 5[(fn−1 + fn−2 )2 ] + 4(−1)n−2 rearranging

2 n
= 5[(fn ) ] + 4(−1) , definition of the Fibonacci numbers; (−1) n−2
= (−1)n

which is precisely P(n). To prove Q(n):

(Ln )(Ln−1 ) = (Ln−1 + Ln−2 )(Ln−1 ) definition of the Lucas numbers

= (Ln−1 )2 + Ln−2 Ln−1 multiplying out

2 n−1 n−2
= 5(fn−1 ) + 4(−1) + 5fn−1 fn−2 + 2(−1) inductive hypotheses P(n − 1) and Q(n − 1)
n−1 n−2
= 5(fn−1 )(fn−1 + fn−2 ) + 4(−1) + 2(−1) factoring
n−1 n−2
= 5(fn−1 )(fn ) + 4(−1) + 2(−1) definition of the Fibonacci numbers

= 5(fn−1 )(fn ) + 2(−1)n−1 , (−1)n−1 = −1 · (−1)n−2

which is precisely Q(n). The theorem follows!

5.66 Let P(n) denote the property

Jn = 2Jn−1 + (−1)n−1

We’ll prove that P(n) holds for all n ≥ 2 by strong induction on n.

There are two base cases, n = 2 and n = 3. For n = 2, we have 2J1 + (−1)1 = 2 − 1 = 1 = J2 . For n = 3, we have
2J2 + (−1)2 = 2 + 1 = 3 = J3 .
For the inductive case (n ≥ 4), we assume the inductive hypotheses P(n − 1) and P(n − 2). We must prove P(n):

Jn = Jn−1 + 2Jn−2 definition of the Jacobsthal numbers

h i
= 2Jn−2 + (−1)n−2 + 2 2Jn−3 + (−1)n−3 inductive hypothesis

= 2 [Jn−2 + 2Jn−3 ] + (−1) n−2

+ 2 · (−1) n−3
rearranging
n−1
= 2Jn−1 + (−1) [−1 + 2] (−1) n−1
= −1 · (−1) n−2
, and (−1) n−1
= (−1)n−3

= 2Jn−1 + (−1)n−1 . rearranging

n n
5.67 We’ll prove Jn = 2 −(−1)
3
by strong induction on n.
We’ll prove two base cases, n = 1 and n = 2. For n = 1, we have (21 − (−1)1 )/3 = (2 − (−1))/3 = 3/3 = 1 = J1 .
For n = 2, we have (22 − (−1)2 )/3 = (4 − 1)/3 = 3/3 = 1 = J2 .
5.3 Proofs by Mathematical Induction 89

2k −(−1)k
For the inductive case (n ≥ 3), we assume the inductive hypotheses—specifically, Jk = 3
for any positive
n n
integer k with k < n—and we must prove Jn = 2 −(−1)
3
:

Jn = Jn−1 + 2Jn−2 definition of the Jacobsthal numbers

2n−1 − (−1)n−1 2n−2 − (−1)n−2
= +2· inductive hypothesis
3 3
2n−1
− (−1) n−1
+2·2 n−2
− 2 · (−1) n−2
= rearranging
3
2 − (−1)
n n−1
− 2 · (−1) n−2
= 2 · 2n−2 = 2n−1 , and 2n−1 + 2n−1 = 2n
3
2n + (−1)n − 2 · (−1)n
= (−1)n = −1 · (−1)n−1 , and (−1)n = −1 · −1 · (−1)n−2 = (−1)n−2
3
2n − (−1)n
= . for any x, we have x − 2x = −x; here x = (−1)n .
3

5.68 Let P(n) denote the property Jn = 2n−1 − Jn−1 We’ll prove P(n) for all n ≥ 2 by (weak) induction on n.
For the base case n = 2, we have 21 − J1 = 2 − 1 = 1, and indeed J2 = J1 + 2J0 = 1 + 2 · 0 = 1.
For the inductive case (n ≥ 3), we assume the inductive hypothesis P(n − 1). We must prove P(n). Note that, by
the inductive hypothesis P(n − 1), we have Jn−1 = 2n−2 − Jn−2 . Therefore the inductive hypothesis also states that
Jn−2 = 2n−2 − Jn−1 .

Jn = Jn−1 + 2Jn−2 definition of the Jacobsthal numbers

h i
= Jn−1 + 2 2n−2 − Jn−1 inductive hypothesis/above discussion

= Jn−1 + 2n−1 − 2Jn−1 multiplying through

=2 n−1
− Jn−1 .

5.69 Let T(n) denote the claim that the number of different ways of tiling the 2-by-n grid is precisely Jn+1 . We’ll prove
that T(n) holds for all n ≥ 1 by strong induction on n. There are two base cases: n = 1 and n = 2.

• For n = 1, it’s clear that there’s only one way to tile a 1-by-2 grid: namely, using a single domino.
• For n = 2, we have three different tilings—one square, two horizontal dominoes, or two vertical dominoes (see
Figure S.5.4a). And, indeed, J2 = 1 and J3 = J2 + 2J1 = 1 + 2 = 3.

For the inductive case (n ≥ 3), we assume the inductive hypotheses, specifically P(n − 1) and P(n − 2): there are Jn and
Jn−1 ways of tiling an 2-by-(n − 1) and 2-by-(n − 2) grid, respectively. How many ways, then, are there to tile the 2-by-n
grid? The easiest way to distinguish these ways is by first asking “what piece fills the top-most row?” and then asking
“how many ways are there to fill the remaining cells?”. The ways are illustrated in Figure S.5.4b; using these cases, we

n−1
n−2 n−2
rows left
rows left rows left
to fill
to fill to fill

(a) Three ways to fill a 2-by-2 grid. (b) Three ways to fill the top of an 2-by-n grid.

Figure S.5.4 Tiling a grid with squares and dominoes.

90 Mathematical Induction

can now do the calculation:

total number of ways of filling the 2-by-n grid

= ways to fill the 2-by-n grid starting with a horizontal domino
+ ways to fill the 2-by-n grid starting with a vertical domino
+ ways to fill the 2-by-n grid starting with a square
= ways to fill the 2-by-(n − 1) grid + ways to fill the 2-by-(n − 2) grid + ways to fill the 2-by-(n − 2) grid
= Jn + Jn−1 + Jn−1 inductive hypothesis
= Jn + 2Jn−1
= Jn+1 . definition of the Jacobsthal numbers

5.70 The n = 1 base case does not change (there’s still only one way of tiling a 1-by-2 grid), but the n = 2 base case
does: there are now only two ways of tiling a 2-by-2 grid. Similarly, for the inductive case, we can tile a 2-by-n grid by
placing a horizontal tile (and tiling the resulting 2-by-(n − 1) grid), or by placing two vertical tiles (and tiling the resulting
2-by-(n − 2) grid). With only the above changes to the proof, we can prove by strong induction on n that the number of
different ways of tiling the 2-by-n grid using only dominoes is precisely fn+1 . The only change is in the inductive case:

total number of ways of filling the 2-by-n grid = ways to fill the 2-by-(n − 1) grid + ways to fill the 2-by-(n − 2) grid
previously, “ways to fill the 2-by-(n − 2) grid” appeared twice
= fn + fn−1 inductive hypothesis
= fn+1 . definition of the Fibonacci numbers

(Everything else remains unchanged.)

5.71 We’ll prove that |sn | = fn for all n ≥ 1 by strong induction on n.

For the base cases (n = 1 and n = 2), by definition |s1 | = |s2 | = 1. And, indeed, f1 = f2 = 1.
For the inductive case (n ≥ 3), we assume the inductive hypotheses—namely, |sn−1 | = fn−1 and |sn−2 | = fn−2 . We
must prove that |sn | = fn :

|sn | = |sn−1 ◦ sn−2 | definition of sn

= |sn−1 | + |sn−2 | length of a concatentation equals the sum of the lengths
= fn−1 + fn−2 inductive hypotheses
= fn . definition of the Fibonacci numbers

5.72 First, we’ll strengthen the stated property and prove this stronger result by strong induction. Let P(n) denote the
following three properties of sn :
(i) sn starts with 01; and
(ii) sn ends with 10 if n is even, and ends with 01 if n is odd; and
(iii) sn does not contain 11 or 000 consecutively.
We’ll prove P(n) for all n ≥ 3 by strong induction on n. (We can check immediately that s1 = 1 and s2 = 0 do not
contain consecutive 11 or 000 because they only have length one, but P(1) and P(2) don’t make sense—conditions (i)
and (ii) apply only for strings with more than one character.)
Base cases (n = 3 and n = 4): By definition, s3 = 01 and s4 = 010. Neither 01 nor 010 contains 11 or 000, and both
start with 01. The odd-indexed s3 ends with 01 and the even-indexed s4 ends with 10. Thus P(3) and P(4) hold.
Inductive case (n ≥ 5): We assume the inductive hypotheses—specifically, we’ll assume P(n − 1) and P(n − 2). We must
prove P(n). Recall that, by definition, sn = sn−1 ◦ sn−2 .. Thus:
(i) Because sn−1 starts with 01 by the inductive hypothesis P(n − 1).(i), sn does too. Thus (i) follows.
(ii) Because sn−2 ends with 01 if n − 2 is odd, and ends with 10 if n − 2 is even—by the inductive hypothesis
P(n − 2).(ii)—and because n − 2 and n have the same parity, sn ends with 01 if n is odd, and ends with 10 if n is
even. Thus (ii) follows.
(iii) Neither sn−1 nor sn−2 contains 11 or 000, by the inductive hypotheses P(n − 1).(iii) and P(n − 2).(iii). So the
only concern is that there might be a 11 or a 000 that “straddles” the boundary between sn−1 and sn−2 . But, by the
inductive hypothesis P(n − 1).(ii), sn−1 ends with 10 or 01, and by the inductive hypothesis P(n − 2).(i), sn−2 starts
with 01. Thus the “boundary” is either 10|01 or 01|01, neither of which contains 11 or 000. Thus (iii) follows.
5.3 Proofs by Mathematical Induction 91

5.73 We’ll first prove that #0(sn ) = fn−1 and #1(sn ) = fn−2 for all n ≥ 2 by strong induction on n. For the base cases
(n = 2 and n = 3), by definition s2 = 0 and s3 = 01, so indeed
#0(s2 ) = 1 = f1 #1(s2 ) = 0 = f0
#0(s3 ) = 1 = f2 #1(s3 ) = 1 = f1 .
For the inductive case (n ≥ 4), we assume the inductive hypotheses—namely, #0(sn−1 ) = fn−2 and #1(sn−1 ) = fn−3 ,
and also #0(sn−2 ) = fn−3 and #1(sn−2 ) = fn−4 . We must prove the desired properties for sn :
#0(sn ) #1(sn )
= #0(sn−1 ◦ sn−2 ) = #1(sn−1 ◦ sn−2 ) definition of sn
= #0(sn−1 ) + #0(sn−2 ) = #1(sn−1 ) + #1(sn−2 ) definition of concatentation
= fn−2 + fn−3 = fn−3 + fn−4 inductive hypothesis
= fn−1 = fn−2 . definition of the Fibonaccis

The exercise asks for a proof that #0(sn ) − #1(sn ) is a Fibonacci number, which we can now show:
#0(sn ) − #1(sn ) = fn−1 − fn−2 the above claim
= (fn−2 + fn−3 ) − fn−2 definition of the Fibonaccis
= fn−3 .

5.74 A solution in Python is shown in Figure S.5.5.

5.75 Define P(k) to be the property that any polygon with k vertices has interior angles that sum to 180k − 360 degrees.
We claim that P(k) holds for any k ≥ 3, by strong induction on k.
For the base case (k = 3), we are given the fact that sum of the interior angles of any triangle is 180◦ —which is exactly
180 · 3 − 360—so the given fact is exactly P(3).
For the inductive case (k ≥ 4), we assume the inductive hypotheses P(3), P(4), . . . , P(k − 1). We must prove P(k).
Consider any polygon with k vertices. Let u and v be any two nonadjacent vertices of P. (Because k ≥ 4, such a pair

1 # This solution uses a graphics package in which:

2 # -- a Point is defined using an (x,y) coordinate pair
3 # -- a Line is defined by two Points.
4 # -- an object x (like a Line) is drawn using x.draw(window) for a given window.
5
6 def fibonacci_word_fractal(n):
7 L = ["1", "0"] # L = [s1, s2].
8 i = 3
9 while i <= n:
10 L.append(L[-1] + L[-2]) # sI = s(I-1) + s(I-2), which, in other words, is
11 # the concantenation of the last two strings in L.
12 i += 1
13 return L[-1]
14
15 def draw_fibonacci_word_fractal(word, window, length):
16 # Draw the given Fibonacci word fractal word in the given window.
17 heading = 0
18 x = window.getWidth() / 2
19 y = window.getHeight() / 2
20 for i in range(1, len(word) + 1): # Python's 0-indexing means we have to adjust i by one.
21 new_x = x + length * math.cos(heading)
22 new_y = y + length * math.sin(heading)
23 Line(Point(int(x), int(y)), Point(int(new_x), int(new_y))).draw(window)
24 if word[i - 1] == "0" and i % 2 == 1: # 0 + odd --> turn left (90 degrees, in radians).
25 heading -= math.pi / 2
26 elif word[i - 1] == "0" and i % 2 == 0: # 0 + even --> turn right.
27 heading += math.pi / 2
28 x, y = new_x, new_y

Figure S.5.5 Some Python code to draw the Fibonacci word fractal.
92 Mathematical Induction

exists.) Define A as the “above the ⟨u, v⟩ line” piece of P and B as the “below the ⟨u, v⟩ line” piece of P. Notice that the
interior angles of P are precisely the sum of the interior angles of A and B. Let ℓ be the number of vertices in A. Observe
that ℓ ≥ 3 and ℓ < k because u and v are nonadjacent. Also observe that B contains precisely k − ℓ + 2 vertices. Therefore

sum of the interior angles of P = [sum of the interior angles of A] + [sum of the interior angles of B]
= [180 · ℓ − 360] + [180 · (k − ℓ + 2) − 360] inductive hypothesis
= 180 · (k − ℓ + 2 + ℓ) − 360 − 360
= 180 · k − 360,

precisely as required.

5.76 A triangle has 0 diagonals, a quadrilateral has 2, a pentagon has 5, and a hexagon 9:

Observe that “lopping off” a triangle from a polygon with k vertices removes k − 1 diagonals: we lose k − 2 diagonals
(k − 1 from the lopped-off vertex, and the other diagonal from the new edge that was previously a diagonal):

original polygon with k vertices “lopped off” vertex (leaving k − 1) the lost k − 2 diagonals

Intuitively, we lop off the kth vertex, then the (k − 1)st, etc., down to the 3rd vertex. The ith step causes us to remove
k − i − 2 diagonals. Thus we will end up with

X
k−2 X
k−3
(k − 2)(k − 3)
(k − i − 2) = j=
i=1 j=0
2

diagonals.
We can turn this observation into a proof. Let P(k) denote the claim that a polygon with k vertices has precisely
(k−2)(k−3)
2
diagonals. We’ll prove P(k) for all k ≥ 3 by (weak) induction on k.
For the base case (k = 3), observe that a triangle has 0 diagonals, and indeed (3−2)(3−3)
2
= 1·0
2
= 0. Thus P(3) holds.
For the inductive case (k ≥ 4), we assume the inductive hypothesis P(k − 1) and we must prove P(k). Consider an
arbitrary polygon with k vertices. Choose any vertex u. As discussed above, if we delete u then the resulting polygon has
k − 2 fewer vertices than the original polygon. Thus

number of vertices in the polygon with u = (k − 2) + number of vertices in the polygon without u
= (k − 2) + number of vertices in a polygon with k − 1 vertices
(k − 3)(k − 4)
= (k − 2) + inductive hypothesis
2
2k − 4 + k2 − 7k + 12
=
2
k2 − 5k + 8
=
2
(k − 2)(k − 3)
= .
2
5.3 Proofs by Mathematical Induction 93

5.77 Let P(n) denote the stated claim: for any sorted array A[1 . . . n], binarySearch(A, x) returns true if and only if x ∈ A.
We’ll prove that P(n) holds for all n ≥ 0 by strong induction on n.
For the base case (n = 0), we must prove P(0). This claim follows immediately: by inspection of the algorithm,
binarySearch(A, x) returns False, and indeed A[1 . . . 0] is empty and thus x ∈ / A.
For the inductive case (n ≥ 1), we assume the inductive hypotheses P(0), P(1), . . . , P(n − 1). We must prove
P(n). First, observe that that inductive hypothesis does apply to both binarySearch(A[1 . . . middle − 1], x) and
binarySearch(A[middle + 1 . . . n], x), because:
• any subsequence of a sorted array (like A) is also sorted; and
• the number of elements in A[1 . . . middle − 1] is middle − 1 = 1+n − 1 = n−1 ≤ n − 1.
2 1+n
2
• similarly, the number of elements in A[middle + 1 . . . n] is n − middle = n − 2 ≤ n − 1+n 2
≤ n − 1.
By inspection of the algorithm, we see that binarySearch(A, x) returns true if and only if one of three conditions holds:
(i) A[middle] = x, or (ii) A[middle] > x and binarySearch(A[1 . . . middle − 1], x) returns True, or (iii) A[middle] < x
and binarySearch(A[middle + 1 . . . n], x) returns True. By the inductive hypothesis, option (ii) occurs if and only if
A[middle] > x and x ∈ A[1 . . . middle−1], and option (iii) occurs if and only if A[middle] < x and x ∈ A[middle+1 . . . n].
Thus binarySearch(A, x) returns true if and only if
A[middle] = x or A[middle] > x and x ∈ A[1 . . . middle − 1] or A[middle] < x and x ∈ A[middle + 1 . . . n].
Because A is sorted, this condition is true precisely when x ∈ A.

5.78 We need to show the following property P(t): for any sorted arrays X and Y with a total of t elements, the output of
merge(X, Y) is a sorted array containing precisely the elements of X and Y. We’ll prove P(t) by weak induction on t.
Base case (t = 0): By inspection of the algorithm, merge(X, Y) = Y, which is the empty array—and indeed the empty
array is sorted (vacuously), and furthermore contains precisely the zero elements of the empty X and Y.
Inductive case (t ≥ 1): We assume the inductive hypothesis P(t − 1), and we must prove P(t). We’ll consider two cases:
(i) |X| = 0 or |Y| = 0; and (ii) neither X nor Y is empty.
Case I: X is empty or Y is empty. In the former case, the algorithm returns Y; in the latter, the algorithm returns X.
The returned array is sorted by assumption and obviously contains precisely those elements of X and Y (since the
other is empty).
Case II: neither X nor Y is empty. By assumption, X and Y are both sorted, which means that X[1] ≤ X[i] for all i
and Y[1] ≤ Y[i] for all j. Thus min(X[1], Y[1]) is the smallest element in either X or Y. Without loss of generality,
suppose that X[1] < Y[1]. (The other case is strictly analogous.) By the inductive hypothesis, merge(X[2 . . . n], Y)
returns a sorted array which contains precisely the elements of X and Y aside from X[1]. Thus X[1] followed by
merge(X[2 . . . n], Y) is sorted, and contains precisely the elements of X and Y.

5.79 The proof that mergeSort(A[1 . . . n]) sorts its input is by strong induction on n.
For the base case (n = 1), the algorithm returns A, and, indeed, any 1-element array is sorted.
For the inductive case (n ≥ 2), we assume the inductive hypothesis—that mergeSort(B) sorts B for any array B of
n − 1 or fewer elements—and we must prove is true for any n-element array. Because n ≥ 2, we know
that the same
that 1 ≤ ⌊n/2⌋ ≤ n − 1. Thus both A[1 . . . 2n ] and A[ 2n + 1 . . . n] contain between 1 and n − 1 elements. Note too
that these subarrays contain all of A but are also disjoint. By the inductive hypothesis, then, L and R are sorted, and they
contain precisely those elements of A. And thus, by Exercise 5.78, mergeSort(A[1 . . . n]) returns an array that’s sorted
and contains precisely the elements of A.

5.80 Here is one such algorithm:

permutations(S):
1 if |S| = 1 then
2 return {⟨x⟩} (where S = {x})
3 else
4 result := ∅
5 for x ∈ S:
6 P := permutations(S − {x})
7 for p ∈ P:
8 add ⟨x, p1 , p2 , . . . , p|p| ⟩ to result
9 return result
94 Mathematical Induction

We claim two facts, both by induction on |S|:

Claim #1: the only elements returned by permutations(S) are permutations of S. (The base case, for |S| = 1,
is immediate; for the inductive case, by the inductive hypothesis only permutations of S − {x} are returned by
permutations(S − {x}), and therefore we know that ⟨x, p1 , . . . , p|P| ⟩ is a permutation of S for every choice of x ∈ S
and every choice of p ∈ permutations(S − {x}).)
Claim #2: every permutation p of S is an element of permutations(S). (The base case, for |S| = 1, is immediate;
for the inductive case, note that ⟨p2 , . . . , p|P| ⟩ is a permutation of S − {p0 } and thus, by the inductive hypothesis, we
know that ⟨p2 , . . . , p|P| ⟩ ∈ permutations(S − {p0 }). Thus when x = p1 , the permutation ⟨x, p2 , . . . , p|P| ⟩ = p is added
to result, and thus returned by permutations(S).

5.81 This direction is the easier of the two. We have a proof of ∀n ∈ Z≥0 : P(n) by weak induction. To prove the same
fact by strong induction, we can just duplicate the weak-induction proof. The base case is unchanged. The inductive case
was a proof that P(n − 1) ⇒ P(n). To prove that P(n − 1) ∧ φ ⇒ P(n), the same proof suffices—you’ve just added an
extra assumption that you’ll never use. (Here the extra assumption is φ = P(1) ∧ · · · ∧ P(n − 1).)

5.82 This direction is the harder of the two. We have a proof of ∀n ∈ Z≥0 : P(n) by strong induction, which means that
we have proofs of the following two facts:
P(0) (1)
for any n ≥ 1, [P(0) ∧ P(1) ∧ · · · ∧ P(n − 1)] ⇒ P(n) (2)

To prove ∀n ∈ Z≥0 : P(n) using only weak induction, define a new predicate
Q(n) = P(0) ∧ P(1) ∧ · · · ∧ P(n).
≥0
We will prove that ∀n ∈ Z : Q(n) by weak induction on n:
base case (n = 0): For n = 0, in fact Q(0) and P(0) are identical. Therefore the old base case (1) proves Q(0).
inductive case (n ≥ 1): We assume the inductive hypothesis Q(n − 1), and we must prove Q(n).
Note that Q(n) = Q(n − 1) ∧ P(n). By the inductive hypothesis, we already know Q(n − 1), so the only work left to
do is to prove P(n). But we can do so using the old inductive case (2)—in fact, restated using the definition of Q, (2)
simply is a proof of Q(n − 1) ⇒ P(n). Thus we’ve shown that Q(n − 1) ∧ P(n) holds too.
Thus we have proven that ∀n ∈ Z≥0 : Q(n) by weak induction on n. Because P(n) ∧ (anything) ⇒ P(n), we know that
Q(n) ⇒ P(n), for any n. So we can conclude ∀n ∈ Z≥0 : P(n) from ∀n ∈ Z≥0 : Q(n).

5.4 Structural Induction

5.83 Let L be a linked list. Let P(L) be the claim that length(L) returns the number of elements in L. We prove that P(L)
holds for all lists L by induction on the form of L.
L is the empty list. Then there are no elements in L, and length(⟨⟩) indeed returns 0. Thus P(⟨⟩) holds.
L is the linked list ⟨x, L′ ⟩. We assume the inductive hypothesis P(L′ ), and we must prove P(L). The number of elements
in L is precisely one more than the number of elements in L′ . By the inductive hypothesis, the number of elements in
L′ is length(L′ ). By inspection of the algorithm (Line 4), length(L) returns 1 + length(L), precisely as desired. Thus
P(L) holds.

5.84 Let L be a linked list, and let x be arbitrary. Let Q(L) be the claim that contains(L, x) returns True if and only if x is
one of the elements contained in L. We prove that Q(L) holds for all lists L by induction on the form of L.
L is the empty list. Then there are no elements in L, and thus x is not in L. Indeed, contains(⟨⟩, x) indeed returns False.
Thus Q(⟨⟩) holds.
L is the linked list ⟨y, L′ ⟩. We assume the inductive hypothesis Q(L′ ), and we must prove Q(L). The list L contains x
if and only if either the first element y of the list is x, or the remainder L′ of the list contains x. By the inductive
hypothesis, L′ contains x if and only if contains(L′ , x). By inspection of the algorithm (Line 4), contains(L, x) returns
(x = y) ∨ contains(L, x), precisely as desired. Thus Q(L) holds.

5.85 Let L be a linked list. Let P(L) be the claim that sum(L) returns the sum of the elements in L. We prove that P(L)
holds for all lists L by induction on the form of L.
5.4 Structural Induction 95

L is the empty list. Then there are no elements in L, and their sum is therefore 0—which is precisely what sum(⟨⟩) returns.
Thus P(⟨⟩) holds.
L is the linked list ⟨x, L′ ⟩. We assume the inductive hypothesis P(L′ ), and we must prove P(L). The sum of the elements
in L is precisely x more than the sum of elements in L′ . By the inductive hypothesis, the sum of the elements in L′ is
sum(L′ ). By inspection of the algorithm (Line 4), sum(L) returns x + sum(L), as desired. Thus P(L) holds.

5.86 Let ⟨x, L⟩ be a nonempty sorted list. Let P(⟨x, L⟩) be the claim that every element z in L satisfies z ≥ x. We prove
that P(⟨x, L⟩) holds for all elements x and all lists L by induction on the form of L.

L is the empty list. Then there are no elements z of L, and so the claim is vacuously true. Thus P(⟨x, ⟨⟩⟩) holds for any x.
L is the nonempty sorted list ⟨y, L′ ⟩ where x ≤ y. We assume the inductive hypothesis P(⟨y, L′ ⟩), and we must prove
P(⟨x, ⟨y, L′ ⟩⟩). By the inductive hypothesis, every element z′ in L′ satisfies y ≤ z′ . Furthermore x ≤ y itself. Thus, for
any z′ in L′ , we have x ≤ y ≤ z′ . Thus x ≤ z for any z in ⟨y, L′ ⟩. Thus P(⟨x, ⟨y, L′ ⟩⟩) holds.

5.87 By definition, a string of balanced parentheses is (i) the empty string; or (ii) [S] where S is a string of balanced
parentheses; or (iii) a string S1 S2 where S1 and S2 are both strings of balanced parentheses. Let S be a string of balanced
parentheses. We must show that S contains the same number of [s and ]s. Call this property P(S).

S is the empty string. Then S contains 0 [s and 0 ]s, and 0 = 0.

S is [S′ ] for a string S′ of balanced parentheses. We assume the inductive hypothesis P(S′ ) and must prove P(S). Then

number of [s in S = 1 + (number of [s in S′ ) [S′ ] contains one more [ than S′

= 1 + (number of ]s in S′ ) inductive hypothesis
= number of ]s in S. [S′ ] contains one more ] than S′

S is S1 S2 for two strings S1 , S2 of balanced parentheses. We assume the inductive hypotheses P(S1 ) and P(S2 ) and must
prove P(S). Then

number of [s in S = (number of [s in S1 ) + (number of [s in S2 )

= (number of ]s in S1 ) + (number of [s in S2 ) inductive hypothesis
= (number of ]s in S1 ) + (number of ]s in S2 ) inductive hypothesis
= number of ]s in S.

5.88 Let S be a string of balanced parentheses. We must show that any prefix of S contains at least as many [s as ]s.

S is the empty string. The only prefix of S contains zero characters, and thus contains 0 [s and 0 ]s. Indeed 0 ≥ 0.
S is [S′ ] for a string S′ of balanced parentheses. We assume the inductive hypothesis, that that any prefix of S′ contains
at least as many [s as ]s. We must prove the same for S. Observe that the empty prefix of S (containing zero characters)
necessarily contains 0 [s and 0 ]s, and 0 ≥ 0. Otherwise, for any i ≥ 1:

number of [s in the first i characters of S

= 1 + (number of [s in the first i − 1 characters of S′ )
≥ 1 + (number of ]s in the first i − 1 characters of S′ ) inductive hypothesis
(
1 + (number of ]s in the first i − 1 characters of S′ ) if i < |S|
= breaking into cases
1 + (number of ]s in the first i − 1 characters of S′ ) if i = |S|
(
1 + (number of ]s in the first i characters of S) if i < |S|
=
1 + (number of ]s in the first i − 1 characters of S′ ) if i = |S|
for i < |S|, (the first i characters of S) = [ + (the first i − 1 characters of S′ ).
(
1 + (number of ]s in the first i characters of S) if i < |S|
=
number of ]s in the first i characters of S if i = |S|
for i = |S|, (the first i characters of S) = all of S = [ + S′ + ].
≥ number of ]s in the first i characters of S.
96 Mathematical Induction

S is S1 S2 for two strings S1 , S2 of balanced parentheses. We assume the inductive hypotheses, that that any prefix of S1 or
S2 contains at least as many [s as ]s. We must prove the same for S. Then
number of [s in the first i characters of S
(
number of [s in the first i characters of S if i ≤ |S1 |
=
number of [s in S1 + number of [s in the first i − |S1 | characters of S2 if i ≥ |S1 |
(
number of ]s in the first i characters of S if i ≤ |S1 |
≥ inductive hypothesis (twice)
number of ]s in S1 + number of ]s in the first i − |S1 | characters of S2 if i ≥ |S1 |
= number of ]s in the first i characters of S.

5.89 Here’s a proof by structural induction on T:

Case I: T = null. Then T contains 0 nodes, and therefore 0 leaves. And indeed countLeaves(null) = 0.
Case II: T has root x, left subtree Tℓ , and right subtree Tr . We assume the inductive hypotheses, namely
number of leaves in Tℓ = countLeaves(Tℓ ) and number of leaves in Tr = countLeaves(Tr ).
Note that x is itself a leaf if and only if Tℓ = Tr = null, by definition; in this case, T contains exactly 1 leaf and
indeed countLeaves(T) = 1. If x is not a leaf, then, the leaves of T are precisely the leaves of Tℓ and Tr , and so
number of leaves in T = (number of leaves in Tℓ ) + (number of leaves in Tr )
= countLeaves(Tℓ ) + countLeaves(Tr ) inductive hypotheses
= countLeaves(T).

5.90 We’ll show the desired claim by structural induction on the form of the nonempty BST T. Because T ̸= null, we
know that T consists of a root x, a left subtree Tℓ , and a right subtree Tr .
Case I: Tℓ = null. Because T is a BST, by definition all nodes in Tr are greater than x. (That’s the BST property.) Because
Tℓ = null, there are no other nodes in T. Thus x is the smallest value in T.
Case II: Tℓ = ⟨x′ , T′ℓ , T′r ⟩. We assume the inductive hypothesis, namely that the bottommost leftmost node u∗ of T′ℓ is the
minimum element of Tℓ . That is, every other node of Tℓ is greater than u∗ . Because T is a BST, by definition all nodes
in Tℓ are less than x, and all nodes in Tr are greater than x. Thus every node in T′ℓ is less than x, and in particular u∗ < x.
Thus u∗ is less than every other node in Tℓ , x, and every node in Tr . Thus u∗ is the smallest value in T.

5.91 A heap is either:

• an empty tree, denoted by null; or
• a root node x, a left subtree Tℓ and a right subtree Tr , where Tℓ and Tr are both heaps, and where the root node of Tℓ
(if it exists) and the root note of Tr (if it exists) are both no more than x.

5.92 We’ll show the desired claim by structural induction on the form of the heap T.
Case I: T = null. We’re done, vacuously: T is null, so it’s empty.
Case II: T = ⟨x, Tℓ , Tr ⟩. We assume the inductive hypotheses, namely that Tℓ is empty or its largest element is its root
(call that root xℓ ) and that Tr is empty or its largest element is its root (call that root xr ). By the heap property, we can
conclude that x is larger than every node in either Tℓ or Tr :
First, for the left subtree: either Tℓ is empty (in which case x is larger than every node in Tℓ vacuously), or x ≥ xℓ and
xℓ ≥ y for every node y in Tℓ by the inductive hypothesis—in which case x ≥ xℓ ≥ y, so x ≥ y for any node y in Tℓ .
Second, for the right subtree: either Tr is empty, or x ≥ xr by the heap property and xr ≥ y for every node y in Tr by
the inductive hypothesis. In either case, x ≥ y for any node y in Tr .

5.93 We’ll show the desired claim by structural induction on the form of the heap T.
Case I: T = null. We’re done, vacuously: T is null, so it’s empty.
Case II: T = ⟨x, Tℓ , Tr ⟩. We assume the inductive hypotheses, namely that Tℓ is empty or it contains an element uℓ such
that uℓ ≤ y for any y ∈ Tℓ ; and that Tr is empty or it contains an element ur such that ur ≤ y for any y ∈ Tr .
Case IIA: both subtrees are empty. If Tℓ = Tr = null, then in fact x is the only element of T—so it’s both a leaf, and
as small as every element of T (vacuously).
5.4 Structural Induction 97

Case IIB: one subtree is empty. If precisely one of Tℓ and Tr is not null, then, without loss of generality, let Tℓ be the
nonnull subtree. Let xℓ be the root of Tℓ . By the inductive hypothesis, Tℓ contains an element uℓ such that uℓ ≤ y
for any y ∈ Tℓ . In particular, uℓ ≤ xℓ . Furthermore xℓ ≤ x by the heap property. Thus uℓ is no larger than any
element of Tℓ ∪ {x}—and, because Tr = null, thus uℓ is no larger than any element of T. Because uℓ is a leaf of
Tℓ , we have that uℓ is a leaf of T too.
Case IIC: neither subtree is empty. Otherwise, both Tℓ and Tr are not null. Let xℓ and xr be the roots of Tℓ and Tr .
By the inductive hypothesis, Tℓ and Tr contain element uℓ and ur such that uℓ ≤ y for any y ∈ Tℓ and ur ≤ y
for any y ∈ Tr . In particular, uℓ ≤ xℓ and ur ≤ xr . Furthermore xℓ ≤ x and xr ≤ x by the heap property. Let
u = min(uℓ , ur ). Without loss of generality, let u = uℓ . Then u is no larger than any element of T: we have u ≤ y
for y ∈ Tℓ by the inductive hypothesis; u ≤ xℓ ≤ x by the heap property; and u ≤ ur ≤ y for y ∈ Tr by our
definition of u. Because uℓ is a leaf of Tℓ , we have that uℓ is a leaf of T too.

5.94 We’ll show the desired claim by structural induction on the form of the 2–3 tree T.
Case I: T = null. Then, by definition, T contains 1 leaf, and the height of T is 0. The property holds because 20 ≤ 1.
Case II: T has two subtrees, L and R. Let h be the height of T. By definition, L and R are both 2–3 trees of height h − 1.
The inductive hypotheses state that
2h−1 ≤ number of leaves in L and 2h−1 ≤ number of leaves in R.
The number of leaves in T is precisely the sum of the number of leaves in L and in R, so we have
number of leaves in T = number of leaves in L + number of leaves in R
≥ 2h−1 + 2h−1 inductive hypotheses (twice)
h
=2 .
Case III: T has three subtrees, L and C and R. Let h be the height of T. By definition, L and C and R are all 2–3 trees of
height h − 1. The inductive hypotheses state that
2h−1 ≤ number of leaves in L and 2h−1 ≤ number of leaves in C and 2h−1 ≤ number of leaves in R.
The number of leaves in T is precisely the sum of the number of leaves in L and in C and in R, so we have
number of leaves in T = number of leaves in L + number of leaves in C + number of leaves in R
≥ 2h−1 + 2h−1 + 2h−1 inductive hypotheses (three times)

=3·2 h−1

> 2 · 2h−1
= 2h .

5.95 We’ll show the desired claim by structural induction on the form of the 2–3 tree T.
Case I: T = null. Then, by definition, T contains 1 leaf, and the height of T is 0. The property holds because 1 ≤ 30 .
Case II: T has two subtrees, L and R. Let h be the height of T. By definition, L and R are both 2–3 trees of height h − 1.
The inductive hypotheses state that
number of leaves in L ≤ 3h−1 and number of leaves in R ≤ 3h−1 .
The number of leaves in T is precisely the sum of the number of leaves in L and in R, so we have
number of leaves in T = number of leaves in L + number of leaves in R
≤ 3h−1 + 3h−1 inductive hypotheses (twice)

=2·3 h−1

< 3 · 3h−1
= 3h .
Case III: T has three subtrees, L and C and R. Let h be the height of T. By definition, L and C and R are all 2–3 trees of
height h − 1. The inductive hypotheses state that
number of leaves in L ≤ 3h−1 and number of leaves in C ≤ 3h−1 and number of leaves in R ≤ 3h−1 .
98 Mathematical Induction

The number of leaves in T is precisely the sum of the number of leaves in L and in C and in R, so we have
number of leaves in T = number of leaves in L + number of leaves in C + number of leaves in R
≤ 3h−1 + 3h−1 + 3h−1 inductive hypotheses (three times)
h
=3 .

5.96 Formally, a 2–3–4 tree of height h is one of the following:

1. a single node (in which case h = 0, and the node is called a leaf ); or
2. a node with k subtrees, all k which are 2–3–4 trees of height h − 1, for some k ∈ {2, 3, 4}.
We will prove by structural induction that a 2–3–4 tree T of height h has at least 2h leaves and at most 4h leaves.
Case I: T = null. Then, by definition, T contains 1 leaf, and the height of T is 0. The property holds because 20 ≤ 1 ≤ 30 .
Case II: T has k subtrees, T1 , . . . , Tk for k ∈ {2, 3, 4}. Let h be the height of T. By definition, each Ti is a 2–3–4 tree of
height h − 1. The inductive hypotheses state that

2h−1 ≤ number of leaves in Ti ≤ 4h−1 .

The number of leaves in T is precisely the sum of the number of leaves in T1 , . . . , Tk , so we have
X
k X
k
number of leaves in T = number of leaves in Ti and number of leaves in T = number of leaves in Ti
i=1 i=1

≥ k · 2h−1 ≤ k · 4h−1
both by the inductive hypothesis

≥2·2 h−1
≤ 4 · 4h−1
= 2h = 4h ,
as desired.

5.97 Let P(b) denote the following property: for all even numbers a, the quantity a + b is an even number. We proceed by
(structural) induction on b.
For the base case (b = 0), we have a + b = a + 0 = a by Additive Identity. Because a is even, we immediately have
that a + b is even.
For the inductive case (b = 2 + k), we assume the inductive hypothesis P(k): for all even numbers c, the quantity
c + k is an even number. We must prove P(b): for all even numbers a, the quantity a + b is an even number. Let a be an
arbitrary even number. Then:
a + b = a + (2 + k) definition of b
= (a + 2) + k associativity
= (2 + a) + k. commutativity

Note that 2 + a is an even number, by clause (ii) of the definition of even numbers. By the inductive hypothesis, the sum
of any even number and k is even—so, in particular, (2 + a) + k is an even number. The claim follows.

5.98 Let P(b) denote the following property: for all powers of two a, the quantity a · b is a power of two. We proceed by
(structural) induction on b.
For the base case (b = 1), we have a · b = a · 1 = a by Multiplicative Identity. Because a is a power of two, we
immediately have that a · b is a power of two. (After all, a is a power of two, and a · b = a.)
For the inductive case (b = 2 · k), we assume the inductive hypothesis P(k): for all powers of two c, the quantity c · k
is a power of two. We must prove P(b): for all powers of two a, the quantity a · b is a power of two. Let a be an arbitrary
power of two. Then:
a · b = a · (2 · k) definition of b
= (a · 2) · k associativity
= (2 · a) · k. commutativity

Note that 2 · a is a power of two, by clause (ii) of the definition of powers of two. By the inductive hypothesis, the product
of any power of two and k is a power of two—so, in particular, (2 · a) · k is a power of two. The claim follows.
5.4 Structural Induction 99

5.99 Let P(k) denote the claim that the sum of any k even numbers is even. We’ll prove that P(k) holds for all integers
k ≥ 0 by weak induction on k.
P
Base case (k = 0): By definition of an empty sum, 0i=1 ai = 0, which is an even number by clause (i) of the definition
of an even number (given in Exercise 5.97).
Inductive case (k ≥ 1): We assume the inductive hypothesis P(k − 1), and we must prove P(k).
" k−1 #
X k X
ai = ai + ak definition of summation
i=1 i=1

= [an even number] + ak inductive hypothesis

= an even number. Exercise 5.97

5.100 Let P(n) be the property that, for all m ≥ 0, we have bm bn = bm+n . We’ll prove that P(n) holds for all integers
n ≥ 0 by induction on n.
base case (n = 0): We must prove P(0). Let m be arbitrary. Then

bm · b0 = bm · 1 definition of exponentiation
= bm multiplicative identity
m+0
=b . additive identity

inductive case (n = k + 1): We assume the inductive hypothesis P(k); we must prove P(n). Let m be arbitrary. Then

bm · bk+1 = bm · (b · bk ) definition of exponentiation

= b · (b · b)
m k
commutativity of multiplication

= (b · b ) · b
m k
associativity of multiplication

=b m+k
·b inductive hypothesis

=b·b m+k
commutativity of multiplication
(m+k)+1
=b definition of exponentiation
m+(k+1)
=b . associativity of addition

5.101 Let P(n) be the property that, for all m ≥ 0, we have (bm )n = bmn . We’ll prove that P(n) holds for all integers n ≥ 0
by induction on n.
base case (n = 0): We must prove P(0). Let m be arbitrary. Then

(bm )0 = 1 definition of exponentiation

= b0 definition of exponentiation
m·0
=b . multiplicative zero

inductive case (n = k + 1): We assume the inductive hypothesis P(k); we must prove P(n). Let m be arbitrary. Then

(bm )k+1 = (bm ) · (bm )k definition of exponentiation

=b ·bm mk
inductive hypothesis
m+(mk)
=b Exercise 5.100
m(1+k)
=b factoring
m(k+1)
=b . commutativity of addition

5.102 Suppose that we are given a wff φ. First, we apply the transformation of Example 5.21 to yield a wff φ that contains
only the connectives ¬ and ∧. Now, we need to show that we can translate any such wff φ into a logically equivalent
proposition ψ that uses only |. Here’s a proof by structural induction:
φ is a variable: Then φ uses no connectives, and we just set ψ = φ.
100 Mathematical Induction

φ is a negation, say φ = ¬τ : By the inductive hypothesis, there is a proposition χ containing only the logical connective
| such that τ and χ are logically equivalent. Observe from the truth table of | that p | p and ¬p are logically equivalent.
Therefore ψ = ¬τ is logically equivalent to χ | χ.
φ is a conjunction, say φ = τ ∧ τ ′ : By the inductive hypothesis, there are propositions χ and χ′ containing only | such
that τ and χ, and τ ′ and χ′ , are logically equivalent. Observe from the truth table of | that p∧q ≡ ¬(p|q) ≡ (p|q)|(p|q).
Therefore ψ = (χ | χ′ ) | (χ | χ′ ) is logically equivalent to τ ∧ τ ′ .

5.103 Here is the ML code for |:

1 datatype wff = Variable of string
2 | Not of wff
3 | And of (wff * wff)
4 | Or of (wff * wff)
5 | Implies of (wff * wff)
6 | Stroke of (wff * wff)
7 | Arrow of (wff * wff);
8
9 fun stroke (Variable var) = Variable var
10 | stroke (Not tau) = Stroke(stroke tau, stroke tau)
11 | stroke (And (tau1,tau2)) = Stroke(Stroke(stroke tau1, stroke tau2),
12 Stroke(stroke tau1, stroke tau2))
13 | stroke (Or (tau1,tau2)) = stroke (Not(And(Not(tau1), Not(tau2))))
14 | stroke (Implies (tau1,tau2)) = stroke (Or(Not(tau1), tau2))
15 | stroke (Stroke (tau1, tau2)) = Stroke(stroke tau1, stroke tau2)
16 | stroke (Arrow (tau1, tau2)) = stroke (Not(Or(tau1,tau2)));

5.104 Suppose that we are given a wff φ. First, we apply the transformation of Example 5.21 to yield a wff φ that contains
only the connectives ¬ and ∧. Now we must prove that we can translate any such wff φ into a logically equivalent
proposition ψ that uses only ↓. Here’s a proof by structural induction:

φ is a variable: Then φ uses no connectives, and we just set ψ = φ.

φ is a negation, say φ = ¬τ : By the inductive hypothesis, there is a proposition χ containing only the logical connective
↓ such that τ and χ are logically equivalent. Observe from the truth table of ↓ that p ↓ p and ¬p are logically equivalent.
Therefore ψ = ¬τ is logically equivalent to χ ↓ χ.
φ is a conjunction, say φ = τ ∧ τ ′ : By the inductive hypothesis, there are propositions χ and χ′ containing only ↓ such
that τ and χ, and τ ′ and χ′ , are logically equivalent. Observe from the truth table of ↓ that ¬p∧¬q ≡ ¬(p∨q) ≡ p ↓ q.
Therefore ψ = τ ∧ τ ′ is logically equivalent to (¬τ ) ↓ (¬τ ′ ). Thus ψ ≡ (χ ↓ χ) ↓ (χ′ ↓ χ′ ).

5.105 Here is the ML code for ↓:

1 datatype wff = Variable of string
2 | Not of wff
3 | And of (wff * wff)
4 | Or of (wff * wff)
5 | Implies of (wff * wff)
6 | Stroke of (wff * wff)
7 | Arrow of (wff * wff);
8
9 fun arrow (Variable var) = Variable var
10 | arrow (Not tau) = Arrow(arrow tau, arrow tau)
11 | arrow (And (tau1,tau2)) = Arrow(Arrow(arrow tau1, arrow tau1),
12 Arrow(arrow tau2, arrow tau2))
13 | arrow (Or (tau1,tau2)) = arrow (Not(And(Not(tau1), Not(tau2))))
14 | arrow (Implies (tau1,tau2)) = arrow (Or(Not(tau1), tau2))
15 | arrow (Stroke (tau1, tau2)) = arrow (Not(And(tau1,tau2)))
16 | arrow (Arrow (tau1, tau2)) = Arrow(arrow tau1, arrow tau2);

5.106 We are given a wff φ that uses only ∧ and ∨, and we must show that φ is truth-preserving. We proceed by structural
induction on the form of φ.

φ = x is a variable: Then, under the all-true truth assignment, the variable x is set to true, and thus φ = x is true.
5.4 Structural Induction 101

φ = X ∨ Y is a disjunction: By the inductive hypothesis, both X and Y are truth-preserving (that is, they are both true under
the all-true assignment). That is, both X and Y are true under the all-true truth assignment. But because True ∨ True is
true, φ is also true under the all-true truth assignment.
φ = X ∧ Y is a conjunction: By the inductive hypothesis, both X and Y are truth-preserving (that is, they are both true
under the all-true assignment). That is, both X and Y are true under the all-true truth assignment. But because True∧True
is true, φ is also true under the all-true truth assignment.
To give a fully rigorous solution to Exercise 4.71 from this proof that a {∧, ∨}-only proposition is truth-preserving, we
just need to observe that there exist non–truth-preserving propositions, such as p ∧ ¬p.

5.107 A palindromic bitstring is

• the empty string (containing no bits at all);
• the string 0;
• the string 1;
• the string 0x0 where x is palindromic bitstring; or
• the string 1x1 where x is palindromic bitstring.

5.108 For a bitstring s, let P(s) denote the claim that [#0(s)] · [#1(s)] is even. We’ll prove the P(s) holds for all bitstrings
s by structural induction on the form of s.
s = the empty string: Then [#0(s)] · [#1(s)] = 0 · 0 = 0, which is even.
s = 0 or s = 1: Then [#0(s)] · [#1(s)] = 1 · 0 = 0, which is even.
s = 0x0 for a palindrome x: We assume the inductive hypothesis P(x). To show P(0x0), observe that
[#0(s)] · [#1(s)] = [2 + #0(x)] · [#1(x)]
= 2 · #1(x) + [#0(x)] · [#1(x)].
Because 2 · #1(x) is even (it’s a multiple of two) and [#0(x)] · [#1(x)] is even by the inductive hypothesis, and the
sum of two even numbers is even, P(0x0) follows.
s = 1x1 for a palindrome x: We assume the inductive hypothesis P(x). To show P(1x1), as in the previous case, observe
that [#0(s)] · [#1(s)] = [#0(x)] · [2 + #1(x)] = 2 · #0(x) + [#0(x)] · [#1(x)]. As in the previous case, both terms
are even, and P(1x1) follows.
6 Analysis of Algorithms

6.2 Asymptotics
6.1 The question asks: for what values of n is n ≤ 100 log n? There are more systematic ways of solving the equation, but
running a version of binary search by hand works just fine. For n = 996, we have 100 log n = 996.0001 · · · ; for n = 997,
we have 100 log n = 996.1449 · · · . It shouldn’t be too hard to convince yourself that once the functions cross, Binary
Search stays faster for larger n. So Linear Search is faster for n ∈ {2, 3, . . . , 996}.

6.2 Because Alice can sort 1000 elements in a minute, the machine can perform ⌈8 · 1000 log 1000⌉ = 79,727 steps in a
minute, and cannot perform ⌈8 · 1001 log 1001⌉ = 79,818 steps in a minute.
The largest n such that ⌈5n log n⌉ ≤ 79,727 is 1509; the smallest n such that ⌈5n log n⌉ ≥ 79,818 is 1512. So Bob can
sort somewhere between 1509 and 1511 elements in a minute.
The largest integer n such that 2n2 ≤ 79,727 is 199; the smallest integer n such that 2n2 ≥ 79,818 is 200. So Charlie
can sort 199 elements in a minute.

6.3 The question asks: for what values of n is 2n2 ≤ ⌈8n log n⌉? It turns out that the values are equal when n = 16: for
n = 16, we have 2n2 = 512 and 8n log n = 8 · 16 · log 16 = 8 · 16 · 4 = 512. Therefore Charlie can sort faster for
n ≤ 15, Alice is faster for n ≥ 17, and they’re tied for n = 16.

6.4 Charlie’s twice-as-fast computer means that his new machine can do two steps for every single step on the old machine;
equivalently, we can think of his new machine as allowing him to run his algorithm in n2 steps on the old machine instead
of 2n2 steps.
Thus the question asks: for what n is n2 ≤ ⌈8n log n⌉? The break-even point is between 43 and 44: Charlie wins on
n = 43 (432 = 1849 steps vs. ⌈8 · 43 log 43⌉ = 1867) but loses on n = 44 (442 = 1936 steps vs. ⌈8 · 44 log 44⌉ = 1922).

6.5 Choose c = 12 and n0 = 1. Then, for any n ≥ n0 :

f(n) = 9n + 3
≤ 9n + 3n n ≥ 1 implies that 3n ≥ 3
= 12n.
By definition of O(·), the claim follows.

6.6 As in the previous exercise, choose c = 12 and n0 = 1. Then, for any n ≥ n0 :

f(n) ≤ 12n by the previous exercise

≤ 12n2 . n ≥ 1 implies that 12n ≤ 12n2

6.7 For any n ≥ 1:

f(n) = 9n + 3
≤ 12n3 n ≥ 1 implies that 9n ≤ 9n3 and 3 ≤ 3n3

≤ 6 · (3n − n )
3 3

≤ 6 · (3n3 − n2 ) n ≥ 1 implies that −n2 ≥ −n3

= 6 · g(n).
Thus choosing c = 6 and n0 = 1 suffices.

102
6.2 Asymptotics 103

6.8 For any n ≥ 1, observe that n2 ≥ 0, and thus g(n) ≥ 3n3 . Thus choosing c = 3 and n0 = 1 suffices.

6.9 For any n ≥ 1, observe that n3 ≤ n4 , and thus g(n) ≤ 3n4 − n2 . As in the previous exercise, because n2 ≥ 0 we have
3n4 − n2 ≤ 3n4 . Thus g(n) ≤ 3n4 . Thus choosing c = 3 and n0 = 1 suffices.

6.10 We’ll give a proof by contradiction. Let n0 and c be arbitrary. Let n = max(n0 , c + 1). Then n > c by definition, so,
multiplying n > c by n2 yields n3 > cn2 . Furthermore, we have that n3 > n2 because n > 1. Thus we have

g(n) = 3n3 − n2 > 2n3 > 2cn2 .

Thus we have identified a value n ≥ n0 such that g(n) > cn2 .

Because n0 and c were generic, we’ve shown that there are no constants meeting the definition of O(·). Thus g(n) is
not O(n2 ).

6.11 We’ll give a proof by contradiction. Let n0 and c be arbitrary. The key is to determine an n > n0 such that n3 > cn3−ϵ .
To do so, let’s solve for n by taking logs of both sides:

n3 = cn3−ϵ
log n3 = log cn3−ϵ
3 log n = log c + (3 − ϵ) log n
ϵ log n = log c
log c
log n =
ϵ
n = 2(log c)/ϵ .

Let n = max(n0 , 2(log c)/ϵ + 1). As before, we have that g(n) ≥ 2n3 because n2 < n3 . Thus we have

g(n) = 3n3 − n2 > 2n3 > 2cn3−ϵ .

Thus we have identified a value n ≥ n0 such that g(n) > cn3−ϵ .

Because n0 and c were generic, we’ve shown that there are no constants meeting the definition of O(·). Thus g(n) is
not O(n3−ϵ ).

6.12 Note that n ≥ 7 means that n2 ≥ 7n = a(n). Thus we can choose n0 = 7 and c = 1.

6.13 Because sin n ≤ 1 for any n, we have that b(n) ≤ 3n2 + 1. For n ≥ 1, obviously 1 ≤ n2 , so b(n) ≤ 4n2 . Thus we
can choose n0 = 1 and c = 4.

6.14 Because c(n) = 128 ≤ 128n2 for any n ≥ 1, we can choose n0 = 1 and c = 128.

6.15 Suppose that f(n) ≤ c∗ · g(n) for all n ≥ n∗0 . Choosing any larger value of c or any larger value of n0 still satisfies
the definition! So any c ≥ c∗ and any n0 ≥ n∗0 are fine, and there are infinitely many such choices.

6.16 Let f(n) = 3n and let g(n) = 2n. It’s clear that f = O(g): choose the constants n0 = 1 and c = 1.5. But f ̸= P(g):
for every n ≥ 1, we have f(n) > g(n).

6.17 Let f(n) = n and let g(n) = n − 1. It’s clear that f = O(g): choose the constants n0 = 2 and c = 0.5: for n ≥ 2 we
have n − 2 ≥ 0, and thus f(n) = n ≤ 2n − 2 = 2g(n). But f ̸= Q(g): for every c > 0, we have f(1) = 1 and g(n) = 0,
and there is no constant c > 0 such that 1 ≤ c · 0.

6.18 By definition, there exist constants n1 , c1 , n2 , and c2 such that

∀n ≥ n1 : f(n) ≤ c1 · g(n) (1)

∀n ≥ n2 : g(n) ≤ c2 · h(n). (2)
104 Analysis of Algorithms

Let n0 = max(n1 , n2 ) and let c = c1 · c2 . Then, from (1) and (2), we have that, for any n ≥ n0 ,
f(n) ≤ c1 · g(n)
≤ c1 · (c2 · h(n))
= c · h(n).
Thus by definition f(n) = O(h(n)).

6.19 By definition, there exist constants n1 , c1 , n2 , and c2 such that

∀n ≥ n1 : f(n) ≤ c1 · h1 (n) (1)
∀n ≥ n2 : g(n) ≤ c2 · h2 (n). (2)
Let n0 = max(n1 , n2 ) and let c = max(c1 , c2 ). Then, from (1) and (2), we have that, for any n ≥ n0 ,
f(n) + g(n) ≤ c1 · h1 (n) + c2 · h2 (n)
≤ c · h1 (n) + c · h2 (n)
= c · [h1 (n) + h2 (n)].

6.20 By definition, there exist constants n1 , c1 , n2 , and c2 such that

∀n ≥ n1 : f(n) ≤ c1 · h1 (n) (1)
∀n ≥ n2 : g(n) ≤ c2 · h2 (n). (2)
Let n0 = max(n1 , n2 ) and let c = c1 · c2 . Then, from (1) and (2), we have that, for any n ≥ n0 ,
f(n) · g(n) ≤ [c1 · h1 (n)] · [c2 · h2 (n)]
= c1 c2 · [h1 (n) · h2 (n)]
= c · [h1 (n) · h2 (n)].

6.21 Observe that n ≥ 1 implies that, for any i ≤ k, we have ni ≤ nk . Thus

" k #
Xk Xk X
p(n) = ai n ≤
i
ai n = n ·
k k
ai .
i=0 i=0 i=0
Pk
Therefore choosing n0 = 1 and c = i=0 ai meets the definition of p(n) = O(nk ).

6.22 We’ll give a proof by contradiction. Let n0 and c be arbitrary. The key is to determine a value of n > n0 such that
p(n) > cnk−ϵ . Observe that

ak nk > cnk−ϵ ⇔ nϵ > c/ak dividing by nk and rearranging

⇔ ϵ log n > log c − log ak taking log of both sides

⇔n>2 (log c−log ak )/ϵ

. rearranging and raising 2 to each side

Let n = max(n0 , 2(log c−log ak )/ϵ + 1). Thus for this n we have ak nk > cnk−ϵ , and so
X
k
p(n) = ai ni ≥ ak nk > cnk−ϵ .
i=0

Thus we have identified a value n ≥ n0 such that p(n) > cnk−ϵ . Because n0 and c were generic, we’ve shown that there
are no constants meeting the definition of O(·). Thus p(n) is not O(nk−ϵ ).

6.23 Let P(k) denote the property that f(n) = logk (n) satisfies f(n) = O(nϵ ) for any ϵ > 0. We’ll prove that P(k) holds
for all k ≥ 0 and all ϵ > 0 by induction on k.
For the base case k = 0, choosing n0 = 1 and c = 1 suffice: for any n ≥ 1, we have log0 n = 1, and nϵ ≥ 1 = c for
any ϵ > 0.
For the inductive case k ≥ 2, we assume the inductive hypothesis P(k − 1). Let ϵ > 0 be arbitrary. We’ll apply the
inductive hypothesis for ϵ/2: specifically, there exist n′0 and c′ such that logk−1 n ≤ c′ ·nϵ/2 for any n ≥ n′0 . By Lemma 6.6,
6.2 Asymptotics 105

there exist n′′0 and c′′ such that log n ≤ c′′ nϵ/2 for any n ≥ n′′0 . We choose n0 = max(n′0 , n′′0 ) and c = c′ · c′′ . Let n ≥ n0
be arbitrary. Then

logk n = (log n) · (logk−1 n) definition of exponentiation

′′ ϵ/2 ′
≤c n ·c ·n ϵ/2
definition of c′ , c′′ above

= c′ · c′′ · nϵ/2 · nϵ/2

= cnϵ .

6.24 Let P(n) denote the property that log n ≤ n. We’ll prove that P(n) holds for all n ≥ 1 by strong induction on n.
For the base case (n = 1), observe that log 1 = 0, and 0 ≤ 1. Thus P(0) holds.
For the inductive case (n ≥ 2), we assume the inductive hypothesis P(0), P(1), . . . , P(n − 1). We must prove P(n).
Observe that n ≥ 2 implies that ⌈n/2⌉ satisfies 1 ≤ ⌈n/2⌉ ≤ n − 1. Thus we have

log n = log 2n · 2 = log n2 + log 2 = log n2 + 1 properties of logs
n
≤ log 2 + 1 properties of logs
n
≤ 2 +1 inductive hypothesis
≤ [n − 1] + 1 above argument
= n.
Thus P(n) follows.

6.25 By assumption, there exist n0 and c such that f(n) ≤ cg(n) for every n ≥ n0 . Let n∗0 = max(n0 , c). Observe that, for
any n ≥ n∗0 , we have
ℓ(n) = log[f(n)]
≤ log[cg(n)]
= log c + log(g(n))
≤ log n + log(g(n)) n ≥ n∗ ∗
0 and n0 ≥ c

≤ log(g(n)) + log(g(n)) g(n) ≥ n by assumption

= 2 log(g(n)).
By definition of O(·), then, with constant values 2 and n∗0 , we have that ℓ(n) = O(log(g(n))).

6.26 Let f(n) = n and let g(n) = n2 . We’ve previously established that n2 ̸= O(n). But note that log(f(n)) = log n
and log(g(n)) = log n2 = 2 log n = 2f(n). Because these functions differ by only a constant factor of 2, we have that
log n2 = O(log n) using n0 = 1 and c = 2.

6.27 Let f(n) = 2 and let g(n) = 1. It’s easy to see that f(n) = O(g(n)), choosing the constants n0 = 1 and c = 0.5. But
log(f(n)) = 1 and log(g(n)) = 0, and 1 ̸= O(0).

6.28 Suppose first that b ≤ c. Then, for any n ≥ 1, we have

c n
bn ≤ cn ⇔ b
≥ 1,
which is true for any n ≥ 0. Thus choosing both constants equal to 1 suffices.
For the converse, suppose that b > c. Fix any constant a > 0. Then
n
bn ≤ a · cn ⇔ bc ≤ a.
But for any n > logb/c (a), we have (b/c)n > a. Thus there is no constant a such that bn ≤ a · cn for all sufficiently large
values of n.

6.29 Yes—unless r0 = 1. If r0 = 1, then f(n) = n and 1n = 1; we’ve previously argued that f(n) = n is not O(1). But if
r0 ≥ 2, then
X
n
r0 · rn0 − 1 r0
f(n) = (r0 )i = ≤ (r0 )n ,
i=1
r0 − 1 r0 − 1
106 Analysis of Algorithms

r0
by Theorem 5.5. Thus f(n) = O((r0 )n ), with n0 = 1 and c = r0 −1
.

6.30 Suppose that f(n) = O(g(n)). Then there exist constants n0 ≥ 0 and c > 0 such that ∀n ≥ n0 : f(n) ≤ c · g(n).
Dividing both sides by c, we have ∀n ≥ n0 : g(n) ≥ 1c · g(n). Thus g(n) = Ω(f(n)), using the constants n0 and 1c .
Similarly, suppose g(n) = Ω(f(n)). Thus there exist constants n0 ≥ 0 and d > 0 such that ∀n ≥ n0 : g(n) ≥ d · f(n).
Dividing both sides by d, we have ∀n ≥ n0 : f(n) ≤ 1d · g(n). Thus f(n) = O(f(n)), using the constants n0 and 1d .

6.31 First, for any n ≥ 1, we have that 1

n
≥ 0, so

f(n) = n + 1
n
≥ n ≥ 1,

and thus f(n) = Ω(1), using the constants n0 = 1 and d = 1. However, f(n) ̸= O(1). To see this, consider any constants
c and n0 . Choose any n > max(n0 , c, 1): then

f(n) = n + 1
n
≥ n > c.

Thus, for any constants c and n0 , there exists n ≥ n0 such that f(n) > c, and thus f(n) ̸= O(1).
By definition, then, we have that f(n) ̸= O(1), f(n) = Ω(1), f(n) ̸= Θ(1), f(n) ̸= o(1), and f(n) = ω(1).

6.32 First, for any n ≥ 1, we have that 1

n
≤ 1 ≤ n, we know that

f(n) = n + 1
n
≤ n + n ≤ 2n.

Thus f(n) = O(n), using the constants n0 = 1 and c = 2. Second, for any n ≥ 1, we have that 1
n
≥ 0, so

f(n) = n + 1
n
≥ n,

and thus f(n) = Ω(n), using the constants n0 = 1 and d = 1.

By definition, then, we have that f(n) = O(n), f(n) = Ω(n), f(n) = Θ(n), f(n) ̸= o(n), and f(n) ̸= ω(n).

6.33 First, for any n ≥ 1, we have that 1

n
≤ 1 ≤ n and n ≤ n2 , so

f(n) = n + 1
n
≤ n + n ≤ 2n ≤ 2n2 .

Thus f(n) = O(n2 ), using the constants n0 = 1 and c = 2. However, f(n) ̸= Ω(n2 ). To see this, consider any constants d
and n0 . Choose any n > max(n0 , d2 , 1): then
dn2
f(n) = n + 1
n
≤ 2n < 2n · n
2/d
=2· 2
= dn2 .

Thus, for any constants d and n0 , there exists n ≥ n0 such that f(n) < dn2 , and thus f(n) ̸= Ω(n).
By definition, then, we have that f(n) = O(n2 ), f(n) ̸= Ω(n2 ), f(n) ̸= Θ(n2 ), f(n) = o(n2 ), and f(n) ̸= ω(n2 ).

6.34 Choose the constants n0 = 1, c = 1, and d = 1/2. Because n < 2k(n)+1 implies n/2 < 2k(n) , we have

n/2 < 2k(n) ≤ n.

Multiplying through by two yields

n < 2k(n)+1 ≤ 2n,

so for 2k(n)+1 we choose n0 = 1, c = 2, and d = 1.

6.35 By definition, we have that

2k(n) ≤ n < 2k(n)+1 .

Taking logs (which preserves inequalities), we have

k(n) ≤ log n < k(n) + 1.

For n ≥ 2, we have k(n) ≥ 1 so k(n) + 1 ≤ 2k(n). Thus choosing the constants n0 = 2, c = 2, and d = 1 suffices.
6.2 Asymptotics 107

6.36 By definition, we have that

bkb (n) ≤ n < bkb (n)+1 .

Taking logs (which preserves inequalities), we have
kb (n) ≤ logb n < kb (n) + 1.
log x
And, by the log property logb x = log b
, we have
1
log b
· kb (n) ≤ log n < 1
log b
· [kb (n) + 1] .

For n ≥ b, we have kb (n) ≥ 1 so kb (n) + 1 ≤ 2k(n). Thus choosing the constants n0 = b, c = 1

log b
, and d = 2
log b
suffices.

6.37 Define the function f(n) = n3/2 . (Or, more precisely: f(n) = ⌈n3/2 ⌉.)
√
Then f(n) ̸= O(n) because, for any constant c, if n > c2 then f(n) = ( n)3 > cn.
√ √
Similarly, f(n) ̸= Ω(n ) because, for any constant d, if n > (1/d) then f(n) = ( n)3 = n2 / n < dn2 .
2 2

6.38 False: the function

(
202 if n ≤ 10
f(n) =
0 otherwise

is also Θ(0) (to name one function among an infinite set of counterexamples).

6.39 The function f(n) = 1 meets this definition: indeed, f(n) = [f(n)]2 for any n, so for this function the claim follows
with n0 = 0 and c = 1.

6.40 (This argument has a lot in common with the proof of lower bounds for comparison-based sorting; see p. 457 of the
text.) To show that nk = O(n!), choose the constants n0 = 4k and c = 1. For any n ≥ 4k, we have
"n−2k # " #
Yn Y Y n
n! = i= i · i definition of !, and splitting up the product
i=1 i=1 i=n−2k+1
" #
Y
n
≥1· i removing multiplicands decreases the product
i=n−2k+1
" #
Y
n
n
≥1· for n ≥ 4k, all remaining multiplicands are ≥ n/2
i=n−2k+1
2
n 2k
= there are 2k remaining multiplicands
2
n k n k
= nk · = nk · ≥ nk .
22k 4
Thus nk = O(n!).
To show that nk ̸= Ω(n!), consider any d > 0. For any n ≥ max(4k + 4, d), by the above argument we have

n! ≥ nk+1 ≥ d · nk .

Thus nk ̸= Ω(n!).

6.41 Suppose there exist constants c > 0 and n0 ≥ 0 such that

∀n ≥ n0 : f(n) + 1 ≤ cf(n).
If f(n) = 0 for any n ≥ n0 , then this inequality fails; thus f(n) ≥ 1 for all n ≥ n0 —that is, f(n) = Ω(1).
Conversely, suppose that f(n) = Ω(1). Then there exist constants d > 0 and n0 ≥ 0 such that
∀n ≥ n0 : f(n) ≥ d.
108 Analysis of Algorithms

Because d > 0 and f is integer-valued, in fact

∀n ≥ n0 : f(n) ≥ 1.
Thus choosing c = 2 suffices:
∀n ≥ n0 : f(n) + 1 ≤ 2f(n),
so g(n) = O(f(n)).

6.42 Here are examples for the four parts:

(a) impossible: having f(n) = o(n2 ) requires that f(n) ̸= Ω(n2 ) and having f(n) = ω(n2 ) requires that f(n) = Ω(n2 );
these two things cannot simultaneously be true.
(b) f(n) = n3
(c) f(n) = n
(d) f(n) = n2

6.43 We simply note that f(n) ̸= O(g(n)) then f(n) ̸= o(g(n)) and f(n) ̸= Θ(g(n)). Similarly if f(n) ̸= Ω(g(n)) then
f(n) ̸= ω(g(n)) and f(n) ̸= Θ(g(n)).
Thus the only way for more than one of these properties to hold is for f(n) = O(g(n)) and f(n) = Ω(g(n)), but in this
case f(n) ̸= o(g(n)) and f(n) ̸= ω(g(n)).

6.44 Let’s argue formally that f(n) ̸= Ω(n2 ). Let d > 0, n0 ≥ 0 be arbitrary. Let n be the smallest odd number strictly
greater than max(1/d, n0 ). Then f(n) = n and n < d · n2 because we chose n > 1/d. But we just argued that, for arbitrary
d, n0 ≥ 0, it is not the case that ∀n ≥ n0 : f(n) ≥ dn2 . Thus f(n) ̸= Ω(n2 ).

6.45 We’ll make heavy use of Exercise 6.30, along with Lemma 6.3 (proven in Exercise 6.18):
f(n) = Ω(g(n)) and g(n) = Ω(h(n)) by assumption
⇒ g(n) = O(f(n)) and h(n) = O(g(n)) by Exercise 6.30
⇒ h(n) = O(f(n)) by Lemma 6.3/Exercise 6.18
⇒ f(n) = Ω(h(n)). by Exercise 6.30

6.46 We’ll use the previous exercise and Lemma 6.3 (proven in Exercise 6.18):
f(n) = Θ(g(n)) and g(n) = Θ(h(n)) by assumption
⇒ f(n) = O(g(n)) and g(n) = O(h(n)) and f(n) = Ω(g(n)) and g(n) = Ω(h(n)) definition of Θ(·)
⇒ f(n) = O(h(n)) and f(n) = Ω(h(n)) by Lemma 6.3/Exercise 6.18 and Exercise 6.45
⇒ f(n) = Θ(h(n)). definition of Θ(·)

6.47 Lemma 6.3 (see Exercise 6.18) immediately tells us that if f(n) = o(g(n)) and g(n) = o(h(n)), then f(n) = O(h(n)).
But we must also prove that f(n) ̸= Ω(h(n)). Assume for the purposes of a contradiction that f(n) = Ω(h(n)): that is,
assume that there exist constants m1 ≥ 0 and c1 > 0 such that
∀n ≥ m1 : f(n) ≥ c1 · h(n). (1)
By the assumption that g(n) = o(h(n)) (and thus that g(n) = O(h(n))), there also exist constants m2 ≥ 0 and c2 > 0
such that
∀n ≥ m2 : g(n) ≤ c2 · h(n). (2)
Combining (1) and (2), we have that, for n ≥ max(m1 , m2 )
f(n) ≥ c1 · h(n) by (1)
≥ c1 · 1
c2
g(n). by (2)

But we’ve just argued that f(n) = Ω(g(n)), with the constants n0 = max(m1 , m2 ) and d = cc21 . But f(n) = o(g(n)) implies
that f(n) ̸= Ω(g(n)). Thus we’ve reached a contradiction, and thus the assumption is false. Therefore f(n) ̸= Ω(h(n)),
and indeed f(n) = o(h(n)).
6.3 Asymptotic Analysis of Algorithms 109

6.48 The claim is false in general (though it can be true for particular functions f and g: for example, if f(n) and g(n) are
identical, then the statement is vacuously true). For example, let f(n) = n2 and g(n) = n. Then f(n) = Ω(g(n)) but
g(n) ̸= Ω(f(n)).

6.49 True, Θ is symmetric: if f(n) = Θ(g(n)), then f(n) = O(g(n)) and f(n) = Ω(g(n)). By Exercise 6.30, the two
properties f(n) = O(g(n)) and f(n) = Ω(g(n)) respectively imply that g(n) = Ω(g(n)) and g(n) = O(g(n)); by
definition, then, we have g(n) = Θ(f(n)).

6.50 The claim is always false (it cannot be true for any particular functions f and g). Suppose f(n) = ω(g(n)). Then
f(n) ̸= O(g(n)), which means—by Exercise 6.30—that g(n) ̸= Ω(f(n)). Thus by definition g(n) ̸= ω(f(n)).

6.51 True: choose the constants n0 = 0 and c = 1. For all n ≥ n0 , we have f(n) ≤ c · f(n).

6.52 True: again, choose the constants n0 = 0 and d = 1. For all n ≥ n0 , we have f(n) ≥ d · f(n).

6.53 False: if f(n) = ω(g(n)), then f(n) ̸= O(g(n)), and Exercise 6.51 showed that f(n) = O(f(n)).

6.54 The problem with this proof is that it shifts the goal posts in the middle of the proof, by changing the multiplicative
constant it’s trying to establish. (This fallacy is called equivocation; see Section 4.5.) Specifically, ignoring n0 (that part
of the proof is okay), the property that the proof claims to be establishing is
∃c > 0 : ∀n ≥ 0 : n2 ≤ cn.
But the claim that it actually establishes is
∀n ≥ 0 : ∃c > 0 : n2 ≤ cn.
Order of quantification matters!

6.3 Asymptotic Analysis of Algorithms

6.55 In the ith iteration of the outer loop, we do precisely n − i comparisons—exactly as argued in Example 6.7. Thus the
number of comparisons is
X
n X
n
n(n + 1) n2 − n
(n − i) = n2 − i = n2 − = .
i=1 i=1
2 2

6.56 In the ith iteration of the outer loop if the input is reverse sorted, we do i − 1 comparisons (and swaps)—exactly as
argued in Example 6.8. Thus the number of comparisons is
X
n X
n−1
n(n − 1)
(i − 1) = i= .
i=2 i=1
2

6.57 The ith iteration of the outer loop executes n − i iterations of the inner loop, each of which does a comparison. Thus
the total number of comparisons is
X
n X
n−1
n(n − 1)
(n − i) = j= .
i=1 j=0
2

Pn
6.58 In the ith iteration of the outer loop, we do precisely 1 swap, in line 6. Thus the number of swaps is i=1 1 = n.

6.59 In the worst case, every comparison causes a swap because the pair is out of order; thus for the reverse-sorted input
we do i − 1 swaps (and comparisons), for a total of
X
n X
n−1
n(n − 1)
(i − 1) = i= .
i=2 i=1
2
110 Analysis of Algorithms

6.60 The ith iteration of the outer loop executes n − i iterations of the inner loop, each of which does a comparison—and,
in a reverse-sorted array, a swap. Thus the total number of swaps for the reverse-sorted input is

X
n X
n−1
n(n − 1)
(n − i) = j= .
i=1 j=0
2

6.61 Regardless of its input’s ordering, Selection Sort still looks for the smallest value in the unexamined portion of the
array; thus its best-case number of comparisons and swaps is identical to the worst case: n(n−1)
2
comparisons and n swaps
(all n of which are “self-swaps” if the input array is sorted).

6.62
Pn If the input array is sorted, thenPeach iteration of the outer loop does a single comparison and zero swaps, which is
i=2 1 = n − 1 comparisons, and
n
i=2 0 = 0 swaps.

6.63 The exact same sequence of comparisons are executed regardless of the input array’s order, but no swaps occur if the
input is already sorted. Thus in the best case there are n(n−1)
2
comparisons and 0 swaps.

6.64 Unfortunately, for a reverse-sorted input array A, the execution of early-stopping-bubbleSort(A) is identical to
bubbleSort(A) (except with the slight additional overhead of the swapped variable): a swap occurs in every iteration.
Thus the worst-case running time is, as before,

X
n X
n−i
Θ(1) = Θ(n2 ).
i=1 j=1

6.65 For a sorted input array A, the first iteration of the outer loop of early-stopping-bubbleSort(A) causes no swaps,
and thus the algorithm terminates after a single iteration. Thus the running time is Θ(n), whereas the best-case running
time for bubbleSort was Θ(n2 ).

6.66 If A is a reverse-sorted array, then R (the array formed by reversing A) is a (forward-)sorted array. Line 1 of
forward-backward-bubbleSort constructs R in Θ(n) time. After a single iteration of the while loop, bubbleSort(R)
has performed no swaps, and thus R is now sorted. So we’ve executed only 1 iterations of the while loop, which requires
Θ(n) steps in Line 3, Θ(n) steps in Line 4, and Θ(n) steps in Line 5 to check if A or R is sorted. But that total is just
Θ(n) + Θ(n) + Θ(n) + Θ(n) = Θ(n).

6.67 There can be at most n iterations of the outer for loop; after n iterations of Line 3, we’ve actually fully executed
bubbleSort(A) (albeit with alternating steps of bubbleSort(R)). But that still gives us at most n iterations of O(n) steps
inside the while loop—including O(n) steps to check if A or R is sorted—and thus O(n2 ) steps overall.

6.68 Consider the array

A[1, . . . , 2k] = [ 1, 2, . . . , k , 2k, 2k − 1, . . . , k + 1 ].

k sorted elements k reverse-sorted elements
This array requires at least k−1 iterations from early-stopping-bubbleSort: note that the element k+1 “moves backward”
by only one position in each iteration, and it has to swap places with the k − 1 elements immediately before it. Similarly,
the reverse of this array is

R[1, . . . , 2k] = [ k + 1, k + 2, . . . , 2k , k, k − 1, . . . , 1 ].
k sorted elements k reverse-sorted elements
For exactly the same reasons (except applying the argument for the element 1 instead of the element k + 1), the array R
requires at least k − 1 iterations from early-stopping-bubbleSort.

6.69 A solution in Python is shown in Figure S.6.1. (The swap-counting and comparison-counting code required for the
next exercise is included.)

6.70 The testing code is shown in the fourth block in Figure S.6.1. In my implementation, I found the following statistics,
averaged over the 8! = 40,320 orderings of {1, 2, . . . , 8}:
6.3 Asymptotic Analysis of Algorithms 111

1 def bubble_sort(A):
2 comparisons, swaps = 0, 0
3 for i in range(len(A)):
4 for j in range(len(A) - i - 1):
5 comparisons += 1
6 if A[j] > A[j+1]:
7 A[j], A[j+1] = A[j+1], A[j]
8 swaps += 1
9 return A, comparisons, swaps

10 def early_stopping_bubble_sort(A):
11 comparisons, swaps = 0, 0
12 for i in range(len(A)):
13 swapped = False
14 for j in range(len(A) - i - 1):
15 comparisons += 1
16 if A[j] > A[j+1]:
17 A[j], A[j+1] = A[j+1], A[j]
18 swaps += 1
19 swapped = True
20 if not swapped:
21 return A, comparisons, swaps

22 def forward_backward_bubble_sort(A):
23 comparisons, swaps = 0, 0
24 R = A[::-1] # Python shorthand for reversing a list
25 for i in range(len(A)):
26 swappedA, swappedR = False, False
27 for j in range(len(A)-i-1):
28 comparisons += 2
29 if A[j] > A[j+1]:
30 A[j], A[j+1] = A[j+1], A[j]
31 swaps += 1
32 swappedA = True
33 if R[j] > R[j+1]:
34 R[j], R[j+1] = R[j+1], R[j]
35 swaps += 1
36 swappedR = True
37 if not swappedA:
38 return A, comparisons, swaps
39 elif not swappedR:
40 return R, comparisons, swaps

41 import itertools
42
43 def test_sort(sort_function, n):
44 '''
45 Input: sort_function is a function taking a list of elements of
46 input and returning (L, s, c) where L is the sorted list, s and c
47 are the number of swaps and comparisons completed, respectively.
48
49 Prints the average number of swaps and comparisons performed by this
50 sorting function on all n-element permutations.
51 '''
52 swaps, comparisons, count = 0, 0, 0
53 for A in itertools.permutations(range(n)):
54 result, c, s = sort_function(list(A))
55 swaps += s
56 comparisons += c
57 count += 1
58 print("Average swaps: %0.3f; comparisons %0.3f" % (swaps / count, comparisons / count))

Figure S.6.1 Three versions of Bubble Sort, in Python, and some code to test them.
112 Analysis of Algorithms

average number of swaps average number of comparisons

bubbleSort 14.0 28.0
early-stopping-bubbleSort 14.0 25.995
forward-backward-bubbleSort 26.636 48.981

Thus the original bubbleSort algorithm and early-stopping-bubbleSort do better with respect to swaps, and the early-
stopping version does best on comparisons.

6.71 The loop in Lines 1–2 takes Θ(k) time; the loop in Lines 3–4 takes Θ(n) time; and Line 5 is Θ(1) time. The trickiest
P
question is how many times we execute Lines 8 and 9, which is ki=1 count[i] = n (because each array element is one of
{1, 2, . . . , k}). Thus the total running time is Θ(k + n). For k = 26, then, the running time Θ(n).

6.72 On a moderate-quality laptop, the elapsed time for Bubble Sort (as measured by total running time over 10 randomly
chosen arrays with integers chosen randomly from the specified range) exceeded the elapsed time for Counting Sort until
k ≈ 10,000,000 (at which point Bubble Sort started to win very handily):

k average elapsed time for bubbleSort average elapsed time for countingSort
1 0.33s 0.00032s
10 0.55s 0.00034s
100 0.58s 0.00031s
1000 0.59s 0.00048s
10,000 0.62s 0.00153s
100,000 0.64s 0.01174s
1,000,000 0.78s 0.12724s
10,000,000 0.98s 1.22973s
100,000,000 1.28s 13.22551s

6.73 Pseudocode and an implementation in Python are shown in Figure S.6.2. The running time of Radix Sort is O(n log k):
there are log2 k iterations of the outer loop, and each element is moved into one of two lists in each iteration, taking Θ(n)
time. The running time for Counting Sort was O(n + k). If k is a constant, these algorithms are both linear time in terms
of n—that is, O(n + constant) = O(n) and O(n · constant) = O(n). But a significant advantage of Radix Sort, though,
is that its space requirements are much lighter: it uses only Θ(n) space, whereas Counting Sort needed Θ(k + n) space.

6.74 Suppose that A[1 . . . n] is in sorted order—say A[i] = i. (Reverse-sorted order is just as bad.) Then the pivot is the
minimum element of A, and so less is empty.

6.75 If less is empty, then we can think of quickSort as spending Θ(n) time to loop through all elements of A and eliminate
the pivot value. In this case, actually, this algorithm becomes a recursive version of selection
P sort. The running time is the
same: we spend n − i time finding the ith largest element, so the total running time is ni=1 (n − i) = Θ(n2 ).

radixSort(A[1 . . . n]):
1 def radix_sort(A):
1 // assume that each A[i] ∈ {1, 2, . . . , k} 2 k = max(A)
2 for i := 0 to log2 k: 3 i = 0
3 Zeros := [] 4 while 2**i <= k:
5 zeros = [x for x in A if (x // 2**i) % 2 == 0]
4 Ones := [] 6 ones = [x for x in A if (x // 2**i) % 2 == 1]
5 for j := 1 to n: 7 A = zeros + ones
6 if the ith bit of A[j] is 1 then 8 i += 1
7 add A[j] to the end of Zeros 9 return A
8 else
9 add A[j] to the end of Ones
10 A := Zeros + Ones (in order)
11 return A

Figure S.6.2 Radix sort, in pseudocode and in Python.

6.4 Recurrence Relations 113

6.76 An array that induces worst-case performance is one in which the middle element is consistently the smallest or
largest element of the array. The form of one array that causes this problematic behavior is
[· · · , 5, 4, 3, 2, 1, n, n − 1, n − 2, n − 3, · · · ]

6.77 An array that induces worst-case performance is one in which the three examined elements are always the three
smallest (or three largest) elements in the array. The result of this configuration is to eliminate the pivot and one other
element, so there are n/2 iterations of Θ(n) work. The form of one array that causes this problematic behavior is
[1, 3, 5, . . . , n − 2, n, n − 1, n − 3, . . . , 6, 4, 2]

6.78 The worst-case running time is still Ω(n): in the case that the sought element is larger than every element of A except
possibly, then there’s nothing gained by the early stopping. Thus the worst-case performance is still Ω(n).

6.79 Consider the string s = ZZ · · · Z, with n characters. Then this algorithm requires n iterations of the outer loop, and at
the start of the ith iteration, the string in question has n − i + 1 characters. Thus the total running time will be
X
n X
n
c · (n − i + 1) = j = Θ(n2 ).
i=1 j=1

6.4 Recurrence Relations

6.80 R(1) = 1 and R(h) = 1 + 4R(h − 1): there’s the “parent” region, and each of the four subregions has as many regions
as any region of height h − 1 can.

6.81 S(1) = 1 and S(h) = 1 + 3 + S(h − 1): there’s the “parent” region, three subregions that aren’t further subdivided,
and one subregion that has as few regions as a region of height h − 1 can.

6.82 T(1) = 1 and T(n) = 1 + 4T(n/4). (A fully general solution for the case when n is not divisible by 4 is a little
trickier. To be fully precise, the recursive case would be T(n) = 1 + (n mod 4) · T(⌈ 4n ⌉) + (4 − n mod 4) · T(⌊ 4n ⌋).)

6.83 T(n) = 1 + T(n − 1) and T(0) = 1.

6.84 T(n) = 1 + T(⌊n/2⌋) + T(⌈n/2⌉) and T(0) = T(1) = 1.

6.85 Getting the rounding right here is trickier; if we let ourselves be sloppy about rounding, then the recurrence is simply
T(n) = 1 + 2T(n/4) + T(n/2). With the correct rounding, the recurrence is

T(n) = 1 + T( 4n ) + T( 3n4 − 4n ) + T( n4 )
and T(0) = T(1) = 1.

6.86 Recall the recurrence T(n) = 1 + T(n − 1) and T(0) = 1. We claim by induction that T(n) = n + 1:
• for n = 0, by definition T(0) = 1 = 0 + 1.
• for n ≥ 1, we have T(n) = 1 + T(n − 1) = 1 + (n − 1) + 1 by the inductive hypothesis, and thus T(n) = 1 + n.
Thus T(n) = 1 + n, and therefore T(n) = Θ(n).

6.87 Recall the recurrence T(n) = 1 + T(⌊n/2⌋) + T(⌈n/2⌉) and T(1) = 1. When n = 2k is a power of 2, the recurrence
is simply T(2k ) = 1 + 2T(2k−1 ). We claim by induction on k that T(2k ) = 2k+1 − 1 for any k ≥ 0:

• for k = 0, by definition T(20 ) = T(1) = 1 = 2 · 1 − 1.

• for k ≥ 1, we have that T(2k ) = 1 + 2T(2k−1 ) = 1 + 2[2k − 1] by the inductive hypothesis, and therefore we know
that T(2k ) = 1 + 2k+1 − 2 = 2k+1 − 1.

Because we’ve shown that T(2k ) = 2k+1 − 1, when n is a power of two we have T(n) = 2n − 1—and thus T(n) = Θ(n).
114 Analysis of Algorithms

6.88 Recall the recurrence

n 3n n n
T(n) = 1 + T( 4
) + T( 4
− 4
) + T( 4
)
and T(1) = 1. We claim by induction that T(n) = 23 n − 1
2
for any n ≥ 1:
• for n = 1, by definition T(1) = 1 = 3
2
− 1
2
.
• for n ≥ 2, we have
T(n) = 1 + T(⌊ n4 ⌋) + T(⌊ 3n4 ⌋ − ⌊ 4n ⌋) + T(⌈ 4n ⌉) definition of the recurrence
= 1 + [ 23 ⌊ n4 ⌋ − 12 ] + [ 32 (⌊ 3n4 ⌋ − ⌊ n4 ⌋) − 12 ] + [ 32 ⌈ n4 ⌉ − 21 ] inductive hypothesis, applied three times
= 32 (⌊ 4n ⌋ + ⌊ 3n
4
⌋ − ⌊ n4 ⌋ + ⌈ n4 ⌉) − 12 collecting like terms
= 32 (⌊ 3n4 ⌋ + ⌈ n4 ⌉) − 21 ⌊ n4 ⌋ − ⌊ n4 ⌋ = 0
= 2n − 2.
3 1
by cases: for each n mod 4 ∈ {0, 1, 2, 3}, we have ⌊ 3n
4
⌋ + ⌈ n4 ⌉ = n

Thus T(n) = 32 n − 12 , and therefore T(n) = Θ(n).

6.89 Given an array A[1 . . . n], how many negative entries does A contain? That is, what is | {i : A[i] < 0} |?

6.90 Let T(n) denote the running time of ternary search on an input of length n. Then T(1) = 1 and T(n) = T(n/3) + 2
for n ≥ 2. (We’re assuming that n = 3k , so we can ignore floors and ceilings in the n/3.) We claim that T(3k ) = 2k + 1
by induction on k. For the base case, T(30 ) = 1 = 0 + 1. For the inductive case:

T(3k ) = T(3k /3) + 2

= T(3k−1 ) + 2
= [2(k − 1) + 1] + 2 inductive hypothesis
= 2k + 1.

6.91 Binary search performs better if we count exactly, though they both require Θ(log n) comparisons asymptotically.
Binary search does ⌈log2 n⌉ + 1 comparisons; ternary search does 2 ⌈log3 n⌉ + 1 comparisons. Let b(n) and t(n) denote
the number of comparisons done by binary and ternary search, respectively. Then we have
t(n) = 2 ⌈log3 n⌉ + 1
≥ 2 log3 n + 1
log2 n
= 2 log 3
+1 conversion of log bases (Theorem 2.10.6)
2

≈ 2
1.585
log2 n + 1
= 1.262 log2 n + 1.

But b(n) = ⌈log2 n⌉ + 1 ≤ log2 n + 2, so binary search is certainly better as long as

1.262 log2 n + 1 ≥ log2 n + 2 ⇔ 0.262 log2 n ≥ 1

⇔ log2 n ≥ 0.262
1
≈ 3.8168
⇔ n ≥ 21/0.262
⇔ n ≥ 14.0919.
That is, you should pick binary search if n ≥ 15 (and even for some smaller values).

6.92 If we hypothesize that the binary search running time is T(2k ) = c(1 + k)—that is, T(n) = c(1 + log n) for powers
of two—then we can prove this straightforwardly by induction on k:
base case (k = 0): Then we have T(1) = c by definition of the recurrence, and indeed c(1 + log 1) = c(1 + 0) = c, too.
inductive case (k ≥ 1): Assume the inductive hypothesis, namely that T(2k − 1) = ck. Then

T(2k ) = T(2k−1 ) + c definition of the recurrence

= ck + c inductive hypothesis
= c(k + 1).
6.4 Recurrence Relations 115

6.93 Consider the recurrence T(1) = T(2) = 1 and T(n) = T(n − 2) + n. We could give a purely asymptotic answer, but
here’s an exact proof. We claim that T(2k) = k2 + k − 1 and T(2k − 1) = k2 . Here’s a proof by induction on k:
base case (n ∈ {1, 2}—that is, k = 1). Plugging k = 1 into the claim, we need to show that T(2) = 12 + 1 − 1 and
T(2 − 1) = 12 . Indeed, 12 + 1 − 1 and 12 are both equal to 1, just as T(1) = T(2) = 1.
inductive case (n ≥ 3—that is, k ≥ 2): We assume the inductive hypothesis—namely that
T(2k − 2) = (k − 2)2 + (k − 2) − 1 and T(2k − 3) = (k − 2)2 .
We must prove that T(2k) = k2 + k − 1 and T(2k − 1) = k2 :
T(2k) = T(2k − 2) + 2k and T(2k − 1) = T(2k − 3) + 2k − 1
definition of the recurrence

= (k − 1)2 + (k − 1) − 1 + 2k = (k − 1)2 + 2k − 1
inductive hypothesis [note that 2k − 3 = 2(k − 1) − 1]

= (k − 2k + 1) + (k − 1) − 1 + 2k
2
= k2 − 2k + 1 + 2k − 1 multiplying out

=k +k−1
2
= k2 .
Together, these results for even and odd n show that T(n) = Θ(n2 ).

6.94 We are considering the recurrence T(n) = T(n − k) + n for n > k (and T(n) = 1 otherwise).
If k = 1, we’ll give a separate proof that T(n) = Θ(n2 ): we claim T(n) = n(n + 1)/2 by induction on n. For the
base case (n = 1), the claim is immediate by definition of the recurrence. For the inductive case (n ≥ 2), we assume the
inductive hypothesis, namely that T(n − 1) = (n − 1)n/2. Then
T(n) = T(n − 1) + n definition of the recurrence
n(n − 1)
= +n inductive hypothesis
2
2n + n(n − 1) n(n + 1)
= = .
2 2
n 2
Now we’ll consider the case k ≥ 2. We claim that T(n) ≥ k and T(n) ≤ n2 , by strong induction on n.

• For the base cases n ∈ {1, 2, . . . , k}, we have T(n) = 1, and 1 ≥ 12 and 1 ≤ n2 .
• For the inductive case n > k, we assume the inductive hypotheses, namely that
2
T(n − k) ≥ n−k k
= (⌊ nk ⌋ − 1)2 and T(n − k) ≤ (n − k)2 .
Then:
T(n) = T(n − k) + n and T(n) = T(n − k) + n definition of the recurrence

≥ ( nk − 1)2 + n ≤ (n − k)2 + n inductive hypothesis
2
= nk − 2 nk + 1 + n = n − 2kn + k + n
2 2
multiplying out
2 ⌊n⌋ ⌊n⌋ ⌊ ⌋
≥ nk − n + 1 + n k ≥ 2, so 2 k
≤2 2
≤ n, and so −2 nk ≥ −n
2
= nk + 1 = n + k − kn + n − kn
2 2
because n > k and k > 1
2 <0 <0
≥ nk . ≤n .2

6.95 We’ll prove that fn ≤ 2n by strong induction on n.

• Base case #1 (n = 1): f1 = 1 ≤ 2 = 21 .
• Base case #2 (n = 2): f2 = 1 ≤ 4 = 22 .
• Inductive case (n ≥ 3): we assume the inductive hypotheses, namely that fn−1 ≤ 2n−1 and fn−2 ≤ 2n−2 . Then:
fn = fn−1 + fn−2 definition of the Fibonaccis

≤2 n−1
+2 n−2
inductive hypotheses

≤ 2n−1 + 2n−1
= 2n .
116 Analysis of Algorithms

6.96 We prove that fibNaive(n − k) appears a total of fk+1 times in the call tree for fibNaive(n) by strong induction on k.

• Base case #1 (k = 0): it’s clear that fibNaive(n − 0) appears once, for the call fibNaive(n) itself. Indeed f1 = 1.
• Base case #2 (k = 1): it’s clear that fibNaive(n − 1) appears once, for the first recursive call fibNaive(n − 1) [which
is made when fibNaive(n) calls fibNaive(n − 1)]. Indeed f2 = 1.
• Inductive case (k > 1): by the inductive hypothesis, fibNaive(n − (k − 1)) appears fk times and fibNaive(n − (k − 2))
appears fk−1 times. Each of these calls in turn invokes fibNaive(n − k) (and no other call does), so the total number of
calls to fibNaive(n − k) is fk + fk−1 = fk+1 .

6.97 The recurrence is T(0) = T(1) = 1 and T(n) = T(n − 1) + 1. Immediately, we have T(n) = n for any n ≥ 1, by
induction on n: it’s true for T(1) = 1, and for larger n we have T(n) = T(n − 1) + 1 = n − 1 + 1 = n.

6.98 The recurrence is T(1) = 1 and T(n) = T(n/2) + 1. Immediately, we have T(n) = 1 + log n, by induction on n: it’s
true for T(1) = 1 = 1 + 0 = 1 + log 1, and for larger n we have

T(n) = T(n/2) + 1 = log(n/2) + 1 + 1 = (log n) − 1 + 1 + 1 = 1 + log n.

6.99 A Python implementation of 2-by-2 matrix powers using repeated squaring is shown in Figure S.6.3.

6.100 The recurrence is, in the worst case:

T(n) = T(n − 1) + 1
T(1) = 1

By induction, we see that T(n) = n: for n = 1, it’s true by definition; for n ≥ 2, we have T(n) = T(n − 1) + 1 =
n − 1 + 1 = n by the inductive hypothesis.

1 def matrix_multiply_2x2(M, N):

2 '''
3 Multiply two 2-by-2 matrices.
4 '''
5 return [[M[0][0]*N[0][0] + M[0][1]*N[1][0], M[0][0]*N[0][1] + M[0][1]*N[1][1]],
6 [M[1][0]*N[0][0] + M[1][1]*N[1][0], M[1][0]*N[0][1] + M[1][1]*N[1][1]]]
7
8 def matrix_exponentiate_2x2(M, n):
9 '''
10 Raise the 2-by-2 matrix M to the nth power.
11 '''
12 if n == 0:
13 # M^0 == the identity matrix
14 return [[1, 0],
15 [0, 1]]
16 else:
17 # M^k == [M^(k/2)]^2 * M (if k is odd)
18 # M^k == [M^(k/2)]^2 (if k is even)
19 N = matrix_exponentiate_2x2(M, n // 2)
20 if n % 2 == 0:
21 return matrix_multiply_2x2(N, N)
22 else:
23 return matrix_multiply_2x2(M, matrix_multiply_2x2(N, N))
24
25 def fibonacci(n):
26 fibM = [[1, 1],
27 [1, 0]]
28 M = matrix_exponentiate_2x2(fibM, n)
29 return M[0][0] + M[0][1]

Figure S.6.3 A Python implementation of fibMatrix.

6.4 Recurrence Relations 117

6.101 Let P(n) denote the property that, for all n′ ≥ n, we have T(n) ≤ T(n′ ) for the recurrence

T(1) = c and T(n) = T( 2n ) + T( n2 ) + cn.
We’ll prove P(n) holds for all n ≥ 1 by strong induction on n.
base case n = 1: Because all terms on the right-hand side of the recurrence relation are positive, we have that T(n′ ) ≥
cn′ ≥ c = T(1). Thus P(1) holds.
inductive case n ≥ 2: Assume that P(1), P(2), . . . , P(n − 1). Let n′ ≥ n be arbitrary. We must prove T(n′ ) ≥ T(n). But
T(n) = T(⌈n/2⌉) + T(⌊n/2⌋) + cn

= T( n′ /2 ) + T( n′ /2 ) + cn′ inductive hypothesis, plus n ≤ n′ implies ⌈n/2⌉ ≤ ⌈n′ /2⌉ and ⌊n/2⌋ ≤ ⌊n′ /2⌋
′
= T(n ).

6.102 The recurrence relation C(1) = 0 and C(n) = 2C( n2 ) + n − 1 denotes the number of comparisons performed
by mergeSort on an input array of size n because all of the comparisons happen in merge. In the worst case, the two
largest elements of the to-be-merged arrays are on opposite sides of the split, which means that we have only one element
remaining when we reach the base case of merge’s recursive calls (returning X or Y)—which therefore requires n − 1
comparisons to occur (one for every merged element except the last).
We prove that C(1) = 0 and C(n) = 2C(n/2) + n − 1 has the solution C(n) = n log n − n + 1 by (strong) induction
on n. (For ease, we’ll assume that n is a power of two.)
base case n = 1: C(1) = 0 by definition and 1 log 1 − 1 + 1 = 0 − 1 + 1 = 0.
inductive case n ≥ 2: We assume the inductive hypothesis, specifically that C(n/2) = [ n2 log 2n − n
2
+ 1]. Then:

C(n) = 2C(n/2) + n − 1
= 2 · [ n2 log n2 − n2 + 1] + n − 1 inductive hypothesis
= n log 2n − n + 2 + n − 1
= n · [(log n) − (log 2)] − n + 2 + n − 1 Theorem 2.10.4
= n · [(log n) − 1] − n + 2 + n − 1 Theorem 2.10.2
= n log n − n − n + 2 + n − 1
= n log n − n + 1.

6.103 The recurrence is T(0) = T(1) = 1 and T(n) = T(n − 2) + 1. We claim that the solution is T(n) = ⌊ n+1 2
⌋. Here’s
the proof, by induction. For the base cases (n = 0 and n = 1), we have T(0) = 1 = ⌊ 21 ⌋ and T(1) = 1 = ⌊ 22 ⌋. For the
inductive case (n ≥ 2), we have T(n) = T(n − 2) + 1 = ⌊ n−1
2
⌋ + 1 by the inductive hypothesis, and ⌊ n−1
2
⌋ + 1 = ⌊ n+1
2
⌋.

6.104 There are two sticky parts in developing the recurrence: first, how much work is done aside from the recursive call?
And, second, how big is n − x? To answer the first question, we can observe that the number of iterations of the while
loop is the number of times 1 can be doubled until it exceeds n—which is log n times. To answer the second question,
we can observe that certainly n − x < n/2 (because n < 2x so n/2 < x). In particular, x will be 2⌊log n⌋ . Putting these
together, we can write
(
1 if n ≤ 1
T(n) =
T(n − 2⌊log n⌋ ) + log n otherwise

Now, we claim that T(n) ≤ 1 + log2 n for all n ≥ 1 by strong induction on n. For the base case (n = 1), we have
T(1) = 1 ≤ 1 + 1 = 1 + 12 = 1 + log2 2, as required. For the inductive case n ≥ 2:

T(n) = T(n − 2⌊log n⌋ ) + log n definition of the recurrence

≤ max [T(k)] + log n above discussion, because n − x < n/2
k=1,2,...,⌊n/2⌋
h i
≤ max 1 + log2 k + log n inductive hypothesis
k=1,2,...,⌊n/2⌋

≤ 1 + log2 ( n2 ) + log n every k ∈ {1, 2, . . . , ⌊n/2⌋} satisfies k ≤ n/2, so log2 k ≤ log2 n

h i
2

≤ 1 + (log n − 1)2 + log n Theorem 2.10.4 and Theorem 2.10.2

118 Analysis of Algorithms

= log2 n − log n + 2 multiplying out and combining like terms

≤ log n − 1 + 2
2
for n ≥ 2, we have log n ≥ 1

≤ log n + 1.
2

6.105 The worst-case behavior is triggered by a number that’s one less than a power of two: for example, g(255) →
g(255 − 128 = 127), and g(127) → g(127 − 64 = 63), etc. Thus the worst cases are n = 2i − 1 for any i ≥ 1.
In this case, the recurrence can be more simply expressed as T(n) = log n + T( n−1 ). The solution is, asymptotically,
Plog n i
2
i=1 log(n/2 ), which turns out to be

X
log n
X
log n
X
log n
log(n/2i ) = log n − log(2i )
i=1 i=1 i=1

X
log n
= log2 n − i
i=1
(log n)(1 + log n)
= log2 n −
2
2 log2 n − log n − log2 n
=
2
log2 n − log n
=
2
= Θ(log2 n).

6.106 Both f(n) and g(n) compute n mod 2. The key observation that underlies both algorithms is that, for any integer n,
we have that n mod 2 = (n − 2k) mod 2 for any integer k. Here f uses k = 1 and g uses k = arg maxi≥1:2i 1 <n 2i , the +

largest power of two that’s less than n.

6.107 We can write the value of ak and bk as

ak = αk β k−1 and bk = α k β k
(where ak is only defined for k ≥ 1). Here’s a proof, by induction on k. For the base case (k = 0), we have b0 = α0 β 0 = 1
by definition. For larger k, we have
• ak = αbk−1 = ααk−1 β k−1 by the inductive hypothesis, which means ak = αk β k−1 as desired.
• bk = βak = βαk β k−1 by the above argument, which means bk = αk β k , again as desired.

6.108 For bn = (αβ)n to be Θ(1), we need that αβ = 1—otherwise bn either grows without bound or gets arbitrarily
close to zero. Having α and β satisfy αβ = 1 is necessary and sufficient for the stated condition.

(n)
6.5 An Extension: Recurrence Relations of the Form T(n) = aT b + cnk
6.109 a = 4, b = 3, k = 2, and so bk /a = 32 /4 = 9/4 > 1. Thus we’re in Case (iii) and T(n) = Θ(n2 ).

6.110 a = 3, b = 4, k = 2, and so bk /a = 42 /3 = 16/3 > 1. Thus we’re in Case (iii) and T(n) = Θ(n2 ).

6.111 a = 2, b = 3, k = 4, and so bk /a = 34 /2 = 81/2 > 1. Thus we’re in Case (iii) and T(n) = Θ(n4 ).

6.112 a = 3, b = 3, k = 1, and so bk /a = 31 /3 = 3/3 = 1. Thus we’re in Case (ii) and T(n) = Θ(n log n).

6.113 a = 16, b = 4, k = 2, and so bk /a = 42 /16 = 16/16 = 1. Thus we’re in Case (ii) and T(n) = Θ(n2 log n).

6.114 a = 2, b = 4, k = 0, and so bk /a = 40 /2 = 1/2 < 1. Thus we’re in Case (i) and T(n) = Θ(nlog4 2 ) = Θ(n0 .5).
(n)
6.5 An Extension: Recurrence Relations of the Form T(n) = aT b + cnk 119

6.115 a = 4, b = 2, k = 0, and so bk /a = 20 /4 = 1/4 < 1. Thus we’re in Case (i) and T(n) = Θ(nlog2 4 ) = Θ(n2 ).

6.116 a = 3, b = 3, k = 0, and so bk /a = 30 /3 = 1/3 < 1. Thus we’re in Case (i) and T(n) = Θ(nlog3 3 ) = Θ(n).

6.117 bk = 22 = 4 > 2 = a, so we’re in Case (iii); thus T(n) = Θ(n2 ).

6.118 bk = 21 = 2 = 2 = a, so we’re in Case (ii); thus T(n) = Θ(n log n).

6.119 bk = 42 = 16 > 2 = a, so we’re in Case (iii); thus T(n) = Θ(n2 ).

6.120 bk = 41 = 4 > 2 = a, so we’re in Case (iii); thus T(n) = Θ(n).

6.121 bk = 22 = 4 = 4 = a, so we’re in Case (ii); thus T(n) = Θ(n2 log n).

6.122 bk = 21 = 2 < 4 = a, so we’re in Case (i); thus T(n) = Θ(n2 ).

6.123 bk = 42 = 16 > 4 = a, so we’re in Case (iii); thus T(n) = Θ(n2 ).

6.124 bk = 41 = 4 = 4 = a, so we’re in Case (ii); thus T(n) = Θ(n log n).

6.125 a = 4, b = 4, k = 0, and so bk /a = 40 /4 = 1/4 < 1. Thus we’re in Case (i) and T(n) = Θ(nlog4 4 ) = Θ(n).

6.126 We are considering the recurrences T(n) = aT( nb ) + c · nk with T(1) = d and S(n) = aS( nb ) + nk with S(1) = 1.
We claim that

min(c, d, 1) · S(n) ≤ T(n) ≤ max(c, d, 1) · S(n)

for any n ≥ 1. We proceed by strong induction on n. For n = 1, observe that

min(c, d, 1) ≤ d = T(1) = d ≤ max(c, d, 1).

Because S(1) = 1, the desired inequality holds.

For n ≥ 2, we assume the inductive hypothesis min(c, d, 1) · S(n′ ) ≤ T(n′ ) ≤ max(c, d, 1) · S(n′ ) for any n′ < n.
Then:

S(n) = aS(n/b) + nk and S(n) = aS(n/b) + nk definition of S

≤ a · max(c, d, 1) · T(n/b) + n k
≥ a · min(c, d, 1) · T(n/b) + n k
inductive hypothesis

≤ a · max(c, d, 1) · T(n/b) + max(c, d, 1) · nk ≥ a · max(c, d, 1) · T(n/b) + min(c, d, 1) · nk

min(c, d, 1) ≤ 1 ≤ max(c, d, 1)
h i h i
= max(c, d, 1) · aT(n/b) + n k
= min(c, d, 1) · aT(n/b) + nk
= max(c, d, 1) · T(n). = min(c, d, 1) · T(n).

Because min(c, d, 1) and max(c, d, 1) are both positive constants, these inequalities show that T(n) = Θ(S(n)).

6.127 We are considering the recurrence T(n) = aT( bn ) + nk with the base case T(1) = 1. Let P(j) be the property that
Plog n P
T(n) = nk · i=0b ( bak )i when n = bj —that is, that T(bj ) = (bj )k · ji=0 ( bak )i . We’ll prove that P(j) holds for all j ≥ 0
by induction on j.

base case (j = 0): Observe that T(b0 ) = T(1) = 1 by definition. Indeed, for n = 1, we have logb n = 0, and so

X
logb n

nk · ( bak )i = 1 · ( bak )0 = 1.
i=0
120 Analysis of Algorithms

inductive case (j ≥ 1): We assume the inductive hypothesis P(j − 1). Then:

T(bj ) = aT(bj /b) + (bj )k definition of T

j−1 j k
= aT(b ) + (b ) properties of exponentation
" #
X
j−1
= a (bk )j−1 · ( bak )i + (bj )k inductive hypothesis
i=0
" #
X
j−1
= a
bk
(bj )k · ( bak )i + (bj )k properties of exponentation
i=0
" #
X
j−1
= (b ) · 1 +
j k
( bak )i+1 factoring
i=0
" #
X
j
= (b ) ·j k
( bak )0 + ( bak )i properties of exponentation; reindexing
i=1
" #
X
j
= (b ) ·j k
( bak )i , folding the extra term into the summation
i=0

exactly as desired.

6.128 The recurrence T(n) = aT( bn ) with T(1) = 1 is effectively equivalent to a Theorem 6.21–style recurrence with
k = −∞, which means that bk < a (by a lot!). The solution to the recurrence matches that of case (i) of Theorem 6.21.
Consider the recursion tree. The tree has precisely the same structure as in Figure 6.36; the only difference is that the
only work that we have to count is what appears at the leaves. The leaves are at depth logb n, which means that there are
alogb n = nloga b leaves in the tree. Each of these contributes a single unit of work, which means that the overall work is
Θ(nloga b ), just as in case (i) of Theorem 6.21.

6.129 The total work in the recursion tree is

X
log n hn n i log n h
X n i
2
i
log = n· log
i=0
2i 2i i=0
2i
X
log n
= n· [(log n) − i] log properties
i=0

X
log n
= n· j reindexing: j := (log n) − i
j=0

(log n)(1 + log n)

= n· Theorem 5.3
2
= Θ(n log2 n).

6.130 The proof is by induction on n.

For the base case (n = 1), by definition T(1) = 1 and indeed 21 − 1 = 2 − 1 = 1.
For the inductive case (n ≥ 2), we assume the inductive hypothesis, namely that T(n − 1) = 2n−1 − 1. Then we have
T(n) = 2T(n − 1) + 1 definition of T

= 2 · (2n−1 − 1) + 1 inductive hypothesis

= 2n − 2 + 1
= 2n − 1.
√ √
6.131 The recurrence is T(n) = 2T( n) √ + n and T(1) = T(2) = . . . = T(100) = 1. The intuition is extremely
similar to Theorem 6.21 (the “additional n work” is way larger than the recursive calls; the root dominates), but the
recurrence isn’t of the form in Theorem 6.21. So we’ll give the proof “by hand.”
(n)
6.5 An Extension: Recurrence Relations of the Form T(n) = aT b + cnk 121

√
Let’s prove that T(n) ≤ 3 n by induction on n: for n ≤ 100, it’s immediate. For n > 100, we have
√ √
T(n) = 2T( n) + n
q√ √
≤ 2·3 n+ n inductive hypothesis
√ √
≤ 2 n+ n for n > 81, we have 6n1/4 < 2n1/2
√
≤ 3 n.
√ √
It’s immediate that T(n) ≥ n for n ≥ 100 by inspection. Thus T(n) = Θ( n).

6.132 Following the hint, we see that R(k) = R(k/2) + 1 and R(1) = Θ(1). By Theorem 6.21 (a = 1, b = 2, k = 0;
case (ii)), R(k) = Θ(log k). Thus T(2k ) = Θ(log k) and T(u) = Θ(log log u).
7 Number Theory

7.2 Modular Arithmetic

7.1 d = 11 and r = 15

7.2 d = 20 and r = 37

7.3 d = −20 − 1 = −21 and r = 99 − 37 = 62

7.4 Let k ≥ 1 and n be integers, and suppose that there exist integers r, r′ , d, and d′ such that 0 ≤ r < k and 0 ≤ r′ < k
and n = dk + r = d′ k + r′ . Because dk + r = d′ k + r′ , we know that

(d − d′ )k = r′ − r. (∗ )

Because 0 ≤ r < k and 0 ≤ r′ < k, we know that |r′ − r| < k. Thus by (∗) we know that |r′ − r| = |d − d′ |k, and thus
|d − d′ | < 1. Because d and d′ are integers, in fact d = d′ . Therefore, by (∗), we have that r′ − r = 0 · k = 0; that is,
r′ = r.

7.5 Here is a version of baseConvert that uses only addition, subtraction, mod, multiplication, and comparison:

baseConvert(n, b):
Input: integers n and b ≥ 2
Output: the representation of n in base b
1 i := 0
2 b′ := 1
3 while n > 0:
4 di := 0
5 while n mod (b · b′ ) ̸= 0:
6 di := di + 1
7 n := n − b′
8 b := b′ · b
′

9 i := i + 1
10 return [di di−1 · · · d1 d0 ]b

7.6 Set b = n − 1. Then n = [11]n−1 , because [11]n−1 = 1 · (n − 1)0 + 1 · (n − 1)1 = 1 + n − 1 = n. We needed n ≥ 3

so that n − 1 was a valid base (that is, so that n − 1 ≥ 2).

7.7 Set b = n2 − 1. Then 2 + 2[ n2 − 1] = 2 + n − 2 = n, so n = [22] 2n −1 . If n > 6, then the base n2 − 1 > 3 − 1

is more than 2, which means [22] 2n −1 is a legitimate representation. (An alternative phrasing: use Exercise 7.6 on the
integer 2n , and then double each entry in the representation. Doing so is legitimate as long as the base allows doubling of
representations, which is true as long as n2 − 1 > 2.)
Pk Pk
7.8 Consider any integer n such that n = [22 · · · 2]b . Then n = i=0 2 · bi = 2 i=0 bi , which is an even number.

7.9 Every integer n can be written n = [11]n−1 (see Exercise 7.6), so every n is a repdigitn−1 . Thus the question is: for
what n is n also a repdigitb for b ∈ {2, . . . , n − 2}, say n = [p · · · p]b ?

122
7.2 Modular Arithmetic 123

If n = [p · · · p]b , then p | n (see Exercise 7.8). Furthermore, if pq = n for integers p, q =

̸ 1 with p ≤ q, then
n = p(q − 1) + p(1)—that is, n = [pp]q−1 (if [pp] is a legitimate number in base q − 1, which happens if p < q − 1).
Therefore the numbers for which R(n) = 1 are:

• primes (there is no integer p > 1 with n = pq and q ≥ p);

• squares of primes (there is no integer p > 1 with n = pq and q > p, so [pp]q−1 isn’t legitimate because p ≮ q); and
• the number 6 (no p > 1 with n = pq and q > p + 1 exists, because 6 is the only integer that’s the product of two
consecutive integers that are both prime, so [pp]q−1 isn’t legitimate because p ≮ q − 1).

These values of n have R(n) = 1—otherwise we’d have found a factor of prime n, which is impossible!—and no other
number fails, because the only way to fail to have two distinct proper factors that differ by more than 1 is to have no
proper factors (primes), no distinct proper factors (squares of primes), or to have as the only distinct proper factors two
consecutive primes (2 and 3).

7.10 Here is the pseudocode:

mod-and-div(n, k):
Input: integers n and k ≥ 1
Output: n mod k and ⌊ nk ⌋
1 r := |n|; d := 0
2 while r ≥ k:
3 r := r − k; d := d + 1
4 if n ≥ 0 then
5 return r, d
6 else if r ̸= 0 then
7 return k − r, −d − 1
8 else
9 return 0, −d

7.11 Use the following algorithm for averaging, instead of simply computing ⌊(a + b)/2⌋:

average(a, b):
Input: integers a and b ≥ a
Output: ⌊(a + b)/2⌋
1 while a < b:
2 d := 1
3 while b − a > 4d:
4 d := d · 2
5 a := a + d
6 b := b − d
7 return b

7.12 We insist that hi satisfy hi · k > n. Because k ≥ 1 and r ≥ 0, we know that the correct answer d satisfies d ≤ n. Thus
the binary search starts with a range that includes the correct answer d. (We had to choose n + 1 instead of n to handle
the case of k = 1.)

7.13 Here is the pseudocode:

124 Number Theory

mod-and-div-faster(n, k):
Input: integers n ≥ 0 and k ≥ 1
Output: n mod k and ⌊ nk ⌋
1 d := 0; lo := 0; hi := 1.
2 while hi · k < n:
3 hi := hi · 2
4 while lo < hi − 1:
5 mid := ⌊(lo + hi)/2⌋
6 if mid · k ≤ n then
7 lo := mid
8 else
9 hi := mid
10 return (n − k · lo), lo

7.14 For n = k − 1, the original version will require log n iterations—but in fact setting hi = 1 suffices. Thus the difference
is between Θ(1) iterations and Θ(log n) = Θ(log k) iterations.

7.15 All three implementations in Python are shown in Figure S.7.1, along with Python’s own implementation.

7.16 On my laptop, the tweaked doubling-mod-and-div-faster and mod-and-div-faster are roughly comparable on the
first two inputs, 232 mod 202 and 232 mod 2020, but the tweaked version is substantially faster on 232 mod 315 , by a factor
of a bit more than two. Both are thousands of times faster than mod-and-div on 232 mod 202 and 232 mod 2020. (There’s
not as much difference on 232 mod 315 .) (Incidentally, the internal Python algorithms are perhaps a 100 times faster than
doubling-mod-and-div-faster.)
There’s a clear ordering of these algorithms’ speed on these three inputs, with the two versions of mod-and-div-faster
varying as to which is better, and the size of the gaps depending on the input. From slowest to fastest, the algorithms are
mod-and-div, then mod-and-div-faster or doubling-mod-and-div-faster, and then the internal Python algorithms.

7.17 We can write a = ck + r and b = dk + t for r, t ∈ {0, . . . , k − 1} (as guaranteed by Theorem 7.1). Observe that
r = a mod k and t = b mod k. Now, observe that
mod-and-div((c + d)k + r + t, k) = mod-and-div(r + t, k)
by inspection of the algorithm: after c + d iterations of the while loop in the first call, we have reduced the input by
(c + d)k. Thus
mod-and-div(a + b, k) = mod-and-div([a mod k] + [b mod k], k).

1 def mod_and_div(n, k): 19 def mod_and_div_double(n, k):

2 r = n 20 lo = 0
3 d = 0 21 hi = 1
4 while r >= k: 22 while hi * k <= n:
5 r = r - k 23 hi = hi * 2
6 d = d + 1 24 while lo + 1 < hi:
7 return r, d 25 mid = (lo + hi) // 2
8 26 if mid * k <= n:
9 def mod_and_div_faster(n, k): 27 lo = mid
10 lo = 0 28 else:
11 hi = n 29 hi = mid
12 while lo + 1 < hi: 30 return n - lo * k, lo
13 mid = (lo + hi) // 2 31
14 if mid * k <= n: 32 def mod_and_div_internal(n, k):
15 lo = mid 33 # Python's own implementations.
16 else: 34 return n % k, n // k
17 hi = mid
18 return n - lo * k, lo

Figure S.7.1 Computing n mod k and ⌊ nk ⌋.

7.2 Modular Arithmetic 125

By Lemma 7.3, then, we have

(a + b) mod k = ([a mod k] + [b mod k]) mod k.

7.18 Write a = d(bc) + r for r ∈ {0, 1, . . . , bc − 1}, so that a mod bc = r. (Such a d and r exist by Theorem 7.1.) Thus
a mod b = [d(bc) + r] mod b a = d(bc) + r
= [dbc mod b] + [r mod b] Exercise 7.17
= 0 + [r mod b] (dc)b is a multiple of b
= r mod b 0+x=x
= (a mod bc) mod b. a mod bc = r by previous discussion

7.19 We must show that 0 mod a = 0 for any a, or, equivalently, that there exists an integer d such that da + 0 = 0.
Simply choose d = 0.

7.20 We must show that a mod 1 = 0 for any a, or, equivalently, that there exists an integer d such that d · 1 + 0 = a.
Simply choose d = a.

7.21 Suppose a | b and a | c. That is, b mod a = 0 and c mod a = 0. But then
(b + c) mod a = ([b mod a] + [c mod a]) mod a Theorem 7.4.2
= (0 + 0) mod a by assumption
= 0.
Thus (b + c) mod a = 0 and, equivalently, a | (b + c).

7.22 Suppose a | c—that is, c mod a = 0. But then

bc mod a = ([b mod a] · [c mod a]) mod a Theorem 7.4.3
= ([b mod a] · 0) mod a by assumption
= 0.
Thus bc mod a = 0 and, equivalently, a | bc.

7.23 Here’s a trace of mod-exp(3, 80, 5):

2
mod-exp(3, 80, 5) = [mod-exp(3, 40, 5)] mod 5
2
mod-exp(3, 40, 5) = [mod-exp(3, 20, 5)] mod 5
2
mod-exp(3, 20, 5) = [mod-exp(3, 10, 5)] mod 5
2
mod-exp(3, 10, 5) = [mod-exp(3, 5, 5)] mod 5
mod-exp(3, 5, 5) = [3 · mod-exp(3, 4, 5)] mod 5
2
mod-exp(3, 4, 5) = [mod-exp(3, 2, 5)] mod 5
2
mod-exp(3, 2, 5) = [mod-exp(3, 1, 5)] mod 5
mod-exp(3, 1, 5) = [3 · mod-exp(3, 0, 5)] mod 5
mod-exp(3, 0, 5) = 1
mod-exp(3, 1, 5) = [3 · 1] mod 5 = 3
2
mod-exp(3, 2, 5) = 3 mod 5 = 4
2
mod-exp(3, 4, 5) = 4 mod 5 = 1
mod-exp(3, 5, 5) = [3 · 1] mod 5 = 3
2
mod-exp(3, 10, 5) = 3 mod 5 = 4
2
mod-exp(3, 20, 5) = 4 mod 5 = 1
126 Number Theory

2
mod-exp(3, 40, 5) = 1 mod 5 = 1
2
mod-exp(3, 80, 5) = 1 mod 5 = 1.

Thus 380 mod 5 = 1.

7.24 The recurrence is T(0) = 0 and

(
T(e − 1) + 1 if e is odd
T(e) =
T( 2e ) + 1 if e is even.

Or, noting that e − 1 is even whenever e is odd:

(
T( e−12
)+2 if e is odd
T(e) =
T( 2e ) + 1 if e is even.

Define the recurrences L(e) and U(e)—the names stand for a lower and upper bound on T(e)—as L(e) = L( 2e ) + 1 and
U(e) = U( 2e ) + 2 where L(1) = U(1) = 1.
Thus L(e) ≤ T(e) ≤ U(e). (We could prove this fact rigorously by induction; we’ll omit the proof here, but it’s
a good exercise to go through this inductive argument.) The solutions to these lower- and upper-bound recurrences are
L(e) = log e and U(e) = 2 log e, and thus log e ≤ T(e) ≤ 2 log e.

7.25 Here’s the code in Python:

1 def modexp(base, exponent, mod):
2 '''Computes b^e mod n via repeated squaring.'''
3 if exponent == 0:
4 return 1
5 half = modexp(base, exponent // 2, mod) % mod
6 if exponent % 2 == 1:
7 return ((base % mod) * half * half) % mod
8 else:
9 return ((half % mod) * half) % mod
10
11 def modexp_slow(base, exponent, mod):
12 '''Computes b^e mod n via built-in mods.'''
13 result = base ** exponent
14 return result % mod

The running time starts to become distinguishable as k gets bigger; on my laptop, it’s always less than 0.00002 seconds
for the faster algorithm, but the slower algorithm takes about 0.3 seconds for k = 800,000 and about 1.25 seconds for
k = 8,000,000.

7.26 First, as suggested, observe that

i i i
10 mod 3 = (10 mod 3) mod 3 = 1 mod 3 = 1 mod 3 = 1.

Now note that

" n−1 # " n−1 #
X i
X i
10 xi mod 3 = 10 xi mod 3 mod 3 Theorem 7.4.2
i=0 i=0
" n−1 #
X i
= (10 mod 3)(xi mod 3) mod 3 mod 3 Theorem 7.4.3
i=0
" n−1 #
X
= 1(xi mod 3) mod 3 mod 3 previous claim
i=0
" n−1 #
X
= xi mod 3 mod 3 a mod k mod k = a mod k
i=0
7.2 Modular Arithmetic 127

" n−1 #
X
= xi mod 3. Theorem 7.4.2
i=0

7.27 We must show that

" n−1 # " n−1 #
X i
X
10 xi mod 9 = xi mod 9.
i=0 i=0

As in the previous exercise,

i i i
10 mod 9 = (10 mod 9) mod 9 = 1 mod 9 = 1 mod 9 = 1.

Precisely the same argument as in the previous exercise now goes through:
" n−1 # " n−1 #
X i X i
10 xi mod 9 = 10 xi mod 9 mod 9 Theorem 7.4.2
i=0 i=0
" n−1 #
X
= (10i mod 9)(xi mod 9) mod 9 mod 9 Theorem 7.4.3
i=0
" n−1 #
X
= 1(xi mod 9) mod 9 mod 9 previous claim
i=0
" n−1 #
X
= xi mod 9 mod 9 a mod k mod k = a mod k
i=0
" n−1 #
X
= xi mod 9. Theorem 7.4.2
i=0

7.28 Here’s the execution of the Euclidean algorithm on n = 111, m = 202:

Euclid(111, 202) = Euclid( 202 mod 111 , 111) = Euclid( 111 mod 91 , 91)
= 91 = 20
= Euclid( 91 mod 20 , 20) = Euclid( 20 mod 11 , 11)
= 11 =9
= Euclid( 11 mod 9 , 9) = Euclid( 9 mod 2 , 2) = 1.
=2 =1

7.29 Here’s the execution of the Euclidean algorithm on n = 333, m = 2017:

Euclid(333, 2017) = Euclid( 2017 mod 333 , 333) = Euclid( 333 mod 19 , 19)
= 19 = 10
= Euclid( 19 mod 10 , 10) = Euclid( 10 mod 9 , 9) = 1.
=9 =1

7.30 Here’s the execution of the Euclidean algorithm on n = 156, m = 360:

Euclid(156, 360) = Euclid( 360 mod 156 , 156) = Euclid( 156 mod 48 , 48) = 12.
= 48 = 12

7.31 Here’s an implementation in Python:

1 def euclid(n, m):
2 '''Computes gcd(n,m) via Euclid's algorithm.'''
3 if m % n == 0:
4 return n
5 else:
6 return euclid(m % n, n)
128 Number Theory

7.32 Here’s an implementation in Python:

7 def brute_force_GCD(n, m):
8 '''Computes gcd(n,m) by brute force.'''
9 best = 1
10 for d in range(1, min(n, m) + 1):
11 if n % d == 0 and m % d == 0:
12 best = d
13 return best

On an aging laptop of mine, it’s pretty consistently around n = 223 that GCD(n, n − 1) begins to take more than a
second using the brute force algorithm. The Euclidean algorithm starts to take more than a second on an input of size
roughly 230,000,000 (!).

7.33 Let n and m ≥ n be arbitrary positive integers. We’ll show that m mod n ≤ m2 the following two (exhaustive) cases:
Case 1: n ≤ m2 . It is always the case that x mod n ≤ n for any x, so therefore m mod n ≤ n ≤ m2 .
Case 2: m2 < n ≤ m. Observe that mn ≥ 1 (because n ≤ m) and mn < 2 (because m2 < n). Therefore m mod n = m − n,
and we have m mod n < m − m2 = m2 .

7.34 Let T(k) denote the depth of recursion that the Euclidean algorithm requires when given two inputs whose product
is k. Observe that

T(1) = 1
T(nm) = 1 + T(n · (m mod n)).

What we established in the previous exercise implies that n · (m mod n) ≤ nm

2
. Thus we can bound this recurrence as
T(1) = 1 and T(nm) ≤ 1 + T( nm 2
). Thus T(nm) = O(log(nm)) = O(log n + log m) using Theorem 6.21.

7.35 We must show that fn mod fn−1 = fn−2 for n ≥ 3:

fn mod fn−1 = [fn−1 + fn−2 ] mod fn−1 definition of the Fibonaccis

= [fn−1 mod fn−1 + fn−2 mod fn−1 ] mod fn−1 Theorem 7.4.2
= [0 + fn−2 mod fn−1 ] mod fn−1 Theorem 7.4.1
= fn−2 fn−2 < fn−1 , and a mod b = a for a < b.

7.36 Observe that, for n ≥ 3, we have that Euclid(fn−1 , fn ) results in a recursive call Euclid(fn mod fn−1 , fn−1 )—which,
by the previous exercise, is Euclid(fn−2 , fn−1 ). We claim by induction on n that Euclid(fn−1 , fn ) results in a recursion tree
of depth n − 2.

• for n = 3, indeed Euclid(f2 , f3 ) results in a call of depth 1 (because f2 = 1, no recursive call is made).
• for n ≥ 4, Euclid(fn−1 , fn ) results in a call of depth 1 more than the depth of the recursion tree for Euclid(fn−2 , fn−1 ),
which (by the inductive hypothesis) is n − 1 − 2. Thus Euclid(fn−1 , fn ) has a depth of 1 + n − 1 − 2 = n − 2.

7.37 For any integer k, there exists a n, m ≤ 2k for which Euclid(n, m) takes k − 2 steps: specifically, n = fk−1 and
m = fk . Thus the running time on input ⟨n, m⟩ can be as bad as k − 2 ≥ (log n + log m)/2 − 2 = Ω(log n + log m).
Because k was generic, the claim follows.

7.3 Primality and Relative Primality

7.38 Here is pseudocode for the Sieve of Eratosthenes:
7.3 Primality and Relative Primality 129

sieve(n):
1 P := ⟨⟩
2 N := ⟨2, 3, . . . , n⟩
3 while N is not empty:
4 p := first number in N
5 remove p from N; add p to the end of P.
6 for i in N:
7 if i mod p = 0 then
8 remove i from N
9 return P

7.39 Here is a trace of sieve(100) (with the number of completed iterations of the while loop indicated):
iterations P N
0 ⟨⟩ ⟨2, 3, 4, 5, 6, 7, 8, 9, . . . , 99, 100⟩
1 ⟨2⟩ ⟨3, 5, 7, 9, 11, 13, 15, 17, . . . , 97, 99⟩
2 ⟨2, 3⟩ ⟨5, 7, 11, 13, 17, 19, 23, 25, 29, 31, 35, 37, 41, 43, 47, 49, 53, 55, 59, 61, 65, 67,
71, 73, 77, 79, 83, 85, 89, 91, 95, 97⟩
3 ⟨2, 3, 5⟩ ⟨7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 49, 53, 59, 61, 67, 71, 73, 77, 79, 83,
89, 91, 97⟩
4 ⟨2, 3, 5, 7⟩ ⟨11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97⟩
5 ⟨2, 3, 5, 7, 11⟩ ⟨13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97⟩
6 ⟨2, 3, 5, 7, 11, 13⟩ ⟨17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97⟩
..
.
25 ⟨2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97⟩

7.40 See Figure S.7.2 (particularly the sieve_of_eratosthenes function).

7.41 See Figure S.7.2 (and particularly the brute_force_prime_finder function). On my laptop for n in the range
100,000 ≤ n ≤ 500,000, as I implemented it in the previous exercise the Sieve is slower than brute force by a factor
of about 20 to about 50, generally increasing with n. (The “list comprehension” operation in Python, which I used in my
Sieve implementation, is pretty slow.)

1 def sieve_of_eratosthenes(n):
2 candidates = list(range(2, n+1))
3 primes = []
4 while len(candidates) >= 1:
5 p = candidates.pop(0)
6 primes.append(p)
7 candidates = [n for n in candidates if n % p != 0]
8 return primes
9
10 def brute_force_prime_finder(n):
11 candidates = range(2, n+1)
12 primes = []
13 for i in candidates:
14 prime = True
15 for d in range(2, int(i**0.5) + 1):
16 if i % d == 0:
17 prime = False
18 break
19 if prime:
20 primes.append(i)
21 return primes

Figure S.7.2 Two ways of computing the primes up to n.

130 Number Theory

7.42 Imagine that we do an iteration of the Sieve for each integer p = 2, 3, . . . , n, including when p is a composite.
(When p is not prime, the iteration vacuously does nothing.) The number of crossings-off for the pth iteration is precisely
a p-fraction of all candidates larger than p, because a p-fraction of integers are divisible by p. (That is, one out of every p
numbers is crossed off: p, 2p, 3p, . . ..) There are at most n − p candidates larger than p. Thus
X
n
the total number of crossings-off = the number of crossings-off in iteration p
p=2

Xn
n−p
≤ a p-fraction of n − p candidates is precisely n−p
p
p=2
p
Xn
n
≤
p=1
p
= n · Hn .

7.43 We’ll estimate as follows:

the number of primes between 2

127
+ 1 and 2128 = primes(2128 ) − primes(2127 ) definition of primes
128 127
2 2
≈ − prime number theorem
ln(2128 ) ln(2127 )
2128 2127
≈ −
88.7228 88.0296
= 1.9025 × 1036 .

7.44 According to the estimate from the Prime Number Theorem, the number of primes less than or equal to 3.1734 × 1040
is approximately

3.1734 × 1040 3.1734 × 1040

≈ ≈ 2127.9999989 .
ln(3.1734 × 1040 ) 93.2582
(I found this number with binary search, using the Prime Number Theorem’s estimate to determine whether my current
guess was too high or too low.)

7.45 Note that primes(n)− primes(n − 1) represents the number of primes ≤ n less the number of primes ≤ n − 1—which
is 1 if n is prime, and 0 if not. Thus

Pr [a randomly chosen number close to n is prime] ≈ primes(n) − primes(n − 1)

n n−1
≈ −
ln n ln(n − 1)
n n−1
≈ − (for large n)
ln n ln n
1
= .
ln n

7.46 The number of 6-digit primes is

999,999 99,999
primes(999,999) − primes(99,999) ≈ −
ln 999,999 ln 9999
≈ 72,382 − 8686
= 63,696.
√
By running the sieve to find all primes ≤ 1000 = 1,000,000 and then testing each 6-digit integer for divisibility by
each of these primes, I get that there are 68,906 6-digit primes—off by about 8%. (Rerunning the same calculation for
8-digit primes yields an estimate of 4,808,260 8-digit primes from the Prime Number Theorem, and an actual number of
5,096,876—off by about 6%.)
7.3 Primality and Relative Primality 131

7.47 Because p is prime, by definition the only positive integers that divide p and 1 and p itself. By assumption, p ̸ | a.
Thus the only candidate common divisor of p and a is 1, so GCD(p, a) = 1.

7.48 Let Q(k) denote the property that p | ak if and only if p | a. We show that Q(k) holds for all k ≥ 1 by induction on k.
For the base case k = 1, vacuously p | a ⇔ p | a1 because a = a1 . For the inductive case k ≥ 2, we assume the inductive
hypothesis Q(k − 1) and we must prove Q(k). We have

7.49 Because n ≤ p − 1 and m ≤ p − 1, we can write n = q1 q2 · · · qk and m = qk+1 qk+2 · · · qℓ where each qi is a prime
number satisfying qi < p. Because the prime factorization of nm is unique and p ∈
/ {q1 , q2 , . . . , qℓ }, nm is not divisible
by p.

7.50 First, suppose m ≡pq a. That is, m = kpq + a for some integer k. But then m mod p = (kpq + a) mod p =
0 + a mod p, and similarly m mod q = (kpq + a) mod q = 0 + a mod q.
For the converse, suppose m ≡p a and m ≡q a. That is, m − a ≡p 0 and m − a ≡q 0. Consider the prime factorization
of m − a: p must appear in the prime factorization (any number divisible by a prime p is a multiple of p), and similarly q
must appear. Thus we can write m − a = pqℓ for an integer ℓ. But then m − a mod pq = pqℓ mod pq = 0. And because
m − a mod pq = 0 we have m ≡p qa.

7.51 Note that a2 − 1 is a polynomial of degree 2. By the fact described on p. 357 (the fundamental theorem of algebra
for prime numbers: two distinct polynomials of degree k agree mod q on at most k inputs) either a2 − 1 mod p = 0
for all a ∈ {0, 1, . . . , q − 1} or there are only at most two solutions to a2 − 1 mod p = 0. Because 02 − 1 mod p =
−1 mod p = p − 1 ̸= 0, the former does not hold, so there are at most two solutions to a2 − 1 mod p = 0. Indeed
12 − 1 ≡p 1 − 1 ≡p 0 and (p − 1)2 − 1 ≡p p2 − 2p ≡p 0, so a ∈ {1, p − 1} are the only solutions.

7.52 No, they’re both divisible by 3.

7.53 209 mod 2 = 1, 209 mod 3 = 2, 209 mod 5 = 4, 209 mod 7 = 6, 209 mod 11 = 0—in fact, 209 = 11 · 19. So
the only candidate common divisors are 11, 19, and 209. 11 ̸ | 323, but 19 | 323 (in fact 323 = 17 · 19). So 209 and 323
are not relatively prime.

7.54 101 is prime, so unless 101 | 1100 the numbers are relatively prime. And 1100 mod 101 = 90 ̸= 0.

7.55 Here is the trace of eeuclid(60, 93):

eeuclid(60, 93)
eeuclid(33, 60)
eeuclid(27, 33)
eeuclid(6, 27)
eeuclid(3, 6) = 1, 0, 3
eeuclid(6, 27) = −4, 1, 3
eeuclid(27, 33) = 5, −4, 3
eeuclid(33, 60) = −9, 5, 3
eeuclid(60, 93) = 14, −9, 3

Indeed 14 · 60 − 9 · 93 = 3 = GCD(60, 93).

132 Number Theory

7.56 Here is the trace of eeuclid(24, 28):

eeuclid(24, 28)
eeuclid(4, 24) = 1, 0, 4
eeuclid(24, 28) = −1, 1, 4

Indeed −1 · 24 + 1 · 28 = 4 = GCD(24, 28).

7.57 Here is the trace of eeuclid(13, 74):

eeuclid(13, 74)
eeuclid(9, 13)
eeuclid(4, 9)
eeuclid(1, 4) = 1, 0, 1
eeuclid(4, 9) = −2, 1, 1
eeuclid(9, 13) = 3, −2, 1
eeuclid(13, 74) = −17, 3, 1

Indeed −17 · 13 + 3 · 74 = 1 = GCD(13, 74).

7.58 Suppose that xn + ym = GCD(n, m). Let x′ = x + km and y′ = y − kn, for any nonnegative integer k. Then
′ ′
x n + y m = (x + km)n + (y − kn)m
= xn + kmn + ym − knm
= xn + ym
= GCD(n, m).
From a single pair ⟨x, y⟩, then, we can construct infinitely many pairs ⟨x′ , y′ ⟩ with the desired property (one for each k).

7.59 We’ll prove the result by induction on k. If k = 2, then we’re done immediately, by Lemma 7.17.
P
For k ≥ 3, we must show that if GCD(a1 , . . . , ak ) = d, then there exist integers x1 , . . . , xk such that ki=1 ai xi = d.
Let d = GCD(a2 , . . . , ak ). Let d = GCD(a1 , . . . , ak ) = GCD(a1 , GCD(a2 , . . . , ak )) = GCD(a1 , d′ ) by definition.
′

By Lemma 7.17, there exist integers n, m such that na1 + md′ = GCD(a1 , d′ ) = d. By the inductive hypothesis, there
P
exist integers y2 , . . . , yk such that ki=2 ai yi = d′ . So define
x1 = n
x2 = my2
x3 = my3
..
.
xk = myk .
Then we have
X
k X k
ai xi = na1 + m i = 2 ai yi
i=1

= na1 + md′
= d.

7.60 We claim that, for arbitrary positive integers n and m with n ≤ m, extended-Euclid(n, m) = ⟨x, y, r⟩ such that
r = GCD(n, m) = xn + ym. Here’s a proof by strong induction on n.
For the base case (n = 1), indeed GCD(1, m) = 1, and 1 · 1 + 0 · m = 1.
For the inductive case (n ≥ 2), just as in the base case we’re done if m mod n = 0: indeed GCD(n, kn) = n, and
1 · n + 0 · kn = n. Otherwise, extended-Euclid(n, m) = ⟨y − ⌊m/n⌋ x, x, r⟩ where extended-Euclid(m mod n, n) =
7.3 Primality and Relative Primality 133

⟨x, y, r⟩. By the inductive hypothesis, r = GCD(m mod n, n), and GCD(m mod n, n) = GCD(n, m) by Corollary 7.11.
Again by the inductive hypothesis, r = x(m mod n) + yn. Note that
h jmk i hj m k i
(y − x) · n + x · m = yn − x n−m
n n
= yn − x [(m − (m mod n)) − m] ⌊b/a⌋ is [b − (b mod a)]/a, so a ⌊b/a⌋ is b − (b mod a)
= yn + x(m mod n)
= r. inductive hypothesis

7.61 Here is a solution in Python:

1 def extended_euclidean_algorithm(n, m):
2 '''Computes x, y, r such that gcd(n, m) = r = xn + ym, via the
3 extended Euclidean algorithm.'''
4 if m % n == 0:
5 return 1, 0, n
6 else:
7 x, y, r = extended_euclidean_algorithm(m % n, n)
8 return y - (m // n) * x, x, r

7.62 Nikki gives me eight 5NZD bills; I give her four 5USD bills. She has thus given me the equivalent of 24USD (= 3 · 8)
and I have given her 20USD (= 5 · 4). Thus she’s paid me 4USD.

7.63 Nikki can pay me any integral value x. She gives me 2x 5NZD bills; I give her x 5USD bills. She has thus given me
the equivalent of (6x)USD and I have given her (5x)USD. Thus she’s paid me (6x − 5x)USD = xUSD.

7.64 Nikki can pay me any integral value x. Run the extended Euclidean algorithm extended-Euclid(b, 5) to find n and
m such that bn + 5m = GCD(b, 5) = 1. (Because 5 is prime and 5 ̸ | b by assumption, we have that GCD(b, 5) = 1.)
Using Exercise 7.58, ensure that n ≥ 0 and m ≤ 0 by repeatedly adding/subtracting 5b to n and m.
Now, Nikki gives me nx 5NZD bills; I give her −mx 5USD bills. She has thus given me the equivalent of (nbx)USD
and I have given her (−5mx)USD. Thus she’s paid me (nbx + 5mx)USD = x(nb + 5m)USD = xUSD.

7.65 Nikki can pay me any x that is a multiple of GCD(k, ℓ). To pay this amount, observe that extended-Euclid(k, ℓ)
returns integers n and m such that nk + mℓ = GCD(k, ℓ). To pay me a · GCD(k, ℓ) USD, she gives me n Thai Bhat notes,
and I give her ℓ Israeli Shekel notes.
To see that Nikki cannot pay me a value x where GCD(k, ℓ) ̸ | x, observe by induction that that amount of money that
we have exchanged after trading i bills, for any i, has a value that’s divisible by GCD(k, ℓ).

7.66 If d | n, then n = kd for some integer k. If n + 1 = ℓd for an integer ℓ, then ℓ ≥ k + 1, and thus

n + 1 ≥ (k + 1)d = kd + d = n + d.

Therefore d ≤ 1.

7.67 By Exercise 7.35, we have fn mod fn−1 = fn−2 for all n ≥ 3. Therefore euclid(fn−1 , fn ) = euclid(fn−2 , fn−1 ). By
induction on n, we have that euclid(fn−1 , fn ) = euclid(f1 , f2 ) = euclid(1, 1) = 1. By the correctness of the Euclidean
algorithm, then, GCD(fn , fn−1 ) = 1.

7.68 If a and b are relatively prime, then there is no d > 1 that divides both—so there is certainly no prime d > 1 that
divides both.
Conversely, suppose that a and b are not relatively prime, so that d ≥ 2 divides a and b. Consider the prime factorization
of d = p1 p2 · · · pk for some k ≥ 1. Then p1 ≥ 2 is prime, and p1 | d and d | a and d | b. By the transitivity of divides,
p1 | a and p1 | b.

7.70 If ab | n, then a | n and b | n by (7.7.8). (See the proof of Theorem 7.7.)

Conversely, suppose that a | n and b | n. Then there exist integers k and ℓ such that ak = n and bℓ = n. Observe that
euclid(a, b) gives us integers x and y such that 1 = xa + yb. Thus n = xan + ybn = xabℓ + ybak = ab(xℓ + yk).
Therefore ab | n.

7.71 Because a and b are relatively prime, eeuclid(a, b) = ⟨c, d, 1⟩, where ca + bd = 1 = GCD(a, b). Simply select
x = cm and y = dm. Then xa + yb = mca + mbd = m(ca + bd) = m.

7.72 97 + 247k : k ∈ Z≥0

7.73 24 + 231k : k ∈ Z≥0

7.74 3 + 42k : k ∈ Z≥0

7.75 149 + 210k : k ∈ Z≥0

7.76 59 + 210k : k ∈ Z≥0

7.77 First, suppose that x mod n = a and x mod m = b. We need to show that x mod nm = y∗ .

[x mod nm] mod n = x mod n by Exercise 7.18

[x mod nm] mod n = a. by assumption

And, strictly analogously,

[x mod nm] mod m = b.

By definition y∗ is the unique value in Znm such that y∗ mod n = a and y∗ mod m = b, so from the facts that [x mod
nm] mod na and [x mod nm] mod m = b we can conclude x mod nm = y∗ .
Second, suppose that x mod nm = y∗ . We need to show that x mod nm = y∗ . Taking both sides of the assumption
modulo n, we have
∗
y mod n = [x mod nm] mod n by assumption
= x mod n. by Exercise 7.18

By definition of y∗ , we have y∗ mod n = a, and so these facts imply that a = x mod n and, strictly analogously, that
b = x mod m.

7.78 Let n = 2, m = 4, a = 1, and b = 2. There is no x ∈ Z8 such that x mod 2 = 1 and x mod 4 = 2.

7.79 Let n = 2, m = 4, a = 1, and b = 1. There are two values of x ∈ Z8 such that x mod 2 = 1 and x mod 4 = 1,
namely x = 1 and x = 5.

7.4 Multiplicative Inverses

7.80 Note that
2·0 2·1 2·2 2·3 2·4 2·5 2·6 2·7 2·8
=0 =2 =4 =6 =8 =1 =3 =5 = 7.
And so, reordering the columns, we have that “half” of each number is

half of 0 half of 1 half of 2 half of 3 half of 4 half of 5 half of 6 half of 7 half of 8

=0 =5 =1 =6 =2 =7 =3 =8 = 4.
7.4 Multiplicative Inverses 135

(
0 if a = 0
7.81 The additive inverse of a in Zn is
n − a if a ̸= 0.

7.82 Writing addinv(·) to denote additive inverse:

(
addinv(0) if a = 0
addinv(addinv(a)) =
addinv(n − a) if a ̸= 0
(
0 if a = 0
=
n − (n − a) if a ̸= 0
(
0 if a = 0
=
a if a ̸= 0
= a.

7.83 Note that −x ≡n (n − x). Thus

a · (−b) ≡n a · (n − b)
≡n an − ab
≡n bn − ab an ≡n bn ≡n 0
≡n b(n − a)
≡n b(−a).

7.84 Note that −x ≡n (n − x). Thus

(−a) · (−b) ≡n (n − a) · (n − b)
≡n n2 − (a + b)n + ab
≡n ab. n2 ≡n xn ≡n 0

7.85 The claim is false. If n is divisible by 4 then 02 ≡n (n/2)2 : for example, 82 mod 16 = 64 mod 16 = 0 and
02 mod 16 = 0 mod 16 = 0.

7.86 The claim is false. If n is even but not divisible by 4 then there is exactly one b such that b2 = (n/2)2 : for example,
in Z6 , there is only one b such that b2 = 3 (because 02 ≡6 0, 12 ≡6 1, 22 ≡6 4, 32 ≡6 3, 42 ≡6 4, and 52 ≡6 1).

7.87 4 · 3 = 12 ≡11 1, so 4−1 = 3.

7.88 7 · 8 = 56 ≡11 1, so 7−1 = 8.

7.89 None exists: 0 · a = 0 ≡11 0 for any a.

7.90 None exists: 5 · a mod 15 ∈ {0, 5, 10} for any a, so 5a ̸≡15 1.

7.91 7 · 13 = 91 ≡15 1, so 7−1 = 13.

7.92 None exists: 9 · a mod 15 ∈ {0, 3, 6, 9, 12} for any a (because GCD(9, 15) = 3), so 9a ̸≡15 1.
136 Number Theory

7.93 Suppose that ax mod n = 1 and ay mod n = 1. We’ll argue that x ≡n y. Then:

ax − ay ≡n 0 by assumption
⇔ (x − y)a ≡n 0
⇒ (x − y)ax ≡n 0·x
⇒ (x − y) ≡n 0 ax ≡n 1 by definition; 0x = 0
⇒ x ≡n y.

7.94 0 1 2 3 4
0 0 0 0 0 0
1 0 1 2 3 4
2 0 2 4 1 3
3 0 3 1 4 2
4 0 4 3 2 1

7.95 0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 1 2 3 4 5
2 0 2 4 0 2 4
3 0 3 0 3 0 3
4 0 4 2 0 4 2
5 0 5 4 3 2 1

7.96 0 1 2 3 4 5 6 7
0 0 0 0 0 0 0 0 0
1 0 1 2 3 4 5 6 7
2 0 2 4 6 0 2 4 6
3 0 3 6 1 4 7 2 5
4 0 4 0 4 0 4 0 4
5 0 5 2 7 4 1 6 3
6 0 6 4 2 0 6 4 2
7 0 7 6 5 4 3 2 1

7.97 The claim is true. Observe that (n − 1)(n − 1) = n2 − 2n + 1, and n2 − 2n + 1 mod n = 0 − 0 + 1 = 1. Thus
(n − 1)−1 = n − 1 in Zn by definition.

7.98 Let b = a−1 . Then b−1 is the number c such that bc ≡n 1. But by definition a · a−1 = a−1 · a = b · a, so b · a ≡n 1.
Thus b−1 = a, by definition, as desired.

7.99 We’ll proceed by mutual implication.

First, suppose that y ∈ Zn satisfies ay mod n = 1. Then x = y suffices: ax mod n = 1 and x ∈ Z ⊇ Zn .
Conversely, suppose that x ∈ Z satisfies ax mod n = 1. Then choose y = x mod n. By definition we have y ∈ Zn ,
and

ay mod n = [a(x mod n)] mod n

= [a(x − n · nx )] mod n

= [ax − a nx n] mod n
= ax mod n
= 1.

7.100 Let x = a−1 . Then, because ax ≡n 1, we have

(ax)k ≡n 1k ≡n 1.

Because (ax)k = (ak ) · (xk ), the multiplicative inverse of ak is therefore xk .

7.4 Multiplicative Inverses 137

(To see that this value is unique, we can cite Exercise 7.93, or consider the following argument: suppose that y is the
multiplicative inverse of ak : that is, assume ak y ≡n 1. Then

a y ≡n 1
k

⇒ a y · (a−1 )k ≡n 1 · (a−1 )k
k

⇒ y ≡n (a−1 )k .)

7.101 17−1 in Z23 is 19; indeed, 17 · 19 = 323, and 323 = 14 · 23 + 1.

7.102 7−1 in Z25 is 18; indeed, 7 · 18 = 126, and 126 = 5 · 25 + 1.

7.103 9−1 in Z33 does not exist; 9 and 33 are not relatively prime (they’re both divisible by 3).

7.104 Here’s an implementation in Python:

1 def inverse(a, n):
2 '''Finds the inverse of a in Z_n if it exists. Returns False if a
3 has no inverse in Z_n (if a and n aren't relatively prime).'''
4 x, y, gcd = extended_euclidean_algorithm(a,n)
5 if gcd != 1:
6 return False
7 else:
8 return x % n

7.105 The converse is true. Suppose that n is composite, and suppose that d | n for an integer d > 2 with d satisfying
2 ≤ d < n—that is, d ∈ Zn and d ̸= 0. Then GCD(d, n) ≥ 2, so d and n are not relatively prime. By Theorem 7.24,
d−1 does not exist in Zn .

7.106 For p = 2, we have 2p+1 mod p = 23 mod 2 = 8 mod 2 = 0. Otherwise, by Fermat’s Little Theorem, when p ̸= 2
(and thus 2 ̸≡p 0), we have

mod p = 4 · 2
p+1 p−1
2 mod p = 4 mod p.
For any p ≥ 5, we have 4 mod p = 4. For p = 3, we have 4 mod 3 = 1. Thus


0 if p = 2
p+1
2 mod p = 1 if p = 3

4 otherwise.

7.107 249 is not prime. If n is prime, then Fermat’s Little Theorem establishes that an−1 mod n = 1 for any a ̸= 0 in Zn .
For a = 247, then, if n is prime, we’d have 247n−1 mod n = 1. But 247n−1 mod n ̸= 1 for n = 249, so 249 is not prime.
(Though you can’t conclude anything about the primality of 247 from the fact stated in the exercise, it turns out that
247 is not prime: 247 = 13 · 19.)
P Q
7.108 Let x = ki=1 ai Ni di , where N = ki=1 ni and Ni = N/ni and di is the multiplicative inverse of Ni in Zni . We must
show that x mod nj = aj for all 1 ≤ j ≤ k:
" k #
X
x mod nj = ai Ni di mod nj definition of x
i=1
   
X
k Y
= ai  nℓ  di  mod nj definition of Ni
i=1 ℓ̸=i
 
Y
= aj  nℓ  dj mod nj all other terms are multiples of nj
ℓ̸=j
138 Number Theory

= aj · (Nj dj mod nj ) mod nj definition of Nj , and (7.4.3)

= aj · 1 mod nj dj was chosen as the multiplicative inverse of Nj mod nj
= aj .

7.109 Observe that for any prime number n, we must have that φ(n) = n − 1 (there is no positive integer < n that shares
a divisor other than 1 with a prime number). Thus the Fermat–Euler theorem says that 1 = aφ(n) mod n = an−1 mod n.
That’s just Fermat’s Little Theorem.

7.110 The argument is direct: observe that a · aφ(n)−a = aφ(n) , and aφ(n) ≡n 1 by the Fermat–Euler Theorem.
The integers less than 60 that are relatively prime to it are {1, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 49, 53, 59},
so φ(60) = 16. Note (either by calculator or using mod-exp from Figure 7.6) that 715 mod 60 = 43 and
1716 mod 60 = 53, and 3116 mod 60 = 31. So the claim is that 7−1 = 43 and 17−1 = 53 and 31−1 = 31. Indeed
7 · 43 = 301 ≡60 1 and 17 · 53 = 901 ≡60 1 and 31 · 31 = 961 ≡60 1.

7.111 Here’s an implementation in Python. (See Exercise 7.25 for modexp and Exercise 7.31 for euclid.)
1 def totient(n):
2 count = 0
3 for k in range(1, n):
4 count += (euclid(n, k) == 1)
5 return count
6
7 def inverse(a,n):
8 return modexp(a, totient(n) - 1, n)

7.112 Again in Python:

1 def verify_carmichael(n):
2 divisors = [d for d in range(2, n) if n % d == 0]
3 success = True
4 for a in range(1, n):
5 if euclid(a,n) == 1 and modexp(a, n-1, n) != 1:
6 success = False
7 if len(divisors) > 0 and success:
8 print("Carmichael!", n, "is not prime -- divisors are", divisors,
9 "-- but it passes the test.")
10
11 # verify_carmichael(561)
12 # --> Carmichael! 561 is not prime -- divisors are [3, 11, 17, 33, 51, 187] -- but it passes the test.

7.113 If n is composite, then for some p ∈ {2, . . . , n − 1} we have p | n. Observe that GCD(pn−1 , n) ̸= 1 (p is a common
divisor), and thus pn−1 mod n is also divisible by p—and thus is not equal to 1.

7.114 A Python solution (which depends on the Sieve of Eratosthenes; see Figure S.7.2):
1 def korselt(n):
2 carmichaels = []
3 primes = sieve_of_eratosthenes(n)
4 for candidate in range(1, n):
5 divisors = [d for d in range(2, candidate) if candidate % d == 0]
6 squarefree = not any([candidate % (d * d) == 0 for d in divisors])
7 korselt_test = all([(candidate - 1) % (p - 1) == 0 for p in divisors if p in primes])
8 if len(divisors) > 1 and squarefree and korselt_test:
9 carmichaels.append(candidate)
10 return carmichaels
11
12 # korselt(10000) --> [561, 1105, 1729, 2465, 2821, 6601, 8911]

7.115 Suppose that n is an even Carmichael number. Note that n > 3 by Korselt’s theorem (so that n is composite). Then,
by definition 2 | n—and 4 ̸ | n because, by Korselt’s theorem, n must be squarefree. Let p ≥ 3 be the next-smallest prime
7.5 Cryptography 139

factor of n. (There must be one, as n > 3 and 4 ̸ | n.) By Korselt’s theorem, p − 1 | n − 1—but p − 1 is even, and n − 1
is odd! That’s a contradiction.

7.116 Here is a solution in Python:

1 def miller_rabin(n, k=16):
2 '''Test whether n is prime, using k trials of the Miller-Rabin test.
3 The probability of a false positive is < 2^k.'''
4 r = 0
5 while (n - 1) % (2 * 2**r) == 0:
6 r += 1
7 d = (n - 1) / 2**r
8 for i in range(k):
9 a = random.randint(1, n-1)
10 sigma = [a**d % n]
11 while len(sigma) <= r:
12 sigma.append(sigma[-1]**2 % n)
13 if sigma[-1] != 1:
14 return False
15 for i in range(len(sigma)-1):
16 if sigma[i] not in [1, n-1] and sigma[i+1] == 1:
17 return False
18 return True

7.5 Cryptography
7.117 The four connectives with the property that (m ◦ k) ◦ k ≡ m are:

p q p q p⊕q p⇔q
T T T T F T
T F T F T F
F T F T T F
F F F F F T

Choosing p as the “encryption” for m ◦ k means that the ciphertext is in fact the plain text: no security at all. Choosing q as
the encryption for m ◦ k means that the ciphertext is in fact just the key: no message at all. Choosing ⇔ as the encryption
for m ◦ k means that the ciphertext is just the bitwise negation of the exclusive or choice: not interestingly different from
⊕.

7.118 Observe that each bit is handled independently, so we only need to consider a bit at a time. By definition, the
probability that the chosen key ki ∈ {0, 1} is 1 is precisely 12 . Let p be the probability that the plaintext message is 1.
Then
Pr [ciphertext = 1]
= Pr [ciphertext = 1|plaintext = 1] · Pr [plaintext = 1] + Pr [ciphertext = 1|plaintext = 0] · Pr [plaintext = 0]
law of total probability
= 1
2
p + 1
2
(1 − p) definition of p/one-time pad
1
= 2
.

7.119 The key is 1010011010010010101010011110001011110010; the document is the Declaration of Independence.

7.120 An implementation in Python is shown in Figure S.7.3.

7.121 ⟨e = 5, n = 437⟩ and ⟨d = 317, n = 437⟩

7.122 ⟨e = 7, n = 1147⟩ and ⟨d = 463, n = 1147⟩

140 Number Theory

1 import random
2
3 def keygen(n):
4 '''Generate a random n-bit string.'''
5 return "".join([str(random.randint(0, 1)) for i in range(n)])
6
7 def text2bits(message):
8 '''Convert a text string into a bitstring (8 bits per character).'''
9 output = []
10 for ch in message:
11 bits = bin(ord(ch))[2:].zfill(8)
12 output.append(bits)
13 return "".join(output)
14
15 def bits2text(message):
16 '''Convert a bitstring (8 bits per character) into a text string.'''
17 output = []
18 for i in range(0, len(message), 8):
19 char = chr(int(message[i:i+8], base=2))
20 output.append(char)
21 return "".join(output)
22
23 def encode(key, message):
24 '''Encode a text message using the one-time pad key.'''
25 output = []
26 for i in range(0,len(message),len(key)):
27 chunk = message[i:i+len(key)]
28
29 # Note that (A != B) denotes (A XOR B)
30 # (and we may have to pad the last chunk of the message with 0s).
31 coded = "".join([str(int(chunk[i] != key[i])) for i in range(len(chunk))])
32 coded = coded + '0' * (len(key) - len(coded))
33
34 output.append(coded)
35 return "".join(output)

Figure S.7.3 An implementation of one-time pads in Python.

7.123 ⟨e = 11, n = 1763⟩ and ⟨d = 611, n = 1763⟩

7.124 me mod n = 425 mod 221 = 9

7.125 me mod n = 995 mod 221 = 216

7.126 cd mod n = 9977 mod 221 = 73

7.127 cd mod n = 1777 mod 221 = 153

7.128 I used a program to do the factoring and arithmetic in what follows.

First, we can factor n = 1,331,191 as n = 881 · 1511. Writing p = 881 and q = 1511, we have

(p − 1)(q − 1) = 880 · 1510 = 1,328,800,

and we know that e · 885,867 mod (p − 1)(q − 1) = 1. Therefore d = 885,867.

And cd mod n = 441626885867 mod 1,328,800 = 424242. The message that was sent to Carol is 424242.

7.129 The primes were 3413 and 3571; the message is 7041776.

7.130 The primes were 20477 and 32377; the message is 11223344.
7.5 Cryptography 141

7.131 Repeatedly test, in increasing order, each integer greater than x using the Miller–Rabin primality test. (We might
as well check only the odd integers, but it doesn’t make a difference asymptotically.) Each Miller–Rabin test is efficient,
and, by the Prime Number Theorem, approximately log1 n of these integers are prime—so after about log n iterations we’ll
have encountered a prime number.

7.132 Repeatedly test, in increasing order, each integer e greater than max(p, q) for relatively primality with (p − 1)(q − 1),
using the Euclidean algorithm. (Again, we might as well check only the odd integers, but it doesn’t make a difference
asymptotically.) Any prime integer e will be relatively prime to (p − 1)(q − 1), but composite numbers may be relatively
prime as well. By the Prime Number Theorem, approximately log1 n of these integers are prime—so after about log n
iterations we’ll have encountered a prime number, if we haven’t terminated already.

7.133 Use the inverse algorithm based on the extended Euclidean algorithm. (See Figure 7.19a.)

7.134 Use the repeated squaring algorithm for modular exponentiation. (See Figure 7.6.)

7.135 Aside from 2, all primes are odd. Therefore p and q (as “large” primes) must be odd. Thus p − 1 and q − 1 are
even—so any number e relatively prime to (p − 1)(q − 1) cannot be even; otherwise 2 | e and 2 | (p − 1)(q − 1).

7.136 Suppose that d were even. Then 2 | de. But de mod (p − 1)(q − 1) = 1 by definition, and in the previous exercise
we argued that (p − 1)(q − 1) is even. If 2 | de and 2 | (p − 1)(q − 1), then de ̸≡(p−1)(q−1) 1. (See Theorem 7.24.)

√
7.137 Factoring n is now easy: Eve can compute n by performing binary search to identify p such that p2 = n. (Eve can
tell whether p is too big or too small efficiently, by multiplying p by p and comparing the result to n.) Thus the protocol
is no longer secure.

7.138 Bob can’t necessarily decrypt the ciphertext: we relied on the fact that p was prime to conclude via Fermat’s Little
Theorem that the unencrypted message and the decrypted ciphertext were identical modulo p. If p is not prime, we cannot
guarantee this fact, and thus Bob may not be able to decrypt.
(For example, suppose that p = 9 and q = 15. Then (p − 1)(q − 1) = 8 · 14 = 112, to which e = 3 is relatively prime.
Then d = 75. If we encrypt the message m = 2, we get 23 mod 135 = 8. But 8d mod 135 = 875 mod 135 = 107 ̸= 2.
It turns out that many different messages are decrypted as 107: {2, 17, 32, 47, 62, 77, 92, 107, 122}.)

7.139 There’s no longer any encryption at all! The message m is encrypted as c := me mod n. But if e = 1, then
c = m mod n = m. Thus Eve can simply read the message straight out of the so-called ciphertext.

7.140 There a step of the protocol that can no longer be executed. We have to compute d := e−1 modulo (p − 1)(q − 1).
If e is not relatively prime to (p − 1)(q − 1), then there is no such d. (See Theorem 7.24.)

7.141 We are given a number c and a number e, and we must find the value b such that be = c. The key is that there are
certainly at most n possible values for b (the set {0, 1, 2, . . . , c}), and, given a hypothesis x, we can tell whether x < b,
x = b, or x > b by simply performing exponentiation and comparing xe to c. Thus, in each step, we identify the middle
of the range of candidate values of x, and compare x and b. In log n steps, we’ve found b.
In RSA, on the other hand, we are given a number c and a number e and a number n, and we must find the value b such
that be mod n = c. The issue is that, given a candidate x, even knowing that xe mod n < be mod n, we don’t know that
x < b. We don’t know how many times xe and be “wrap around” in the modular computation. (For example, 35 mod 11
is larger than 25 mod 11—but 2 < 3, so we’re in trouble!)

7.142 See keygen in Figure S.7.4.

7.143 See encrypt and decrypt in Figure S.7.4.

7.144 See str2list and list2str (which rely on str2int, int2str, and baseConvert) in Figure S.7.4.

7.145 Here’s Python code testing these implementations (and also testing the implementation in Exercise 7.147):
142 Number Theory

1 def keygen(p, q):

2 '''Generate an RSA keypair given two primes p, q.'''
3 n = p * q
4 e = 3
5 while euclid(e, (p-1) * (q-1)) != 1:
6 e += 2
7 d = inverse(e, (p-1) * (q-1))
8 return ((e, n), (d, n))
9
10 def encrypt(plaintext, pubkey):
11 '''Encrypt an integer using the RSA public key pubkey.'''
12 e, n = pubkey[0], pubkey[1]
13 return modexp(plaintext, e, n)
14
15 def decrypt(ciphertext, privkey):
16 '''Decrypt an encoded message using the RSA private key privkey.'''
17 d, n = privkey[0], privkey[1]
18 return modexp(ciphertext, d, n)
19
20 def baseConvert(n,b):
21 '''Convert an integer n into a base-b integer.'''
22 d = []
23 c = 1
24 while n > 0:
25 dI = (n // c) % b
26 n = n - dI * c
27 c = c * b
28 d.append(dI)
29 d.reverse()
30 return d
31
32 def str2int(s):
33 '''Convert an ASCII string, viewed as a base-256 number, to an integer.'''
34 n = 0
35 for i in range(len(s)):
36 n = ord(s[i]) + 256*n
37 return n
38
39 def int2str(n):
40 '''Convert an integer to an ASCII string, viewed as a base-256 number.'''
41 s = []
42 while n > 0:
43 s.append(chr(n % 256))
44 n = n // 256
45 s.reverse()
46 return "".join(s)
47
48 def str2list(s, blockSize):
49 '''Convert an ASCII string, viewed as a base-256 number, to an
50 base-blockSize list of integers.'''
51 return baseConvert(str2int(s), blockSize)
52
53 def list2str(L, blockSize):
54 '''Convert a base-blockSize list of integers to an ASCII string,
55 viewed as a base-256 number.'''
56 x = 0
57 for n in L:
58 x = n + blockSize*x
59 return int2str(x)

Figure S.7.4 An implementation of RSA in Python.

7.5 Cryptography 143

1 p = 5277019477592911
2 q = 7502904222052693
3 pub, priv = keygen(p, q)
4 d, n = priv[0], priv[1]
5
6 plaintext = "THE SECRET OF BEING BORING IS TO SAY EVERYTHING."
7 ciphertext = [encrypt(m, pub) for m in str2list(plaintext, n)]
8 decoded = [decrypt(m, priv) for m in ciphertext]
9 print(decoded)
10 print(list2str(decoded, n))
11
12 decoded = [decryptCRT(m, d, p, q) for m in ciphertext]
13 print(decoded)
14 print(list2str(decoded, n))

7.146 Here’s Python code, using the Miller–Rabin test (see Exercise 7.116):
1 def find_prime(lo, hi):
2 '''Repeatedly select a random odd number not divisible by 5 in [lo, hi]
3 and return it if the Miller-Rabin test claims it's prime.'''
4 while True:
5 n = random.randint(lo, hi)
6 if n % 10 in [0, 2, 4, 5, 6, 8]: # n is definitely not prime
7 continue
8 if miller_rabin(n):
9 return n

7.147 Here’s the alternative implementation of encrypt:

1 def decryptCRT(ciphertext, d, p, q):
2 '''Decrypt an encoded message using the RSA private key (d, p*q).'''
3 cP = modexp(ciphertext, d, p)
4 cQ = modexp(ciphertext, d, q)
5 x, y, r = extended_euclidean_algorithm(p, q)
6 c = (cP*y*q + cQ*x*p) % (p*q)
7 return c

7.148 Note that

d
c mod p = [c
d mod p−1
· ck(p−1) ] mod p
h i
= [cd mod p−1 mod p] · [(ck )p−1 ] mod p mod p
h i
= [cd mod p−1 mod p] · [1] mod p

by Fermat’s Little Theorem, unless ck ≡p 0. But if ck ≡p 0 then c ≡p 0 by Exercise 7.48, in which case cd mod p =
0 = cd mod p−1 mod p.
8 Relations

8.2 Formal Introduction

8.1 The relation divides on {1, 2, . . . , 8} is

⟨1, 1⟩, ⟨1, 2⟩, ⟨1, 3⟩, ⟨1, 4⟩, ⟨1, 5⟩, ⟨1, 6⟩, ⟨1, 7⟩, ⟨1, 8⟩, ⟨2, 2⟩, ⟨2, 4⟩, ⟨2, 6⟩, ⟨2, 8⟩,
⟨3, 3⟩, ⟨3, 6⟩, ⟨4, 4⟩, ⟨4, 8⟩, ⟨5, 5⟩, ⟨6, 6⟩, ⟨7, 7⟩, ⟨8, 8⟩ .

8.2 The subset relation on P({1, 2, 3}) is

 


⟨{} , {1}⟩, ⟨{} , {2}⟩, ⟨{} , {3}⟩⟨{} , {1, 2}⟩, ⟨{} , {1, 3}⟩, ⟨{} , {2, 3}⟩, ⟨{} , {1, 2, 3}⟩, 


 ⟨{1} , {1, 2}⟩, ⟨{1} , {1, 3}⟩, ⟨{1} , {1, 2, 3}⟩, 

⟨{2} , {1, 2}⟩, ⟨{2} , {2, 3}⟩, ⟨{2} , {1, 2, 3}⟩, .

 

 ⟨{3} , {1, 3}⟩, ⟨{3} , {2, 3}⟩, ⟨{3} , {1, 2, 3}⟩,
 

⟨{1, 2} , {1, 2, 3}⟩, ⟨{1, 3} , {1, 2, 3}⟩, ⟨{2, 3} , {1, 2, 3}⟩

8.3 The isProperPrefix relation on bitstrings of length ≤ 3 is

 


⟨ϵ, 0⟩, ⟨ϵ, 1⟩, ⟨ϵ, 00⟩, ⟨ϵ, 01⟩, ⟨ϵ, 10⟩, ⟨ϵ, 11⟩, ⟨ϵ, 000⟩, ⟨ϵ, 001⟩, 


 ⟨ϵ, 010⟩, ⟨ϵ, 011⟩, ⟨ϵ, 100⟩, ⟨ϵ, 101⟩, ⟨ϵ, 110⟩, ⟨ϵ, 111⟩, 

⟨0, 00⟩, ⟨0, 01⟩, ⟨0, 000⟩, ⟨0, 001⟩, ⟨0, 010⟩, ⟨0, 011⟩, .

 

⟨1, 10⟩, ⟨1, 11⟩, ⟨1, 100⟩, ⟨1, 101⟩, ⟨1, 110⟩, ⟨1, 111⟩,
 

⟨00, 000⟩, ⟨00, 001⟩, ⟨01, 010⟩, ⟨01, 011⟩, ⟨10, 100⟩, ⟨10, 101⟩, ⟨11, 110⟩, ⟨11, 111⟩

8.4 On bitstrings of length ≤ 3, isProperSubstring is

 
 ⟨ϵ, 0⟩, ⟨ϵ, 1⟩, ⟨ϵ, 00⟩, ⟨ϵ, 01⟩, ⟨ϵ, 10⟩, ⟨ϵ, 11⟩,






 ⟨ϵ, 000⟩, ⟨ϵ, 001⟩, ⟨ϵ, 010⟩, ⟨ϵ, 011⟩, ⟨ϵ, 100⟩, ⟨ϵ, 101⟩, ⟨ϵ, 110⟩, ⟨ϵ, 111⟩, 


 


 ⟨ 0, 00⟩, ⟨0, 01⟩, ⟨0, 10⟩, ⟨0, 000⟩, ⟨0, 001⟩, ⟨0, 010⟩, ⟨0, 011⟩, ⟨0, 100⟩, ⟨0, 101⟩, ⟨0, 110⟩, 

 
⟨1, 01⟩, ⟨1, 10⟩, ⟨1, 11⟩, ⟨1, 001⟩, ⟨1, 010⟩, ⟨1, 011⟩, ⟨1, 100⟩, ⟨1, 101⟩, ⟨1, 110⟩, ⟨1, 111⟩,
⟨00, 000⟩, ⟨00, 001⟩, ⟨00, 100⟩, .

 


 ⟨ 01, 001⟩, ⟨01, 010⟩, ⟨01, 011⟩, ⟨01, 101⟩, 


 


 ⟨ 10, 010⟩, ⟨10, 100⟩, ⟨10, 101⟩, ⟨10, 110⟩, 


 

⟨11, 011⟩, ⟨11, 110⟩, ⟨11, 111⟩

8.5 isProperSubsequence consists of all pairs ⟨x, y⟩ where x is a proper substring of y, plus any pairs for which the elements
of x appear nonconsecutively in y. For any y ̸= ϵ, we have that ⟨ϵ, y⟩ ∈ isProperSubstring. For x ∈ {0, 1}, we have that
⟨x, y⟩ ∈ isProperSubstring ⇔ ⟨x, y⟩ ∈ isProperSubsequence (there’s only one symbol in x, so it can’t be “split up”). No
string of length 3 is proper substring or subsequence of a string of length ≤ 3. So the only interesting strings 00, 01, 10,
and 11. If 01 appears at all in y, it appears consecutively. (Think about why!) Similarly for 10. Thus
isProperSubsequence = isProperSubstring ∪ {⟨00, 010⟩, ⟨11, 101⟩} .

8.6 isAnagram includes all pairs ⟨x, x⟩ for all bitstrings x ∈ {ϵ, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111}
plus all of the following:
 
 ⟨01, 10⟩, ⟨10, 01⟩, 
⟨001, 010⟩, ⟨001, 100⟩, ⟨010, 001⟩, ⟨010, 100⟩, ⟨100, 001⟩, ⟨100, 010⟩, .
 ⟨110, 101⟩, ⟨110, 011⟩, ⟨101, 110⟩, ⟨101, 011⟩, ⟨011, 110⟩, ⟨011, 101⟩ 

144
8.2 Formal Introduction 145

8.7 ⊆

8.8 =

8.9 ∅

8.10 ⊂

8.11 The pair ⟨A, B⟩ ∈ ∼⊂ if ⟨A, B⟩ ∈

/ ⊂—that is, if
¬(A ⊂ B) ⇔ ¬(∀x : x ∈ A ⇒ x ∈ B)
⇔ ∃x : ¬(x ∈ A ⇒ x ∈ B)
⇔ ∃x : (x ∈ A ∧ x ∈
/ B)
⇔ A − B ̸= ∅.

8.12 R−1 = {⟨2, 2⟩, ⟨1, 5⟩, ⟨3, 2⟩, ⟨2, 5⟩, ⟨1, 2⟩}

8.13 S−1 = {⟨4, 3⟩, ⟨3, 5⟩, ⟨6, 6⟩, ⟨4, 1⟩, ⟨3, 4⟩}

8.14 R ◦ R = {⟨2, 1⟩, ⟨2, 2⟩, ⟨2, 3⟩, ⟨5, 1⟩, ⟨5, 2⟩, ⟨5, 3⟩}

8.15 R ◦ S = ∅

8.16 S ◦ R = {⟨2, 4⟩, ⟨5, 4⟩}

8.17 R ◦ S−1 = {⟨3, 1⟩, ⟨3, 2⟩}

8.18 S ◦ R−1 = {⟨1, 3⟩, ⟨2, 3⟩}

8.19 S−1 ◦ S = {⟨4, 4⟩, ⟨4, 5⟩, ⟨5, 4⟩, ⟨5, 5⟩, ⟨1, 1⟩, ⟨1, 3⟩, ⟨3, 1⟩, ⟨3, 3⟩, ⟨6, 6⟩}

8.20 The relation is shown in Figure S.8.1a.

8.21 R ◦ R−1 denotes the set of pairs of ingredients that are used together in at least one sauce. The relation is shown in
Figure S.8.1b.

8.22 R−1 ◦ R denotes the set of pairs of sauces that share at least one ingredient. Because all five sauces include butter, all
pairs are in the relation, so R−1 ◦ R equals
{Béchamel, Espagnole, Hollandaise, Velouté, Tomate} × {Béchamel, Espagnole, Hollandaise, Velouté, Tomate} .

8.23 at ◦ taking

8.24 at ◦ taughtIn−1

8.25 taking−1 ◦ taking

8.26 taking−1 ◦ at−1 ◦ at ◦ taking

8.27 parent ◦ parent = grandparent

8.28 (parent−1 ) ◦ (parent−1 ) = grandchild

146 Relations

(a) (b)
sauce ingredient ingredient ingredient
Béchamel milk milk milk
Béchamel butter milk butter
Béchamel flour milk flour
Espagnole stock butter milk
Espagnole butter butter butter
Espagnole flour butter flour
Hollandaise egg butter stock
Hollandaise butter butter egg
Hollandaise lemon juice butter lemon juice
Velouté stock butter tomatoes
Velouté butter tomatoes tomatoes
Velouté flour tomatoes butter
Tomate tomatoes tomatoes flour
Tomate butter lemon juice egg
Tomate flour lemon juice butter
lemon juice lemon juice
egg egg
egg butter
egg lemon juice
flour milk
flour butter
flour flour
flour stock
flour tomatoes
stock butter
stock flour
stock stock
stock egg
stock lemon juice
stock tomatoes

Figure S.8.1 Some relations based on sauces in French cooking.

8.29 Writing parent−1 as child, we have parent ◦ child = sibling (or yourself): ⟨x, y⟩ is in the given relation if there exists
a person z such that ⟨x, z⟩ ∈ child and ⟨z, y⟩ ∈ parent. That is, x is a child of z, who is a parent of y—in other words, x
and y share a parent. (You share a parent with yourself, so x and y can be siblings or they can be the same person.)

8.30 Writing parent−1 as child, we have child ◦ parent = procreated-with (or yourself, if you have a child): ⟨x, y⟩ is in
the given relation if there exists a person z such that ⟨x, z⟩ ∈ parent and ⟨z, y⟩ ∈ child. That is, x is a parent of z, who is
a child of y—that is, x and y share a child. (You share a child with yourself if you’ve had a child, so x and y can be the
joint parents of a child, or they can be the same parent of a child.)

8.31 Writing parent ◦ parent as grandparent and parent−1 ◦ parent−1 as grandchild = grandparent−1 , the given relation
is grandparent ◦ grandchild. That is, ⟨x, y⟩ ∈ grandparent ◦ grandchild if there’s a z such that x is the grandchild of z
and z is the grandparent of y—that is, x and y share a grandparent. That means that x and y are either the same person,
siblings, or first cousins.

8.32 Writing parent ◦ (parent−1 ) as sibling-or-self, we’re looking for sibling-or-self ◦ sibling-or-self. Because the sibling
of my sibling is my sibling, in fact this relation also simply represents sibling-or-self.

8.33 The largest possible size is nm: for example, if R = {⟨1, x⟩ : 1 ≤ x ≤ n} and S = {⟨y, 1⟩ : 1 ≤ y ≤ m}, then
R ◦ S = {⟨y, x⟩ : 1 ≤ x ≤ n and 1 ≤ y ≤ m}.
8.2 Formal Introduction 147

The smallest possible size is zero, if, for example, R ⊆ Z>0 × Z>0 and S ⊆ Z≤0 × Z≤0 .

8.34 We claim that both sets denote the set of pairs ⟨a, d⟩ for which there exist b and c such that ⟨a, b⟩ ∈ T, ⟨b, c⟩ ∈ S,
and ⟨c, d⟩ ∈ R:
⟨a, d⟩ ∈ R ◦ (S ◦ T) ⇔ ∃c : ⟨a, c⟩ ∈ S ◦ T and ⟨c, d⟩ ∈ R definition of composition
⇔ ∃c : [∃b : ⟨a, b⟩ ∈ T and ⟨b, c⟩ ∈ S] and ⟨c, d⟩ ∈ R definition of composition
⇔ ∃b, c : [⟨a, b⟩ ∈ T and ⟨b, c⟩ ∈ S and ⟨c, d⟩ ∈ R] properties of ∃
⇔ ∃b : ⟨a, b⟩ ∈ T and [∃c : ⟨b, c⟩ ∈ S and ⟨c, d⟩ ∈ R] properties of ∃
⇔ ∃b : ⟨a, b⟩ ∈ T and ⟨b, d⟩ ∈ R ◦ S definition of composition
⇔ ⟨a, d⟩ ∈ (R ◦ S) ◦ T. definition of composition

8.35 Let a and c be generic elements. Then:

⟨a, c⟩ ∈ (R ◦ S)−1 ⇔ ⟨c, a⟩ ∈ R ◦ S definition of inverse

⇔ ∃b : ⟨c, b⟩ ∈ S and ⟨b, a⟩ ∈ R definition of composition

⇔ ∃b : ⟨b, c⟩ ∈ S−1 and ⟨a, b⟩ ∈ R−1 definition of inverse

−1 −1
⇔ ∃b : ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ S commutativity of ∧

⇔ ⟨a, c⟩ ∈ S−1 ◦ R−1 . definition of composition

8.36 False. The pair ⟨x, x⟩ ∈ R ◦ R−1 if and only if there exists a y ∈ B such that ⟨x, y⟩ ∈ R. For example, if R = ∅, then
R ◦ R−1 = ∅ too.

8.37 R × R

8.38 identity = {⟨x, x⟩ : x ∈ Z}

8.39 Consider any x ∈ A. Because R is a function, there’s one and only one y ∈ B such that ⟨x, y⟩ ∈ R. Because T is
a function, there’s one and only one z ∈ C such that ⟨y, z⟩ ∈ T. A pair ⟨x, z⟩ is in T ◦ R if there exists a y such that
⟨x, y⟩ = R and ⟨y, z⟩ ∈ T. Thus, for any x, there’s one and only one such y, and one and only such z. Therefore T ◦ R is
a function.

8.40 Fix z ∈ C. Because T is one-to-one, there is at most one y ∈ B such that ⟨y, z⟩ ∈ T. Because R is one-to-one, for
any y (including the lone y from the previous sentence, if there is one) there is at most one x ∈ A such that ⟨x, y⟩ ∈ R.
Thus there’s at most one x ∈ A such that ∃b ∈ B : ⟨x, b⟩ ∈ R and ⟨b, z⟩ ∈ T—that is, there’s at most one x ∈ A such
that ⟨x, z⟩ ∈ T ◦ R. Thus T ◦ R is one-to-one.

8.41 Fix z ∈ C. Because T is onto, there is at least one y ∈ B such that ⟨y, z⟩ ∈ T. Because R is onto, for any y (including
the values of y from the previous sentence) there is at least one x ∈ A such that ⟨x, y⟩ ∈ R. Thus there’s at least one x ∈ A
such that ∃b ∈ B : ⟨x, b⟩ ∈ R and ⟨b, z⟩ ∈ T—that is, there’s at least one x ∈ A such that ⟨x, z⟩ ∈ T ◦ R. Thus T ◦ R is
onto.

8.42 Neither must be a function. For example, consider R ⊆ {a, b} × {0, 1} and T ⊆ {0, 1} × {A, B}.
R = {⟨a, 0⟩, ⟨a, 1⟩, ⟨b, 0⟩, ⟨b, 1⟩} (not a function)
T = {⟨0, A⟩} (not a function—T(1) is undefined)

But, composing these relations, we get

T ◦ R = {⟨a, A⟩, ⟨b, A⟩} . (a function!)

8.43 T does not need to be one-to-one. For example, consider R ⊆ {a, b} × {0, 1} and T ⊆ {0, 1} × {A, B}.
R = {⟨a, 0⟩, ⟨b, 1⟩} (one-to-one)
T = {⟨0, A⟩, ⟨1, B⟩, ⟨2, B⟩} (not one-to-one—T(1) = T(2))
148 Relations

But, composing these relations, we get

T ◦ R = {⟨a, A⟩, ⟨b, B⟩} . (one-to-one!)

On the other hand, R does need to be one-to-one. Suppose not. Then R(x) = R(x′ ) = y for some x and x′ ̸= x. Because
T is a function, T(y) = z for some particular z. But then ⟨x, z⟩, ⟨x′ , z⟩ ∈ T ◦ R—making T ◦ R not one-to-one.

8.44 R does not need to be onto. For example, consider R ⊆ {a, b} × {0, 1, 2} and T ⊆ {0, 1, 2} × {A, B}.

R = {⟨a, 0⟩, ⟨b, 1⟩} (not onto—no x with R(x) = 2)

T = {⟨0, A⟩, ⟨1, B⟩, ⟨2, B⟩} (onto)

But, composing these relations, we get

T ◦ R = {⟨a, A⟩, ⟨b, B⟩} . (onto!)

On the other hand, T does need to be onto. Suppose not. Then there’s a z such that no y satisfies T(y) = z. But then
⟨x, z⟩ ∈
/ T ◦ R for any x—making T ◦ R not onto.

8.45 join(≤, ≤) ∪ join(≤−1 , ≤−1 )

(
True if r = 0
8.46 project(select(C, f), {1}) where f(c, r, g, b) =
False otherwise.

8.47 project join project(C, {1, 4}), [project(C, {1, 4})]−1 , {1, 3}

(
True if b ≥ r
8.48 project(select(C, f), {1}) where f(c, r, g, b) =
False otherwise.

8.49 Blue ◦ Blue−1

8.50 [Blue ◦ ≥ ◦ Red−1 ] ∩ =

8.3 Properties of Relations

8.51 9 10
11
8
12
7
0
6
1
5
2
4 3
8.3 Properties of Relations 149

8.52 11 12
10
13
9
14
8
0
7
1
6
2
5
4 3

8.53 11 12
10
13
9
14
8
0
7
1
6
2
5
4 3

8.54 The relation ⟨x, x⟩ : x5 ≡5 x is reflexive: 05 = 0 ≡5 0 and 15 = 1 ≡5 1 and 25 = 32 ≡5 2 and 35 = 243 ≡5 3
and 45 = 1024 ≡5 4.

8.55 The relation R = {⟨x, y⟩ : x + y ≡5 0} is neither reflexive nor irreflexive: ⟨0, 0⟩ ∈ R because 0 + 0 = 0 ≡5 0, but
⟨1, 1⟩ ∈
/ R because 1 + 1 = 2 ̸≡5 1.

8.56 R = {⟨x, y⟩ : there exists z such that x · z ≡5 y} is reflexive: ⟨x, x⟩ ∈ R because we can always choose z = 1.

8.57 The relation R = ⟨x, y⟩ : there exists z such that x2 · z2 ≡5 y is neither reflexive nor irreflexive: ⟨0, 0⟩ ∈ R
because 02 · z2 = 0 ≡5 0 for any z. But ⟨2, 2⟩ ∈
/ R because 22 z2 = 4z2 ̸≡5 2 for any z: 4 · 02 ≡5 0 and 4 · 12 ≡5 4 and
4 · 2 ≡5 1 and 4 · 3 ≡5 1 and 4 · 4 ≡5 4.
2 2 2

8.58 The claim is true: ⟨x, y⟩ ∈ R if and only if ⟨y, x⟩ ∈ R−1 , so ⟨x, x⟩ ∈ R if and only if ⟨x, x⟩ ∈ R−1 . Thus R is reflexive
(every ⟨x, x⟩ ∈ R) if and only if R−1 is reflexive (every ⟨x, x⟩ ∈ R−1 ).

8.59 The claim is true: consider an arbitrary x. By definition, ⟨x, x⟩ ∈ R ◦ T if and only if there exists a y such that
⟨x, y⟩ ∈ T and ⟨y, x⟩ ∈ R. But we can always just choose y = x. By the reflexivity of R and T, we have that ⟨x, x⟩ ∈ T
and ⟨x, x⟩ ∈ R, and thus there exists a y such that ⟨x, y⟩ ∈ T and ⟨y, x⟩ ∈ R. Therefore ⟨x, x⟩ ∈ R ◦ T for every x, and
R ◦ T is reflexive.

8.60 The claim is false: consider R = {⟨0, 1⟩, ⟨1, 0⟩}. Then R is not reflexive. But ⟨0, 0⟩ ∈ R ◦ R (because ⟨0, 1⟩ ∈ R
and ⟨1, 0⟩ ∈ R) and ⟨1, 1⟩ ∈ R ◦ R (because ⟨1, 0⟩ ∈ R and ⟨0, 1⟩ ∈ R).

8.61 The claim is true: ⟨x, y⟩ ∈ R if and only if ⟨y, x⟩ ∈ R−1 , so ⟨x, x⟩ ∈ R if and only if ⟨x, x⟩ ∈ R−1 . Thus every
⟨x, x⟩ ∈ / R−1 .
/ R if and only if every ⟨x, x⟩ ∈

8.62 The claim is false: consider R = {⟨0, 1⟩, ⟨1, 0⟩}. Then R is irreflexive. But ⟨0, 0⟩ ∈ R ◦ R (because ⟨0, 1⟩ ∈ R and
⟨1, 0⟩ ∈ R).
150 Relations

8.63 Symmetric and antisymmetric. This relation is in fact {⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨3, 3⟩, ⟨4, 4⟩} and thus satisfies both
requirements. But the relation is not asymmetric, because ⟨0, 0⟩ ∈ R.

8.64 Symmetric. x + y ≡5 y + x, so ⟨x, y⟩ ∈ R ⇔ ⟨y, x⟩ ∈ R. It is not asymmetric because ⟨0, 0⟩ ∈ R, and not
antisymmetric because ⟨2, 3⟩, ⟨3, 2⟩ ∈ R.

8.65 None of these. The relation is not antisymmetric (and not asymmetric) because ⟨1, 2⟩, ⟨2, 1⟩ ∈ R (for ⟨1, 2⟩ choose
z = 2 so that 1 · z ≡5 2; for ⟨2, 1⟩ choose z = 3 so that 2 · z ≡5 1). It is not symmetric because ⟨1, 0⟩ ∈ R (choose
z = 0) but ⟨0, 1⟩ ∈
/ R (there’s no z such that 0z ≡5 1).

8.66 None of these. The relation is not symmetric because ⟨1, 0⟩ ∈ R (choose z = 0) but ⟨0, 1⟩ ∈/ R (there’s no z such
that 0z ≡5 1). It is not antisymmetric (and not asymmetric) because ⟨1, 4⟩, ⟨4, 1⟩ ∈ R (for ⟨1, 4⟩ choose z = 2 and
12 · z2 ≡5 4; for ⟨4, 1⟩ choose z = 1 and 42 · z2 ≡5 1).

8.67 The key step is the fact (“the axiom of extensionality”) that two sets A and B are equal if and only if x ∈ A ⇔ x ∈ B
for all values of x. Here’s the proof of the claim:
R is symmetric ⇔ ∀x, y : [⟨x, y⟩ ∈ R ⇔ ⟨y, x⟩ ∈ R] definition of symmetry
−1
⇔ ∀x, y : [⟨x, y⟩ ∈ R ⇔ ⟨x, y⟩ ∈ R ] definition of inverse
−1
⇔R=R . “axiom of extensionality”

8.68 We’ll prove that R is not antisymmetric if and only if R ∩ R−1 ̸⊆ {⟨a, a⟩ : a ∈ A}; the desired fact follows
immediately (because p ⇔ q and ¬p ⇔ ¬q are logically equivalent).
−1
R∩R ̸⊆ {⟨a, a⟩ : a ∈ A} ⇔ there exists ⟨a, b⟩ ∈ R ∩ R−1 with a ̸= b definition of ⊆
−1
⇔ there exists ⟨a, b⟩ ∈ R and ⟨a, b⟩ ∈ R with a ̸= b definition of ∩
⇔ there exists ⟨a, b⟩ ∈ R and ⟨b, a⟩ ∈ R with a ̸= b definition of inverse
⇔ R is not antisymmetric. definition of antisymmetry

8.69 Analogously to the previous exercise, we’ll prove that R is asymmetric if and only if R ∩ R−1 = ∅ by proving that
R is not asymmetric if and only if R ∩ R−1 ̸= ∅:
−1
R∩R ̸= ∅ ⇔ there exists ⟨a, b⟩ ∈ R ∩ R−1
⇔ there exists ⟨a, b⟩ ∈ R and ⟨a, b⟩ ∈ R−1 definition of ∩
⇔ there exists ⟨a, b⟩ ∈ R and ⟨b, a⟩ ∈ R definition of inverse
⇔ R is not asymmetric. definition of asymmetry

8.70 Suppose ⟨x, y⟩ ∈ R for x ̸= y. If ⟨y, x⟩ ∈

/ R too, then R is not symmetric, but if ⟨y, x⟩ ∈ R then R is not antisymmetric.
Thus any R containing ⟨x, y⟩ ∈ R for x ̸= y cannot be both symmetric and antisymmetric. But any relation R satisfying
R ⊆ {⟨x, x⟩ : x ∈ A} is both symmetric. It’s symmetric because ⟨x, x⟩ ∈ R implies ⟨x, x⟩ ∈ R, and it’s antisymmetric
by Theorem 8.8.

8.71 Using Theorem 8.8: if R is asymmetric, then R ∩ R−1 = ∅, and ∅ is certainly a subset of {⟨a, a⟩ : a ∈ A}. Thus R
is antisymmetric.

8.72 {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩} or {⟨0, 0⟩, ⟨1, 1⟩}.

8.73 {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 1⟩} or {⟨0, 0⟩, ⟨1, 0⟩, ⟨1, 1⟩} or {⟨0, 0⟩, ⟨1, 1⟩}.

8.74 Impossible! To be reflexive, we must include ⟨0, 0⟩; to be asymmetric we cannot have ⟨0, 0⟩.

8.75 {⟨0, 1⟩, ⟨1, 0⟩} or ∅

8.76 {⟨0, 1⟩} or {⟨1, 1⟩} or ∅

8.3 Properties of Relations 151

8.77 {⟨0, 1⟩} or {⟨1, 1⟩} or ∅

8.78 {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 0⟩} or {⟨0, 0⟩} or {⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩} or {⟨1, 1⟩}

8.79 {⟨0, 0⟩, ⟨1, 0⟩} or {⟨0, 0⟩, ⟨0, 1⟩} or {⟨0, 0⟩} or {⟨1, 0⟩, ⟨1, 1⟩} or {⟨0, 1⟩, ⟨1, 1⟩} or {⟨1, 1⟩}

8.80 Impossible! Asymmetry requires that we have neither ⟨0, 0⟩ nor ⟨1, 1⟩, but then the relation is irreflexive.

8.81 R = ⟨x, x⟩ : x5 ≡5 x is transitive. There’s no trio of elements a, b, c such that ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ R unless
a = b = c, so it’s immediate that the relation is transitive.

8.82 R = {⟨x, y⟩ : x + y ≡5 0} is not transitive. We have ⟨2, 3⟩ ∈ R and ⟨3, 2⟩ ∈ R, but ⟨2, 2⟩ ∈
/ R.

8.83 R = {⟨x, y⟩ : there exists z such that x · z ≡5 y} is transitive. Suppose that ⟨a, b⟩ ∈ R (because az1 ≡5 b) and
⟨b, c⟩ ∈ R (because bz2 ≡5 c). Then ⟨a, c⟩ ∈ R: choose z = z1 z2 , and thus az = (az1 )z2 ≡5 bz2 ≡5 c.

8.84 R = ⟨x, y⟩ : there exists z such that x2 · z2 ≡5 y is transitive. The relation consists of ⟨0, 0⟩ plus all pairs ⟨a, r⟩
for r ∈ {0, 1, 4} and a ∈ {0, 1, 2, 3, 4}—that is,
R = {⟨0, 0⟩} ∪ ({0, 1, 2, 3, 4} × {0, 1, 4}) .
We can check exhaustively that this relation is in fact transitive.

8.85 Assume that R is irreflexive and transitive. We must show that R is asymmetric. Suppose for a contradiction that
⟨a, b⟩ ∈ R and ⟨b, a⟩ ∈ R. But then ⟨a, a⟩ ∈ R by transitivity—but ⟨a, a⟩ ∈ R violates irreflexivity! Thus the assumption
was false, and ⟨a, b⟩ ∈ R ⇒ ⟨b, a⟩ ∈/ R, precisely as asymmetry demands.

8.86 We’ll show that R is transitive if and only if R ◦ R ⊆ R by mutual implication. First suppose R is transitive. Then
⟨a, c⟩ ∈ R ◦ R ⇔ ∃b : ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ R definition of ◦
⇒ ⟨a, c⟩ ∈ R. by transitivity

Thus R ◦ R ⊆ R.
Conversely, suppose that R ◦ R ⊆ R. Then
⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ R ⇒ ⟨a, c⟩ ∈ R ◦ R definition of ◦
⇒ ⟨a, c⟩ ∈ R. definition of ⊆

Thus R is transitive.

8.87 Let R = {⟨0, 1⟩}. Then R ◦ R = ∅, which is not equal to R. But R is vacuously transitive, because there is no trio
of elements a, b, c such that ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ R.

8.88 Yes, it is possible for R to be simultaneously symmetric, transitive, and irreflexive—but only when R = ∅. Suppose
that R is nonempty; in other words, suppose that ⟨a, b⟩ ∈ R for some particular a and b. Then, by the symmetry of R, we
know that ⟨b, a⟩ ∈ R too. Then by transitivity (and the facts that ⟨a, b⟩ ∈ R and ⟨b, a⟩ ∈ R) we have that ⟨a, a⟩ ∈ R,
too, violating irreflexivity.

8.89 Suppose ⟨a, f(a)⟩ ∈ R for f(a) ̸= a. Because R is a function, there must be a pair ⟨f(a), f(f(a))⟩ ∈ R. By transitivity,
then, ⟨a, f(f(a))⟩ ∈ R too. Thus, for R to be a function, we must have f(a) = f(f(a)).
Therefore it is possible for R to be simultaneously transitive and a function, so long as R has the following property:
for any b in the image of the function R, it must be the case that ⟨b, b⟩ ∈ R. To state it another way: for every b ∈ A, the
set Sb = {a : f(a) = b} is either empty, or b ∈ Sb .

8.90 The only problem with transitivity that arises is if 0 → 1 and 1 → 0, but either 0 ̸→ 0 or 1 ̸→ 1. Thus the only non-
transitive relations on {0, 1} are {⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩} and {⟨0, 1⟩, ⟨1, 0⟩, ⟨0, 0⟩} and {⟨0, 1⟩, ⟨1, 0⟩}. So the transitive
ones are the other 14:
• {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩}
152 Relations

• {⟨0, 0⟩, ⟨1, 0⟩, ⟨1, 1⟩}

• {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 1⟩}
• {⟨0, 0⟩, ⟨1, 1⟩}
• {⟨0, 0⟩, ⟨1, 0⟩}
• {⟨0, 0⟩, ⟨0, 1⟩}
• {⟨0, 0⟩}
• {⟨1, 0⟩, ⟨1, 1⟩}
• {⟨0, 1⟩, ⟨1, 1⟩}
• {⟨1, 1⟩}
• {⟨0, 1⟩, ⟨1, 0⟩}
• {⟨1, 0⟩}
• {⟨0, 1⟩}
• {}

8.91 Reflexivity requires both ⟨0, 0⟩ and ⟨1, 1⟩, so only the following are both transitive and reflexive:
• {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩}
• {⟨0, 0⟩, ⟨1, 0⟩, ⟨1, 1⟩}
• {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 1⟩}
• {⟨0, 0⟩, ⟨1, 1⟩}
For symmetry, we must have both ⟨0, 1⟩ or ⟨1, 0⟩ or neither, not just one of the two, which eliminates the middle two
relations from this list. So only the following relations on {0, 1} are transitive, reflexive, and symmetric:
• {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩}
• {⟨0, 0⟩, ⟨1, 1⟩}

8.92 {⟨2, 4⟩, ⟨4, 3⟩, ⟨4, 4⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨3, 3⟩}

8.93 {⟨2, 4⟩, ⟨4, 3⟩, ⟨4, 4⟩, ⟨4, 2⟩, ⟨3, 4⟩}

8.94 {⟨2, 4⟩, ⟨4, 3⟩, ⟨4, 4⟩, ⟨2, 3⟩}

8.95 {⟨2, 4⟩, ⟨4, 3⟩, ⟨4, 4⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨3, 3⟩, ⟨2, 3⟩}

8.96 {⟨2, 4⟩, ⟨4, 3⟩, ⟨4, 4⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨3, 3⟩, ⟨2, 3⟩, ⟨4, 2⟩, ⟨3, 4⟩, ⟨3, 2⟩}

8.97 T ∪ {⟨1, 1⟩, ⟨2, 2⟩, ⟨3, 3⟩, ⟨4, 4⟩, ⟨5, 5⟩}

8.98 T ∪ {⟨4, 3⟩, ⟨5, 4⟩}

8.99 T ∪ {⟨1, 1⟩, ⟨1, 4⟩, ⟨1, 5⟩, ⟨2, 2⟩, ⟨2, 4⟩, ⟨2, 5⟩, ⟨3, 3⟩, ⟨3, 5⟩}

8.100 The symmetric closure of ≥ is R × R: every pair of real numbers ⟨x, y⟩ either satisfies x ≥ y or y ≥ x. (The
symmetric closure of > is {⟨x, y⟩ ∈ R × R : x ̸= y}.)

8.101 There’s an almost word-for-word translation from the mathematical notation if we represent a relation as a Python set
of ordered pairs (for example, representing the relation {⟨3, 1⟩, ⟨5, 2⟩} as set((3, 1), (5, 2)). So, instead, we’ll give
an alternative (slower) solution based on a simpler data structure: a relation simply consists of an arbitrarily ordered list of
ordered pairs of elements. For example, the above relation is represented as [(3, 1), (5, 2)] or [(5, 2), (3, 1)].
See invert and compose in Figure S.8.2.

8.102 See reflexive, irreflexive, symmetric, antisymmetric, asymmetric, and transitive in Figure S.8.2.

8.103 See reflexive_closure, symmetric_closure, and transitive_closure in Figure S.8.2.

8.104 See Figure S.8.3. Running this code lists the exact same relations as in Exercises 8.72–8.80 (though, of course, in
different orders).
8.3 Properties of Relations 153

1 def invert(R):
2 return [(y, x) for (x, y) in R]
3
4 def compose(R, S):
5 R_compose_S = []
6 for (a, b) in R:
7 for (c, d) in S:
8 if b == c:
9 R_compose_S += [(a, d)]
10 return R_compose_S
11
12 def reflexive(R):
13 for x in universe: # a global variable! [a list].
14 if (x, x) not in R:
15 return False
16 return True
17
18 def irreflexive(R):
19 for (x,y) in R:
20 if x == y:
21 return False
22 return True
23
24 def symmetric(R):
25 return sorted(R) == sorted(invert(R))
26
27 def antisymmetric(R):
28 for (x, y) in R:
29 if (y, x) in R and x != y:
30 return False
31 return True
32
33 def asymmetric(R):
34 for (x, y) in R:
35 if (y, x) in R:
36 return False
37 return True
38
39 def transitive(R):
40 for (x, y) in compose(R, R):
41 if (x, y) not in R:
42 return False
43 return True
44
45 def reflexive_closure(R):
46 return R + [(x, x) for x in universe if (x, x) not in R]
47
48 def symmetric_closure(R):
49 return R + [(y, x) for (x, y) in R if (y, x) not in R]
50
51 def transitive_closure(R):
52 result = R
53 changed = True
54 while changed:
55 changed = False
56 for (a, b) in result:
57 for (c, d) in result:
58 if b == c and (a, d) not in result:
59 result += [(a, d)]
60 changed = True
61 return result

Figure S.8.2 An implementation of relations (including closures) in Python.

154 Relations

62 # Compute a list of all relations on the universe [0, 1].

63 universe = [0, 1]
64 all_binary_relations = [[]]
65 for x in universe:
66 for y in universe:
67 all_binary_relations = all_binary_relations + [L + [(x, y)] for L in all_binary_relations]
68
69 reflexives = [R for R in all_binary_relations if reflexive(R)]
70 irreflexives = [R for R in all_binary_relations if irreflexive(R)]
71 neithers = [R for R in all_binary_relations if not reflexive(R) and not irreflexive(R)]
72
73 for (number, description, relations) in [ \
74 ("8.72", "reflexive, symmetric", [R for R in reflexives if symmetric(R)]),
75 ("8.73", "reflexive, antisymmetric", [R for R in reflexives if antisymmetric(R)]),
76 ("8.74", "reflexive, asymmetric", [R for R in reflexives if asymmetric(R)]),
77 ("8.75", "irreflexive, symmetric", [R for R in irreflexives if symmetric(R)]),
78 ("8.76", "irreflexive, antisymmetric", [R for R in irreflexives if antisymmetric(R)]),
79 ("8.77 ", "irreflexive, asymmetric", [R for R in irreflexives if asymmetric(R)]),
80 ("8.78", "not refl/irrefl., symmetric", [R for R in neithers if symmetric(R)]),
81 ("8.79", "not refl/irrefl., antisymmetric", [R for R in neithers if antisymmetric(R)]),
82 ("8.80", "not refl/irrefl., asymmetric", [R for R in neithers if asymmetric(R)])
83 ]:
84 print("Exercise %s (%s):" % (number, description))
85 if len(relations) == 0:
86 relations = ["impossible!"]
87 for R in relations:
88 print(" ", R)

Figure S.8.3 Verifying Exercises 8.72–8.80 using Figure S.8.2.

8.105 Let R be a relation, and let S ⊇ R be transitive relation. We’ll argue that Rk ⊆ S for any k ≥ 1 by induction on k.
For the base case (k = 1), the statement is immediate: S ⊇ R and R1 = R, so R1 ⊆ S.
For the inductive case (k ≥ 2), we assume the inductive hypothesis Rk−1 ⊆ S, and we must prove that if ⟨a, c⟩ ∈ Rk
then ⟨a, c⟩ ∈ S. By definition, Rk = R ◦ Rk−1 . Thus:

⟨a, c⟩ ∈ Rk ⇔ ⟨a, c⟩ ∈ R ◦ Rk−1 definition of Rk

⇔ ∃b : ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ ◦Rk−1 definition of ◦

⇔ ∃b : ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ S inductive hypothesis
⇒ ∃b : ⟨a, b⟩ ∈ S and ⟨b, c⟩ ∈ S S ⊇ R by assumption
⇒ ⟨a, c⟩ ∈ S. S is transitive by assumption

8.106 Let A = {1, 2, . . . , n}, and let R = successor ∩ (A × A). Then the transitive closure of R is ≤, and contains
n(n + 1)/2 pairs. Thus our c gets arbitrarily close to 0.5 as n gets bigger and bigger.

8.107 We claim that ⟨x, x + k⟩ is in the transitive closure T of successor, by induction on k.

For the base case (k = 1), indeed ⟨x, x + 1⟩ ∈ successor itself. (And by definition T ⊇ successor.)
For the inductive case (k ≥ 2), we assume the inductive hypothesis ⟨x, x + k − 1⟩ ∈ T. We must prove ⟨x, x + k⟩ ∈ T.
But ⟨x + k − 1, x + k⟩ ∈ successor by the definition of successor, and successor ⊆ T by the definition of transitive
closure. Thus we have both ⟨x, x + k − 1⟩ ∈ T (inductive hypothesis) and ⟨x + k − 1, x + k⟩ ∈ T (successor ⊆ T). By
the transitivity of T, then, ⟨x, x + k⟩ ∈ T too.

8.108 The X closure of R is a superset of R (specifically, the smallest superset of R) that furthermore has the property of
being X. But if R is not antisymmetric (that is, if ⟨a, b⟩, ⟨b, a⟩ ∈ R for some a ̸= b), then every superset of R also fails
to be antisymmetric. Thus the “antisymmetric closure” would be undefined!
8.4 Special Relations 155

8.4 Special Relations

8.109 {⟨0, 0⟩, ⟨1, 1⟩} and {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩}

8.110 Because they must be reflexive, {⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨3, 3⟩} are in all equivalence relations, but what other pairs
we add can vary. There are 15 possibilities:
{} all separate
{⟨0, 1⟩, ⟨1, 0⟩} 0, 1 together
{⟨0, 2⟩, ⟨2, 0⟩} 0, 2 together
{⟨0, 3⟩, ⟨3, 0⟩} 0, 3 together
{⟨1, 2⟩, ⟨2, 1⟩} 1, 2 together
{⟨1, 3⟩, ⟨3, 1⟩} 1, 3 together
{⟨2, 3⟩, ⟨3, 2⟩} 2, 3 together
{⟨0, 1⟩, ⟨1, 0⟩, ⟨2, 3⟩, ⟨3, 2⟩} 0, 1 and 2, 3 together
{⟨0, 2⟩, ⟨2, 0⟩, ⟨1, 3⟩, ⟨3, 1⟩} 0, 2 and 1, 3 together
{⟨0, 3⟩, ⟨3, 0⟩, ⟨1, 2⟩, ⟨2, 1⟩} 0, 3 and 1, 2 together
{⟨0, 1⟩, ⟨1, 0⟩, ⟨0, 2⟩, ⟨2, 0⟩, ⟨1, 2⟩, ⟨2, 1⟩} 0, 1, 2 together
{⟨0, 1⟩, ⟨1, 0⟩, ⟨0, 3⟩, ⟨3, 0⟩, ⟨1, 3⟩, ⟨3, 1⟩} 0, 1, 3 together
{⟨0, 3⟩, ⟨3, 0⟩, ⟨0, 2⟩, ⟨2, 0⟩, ⟨3, 2⟩, ⟨2, 3⟩} 0, 2, 3 together
{⟨3, 1⟩, ⟨1, 3⟩, ⟨3, 2⟩, ⟨2, 3⟩, ⟨1, 2⟩, ⟨2, 1⟩} 1, 2, 3 together
{⟨0, 1⟩, ⟨1, 0⟩, ⟨0, 2⟩, ⟨2, 0⟩, ⟨0, 3⟩, ⟨3, 0⟩, ⟨1, 2⟩, ⟨2, 1⟩, ⟨1, 3⟩, ⟨3, 1⟩, ⟨2, 3⟩, ⟨3, 2⟩} . all elements together

8.111 Yes, R1 is an equivalence relation. The equivalence classes are:

• ∅
• {0}
• {1} , {0, 1}
• {2} , {0, 2} , {1, 2} , {0, 1, 2}
• {3} , {0, 3} , {1, 3} , {2, 3} , {0, 1, 3} , {0, 2, 3} , {1, 2, 3} , {0, 1, 2, 3}.

8.112 Yes, R2 is an equivalence relation. The equivalence classes are:

• {} , {0}
• {1} , {0, 1}
• {2} , {0, 2}
• {3} , {0, 3} , {1, 2} , {0, 1, 2}
• {1, 3} , {0, 1, 3}
• {2, 3} , {0, 2, 3}
• {1, 2, 3} , {0, 1, 2, 3}.

8.113 Yes, R3 is an equivalence relation. In fact, the intersection of any two equivalence relations must be an equivalence
relation. Its equivalence classes are the intersections of the equivalence classes of the two:
• {}
• {0}
• {1} , {0, 1}
• {2} , {0, 2}
• {1, 2} , {0, 1, 2}
• {3} , {0, 3}
• {1, 3} , {0, 1, 3}
• {2, 3} , {0, 2, 3}
• {1, 2, 3} , {0, 1, 2, 3}.

8.114 No, R4 is not an equivalence relation: {0} ∩ {0, 1} ̸= ∅ and {1} ∩ {0, 1} ̸= ∅, but {0} ∩ {1} = ∅; thus the
relation isn’t transitive.
156 Relations

8.115 Yes, R5 is an equivalence relation, with the following equivalence classes:

• {}
• {0} , {1} , {2} , {3}
• {0, 1} , {0, 2} , {0, 3} , {1, 2} , {1, 3} , {2, 3}
• {0, 1, 2} , {0, 1, 2} , {0, 2, 3} , {1, 2, 3}
• {0, 1, 2, 3}.

8.116 No, R−1 ◦ R need not be reflexive. If there’s no pair ⟨•, a⟩ ∈ R, then ⟨a, a⟩ ∈
/ R−1 ◦ R. For example, let R = {⟨0, 1⟩}
−1
be a relation on {0, 1} × {0, 1}. Then R ◦ R = {⟨1, 1⟩}, which is not reflexive because it’s missing ⟨0, 0⟩.

8.117 Yes, R−1 ◦ R must be symmetric. Let ⟨a, c⟩ be a generic pair of elements. Then

⟨a, c⟩ ∈ R−1 ◦ R ⇔ ∃b : ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ R−1 definition of ◦

⇔ ∃b : ⟨a, b⟩ ∈ R and ⟨c, b⟩ ∈ R definition of −1
−1
⇔ ∃b : ⟨b, a⟩ ∈ R and ⟨c, b⟩ ∈ R definition of −1
−1
⇔ ⟨c, a⟩ ∈ R ◦ R. definition of ◦

8.118 False, R−1 ◦ R need not be transitive. For example, let R = {⟨0, 2⟩, ⟨0, 3⟩, ⟨1, 3⟩, ⟨1, 4⟩}. Then
−1
R ◦ R = {⟨2, 0⟩, ⟨3, 0⟩, ⟨3, 1⟩, ⟨4, 1⟩} ◦ {⟨0, 2⟩, ⟨0, 3⟩, ⟨1, 3⟩, ⟨1, 4⟩}
= {⟨2, 2⟩, ⟨2, 3⟩, ⟨3, 2⟩, ⟨3, 3⟩, ⟨3, 4⟩, ⟨4, 3⟩, ⟨4, 4⟩}
which is not transitive because it contains ⟨2, 3⟩ and ⟨3, 4⟩ but is missing ⟨2, 4⟩.

8.119 The equivalence relation ≡coarsest has only one equivalence class, and ≡coarsest = A × A. Every pair of elements is in
A × A, so it’s immediate that the relation is reflexive, symmetric, and transitive. Every equivalent relation ≡ on A refines
≡coarsest , vacuously, because (a ≡ a′ ) ⇒ (a ≡coarsest a′ ) because anything implies true, and a ≡coarsest a′ is true for any
a and a′ .

8.120 The equivalence relation ≡finest has a separate equivalence class for each element of A, and ≡finest =
{⟨a, a⟩ : a ∈ A}. The relation is reflexive by definition; it’s symmetric and transitive because there’s never a non-⟨a, a⟩
pair in the relation. Every equivalence relation ≡ on A is refined by ≡finest , vacuously, because (a ≡finest a′ ) ⇒ (a ≡ a′ )
because the only pairs related by ≡finest are ⟨a, a⟩ and ≡ must be reflexive.

8.121 ≡n is a refinement of any equivalence relation that combines one or more of ≡n ’s n equivalence classes together:
that is, ≡n refines ≡k if a ≡n b ⇒ a ≡k b for any integers a and b. This implication holds precisely when k | n.
Thus the coarsenings of ≡60 are ≡1 , ≡2 , ≡3 , ≡4 , ≡5 , ≡6 , ≡10 , ≡12 , ≡15 , ≡20 , ≡30 , ≡60 . And there are infinitely many
refinements of ≡60 , namely ≡60 , ≡120 , ≡180 , . . . .

8.122 The claim is very false. We can coarsen ≡60 by combining any equivalence classes of the relation, not just the ones
that are defined by divisibility. For example, the relation
R = {⟨a, b⟩ : (a mod 60 is prime) ⇔ (b mod 60 is prime)}
is a two-class equivalence relation, where the two equivalence classes are
[0]≡60 ∪ [1]≡60 ∪ [4]≡60 ∪ [6]≡60 . . . ∪ [58]≡60 and [2]≡60 ∪ [3]≡60 ∪ [5]≡60 ∪ [7]≡60 . . . ∪ [59]≡60 .
The only two-class equivalence relation of the form ≡k is ≡2 , whose equivalence classes are very different from these.

8.123 Yes, the is the same object as is a refinement of has the same value as. Any object x has the same value as the object x
itself; thus, if x and y are the same object, then x and y have the same value. That is: x ≡identical object y implies x ≡same value y.
But there may be two distinct objects storing the same value, so the converse is not true. Thus, by definition, ≡identical object
refines ≡same value .

8.124 The partial orders on {0, 1} are:

• ⟨0, 0⟩, ⟨1, 1⟩
8.4 Special Relations 157

• ⟨0, 0⟩, ⟨1, 1⟩, ⟨0, 1⟩

• ⟨0, 0⟩, ⟨1, 1⟩, ⟨1, 0⟩.

8.125 The partial orders on {0, 1, 2} are:

• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 1⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨1, 2⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 2⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨1, 0⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨2, 1⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨2, 0⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 1⟩, ⟨0, 2⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨1, 0⟩, ⟨1, 2⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨2, 0⟩, ⟨2, 1⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨1, 0⟩, ⟨2, 0⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 1⟩, ⟨2, 1⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 2⟩, ⟨1, 2⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 1⟩, ⟨1, 2⟩, ⟨0, 2⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 2⟩, ⟨2, 1⟩, ⟨0, 1⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨1, 0⟩, ⟨0, 2⟩, ⟨1, 2⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨1, 2⟩, ⟨2, 0⟩, ⟨1, 0⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨2, 0⟩, ⟨0, 1⟩, ⟨2, 1⟩
• ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨2, 1⟩, ⟨1, 0⟩, ⟨2, 0⟩.

8.126 Neither, because it’s not antisymmetric: ⟨{2, 5} , {3, 4}⟩ ∈ R1 and ⟨{3, 4} , {2, 5}⟩ ∈ R1 because 3 + 4 ≤ 2 + 5
and 2 + 5 ≤ 3 + 4.

8.127 This relation is a partial order:

• It’s antisymmetric (there are no two distinct subsets of {2, 3, 4, 5} that have the same product—though that’s only
because of the particular
Q numbersQ
involved).
• It’s reflexive (because Qa∈A a = Qa∈A a for any
Q set A). Q Q
• It’s transitive (because a∈A a ≤ b∈B b ≤ c∈C c implies a∈A a ≤ c∈C c for any sets A, B, and C).

8.128 Partial order. It’s antisymmetric because A ⊆ B and B ⊆ A only if A = B; it’s reflexive because A ⊆ A; and it’s
transitive because A ⊆ B ⊆ C implies A ⊆ C.

8.129 Partial order. Actually this is the same question as the previous exercise, just reversed—antisymmetry, reflexivity,
and transitivity follow in precisely the same way.

8.130 Total order. It’s irreflexive (|A| ̸< |A|) but it is transitive (|A| < |B| < |C| implies |A| < |C|).

8.131 Let ⪯ ⊆ A × A. Then:

⪯ is reflexive ⇔ ∀a ∈ A : ⟨a, a⟩ ∈ ⪯ definition of reflexivity

⇔ ∀a ∈ A : ⟨a, a⟩ ∈ ⪯−1 definition of −1

−1
⇔⪯ is reflexive definition of reflexivity

⪯ is antisymmetric ⇔ ∀a, b ∈ A : ⟨a, b⟩ ∈ ⪯ and ⟨b, a⟩ ∈ ⪯ ⇒ a = b definition of antisymmetry

−1 −1
⇔ ∀a, b ∈ A : ⟨b, a⟩ ∈ ⪯ and ⟨a, b⟩ ∈ ⪯ ⇒a=b definition of −1
−1
⇔⪯ is antisymmetric definition of antisymmetry

⪯ is transitive ⇔ ∀a, b, c ∈ A : ⟨a, b⟩ ∈ ⪯ and ⟨b, c⟩ ∈ ⪯ ⇒ ⟨a, c⟩ ∈ ⪯ definition of transitivity

−1 −1 −1
⇔ ∀a, b, c ∈ A : ⟨b, a⟩ ∈ ⪯ and ⟨c, b⟩ ∈ ⪯ ⇒ ⟨c, a⟩ ∈ ⪯ definition of −1
158 Relations

⇔ ⪯−1 is transitive definition of transitivity

Thus ⪯ is a partial order if and only if ⪯−1 is a partial order.

8.132 Suppose that the relation ⪯ is a partial order—in other words, suppose that ⪯ is transitive, reflexive, and
antisymmetric. Let R = {⟨a, b⟩ : a ⪯ b and a ̸= b}.
Irreflexivity of R is immediate: by definition, if a ̸= b then ⟨a, b⟩ ∈
/ R.
For transitivity, let a, b, and c be arbitrary and suppose ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ R. Then a ̸= b and a ⪯ b and b ⪯ c
by definition of R, and by the transitivity of ⪯ we have a ⪯ c. If a = c, then a ⪯ b and b ⪯ a and a ̸= b, violating the
antisymmetry of ⪯; thus a ⪯ c and a ̸= c. Thus R is transitive.

8.133 Suppose that R is a transitive, antisymmetric relation. We’ll show that there is no cycle of length k ≥ 2 in R by
induction on k.
For the base case (k = 2), a cycle would be a sequence ⟨a0 , a1 ⟩ of distinct elements where ⟨a0 , a1 ⟩ ∈ R and ⟨a1 , a0 ⟩ ∈
R. But that’s a direct violation of antisymmetry.
For the inductive case (k ≥ 3), we assume the inductive hypothesis, that there is no cycle of length k−1. Suppose for the
purposes of a contradiction that a0 , a1 , . . . , ak−1 ∈ A forms a cycle of length k: that is, suppose that ⟨ai , ai+1 mod k ⟩ ∈ R for
each i ∈ {0, 1, . . . , k − 1}, with each ai distinct. But because ⟨ak−2 , ak−1 ⟩, ⟨ak−1 , a0 ⟩ ∈ R, by transitivity, ⟨ak−1 , a0 ⟩ ∈
R too. But then a0 , a1 , . . . , ak−2 ∈ A is a cycle of length k − 1—and, by the inductive hypothesis, there are no cycles of
length k − 1.

8.134 The points x = ⟨2, 4⟩ and y = ⟨3, 3⟩ are incomparable: ⟨x, y⟩ ∈

/ R because 4 ̸≤ 3 and ⟨y, x⟩ ∈
/ R because 3 ̸≤ 2.

8.135 For reflexivity, we have ⟨⟨a, b⟩, ⟨a, b⟩⟩ ∈ R because a ≤ a and b ≤ b.
For transitivity, suppose ⟨⟨a, b⟩, ⟨c, d⟩⟩ ∈ R and ⟨⟨c, d⟩, ⟨e, f⟩⟩ ∈ R. Then a ≤ c ≤ e and b ≤ d ≤ f, and thus a ≤ e
and b ≤ f—so ⟨⟨a, b⟩, ⟨e, f⟩⟩ ∈ R too.
Finally, for antisymmetry, suppose that ⟨⟨a, b⟩, ⟨c, d⟩⟩ ∈ R and ⟨⟨c, d⟩, ⟨a, b⟩⟩ ∈ R. But then a ≤ c and c ≤ a (so
a = c), and b ≤ d and d ≤ b (so b = d). Thus ⟨a, b⟩ = ⟨c, d⟩.

8.136 ⟨1, 1⟩, ⟨1, 3⟩, ⟨1, 5⟩, ⟨2, 2⟩, ⟨2, 4⟩, ⟨2, 5⟩, ⟨3, 3⟩, ⟨3, 5⟩, ⟨4, 4⟩, ⟨4, 5⟩, ⟨5, 5⟩

8.137 ⟨1, 1⟩, ⟨1, 2⟩, ⟨1, 3⟩, ⟨1, 4⟩, ⟨1, 5⟩, ⟨2, 2⟩, ⟨2, 3⟩, ⟨2, 4⟩, ⟨2, 5⟩, ⟨3, 3⟩, ⟨3, 5⟩, ⟨4, 4⟩, ⟨4, 5⟩, ⟨5, 5⟩

8.138 ⟨1, 1⟩, ⟨1, 2⟩, ⟨1, 3⟩, ⟨1, 4⟩, ⟨1, 5⟩, ⟨2, 2⟩, ⟨2, 5⟩, ⟨3, 3⟩, ⟨3, 5⟩, ⟨4, 4⟩, ⟨5, 5⟩

8.139 ⟨1, 1⟩, ⟨1, 4⟩, ⟨1, 5⟩, ⟨2, 2⟩, ⟨2, 4⟩, ⟨2, 5⟩, ⟨3, 3⟩, ⟨3, 4⟩, ⟨3, 5⟩, ⟨4, 4⟩, ⟨4, 5⟩, ⟨5, 5⟩

8.140 {1, 2, 3}

{1, 2} {1, 3} {2, 3}

{1} {2} {3}

{}

8.141 000 001 010 011 100 101 110 111

00 01 10 11

0 1
8.4 Special Relations 159

8.142 201 is the only immediate predecessor; 203 is the only immediate successor.

8.143 The immediate predecessors are {101, 2}; there are infinitely many immediate successors, namely all of the integers
in the set {202p : p is prime}.

8.144 One such strict partial order is the transitive closure of R = ⟨n, 2n⟩ : n ∈ Z≥1 ∪ ⟨n, 2n + 1⟩ : n ∈ Z≥1 .
[Note that the relation R is the “child of” relation for a heap stored in an array (see Figure 2.60). Each index has precisely
two children.] This relation is irreflexive (no n ≥ 1 satisfies n ∈ {2n, 2n + 1}) and transitive by the definition of the
transitive closure. Each n has precisely two immediate successors (2n and 2n + 1): there’s no m < 2n such that n ⪯ m
so the only concern is that 2n ⪯ 2n + 1, but n ≥ 1 implies that 2n ≥ 2 and the successors of 2n are 4n > 2n + 1 and
4n + 1 > 2n + 1. Thus 2n and 2n + 1 are immediate successors of n.

8.145 Let ⪯ be a partial order on a finite set A. We’re asked to prove that there must be an a ∈ A that has fewer than two
immediate successors, but we’ll prove something stronger: there must be an a ∈ A that has no immediate successors.
Namely, let a be a maximal element, which is guaranteed to exist by Theorem 8.21. By definition, that maximal element
has no immediate successors (or even non-immediate successors).

8.146 No integer n is a maximal element, because ⟨n + 1, n⟩ ∈ ≥.

8.147 Consider the infinite set R>0 and the partial order ≥. (Remember that 0 ∈
/ R>0 .) There is no maximal element
under ≥ in R for the same reason as with the integers: for any x, ⟨x + 1, x⟩ ∈ ≥, so x is not maximal. There is no
>0

minimal element because, for any x, ⟨x, 2x ⟩ ∈ ≥, so x is not minimal.

8.148 By definition, a minimum element x has the property that every y satisfies y ⪯ x. Thus, if a and b are both minimum
elements, we have a ⪯ b and b ⪯ a. But ⪯ is antisymmetric, so from a ⪯ b and b ⪯ a we know that a = b.

8.149 By definition of a being a minimum element, for any x we have a ⪯ x. We need to show that a is a minimal
element—that is, there is no b ̸= a with b ⪯ a. Consider any b such that b ⪯ a. Then both a ⪯ b and b ⪯ a—but, by
the antisymmetry of ⪯, we therefore have a = b.

8.150 A minimal element under ⪯ is a word w such that no other word is “contained in” the letters of w—that is, no subset
of the letters of w can be rearranged to make a word.
A maximal element under ⪯ is a word w such that no other word can be formed from the letters of w—that is, no
letters can be added to w so that the resulting sequence can be rearranged to make a word.

8.151 The immediate successors of GRAMPS that I found in /usr/share/dict/words (filtered to remove any words that
contained characters other than the 26 English letters) were the following. (I found them using the Python implementation
in Figure S.8.4.)

CAMPGROUNDS EPIGRAMS GAMEKEEPERS HOMOGRAPHS MAGNETOSPHERE MAINSPRING MONOGRAPHS PARADIGMS

PLAGIARISM POSTMARKING PRAGMATICS PRAGMATISM PRAGMATIST PROGRAMS PROMULGATES PTARMIGANS
RAMPAGES

8.152 The longest minimal elements that I found in /usr/share/dict/words (filtered to remove any words that
contained characters other than the 26 English letters) were the following 6-letter words:

ACACIA COCCUS JEJUNE JUJUBE LUXURY MUKLUK NUZZLE PUZZLE SCUZZY UNIQUE VOODOO
I again found them using the Python implementation in Figure S.8.4; see minimal_elements. (In a Scrabble dictionary,
the longest minimal element that I found was the word VIVIFIC [adj., “imparting spirit or vivacity”].)

8.153 A minimal element is a cell that does not depend on any other cell. A maximal element is a cell whose value is not
used by the formula for any other cell in the spreadsheet.

8.154 ⟨1, 2, 3, 4, 5⟩, ⟨1, 2, 4, 3, 5⟩, ⟨1, 3, 2, 4, 5⟩, ⟨2, 1, 3, 4, 5⟩, ⟨2, 1, 4, 3, 5⟩, ⟨2, 4, 1, 3, 5⟩

8.155 ⟨1, 2, 3, 4, 5⟩, ⟨1, 2, 4, 3, 5⟩

160 Relations

1 def goes_to(wordA, wordB):

2 '''Can wordB be formed from wordA (plus at least one additional letter)?'''
3 if len(wordA) >= len(wordB):
4 return False
5 for letter in wordA:
6 if wordA.count(letter) > wordB.count(letter):
7 return False
8 return True
9
10 def immediate_successors(word, words):
11 candidates = [w for w in words if goes_to(word, w)]
12 result = []
13 for w in candidates:
14 immediate = True
15 for found in candidates:
16 if goes_to(found,w):
17 immediate = False
18 break
19 if immediate:
20 result.append(w)
21 return sorted(result)
22
23 def minimal_elements(words):
24 result = []
25
26 # We consider the words in increasing order of length. If you don't sort, then
27 # finding the minimal elements gets WAY slower!
28 for w in sorted(words, key=lambda x:len(x)):
29 minimal = True
30 for pred in result:
31 if goes_to(pred, w):
32 minimal = False
33 break
34 if minimal:
35 result.append(w)
36 return result

Figure S.8.4 An implementation of the anagram game in Python.

8.156 ⟨1, 2, 3, 4, 5⟩, ⟨1, 2, 3, 5, 4⟩, ⟨1, 2, 4, 3, 5⟩, ⟨1, 3, 2, 4, 5⟩, ⟨1, 3, 2, 5, 4⟩, ⟨1, 3, 4, 2, 5⟩, ⟨1, 4, 2, 3, 5⟩, ⟨1, 4, 3, 2, 5⟩

8.157 ⟨1, 2, 3, 4, 5⟩, ⟨1, 3, 2, 4, 5⟩, ⟨2, 1, 3, 4, 5⟩, ⟨2, 3, 1, 4, 5⟩, ⟨3, 1, 2, 4, 5⟩, ⟨3, 2, 1, 4, 5⟩

8.158 {1, 3}, {1, 5}, {3, 5}, {1, 3, 5}, {2, 4}, {2, 5}, {4, 5}, {2, 4, 5}

8.159 All subsets with two or more elements that do not contain both 3 and 4. There are 32 total subsets, of which 6 are
too small (empty or singletons); there are 8 that contain both 3 and 4. The remaining 18 subsets are chains.

8.160 {1, 2}, {1, 4}, {3, 2}, {3, 4}

8.161 {3, 4} only

8.162 Yes: define ⪯ to be {⟨a, a⟩ : a ∈ A}. This relation is reflexive, symmetric, antisymmetric, and transitive—which is
all that we need to have the relation be both a partial order and an equivalence relation.
9 Counting

9.2 Counting Unions and Sequences

9.1 The length of a tweet can range from i = 0 (an empty tweet) up to i = 140. The total number of possible tweets is
X
140
i 256141 − 1 (28 )141 − 1 21128 − 1
256 = = = ,
i=0
256 − 1 255 255

which comes out to approximately 1.43 × 10337 tweets.

9.2 The number of digit-digit-digit-letter-letter-letter license plates is 10 · 10 · 10 · 26 · 26 · 26 = 17,576,000.

9.3 The number of letter-letter-letter-digit-digit-digit-digit license plates is 26 · 26 · 26 · 10 · 10 · 10 · 10 = 175,760,000.

9.4 The number of digit-letter-letter-letter-letter-digit license plates is 10 · 26 · 26 · 26 · 26 · 10 = 175,760,000.

9.5 There were 264 · 104 = (260)4 old license plates. Your new format has 36k possibilities if your plates contain k
symbols. So you need to ensure that 36k ≥ (260)4 . In other words, you need
log 260
k ≥ log36 (260) = 4 · ≈ 6.207.
4
log 36
You therefore need just over 6 symbols—which means you need to pick k = 7 (or larger).

9.6 Counting each type of license plate separately:

• digit-digit-digit-letter-digit-digit. There are 26 · 105 possibilities.
• digit-digit-digit-letter-letter-digit-digit (with the first letter ≤ P). Because |{A, . . . , P}| = 16, there are 16 · 26 · 105
possibilities.
• digit-digit-digit-digit-letter-letter-digit-digit (with the first letter ≥ Q). Similarly, |{Q, . . . , Z}| = 10, so there are
10 · 26 · 106 possibilities.
• digit-digit-digit-letter-letter-letter-digit-digit. There are 263 · 105 possibilities.
Thus the total number of plates is 26 · 105 · (1 + 16 + 100 + 262 ) = 2,061,800,000.

9.7 There are 103 = 1000 numbers of length 3, and 104 = 10,000 of length 4, and 105 = 100,000 of length 5, leading
to a total of 1000 + 10,000 + 100,000 = 111,000. (If you interpreted the system as forbidding leading zeroes in a
password, then the set of legal passwords is precisely the set of numbers between 100 and 99,999, in which case there are
99999 − 99 = 99,900 valid choices.)

9.8 The number of legal passwords is 104 + 105 + 106 = 10,000 + 100,000 + 1,000,000 = 1,110,000. (Again, if
you forbade leading zeroes, then any number between 1000 and 999,999 will do; there are 999,999 − 999 = 999,000
passwords valid under this reading.)

9.9 There are 60 powers. (To avoid any off-by-one errors, I checked this count with a little snippet of Python:
1 powers = [x / 4 for x in range(-10 * 4, 8 * 4 + 1) # x / 4 is the power (allowing x to be an int)
2 if x != 0 and (abs(x / 4) <= 6 or x % 2 == 0)]

161
162 Counting

Indeed, len(powers) now reports 60.) Thus there are 60 different contact lenses that do not correct astigmatism. For
a lens that does correct astigmatism, there are 60 powers, 4 cylindrical powers, and 18 axes. In total, then, there are
60 + 60 · 4 · 18 = 4380 different lenses.

9.10 There are 4380 choices for each contact, and thus 43802 = 8760 total prescriptions.

9.11 A tag is a length-8 sequence with 4 choices per position, so there are 48 = 65,536 tags.

9.12 There are between 14 and 26 students who have written a program in at least one of the two languages. Write P for
the set of Python programmers and J for the set of Java programmers. If P ⊆ J (the most extreme version of overlap)
then there are |P ∪ J| = |J| = 14 students; if P ∩ J = ∅ (the most extreme version of non-overlap) then there are
|P ∪ J| = |P| + |J| = 12 + 14 = 26 students.

9.13 There are 3x · 2y passwords, where x = 4 is the number of ones/ells/eyes and y = 3 is the number of zeroes/ohs. So,
for the given password, there are 34 · 23 = 81 · 8 = 648 possibilities.

9.14 There are 6 · 2 = 12 moves: choose one of six cube faces, and choose one of two directions (clockwise or
counterclockwise).

9.15 We’re looking for a sequence of 26 choices, each from a set of 12 options, so there are 1226 total move sequences.

9.16 There are 12 first moves. All subsequent moves have only 11 options—one being forbidden because of the previous
move. Thus 12 · 1125 sequences are possible.

9.18 There are six faces, each of which can be rotated by any of three angles. Thus there are 6 · 3 = 18 distinct half-turn
moves. The number of sequences of 20 choices, where each choice is from a set of 18 options, is 1820 ≈ 1.27 · 1025 .
There were 1226 ≈ 1.14 · 1028 quarter-turn move sequences in Exercise 9.15, which is roughly 103 = 1000 times more
than the number of half-turn move sequences.

9.19 There are still 18 first moves. Each subsequent move has only 15 options—any of the other 5 faces have 3 choices of
angle. Thus the total number of non-wasteful move sequences is 18 · 1519 ≈ 3.99 · 1023 . There were 12 · 1125 ≈ 1.30 · 1027
quarter-turn move sequences in Exercise 9.16, which is roughly 3000 times more than the number of half-turn move
sequences.

9.20 There are 3 command options (Control, Meta, Control and Meta) and 26 letters. So there are 26 · 3 = 78 command
characters.

9.21 For a two-part command (using either Control+X or Meta+X as a command prefix), we have 4 · 26 command “suffixes”
(a letter with any of four possible options of for the modifiers: Control, Meta, both, or neither). Thus we have 2·4·26 = 208
two-part commands (two choices of the prefix and 4 · 26 choices of the suffix). Using the same logic as in Exercise 9.20,
we also have 3 · 25 = 75 one-part commands that do not involve X. Finally, there is the lone one-part command that uses
X, namely Control+Meta+X. The total number of command characters is therefore 208 + 75 + 1 = 284.

9.22 The key observation is that A − B and B − A and A ∩ B are all disjoint, and furthermore we know that A ∪ B =
(A − B) ∪ (B − A) ∪ (A ∩ B). Because the three sets on the right-hand side of this equation are disjoint, we can apply
the Sum Rule:

|A ∪ B| = |(A − B) ∪ (B − A) ∪ (A ∩ B)|
(the fact that A ∪ B = (A − B) ∪ (B − A) ∪ (A ∩ B) is probably easiest to see by Venn diagram)
= |A − B| + |B − A| + |A ∩ B| . Sum Rule (because A − B and B − A and A ∩ B are all disjoint)

9.23 By Example 9.4, there are 100·99

2
= 4950 bitstrings of length 100 that contain exactly two ones. There are 100
such strings with exactly one 1: for each of the 100 indices, there’s one bitstring with its solitary 1 in that position. And
there’s 1 string with zero ones: namely, the string 000 · · · 0. So the number of 100-bit strings that have at most 2 ones is
4950 + 100 + 1 = 5051.
9.2 Counting Unions and Sequences 163

9.24 Let’s call valid any k-bit string with exactly three ones. What’s the structure of a valid bitstring x? For some length i,
the string x must consist precisely of:
• an i-bit string x′ with exactly two ones; followed by
• 100 · · · 0, where there are exactly k − i − 1 zeros.
(That is, x = x′ 1000 · · · 0, where x′ is a bitstring of length i containing exactly two ones, and there are k − i − 1 zeros at
the end of x.) Note that i can be as small as 2 and as big as k − 1. Every valid string x is uniquely describable in this way,
and so
X
k−1
the number of valid strings = [the number of i-bit strings with exactly two ones] Sum Rule (and the above discussion)
i=0

X
k−1
i(i−1)
= 2
Example 9.4
i=0
"" k−1 # " k−1 ##
1 X 2 X
= · i − i separating the summation into two
2 i=0 i=0

1 k(k − 1)(2k − 1) k(k − 1)
= · −
2 6 2
Theorem 5.3 and Exercise 5.1 (sum of first n integers/squares)
k(k − 1) k(k − 1)(k − 2)
= · (2k − 4) = . factoring
12 6

9.25 Here’s a Python solution that makes use of the built-in capacity to convert an integer into a binary string, and to count
the number of occurrences of a character in a given string:
1 bitstring_length = 16
2
3 # Initialize a list of counts (from counts[0] up to counts[bitstring_length]),
4 # where counts[i] will record the number of bitstrings with exactly i ones.
5 counts = [0 for i in range(bitstring_length + 1)]
6
7 for k in range(2 ** bitstring_length): # k = 0, 1, ... 2**bitstring_length - 1
8 bitstring = bin(k) # convert k into a binary string;
9 num_ones = bitstring.count("1")
10 counts[num_ones] += 1
11
12 print(counts)
13 # Output:
14 # [1, 16, 120, 560, 1820, 4368, 8008, 11440, 12870, 11440, 8008, 4368, 1820, 560, 120, 16, 1]

Double checking for small k: our formula in Exercise 9.23 works out to be k(k+1)
2
+ 1 for general k; from Exercise 9.24, it
was k(k−1)(k−2)
6
. For k = 16, these values are 137 and 560. Indeed, from our computed counts, we see 1 + 16 + 120 = 137
bitstrings with at most 2 ones, and 560 with 3 ones.

9.26 This argument is bogus because it uses the Sum Rule, but it does so on sets that aren’t disjoint! For example, consider
the string x = 1100 · · · 0 ∈ {0, 1} , with ones in positions 1 and 2. Then x ∈ S1 and also x ∈ S2 . Thus S1 ∩ S2 ̸= ∅ and
k

the Sum Rule does not apply. (In fact, each string with exactly 2 ones is in precisely two sets Si and Si ; thus we
′

Pcounted
every string exactly
Ptwice, and we overcounted |S| by a factor of two. In other words, we have that |S| = 12 · ki=1 |Si |,
rather than |S| = ki=1 |Si |.)

9.27 The number of UTF-8 encodings is 27 + 211 + 216 + 220 + 216 = 1,181,824. The first three terms correspond to the
1-, 2-, and 3-byte encodings. The fourth term (220 ) comes from a 4-byte encoding in which yyyyy is of the form 0xxxx;
the last term (the second 216 ) comes from a 4-byte encoding in which yyyyy is of the form 10000.

9.28 The most obvious way to count is to figure out the number of valid characters encoded in 1 byte, then the number
encoded in 2 bytes (which would thus exclude any characters that had a valid 1-byte encoding), etc. But there’s an easier
way: we will identify each character by its longest encoding, not their shortest (though of course the shortest one is actually
the valid encoding). Every 1-byte, 2-byte, and 3-byte encoding can be represented as a 4-byte encoding, which means that
164 Counting

the number of validly encoded characters is simply the number of 4-byte encodings. There are thus 220 + 216 = 1,114,112
valid characters.

9.29 There are 64 choices for where the black rook goes (8 rows, 8 columns). The white rook cannot be placed into
either the same row or the same column, so there are 7 remaining choices for each, or 49 total. There are thus a total of
64 · 49 = 3136 legal positions.

9.30 The hard part of this question is to identify a helpful way to break down the cases. Suppose we place the black queen
first. The crucial variable is this number: how many empty rows/columns are between this queen and the nearest edge of
the board? Call that number k. (For example, the queen is k spaces above the bottom row, and k or more squares away
from the other three edges.)

8
0Z0Z0Z0Z 0Z0Z0Z0Z 0Z0Z0Z0Z 0Z0Z0Z0Z 0Z0Z0Z0Z
7
Z0Z0Z0Z0 Z0Z0Z0Z0 Z0Z0Z0Z0 Z0Z0Z0Z0 Z0Z0Z0Z0
6
0Z0Z0Z0Z 0Z0Z0Z0Z 0Z0Z0Z0Z 0Z0Z0Z0Z 0Z0Z0Z0Z
5
Z0Z0Z0Z0 Z0Z0Z0Z0 Z0Z0Z0Z0 Z0Z0Z0Z0 Z0Z0Z0Z0
4
0Z0Z0Z0Z 0Z0Z0Z0Z 0Z0Z0Z0Z
k=1 0Z0Z0Z0Z 0Z0Z0Z0Z
3
Z0Z0ZQZ0 Z0Z0Z0Z0
k=0 Z0Z0Z0Z0 Z0Z0Z0Z0 Z0Z0Z0Z0
2
0Z0Z0Z0Z 0Z0Z0Z0Z 0Z0Z0Z0Z 0Z0Z0Z0Z 0Z0Z0Z0Z
k 2

k=2 k=3
=

1
Z0Z0Z0Z0 Z0Z0Z0Z0 Z0Z0Z0Z0 Z0Z0Z0Z0 Z0Z0Z0Z0
a b c d e f g h

First, we claim that there are 7 + 2k spaces that this queen can capture diagonally: 2k diagonal captures below the queen—
one in each of the first k columns to the left and right—and 7 above the queen, one in each column. (“Below” is shorthand
for “closer to the closest edge of the board,” and “above” for “the opposite direction.” For a position in the middle of the
next-to-rightmost column, “below” would refer to the rightmost column.) Each queen position also captures 7 horizontal
and 7 vertical squares.
Second, we claim that there are exactly (8 − 2k − 1) · 4 cells that are k steps away from the nearest edge of the board.
We can verify this quantity in the boards shown above, or note that the set of such cells forms a square of side length
8 − 2k. (There might seem to be 4 · (8 − 2k) cells in the perimeter of such a square, but we have to subtract 4 to avoid
double counting the four corners of the square.) That is, there are 4 cells with k = 3; and 12 cells with k = 2; and 20
cells with k = 1; and the remaining 28 cells have k = 0.
Thus the total number of mutually capturing positions is

X
3
(8 − 2k − 1) · 4 · (21 + 2k) = 28 · 21 + 20 · 23 + 12 · 25 + 4 · 27
k=0 number of queen cells number of captured cells on the edge 1 away from the edge 2 away from the edge 3 away from the edge

for a total of 1456 capturing positions. There are 64 · 63 = 4032 total position pairs, so 2576 of these positions are not
mutually capturing.

9.31 A solution in Python is shown in Figure S.9.1. (After running that code, rook_count and queen_count match the
results from the previous two exercises.)

9.32 Your laptop has 8 · 8 choices of frequencies. It eliminates one choice of send frequency and one choice of receive
frequency for the phone, which now has 7 · 7 choices. Together, they eliminate two choices of each for the tablet, which
now has 6 · 6 choices. The total number of choices is (8 · 7 · 6)2 = 112,896. (Incidentally, this problem is precisely the
rook problem on a chessboard, of placing three rooks so that none can capture another.)

9.33 We can write |A ∪ B ∪ C ∪ D| as

|A ∪ B ∪ C ∪ D| = |A| + |B| + |C| + |D| add the single set cardinalities

− |A ∩ B| − |A ∩ C| − |A ∩ D| − |B ∩ C| − |B ∩ D| − |C ∩ D| subtract the pairs
+ |A ∩ B ∩ C| + |A ∩ B ∩ D| + |A ∩ C ∩ D| + |B ∩ C ∩ D| add the triples
− |A ∩ B ∩ C ∩ D|. subtract the quadruple
9.2 Counting Unions and Sequences 165

1 # We'll represent a position on the board by an ordered tuple, so

2 # that (x, y) represents a piece in column x and row y.
3 all_positions = [(i,j) for i in range(8) for j in range(8)]
4
5 def rook_capture(posA, posB):
6 return posA[0] == posB[0] or posA[1] == posB[1]
7
8 def queen_capture(posA, posB):
9 # A and B are on the same diagonal if and only if
10 # the difference in their rows == the difference in their columns
11 return rook_capture(posA, posB) or abs(posA[0] - posB[0]) == abs(posA[1] - posB[1])
12
13 rook_count, queen_count = 0, 0
14 for white_pos in all_positions:
15 for black_pos in all_positions:
16 rook_count += not rook_capture(white_pos, black_pos)
17 queen_count += not queen_capture(white_pos, black_pos)

Figure S.9.1 Non-mutually capturing positions for rooks and queens, in Python.

9.34 Let A, B, and C, respectively, denote those integers between 1 and 1000, inclusive, that are divisible by 3, 5, and 7.
Then

|A| = | {3k : 1 ≤ k ≤ 333} | = 333

|B| = | {5k : 1 ≤ k ≤ 200} | = 200
|C| = | {7k : 1 ≤ k ≤ 142} | = 142
|A ∩ B| = | {15k : 1 ≤ k ≤ 66} | = 66
|A ∩ C| = | {21k : 1 ≤ k ≤ 47} | = 47
|B ∩ C| = | {35k : 1 ≤ k ≤ 28} | = 28
|A ∩ B ∩ C| = | {105k : 1 ≤ k ≤ 9} | = 9.

(I find that writing a set like A as {i : 1 ≤ i ≤ 1000 and i mod 3 = 0} makes the cardinality of A hard to see. Instead,
I’ve chosen to write the same set as {3k : 1 ≤ k ≤ 333}, which seems to make it more apparent.) Thus, by Inclusion–
Exclusion, we have

|A ∪ B ∪ C| = 333 + 200 + 142 − 66 − 47 − 28 + 9 = 543.

9.35 Let A, B, and C, respectively, denote those integers between 1 and 1000, inclusive, that are divisible by 6, 7, and 8.
Then

|A| = | {6k : 1 ≤ k ≤ 166} | = 166

|B| = | {7k : 1 ≤ k ≤ 142} | = 142
|C| = | {8k : 1 ≤ k ≤ 125} | = 125
|A ∩ B| = | {42k : 1 ≤ k ≤ 23} | = 23
|A ∩ C| = | {24k : 1 ≤ k ≤ 41} | = 41
|B ∩ C| = | {56k : 1 ≤ k ≤ 17} | = 17
|A ∩ B ∩ C| = | {168k : 1 ≤ k ≤ 5} | = 5.

(Note that the greatest common divisor of 6 and 8 is 24, not 48. That’s why A ∩ C is expressed as a set of integer multiples
of 24 rather than of 48. Similarly, the greatest common divisor of {6, 7, 8} is 168.) Thus, by Inclusion–Exclusion, we
have

|A ∪ B ∪ C| = 166 + 142 + 125 − 23 − 41 − 17 + 5 = 357.

166 Counting

9.36 Let A, B, C, and D, respectively, denote those integers between 1 and 10,000, inclusive, that are divisible by 2, 3, 5,
and 7. Then

|A| = | {2k : 1 ≤ k ≤ 5000} | = 5000

|B| = | {3k : 1 ≤ k ≤ 3333} | = 3333
|C| = | {5k : 1 ≤ k ≤ 2000} | = 2000
|D| = | {7k : 1 ≤ k ≤ 1428} | = 1428
|A ∩ B| = | {6k : 1 ≤ k ≤ 1666} | = 1666
|A ∩ C| = | {10k : 1 ≤ k ≤ 1000} | = 1000
|A ∩ D| = | {14k : 1 ≤ k ≤ 714} | = 714
|B ∩ C| = | {15k : 1 ≤ k ≤ 666} | = 666
|B ∩ D| = | {21k : 1 ≤ k ≤ 476} | = 476
|C ∩ D| = | {35k : 1 ≤ k ≤ 285} | = 285
|A ∩ B ∩ C| = | {30k : 1 ≤ k ≤ 333} | = 333
|A ∩ B ∩ D| = | {42k : 1 ≤ k ≤ 238} | = 238
|A ∩ C ∩ D| = | {70k : 1 ≤ k ≤ 142} | = 142
|B ∩ C ∩ D| = | {105k : 1 ≤ k ≤ 95} | = 95
|A ∩ B ∩ C ∩ D| = | {210k : 1 ≤ k ≤ 47} | = 47.

Thus, by Inclusion–Exclusion, we have

| A ∪ B ∪ C ∪ D|
= [5000 + 3333 + 2000 + 1428] − [1666 + 1000 + 714 + 666 + 476 + 285] + [333 + 238 + 142 + 95] − 47
individual sets pairs triples quadruple
= 7715.

9.37 Consider two integers n and m, and define M = {k ∈ {1, . . . , n} : m | k}. Suppose that m | n. Then we can rewrite
M as

M = {k ∈ {1, . . . , n} : m | k} = mj : j ∈ 1, . . . , mn .
n
Therefore |M| = m
= n
m
because m | n.

9.38 We are given that the prime factorization of n is p11 · p22 · · · pℓℓ , for integers e1 , . . . , eℓ ≥ 1 and distinct primes
e e e

{p1 , . . . , pℓ }. For an arbitrary integer k ≤ n, we wish to show that k and n have a common divisor greater than 1 if and
only if there is an i for which pi | k.
If pi | k, then certainly k and n have a common divisor—namely pi .
Conversely, suppose pi ̸ | k for every i. We claim that therefore k and n do not have a common divisor. Suppose they
did—say, suppose that d > 1 satisfies d | k and d | n. But consider any prime factor q of d. A prime factor of d must divide
d, and thus q | d and d | n. So q | n. But q ̸= pi for all i, which means that we have identified a prime factor of n that isn’t
part of the given prime factorization of n. This is a contradiction of the uniqueness of prime factorization.

9.39 We have an integer n whose prime factorization is n = pi qj , and we are considering the sets
P = {k ∈ {1, . . . , n} : p | k} and Q = {k ∈ {1, . . . , n} : q | k}. Therefore:

φ(n) = {k ∈ {1, . . . , n} : n and k have no common divisors} definition of φ

= | {k ∈ {1, . . . , n} : p ̸ | k and q ̸ | k} | Exercise 9.38
= | {1, . . . , n} − (P ∪ Q)| definition of P and Q
= n − |P ∪ Q| P ∪ Q ⊆ {1, . . . , n}
= n − |P| − |Q| + |P ∩ Q| Inclusion–Exclusion
9.2 Counting Unions and Sequences 167

Exercise 7.50 says that p | k and q | k if and only if pq | k; thus |P ∩ Q| = {k ∈ {1, . . . , n} : p | k and q | k} =
{k ∈ {1, . . . , n} : pq | k} . Furthermore, observe that pq | n, and therefore that p | n and q | n. Thus, by Exercise 9.37, we
have |P| = pn and |Q| = nq and |P ∩ Q| = pq n
.

= n − np − nq + pqn the above argument (and Exercise 9.37)

= n(1 − 1p )(1 − 1q ). factoring

9.40 Each pair of players forms a possible partnerships: player #1 has 10 possible higher-numbered partners, player #2
P P10
i=1 (11 − i) =
has 9, and so forth. Thus the total number of partnerships is 11 i=0 i = 55.

9.41 We can view a sequence of 10 partnerships as a sequence of 9 choices: for each partnership, we must choose which
of the two players currently batting is the one who gets out. Thus the total number of choices is 29 , by the Generalized
Product Rule.

9.42 The number i of players who get out can range from 0 to 10, inclusive. Then, by the same logic as in the previous
exercise, the sequence of partnerships can be described as a sequence of i choices: for each partnership that ends with an
out, which of the two players is the one who gets out? Thus there are 2i−1 possibilities for a sequence of partnerships with
i players who get out. Thus the total number of possible sequences of partnerships is
P10 i−1 P
i=0 2 = 9i=0 2i = 210 − 1 = 1023,
where the summation is simplified by Example 5.2.

9.43 For 4-digit PINS: there are 1000 numbers that contain the same first two digits; there are 1000 numbers that contain
the same last two digits; and there are 100 numbers with both the first two and the last two digits repeated. Thus there are
|A| + |B| − |A ∩ B| = 1000 + 1000 − 100 = 1900 invalid PINs in this scenario.

9.44 We are considering k-digit PINs that neither start with three repeated digits nor end with three repeated digits. Denote
by A those PINs that start with the same three digits and by B those that end with the same three digits.
A typical element of A can be constructed by choosing the repeated three-digit prefix (10 choices) and the k − 3
remaining digits (10k−3 choices), so |A| = 10 · 10k−3 = 10k−2 . The set B is analogous. But this formula doesn’t hold
when k is too small:
(
10k−2 if k ≥ 3
|A| = |B| =
0 if k ∈ {1, 2}.

when k ≤ 2, PINs don’t even have three digits, so they can’t start with the same three digits

How many PINs both start and end with three repeated digits? For long PINs, we can construct an element of A ∩ B by
choosing the repeated prefix, the repeated suffix, and the k − 6 digits in the middle, for a total of 10 · 10 · 10k−6 = 10k−4
options. Short PINs deviate from this formula if the first three and last three digits overlap:


10
k−4
if k ≥ 6
|A ∩ B| = 10 if k ∈ {3, 4, 5}

0 if k ∈ {1, 2}.
when k ∈ {3, 4, 5}, the first three and last three digits overlap; elements of A ∩ B have all identical digits

(As it happens, the formulas for the first two cases match for k = 5.) Thus, by Inclusion–Exclusion,

number of valid PINs = 10 − |A ∪ B|

k
there are 10k total k-digit numbers

= 10 − (|A| + |B| − |A ∩ B|)

k
Inclusion–Exclusion


10 − 10
k k−2
− 10k−2 + 10k−4 = 9801 · 10k−4 if k ≥ 6
= 10 − 10 − 10k−2 + 10
k k−2
= 98 · 10k−2 + 10 if k ∈ {3, 4, 5}

10k if k ∈ {1, 2}.
Plugging in a few small values of k to this formula yields that there are 10 valid one-digit PINs; 100 valid two-digit PINs;
990 valid three-digit PINs; 9810 valid four-digit PINs; 98,010 valid five-digit PINs; 980,100 valid six-digit PINs; and
9,801,000 valid seven-digit PINs.
168 Counting

9.45 A king can be on any of the 32 shaded squares. (There are 4 shaded squares in each of the 8 rows, for a total of 32.)
A piece can be on any of 28 shaded squares, excluding row 8 for Black and row 1 for Red.
Thus there are 32 + 32 + 28 + 28 = 120 Checkers positions with exactly one token: any of 32 squares for a Black
king, any of 32 squares for a Red king, any of 28 squares for a Black piece, and any of 28 for a Red piece.

9.46 There are 32 places where the Black king can go. Once it is placed, there are 31 remaining choices of squares for the
Red king. Thus there are 32 · 31 = 992 positions with one Red king and one Black king.

9.47 (This calculation is a preview of combinations, found in Section 9.4.) By the same logic as in Exercise 9.46, there
are 32 · 31 = 992 pairs of occupiable squares for Red kings—but we’ve erroneously counted each position exactly twice
(one in each order). Thus the number of positions with two Red kings is 32·31
2
= 992
2
= 496.
Here’s another way to count these positions. Index the 32 shaded cells from 0 to 31. If we place the higher-indexed
king in cell #i, then there are exactly i positions where the lower-indexed
P king can go—namely, any of the cells before
31·(31+1)
it in this numerical order. Thus the total number of positions is 31
i=0 i = 2
= 496 by Theorem 5.3.

28·27
9.48 Similar to Exercise 9.47: there are 28 valid squares for Black pieces, so two Black pieces can go in 2
= 378
positions.

9.49 Imagine placing the Red piece first. There are two cases: the Red piece might be in row 8 (where a Black piece can’t
be anyway), or it can be in rows 2 through 7 (where it occupies one of the cells that a Black piece might have gone in).

• There are 4 places for the Red piece in row 8. In this case, the Black piece can go in any of the 28 valid Black piece
positions.
• There are 24 places for the Red piece in the other rows. (A Red piece can’t go in row 1, so there are 6 valid rows, each
with 4 legal positions.) In this case, the Black piece can go in any of the 27 remaining valid Black piece positions.

So the total number of positions for a piece of each color is 4 · 28 + 24 · 27 = 112 + 648 = 760.

9.50 We could solve this exercise as in Exercise 9.49, by dividing into two cases: is the Red king in row 1 (where a Red
piece can’t be anyway), or is it elsewhere (where it blocks one of the Red piece’s possible positions)? But it’s easier to
solve this problem by considering the pieces in the opposite order. There are 28 places for the Red piece. Wherever it is
placed, there are 31 remaining places for the Red king. Thus there are 28 · 31 = 868 positions.

9.51 There are 28 places for the Red piece, and 31 remaining places for the Black king. Thus there are 28 · 31 = 868
choices. Alternatively, imagine placing the Black king first. There are two cases:

• There are 4 places for the Black king in row 1 (where the Red piece already isn’t allowed). In this case, the Red piece
can go in any of the 28 valid Red piece positions.
• There are 28 places for the Black king in rows 2 through 8. In this case, the Red piece can go in any of the 27 remaining
valid Red piece positions.

So, again, the total number of positions with one Black king and one Red piece is 4 · 28 + 28 · 27 = 112 + 756 = 868.

9.52 The first bookkeeping challenge is tracking the combinations of pieces that are possible: there are three all-Black pairs
(king/king, king/piece, piece/piece) and three all-Red pairs, plus four Black–Red pairs (king/king, king/piece, piece/king,
piece/piece). Thus the number of Checkers board positions with exactly two tokens is
Exercise 9.47 Exercise 9.51 Exercise 9.50
Exercise 9.50 Exercise 9.46 Exercise 9.51 Exercise 9.47 Exercise 9.48
Exercise 9.48 Exercise 9.49

B king B king B piece B king B king B piece B piece R king R king R piece
B king
+ B piece
+ B piece
+ R king
+ R piece
+ R king
+ R piece
+ R king
+ R piece
+ R piece

= 496 + 868 + 378 + 992 + 868 + 868 + 760 + 496 + 868 + 378
= 6972.

9.53 Figure S.9.2 shows a simple solution in Python, which uses brute force to try every possible combination of token
types and every possible pair of valid board positions for those types. The values printed out by this program match those
computed in Exercise 9.52.
9.2 Counting Unions and Sequences 169

1 from collections import defaultdict

2
3 class Token:
4 def __init__(self, name, legal_locations):
5 self.name = name
6 self.legal_locations = legal_locations
7 def getLocations(self) : return self.legal_locations
8 def getName(self) : return self.name
9
10 board = [row + pos for row in "12345678" for pos in "ABCD"]
11 black_piece = Token("B piece", [x for x in board if x[0] != '8']) # no black pieces in Row 8
12 black_king = Token("B king ", board)
13 red_piece = Token("R piece", [x for x in board if x[0] != '1']) # no red pieces in Row 1
14 red_king = Token("R king ", board)
15 token_types = [black_piece, black_king, red_piece, red_king]
16
17 counts, total = defaultdict(list), 0
18 for i in range(len(token_types)):
19 for j in range(i,len(token_types)):
20 types = token_types[i].getName() + " + " + token_types[j].getName()
21 for posA in token_types[i].getLocations():
22 for posB in token_types[j].getLocations():
23 if posA != posB and (i != j or (posB, posA) not in counts[types]):
24 # the positions are different, and we haven't already recorded this pair
25 counts[types].append((posA,posB))
26 total += 1
27
28 # The printed output matches the values for Exercises 9.46--9.52.
29 for pair_type in counts.keys():
30 print(len(counts[pair_type]), "\t", pair_type)
31 print("Total: ", total)

Figure S.9.2 Counting Checkers positions with exactly two tokens.

9.54 It turns out that there are 676 prefix-free subsets of {0, 1} ∪ {0, 1} ∪ {0, 1} . (One of them is the empty set.) Some
1 2 3

Python code to calculate the list of them is shown in Figure S.9.3. After running this code, len(prefix_free_codes)
is 676. The smallest prefix-free subset is the empty set; the largest prefix-free subset contains eight strings—all eight of
the 3-bit strings in {0, 1} .
3

9.55 There are 11 · 8 · 2 = 176 consonants.

1 def power_set(S):
2 if len(S) == 0:
3 return [[]]
4 else:
5 result = power_set(S[1:])
6 return result + [[S[0]] + L for L in result]
7
8 def is_prefix(x, y):
9 return len(x) < len(y) and x == y[0:len(x)]
10
11 def is_prefix_free(S):
12 for x in S:
13 if any([is_prefix(x,y) for y in S]):
14 return False
15 return True
16
17 L = ['0','1','00','01','10','11','000','001', '010','011','100','101','110','111']
18 prefix_free_codes = [C for C in power_set(L) if is_prefix_free(C)]

Figure S.9.3 Counting prefix-free codes.

170 Counting

9.56 There are 2 · 3 · 3 = 18 vowels.

9.57 There are 25 · 5 · 26 = 3250 legal syllables in Japanese according to this description: 25 onset consonants, 5 vowels,
26 codas (25 consonants plus 1 “none”).

9.58 According to this description, there are (25 + 252 ) · 16 · 26 = 270,400 legal syllables in English.

9.59 There are (25 + 252 ) · 16 = 10,400 “first halves” of a syllable, and 16 · 26 = 416 “second halves” of a syllable. So
the total number of demisyllables is 10,400 + 416 = 10,816.

9.3 Using Functions to Count

9.60 By counting in the table in Example 9.23, we find 16 such bitstrings. Alternatively, define encode’ : {0, 1} → {0, 1}
4 7

as the function
encode’(⟨a, b, c, d⟩) = ⟨a, b, c, d, ¬(b ⊕ c ⊕ d), ¬(a ⊕ b ⊕ d), ¬(a ⊕ b ⊕ d)⟩.
(Here ¬0 = 1 and ¬1 = 0.) Every bitstring output by this function fails all the three requirements for the Hamming
code, and there are 24 different outputs (because there are 24 different inputs, all of which generate a different output).

9.61 There are 13 different possible states for any particular square on the chess board: empty, one of the six White pieces
(pNRBQK), or one of the six Black pieces. Define the function that maps a sequence of 64 elements of {1, 2, . . . , 13}
to a board position by going from square A1 to square H8, in order. Some of these positions aren’t achievable in a chess
game (for example, 64 Black pieces and 0 White), but no legal position is described by more than one such sequence.
Thus, by the Mapping Rule, |P| ≤ 1364 .

9.62 There are 27n total strings of length n. For the number with two words: we choose the number i of letters that go in
the first word (between 1 and n − 2); there are n − i − 1 letters in the second word.
X
n−2 X
n−2
26 · 26 = (n − 2)26n−1 .
i n−i−1 n−1
= 26
i=1 i=1

As a quick check to make sure this formula is reasonable: if n = 3, we must put the space in the middle, giving 262
possibilities; if n = 4, we choose one of 26 single-letter words and 262 two-letter words (263 total), and choose which
goes first 2 · 263 ).

9.63 Let’s think about the positions of the s: we have n positions, and must choose 2 of them as the positions for the
spaces. There are n(n−1)
2
total such strings, by Example 9.4. Some of them are illegal, though: let F, L, and A denote the
set of these space positions that, respectively, contain a in the first position; in the last position; and in adjacent
positions. Then |F| = |L| = |A| = n − 1 (one has a fixed position), and |F ∩ A| = |L ∩ A| = |F ∩ L| = 1, and
|F ∩ L ∩ A| = 0 for n ≥ 2. Thus the total number of illegal space positions is, by Inclusion–Exclusion,
(n − 1) + (n − 1) + (n − 1) − 1 − 1 − 1 + 0 = 3n − 6.
Thus the number of legal space positions is
n(n − 1)
− 3(n − 2)
2
There is a bijection between a particular two-space positioning and a (n − 2)-symbol string without spaces. Thus each
space position has 26n−2 possibilities. In total, then:

n(n − 1)
26
n−2
· − 3(n − 2) .
2

9.64 The function f simply translates every [ into a 0 and every ] into a 1. Because f(s) = f(s′ ) only if s = s′ , we have
that f is one-to-one, and so |Bn | ≤ |{0, 1} | = 2n .
n
9.3 Using Functions to Count 171

9.65 The function g simply translates every 0 into a [][] and every 1 into [[]]. For any bitstring x, we have that g(x) ∈
B4|x| . Because g(x) = g(x′ ) only if x = x′ , we have that g is one-to-one, and so 2n/4 = |{0, 1} | ≤ |Bn |.
n/4

9.66 There are 2615 total passwords. By pasting together three 5-letter words, there are 86363 of these passwords. (About
270 vs. 240 .)

9.67 There are 8636 choices for the first word, then 8635 choices for the second, then 8634 choices for the last. Thus there
are 8636 · 8635 · 8634 of these passwords: there were 644,077,163,456, and now there are 643,853,439,240—almost no
change.

9.68 The function f : P → A that sorts is a 6-to-1 function: for any x ∈ A, there are 3! = 6 possible orderings y1 , . . . , y6
of the three words in x such that f(yi ) = x. Thus there are 8636 · 8635 · 8634/6 of these passwords: 107,308,906,540,
still about 236 possibilities.

9.69 There are a total of 32 + 16 + 8 + 4 + 2 + 1 = 63 games. Each game chooses one of its two teams as the winner. Thus
we can view a tournament as a sequence from {0, 1} ; thus there are 263 = 9.2233 · · · × 1018 possible tournaments.
63

9.70 By the Generalized Product Rule, there are 26 · 26 · 26 · 1 · 1 · 1 = 263 such palindromes. (Symbol #4 must match
#3, so there’s only one choice; similarly for #5 and #2, and #6 and #1.)

9.71 By the Generalized Product Rule, there are 26 · 26 · 26 · 26 · 1 · 1 · 1 = 264 such palindromes. (Symbol #5 must
match #3, so there’s only one choice; similarly for #6 and #2, and #7 and #1.)

9.72 Define k = ⌈ n2 ⌉. Define f : Pn → Σk as simply selecting the first k symbols of its input: for example, f(NOON) = NO
and f(RACECAR) = RACE.
The function f is onto: any x ∈ Σk is the output for the palindrome x1 x2 · · · xk xk · · · x2 x1 (for k even) or for
x1 x2 · · · xk · · · x2 x1 (for k odd).
The function f is also one-to-one: if f(x) = f(y), then x1...k = y1...k , and therefore x = y (because the second half of x
equals the reverse of the first half of x, which equals the reverse of the first half of y, which equals the second half of y).
Thus by the bijection rule, |Pn | = |Σ⌈n/2⌉ | = |Σ|⌈n/2⌉ .

9.73 100 has the factors {1, 2, 4, 5, 10, 25, 50, 100}. Of these, the squarefree factors are {1, 2, 5, 10}.

9.74 Note that

12! = 12 · 11 · 10 · 9 · 8 · 7 · 6 · 5 · 4 · 3 · 2 · 1
= (3 · 2 · 2) · 11 · (5 · 2) · (3 · 3) · (2 · 2 · 2) · 7 · (3 · 2) · 5 · (2 · 2) · 3 · 2
= 11 · 7 · 52 · 35 · 210 .
A positive integer factor is just the choice of 0 or 1 factors of eleven; 0 or 1 of seven; 0 or 1 or 2 of five; etc. (For example,
choosing 1 eleven, 1 seven, 3 twos, and 0 of everything else corresponds to the factor 111 · 71 · 50 · 30 · 22 = 11 · 7 · 4 = 308.)
In total, that’s 2 · 2 · 3 · 6 · 11 = 792 factors.

9.75 There are 5 distinct primes that divide 12!. We can choose any subset of these 5 primes, so there are 25 = 32
squarefree factors.

9.76 A solution in Python is shown in Figure S.9.4. Running the square_free function on 12! (via import math and
computing math.factorial(12)) yields 32 squarefree factors: 1, 2, 3, 5, 6, 7, 10, 11, 14, 15, 21, 22, 30, 33, 35, 42, 55,
66, 70, 77, 105, 110, 154, 165, 210, 231, 330, 385, 462, 770, 1155, and 2310.

9.77 This logic is a badly twisted version of the Mapping Rule, and it’s false. For example, consider A = {1, 2, 3} and
B = {1, 2}. The function f(1) = 1 and f(2) = 1 and f(3) = 1 is a non-onto function from A to B, but |A| ̸< |B|. The
mistake in the logic is that the existence of a non-onto function does not imply the nonexistence of an onto function!

9.78 Each of the 5 inputs has 5 possible outputs. By the product rule, there are 55 = 3125 such functions.
172 Counting

1 def prime_factors(n):
2 candidate = 2
3 factors = []
4 while n > 1:
5 if n % candidate == 0:
6 factors.append(candidate)
7 while n % candidate == 0:
8 n = n // candidate
9 candidate += 1
10 return factors
11
12 def square_free(n):
13 result = [1]
14 for factor in prime_factors(n):
15 result = result + [x * factor for x in result]
16 return sorted(result)

Figure S.9.4 Finding squarefree factors in Python.

9.79 Once we’ve chosen an output for 1, we can’t use it again because f is one-to-one. So there are 5 choices for f(1), then
4 for f(2), etc. By the Generalized Product Rule, there are 5! = 120 such functions.

9.80 Every one-to-one function f : {1, 2, . . . , 5} → {1, 2, . . . , 5} is a bijection, because the domain and codomain have
the same size. Thus there are 5! = 120 bijections.

9.81 There are m choices for each of n inputs, so |G| = mn . A function is one-to-one if there are no repetitions, so there
are m · (m − 1) · (m − 2) · · · (m − n + 1) choices for the outputs—or, more compactly, (m−n)!m!
one-to-one functions.
There is a bijection only if m = n, in which case all the one-to-one functions are bijections—so there are
(
0 if m ̸= n
m! if m = n

bijections.

9.82 Let F denote the set of bijections f : A → B and let G denote the set of bijections g : B → A. The function inverse
operation takes a function f ∈ F and produces a function g ∈ G: by definition, f−1 is a function from B to A; it’s one-to-
one because f is onto, and it’s onto because f is a function. Thus inverse : F → G is a bijection from F to G, and thus
|F| = |G|.

9.83 Let Z ⊆ {0, . . . , 9} be the set of valid UPCs. Define f : {0, . . . , 9} → Z as follows:
12 11

" #
X
5
f(x) = ⟨x1 , x2 , . . . , x11 , y⟩, where y = 10 − 3x11 − 3x2i−1 + x2i mod 10.
i=1

It’s clear that f is one-to-one (if f(x) = f(x′ ), then certainly f(x) and f(x′ ) have the same first 11 digits—which means that
x = x′ ). And f is onto too: for any integer b, there is one and only one value x12 such that b + x12 ≡10 0—namely, the
additive inverse of b. Thus there is a bijection between Z and {0, . . . , 9} , so |Z| = 1011 .
11

9.84 There are 21022 such increasing sequences. There is a bijection between the following two sets:
• strictly increasing sequences of integers starting with 1 and ending with 1024.
• subsets of the integers {2, 3, . . . , 1023}.
(The internal entries of the sequence are simply the elements of the subset, in sorted order.) And there are
|P({2, 3, 4, . . . , 1023})| = 21022 such subsets.

9.85 Each subset of {1, 2, . . . , n} corresponds to a subsequence—that is, each index is either included in the subsequence,
or it’s not. Thus there are 2n subsequences.
9.3 Using Functions to Count 173

9.86 The issue is that some of the 2n subsequences are identical, and we need to avoid counting them twice. The subse-
quences that are counted twice are those that include none of the elements between the repeated entries, and precisely one
of the two repeated entries. (For example, PYTHA1 N and PYTHA2 N.) There are 2n−k−2 such repeated subsequences—we
can choose to include any of the n elements aside from the repeated elements and the k elements between them—so we
need to correct for those that we’ve double counted. Thus the final number of subsequences is 2n − 2n−k−2 .

9.87 For n = 7 and k = 4, we have that

|P({1, 2, . . . , n − k})| = |P(1, 2, 3)| = 23 = 8

| {0, 1, 2, . . . , n} | = | {0, 1, 2, . . . , 7} = 8.

Because the cardinalities of the two sets are the same, by the Mapping Rule we know that a bijection decode :
P({1, 2, . . . , n − k}) → {0, 1, 2, . . . , n} exists.

9.88 No! For n = 9 and k = 4, the cardinality of P({1, 2, . . . , n − k}) is 2n−k = 25 = 32 and the cardinality of
{0, 1, 2, . . . , n} is 10. But 10 ̸= 32, so by the Mapping Rule no bijection exists.

9.89 For n = 31, the cardinality of P({1, 2, . . . , n − k}) is 231−k , and the cardinality of {0, 1, 2, . . . , n} is 32 = 25 .
So, by the Mapping Rule, for a bijection to exist we need 231−k = 25 , so we need 31 − k = 5, or k = 26.

9.90 Because |P({1, 2, . . . , n − k})| is always a power of 2, for a bijection to exist then we need | {0, 1, 2, . . . , n} | to
be a power of 2 as well. But that’s true only if n + 1 is a power of 2.

9.91 25n: there are n positions that can be changed, and 25 possible replacement letters.

9.92 There are n + 1 positions where an insertion can occur, and 26 possible inserted letters, so there are 26(n + 1)
resulting strings. However, some of these are counted twice: for example, in the word ABC, inserting a B before or after
the current B yields the same word: A B BC and AB B C. For each position i, inserting another copy of the ith letter before
or after position i yields the same word, so we’ve double-counted precisely n results—one for each letter in the original
string. So the total number of results is 26(n + 1) − n = 25n + 26.

9.93 n − 1 − k, where k = | {i : xi = xi+1 } | is the number of adjacent letters that match: there are n − 1 adjacent pairs
of letters, and swapping any non-identical adjacent pair results in a (different) change.

9.94 n − k, where k = | {i : xi = xi+1 } | is the number of adjacent letters that match: there are n letters that we can delete,
but deleting either of two adjacent identical letters results in the same change.

6!
9.95 For PASCAL: = 360.
1! · 1! · 1! · 1! · 2!

11!
9.96 For GRACEHOPPER: = 4,989,600.
1! · 1! · 1! · 1! · 1! · 2! · 2! · 2!

10!
9.97 For ALANTURING: = 907,200.
1! · 1! · 1! · 1! · 1! · 1! · 2! · 2!

14!
9.98 For CHARLESBABBAGE: = 1,210,809,600.
1! · 1! · 1! · 1! · 1! · 2! · 2! · 3!

11!
9.99 For ADALOVELACE: = 1,663,200.
1! · 1! · 1! · 1! · 2! · 2! · 3!

16!
9.100 For PEERTOPEERSYSTEM: = 10,897,286,400.
1! · 1! · 1! · 2! · 2! · 2! · 2! · 5!
174 Counting

9.101 A solution in Python is shown in Figure S.9.5. It correctly verifies the counts for the strings in the last few exercises,
except for CHARLESBABBAGE and PEERTOPEERSYSTEM, which were too long (and crashed my laptop if I tried to compute
the rearrangements using the given brute-force algorithm).

9.102 An implementation in Python is shown in Figure S.9.6. Using this code, I found that the largest xn is achieved by
n = 7560 = 23 · 33 · 5 · 7, which has x7560 = 3!·3!·1!·1!
8!
= 1120 factorizations.

9.103 There are 9 choices for O’s first move. Then there are 8 choices for X. Then 7 for O. And so forth. So there are
9! = 362,880 total leaves in the tree.

9.104 The player that goes first (O) places 5 symbols; the player that goes second (X) places 4—that is, there’s a leaf for
every set of five squares where O goes. We can think of each distinct board as being counted 5! · 4! times in thesolution
to the previous exercise, so the total number of boards is 9!/(5! · 4!). Using the notation of Section 9.4, that is 95 = 126.

9.105 There is 1 tree after 0 moves. For each board B after k moves, there are 9 − k squares remaining unfilled in the board,
so B has 9 − k children. Thus the number of boards after k moves is (9−k)!
9!
. Therefore our total number of boards is

X9
9!
= 986,410.
k=0
k!

9.106 After k moves, there are ⌈ 2k ⌉ moves made by O, and ⌊ 2k ⌋ moves made by X. We can think of a board as a sequence
of 9 symbols: ⌈ 2k ⌉ are O, ⌊ 2k ⌋ are X, and the rest are . Thus the number of distinct nodes is

X
9
9!
= 6046.
k=0
⌈ 2k ⌉! · ⌊ 2k ⌋! · (9 − k)!

1 import math
2
3 def count_rearrangements_formula(s):
4 denominator = 1
5 for a in set(s):
6 denominator *= math.factorial(s.count(a))
7 return math.factorial(len(s)) // denominator
8
9 def count_rearrangements_brute_force(s): # Sloooow!
10 if len(s) > 12:
11 return "??? [too long for brute force on my laptop!]"
12 orderings = set([s[0]])
13 for ch in s[1:]:
14 orderings = {ordering[:i] + ch + ordering[i:]
15 for ordering in orderings for i in range(len(ordering) + 1)}
16 return len(orderings)
17
18 for s in ["PASCAL", "GRACEHOPPER", "ALANTURING",
19 "CHARLESBABBAGE ", "ADALOVELACE ", "PEERTOPEERSYSTEM "]:
20 print(s, count_rearrangements_formula(s), count_rearrangements_brute_force(s))

Figure S.9.5 Computing the ways to rearrange a given string’s letters.

9.3 Using Functions to Count 175

1 import math
2
3 def count_prime_factorization_multiplicities(n):
4 '''
5 Computes a list of the exponents of the prime factors in the prime factorization of n.
6 For example, the prime factorization of 600 is 2^3 * 3^1 * 5^2, so the result is [3, 1, 2].
7 '''
8 candidate = 2
9 factorCounts = []
10 while n > 1:
11 count = 0
12 while n % candidate == 0:
13 n = n // candidate
14 count += 1
15 if count >= 1:
16 factorCounts.append(count)
17 candidate += 1
18 return factorCounts
19
20 def count_prime_factorizations(n):
21 factorCounts = count_prime_factorization_multiplicities(n)
22 result = math.factorial(sum(factorCounts))
23 for count in factorCounts:
24 result = result // math.factorial(count)
25 return result
26
27 most_factorizations = max(range(10001), key=count_prime_factorizations)
28 print(most_factorizations, "has", count_prime_factorizations(most_factorizations), "factorizations")

Figure S.9.6 Counting prime factorizations.

9.107 There are only 850 unique boards:

number of moves made number of unique boards

0 1
1 3
2 12
3 38
4 108
5 174
6 228
7 174
8 89
9 23

See Figures S.9.7 and S.9.8 for the code, especially the rotate, reflect, and matches_equivalent methods.

9.108 There are 765 boards in the tree. See Figure S.9.8, especially the stop_once_won optional argument.

9.109 Each iteration removes 2 people, so the ith person who gets to choose a partner is choosing when there are n − 2i + 2
unassigned people, including herself. So she has n − 2i + 1 choices. There are 2n rounds of choices, so the total number
of choices is

Y
n/2
Y
(n/2)−1

(n − 2i − 1) = (2j + 1).
i=1 j=0
176 Counting

1 import copy
2
3 class Board:
4 def __init__(self, *args):
5 '''
6 Construct a new empty board (no arguments), a copy of a given board (one argument),
7 or (four arguments: oldBoard, col, row, mover) a new board that extends oldBoard
8 but with an additional move at (col, row) by mover.
9 '''
10 if len(args) == 0:
11 self.cells = [[None, None, None], [None, None, None], [None, None, None]]
12 elif len(args) == 1:
13 self.cells = args[0].copy()
14 elif len(args) == 4:
15 oldBoard, xMove, yMove, mover = args
16 self.cells = copy.deepcopy(oldBoard.cells)
17 self.cells[xMove][yMove] = mover
18
19 def rotate(self):
20 '''Create a new Board, rotated 90 degrees from this one.'''
21 return Board([[self.cells[i][0] for i in [2, 1, 0]],
22 [self.cells[i][1] for i in [2, 1, 0]],
23 [self.cells[i][2] for i in [2, 1, 0]]])
24
25 def reflect(self):
26 '''Create a new Board, reflected horizontally from this one.'''
27 return Board([[self.cells[0][i] for i in [2, 1, 0]],
28 [self.cells[1][i] for i in [2, 1, 0]],
29 [self.cells[2][i] for i in [2, 1, 0]]])
30
31 def has_winner(self):
32 '''Has anyone won this game yet?'''
33 return any([self.cells[xA][yA] == self.cells[xB][yB] == self.cells[xC][yC] != None
34 for ((xA,yA), (xB,yB), (xC,yC)) in
35 [((0,0), (0,1), (0,2)), ((1,0), (1,1), (1,2)), ((2,0), (2,1), (2,2)), # rows
36 ((0,0), (1,0), (2,0)), ((0,1), (1,1), (2,1)), ((0,2), (1,2), (2,2)), # columns
37 ((0,0), (1,1), (2,2)), ((2,0), (1,1), (0,2))]]) # diagonals
38
39 def get_subsequent_boards(self, mover):
40 '''Creates new Boards for every possible move that mover can make in this board.'''
41 return [Board(self, x, y, mover)
42 for x in range(3) for y in range(3) if self.cells[x][y] == None]
43
44 def matches_exactly(self, other_boards):
45 '''Is this Board exactly identical to any Board in other_boards?'''
46 return any([self.cells == other.cells for other in other_boards])
47
48 def matches_equivalent(self, other_boards):
49 '''Is this Board equivalent (under rotation/reflection) to any Board in other_boards?'''
50 return any([board.matches_exactly(other_boards) or
51 board.reflect().matches_exactly(other_boards)
52 for board in [self, self.rotate(), self.rotate().rotate(),
53 self.rotate().rotate().rotate()]])

Figure S.9.7 Building a game tree for Tic-Tac-Toe.

9.110 Following the hint, we will start from the right-hand side n!
(n/2)!·2n/2
and rewrite both factorials and 2n/2 using product
notation:

Qn
n! j
= hQn/2 i hQn/2 i
j=1
definition of factorial/exponentiation
(n/2)! · 2n/2 j 2
j=1 j=1
9.3 Using Functions to Count 177

54 def build_game_tree(prevent_duplicates=False, prevent_equivalents=False, stop_once_won=False):

55 '''
56 Builds the full game tree for Tic-Tac-Toe.
57 If prevent_duplicates: do not create multiple copies of the same board
58 (which arise from the same moves in different orders).
59 If prevent_equivalents: do not create multiple copies of equivalent boards
60 (including by rotation and reflection).
61 If stop_once_won: do not create child boards if the game is already over
62 (because one of the players has won already).
63 '''
64 total_board_count = 0
65 current_layer = [Board()]
66 for move in range(10):
67 print("frontier", move, "has length", len(current_layer))
68 total_board_count += len(current_layer)
69 next_layer = []
70 for board in current_layer:
71 if not stop_once_won or not board.has_winner():
72 for child in board.get_subsequent_boards(move % 2):
73 if (prevent_duplicates and not child.matches_exactly(next_layer)) \
74 or (prevent_equivalents and not child.matches_equivalent(next_layer)) \
75 or (not prevent_duplicates and not prevent_equivalents):
76 next_layer.append(child)
77 current_layer = next_layer
78 print(total_board_count, "total boards.")
79
80 build_game_tree() # Exercise 9.103 and 9.105
81 build_game_tree(prevent_duplicates=True) # Exercise 9.104 and 9.106
82 build_game_tree(prevent_equivalents=True) # Exercise 9.107
83 build_game_tree(stop_once_won=True,prevent_equivalents=True) # Exercise 9.108

Figure S.9.8 Building a game tree for Tic-Tac-Toe, continued.

hQ i hQ i
(n/2)−1 n/2
j=0 (2j + 1) j=1 2j
= hQ i hQ i splitting the numerator into even and odd multiplicands
n/2 n/2
j=1 j j=1 2

Y
(n/2)−1

= (2j + 1) cancelling 2j/(2 · j) for each j = 1 . . . n

2
j=0

Y
(n/2)

= (n − 2i + 1). reindexing, with i = n

2
−j
i=1

9.111 999 prefix reversals: choose any positive number (other than 1) of genes to reverse.

9.112 If we choose i as the first index of the range we’re reversing, then there are 1000 − i choices for j: any of the elements
from {i + 1, i + 2, . . . , 1000} (which contains all but the first i choices). We cannot choose i = 1000. Thus the total
number is

X
999 X
999
1000 − i = i reindexing, with i′ = 1000 − i
i=1 i=1

= 999 · 1000/2 = 499,500.

9.113 If we choose j as the last index of the range we’re moving to the right, then there are j choices for the first moved
element i: any spot from {1, 2, . . . , j}. Having chosen j, there are 1000 − j choices for the element to the right of which
the transposed segment will be placed: any spot from {j + 1, j + 2, . . . , 1000}. Thus the total number of transpositions
178 Counting

is
   
X
1000 X
1000 X
1000
j(1000 − j) = 1000 j − 
2
j
j=1 j=1 j=1

1000 · 1000 · 1001 1000 · 1001 · 2001

= −
2 6
= 500,500,000 − 333,833,500
= 166,666,500.

9.114 There are 20 · 20 = 400 different opening move combinations. The total number of games in the tournament is
150 · 7/2 = 525, which exceeds 400. Thus by the Pigeonhole Principle, at least two games have the same opening pair
of moves.

9.115 A sequence of 7 wins and losses is an element of {W, L} . This set has cardinality 27 = 128. There are 150 players,
7

and 150 > 128. Thus by the pigeonhole principle there must be two players with precisely the same sequence of wins
and losses.

9.116 There are only 8 different possible records—a player can win k games for any k ∈ {0, 1, . . . , 7}, which has size 8.
Thus there are 8 categories of players, and there must be a category with at least ⌈150/8⌉ = 19 players in it.

9.117 If a competitor has k wins, then there are 8 − k possibilities for their number of losses: anywhere between 0 and all
7 − k remaining games. Thus the number of possible win–loss-draw records is

X
7 X
8
8(9)
(8 − k) = j= = 36.
k=0 j=1
2

Thus there are 36 categories of players, and so there must be a category with at least ⌈150/36⌉ = 5 players in it.

9.118 An update rule is a function f from the states of cell u and cell u’s eight neighbors to the new state of cell u. Denote
a cell’s state as an element of {0, 1}, where 0 denotes inactive and 1 denotes active. Thus, more formally, f is a function
f : {0, 1} → {0, 1}. The set of functions from A to B has cardinality |B||A| —for each input element a ∈ A, we specify
9

which element b ∈ B is the corresponding output—so there are

9
| {0, 1} ||{0,1} |

possible update rules. Because | {0, 1} | = 29 = 512, there are a total of | {0, 1} | = 2512 ≈ 1.34078079 × 10154
9 512

update rules.

9.119 A strictly cardinal update rule is a function f : {0, 1} × {0, 1, . . . , 8} → {0, 1}. Thus there are 18 different input
values (on + 0, on + 1, . . . , on + 8; off + 0, . . . , off + 8), and therefore 218 = 262,144 different strictly cardinal update
rules.

9.120 In a 10-by-10 lattice, there are only 100 cells, each of which can be in one of two states—so there are “only” 2100
different configurations of the system. So, let K = 2100 + 1. By the pigeonhole principle, there must be a configuration
that appears more than once in the sequence M0 , . . . , MK . Suppose that the configurations Mx and My are the same, for
x < y. This entire system’s evolution is deterministic, so Mx+1 = My+1 too, and Mx+2 = My+2 , and so forth. Thus we’re
stuck in an infinite loop: once we reach Mx , later (y − x steps later, to be precise), we’re back in the same configuration.
Another y − x steps later, we’re back to the same configuration again.
By the pigeonhole principle argument, we know that MK is part of this infinite loop. If MK = MK+1 , then the infinite
loop has “period” one: we just repeat MK forever. (And thus we have eventual convergence.) If MK ̸= MK+1 then the
infinite loop contains at least two distinct configurations, and the system oscillates.

9.121 The hash function maps A → B, with |A| = 1234 and |B| = 17. By the division rule, there must be a bucket with at
least ⌈|A|/|B|⌉ elements—that is, with ⌈1234/17⌉ = 73 elements. If the hash function divides elements perfectly evenly,
then 73 would be the maximum occupancy.
9.4 Combinations and Permutations 179

9.122 The condition is simply that |A| ≥ 202|B|. If this condition holds, then by the generalized version of the Pigeonhole
Principle, there are |B| categories of objects (the |B| different possible output values of f), and some category must therefore
have at least ⌈|A|/|B|⌉ ≥ ⌈202|B|/|B|⌉ = 202 elements.

9.123 There are 5 categories and n inputs, so there must be an output that’s assigned at least n/5 inputs. If that output ki
is precisely in the middle of the range of values mapped to it, then there must be a point that’s n/10 away.

9.4 Combinations and Permutations

9
9

9.124 4
= 5
= 126: there are a total of 9 characters, of which 4 come from BACK.

8
8

9.125 3
= 5
= 56: there are a total of 8 characters, of which 3 come from DAY.

9.126 126
= 924: the only repetition (of P and D) comes within one of the words, and that doesn’t affect the number of
shuffles.

9.127 There are 94 shuffles, but any shuffle starting with a shuffle of LIF and D and ending with EEATH is counted twice—

once with LIFE’s E first, and once with DEATH’s E first. There are 43 = 4 such shuffles, so the total number of shuffles

of LIFE and DEATH is 94 − 4 = 122.

9.128 There are only two shuffles, because the resulting string must start with an O and end with an N, so the only choices
are ONON and OONN.

9.129 The result must start with O. If its form is OO????, then there are two shuffles, corresponding to the two shuffles of
UT and UT, just as in the previous exercise. If its form is OU????, then there are three shuffles, corresponding to the three
shuffles of T and OUT, depending on whether nothing, the O, or the OU come before the first T. Thus the total number of
shuffles is 5.

9.130 Here is a solution in Python:

1 def all_shuffles_with_duplicates(x, y):
2 '''Compute a list of all shuffles (with duplicates allowed) of the strings x and y.'''
3 if len(x) == 0:
4 return [y]
5 elif len(y) == 0:
6 return [x]
7 else:
8 return [x[0] + rest for rest in all_shuffles_with_duplicates(x[1:], y)] \
9 + [y[0] + rest for rest in all_shuffles_with_duplicates(x, y[1:])]
10
11 def count_shuffles(x,y):
12 return len(set(all_shuffles_with_duplicates(x, y)))

9.131 If no letters are shared between the two strings, then the total number of shuffles is
!
|x| + |y|
.
|x|

The largest binomial coefficient nk is for k ∈ {⌊n/2⌋ , ⌈n/2⌉}, so the largest possible number of shuffles for two strings
of total length n is
! !
n n
= .
⌊n/2⌋ ⌈n/2⌉

9.132 Suppose that x is AAA· · · A and y is AAA· · · A. Then, no matter the order in the shuffle, the resulting shuffle is simply
n As—no matter the relative lengths of x and y. In this case, then, there is only one possible shuffle.
180 Counting

9.133 Suppose all symbols are distinct. Then, for each position in the shuffle, we have to choose which of the three input
strings it comes from. The easiest way to write this: of the a + b + c slots, we choose a that come from the first string,
and then of the b + c remaining slots we choose which b come from the second string. Thus the total number of shuffles
is
! !
a+b+c b+c
· .
a b

42

9.134 16

23
23

23

23
9.135 0
+ 1
+ 2
+ 3
= 1 + 23 + 253 + 1771 = 2048 = 211

9.136 In a 32-bit string, having a number of ones within ±2of the numberof zeros means having 15, 16, or 17 ones (and
17, 16, or 15 zeros). Thus the number of such strings is 32
15
+ 3216
+ 32
17
.

9.137 64
32
is roughly 1.8326 × 1018 . The first k for which 64
k
is within a factor of twenty of that number is for k = 23.
P23
i=0
64
k
≈ 3.022 × 1017
P24
i=0
64
k
≈ 5.529 × 1017
P25
i=0
64
k
≈ 9.539 × 1017
P26
i=0
64
k
≈ 1.555 × 1018
P27
i=0
64
k
≈ 2.402 × 1018

So there are as many 32-one strings as there are (≤ 27)-one strings in {0, 1} .
64

9.138 We seek the smallest n for which n
n/2
≤ 0.10 · 2n . It turns out that n = 64 is the breaking point:

62

64
31
= 0.1009 · · · and 32
= 0.0993 · · · .
262 264

9.139 52 − 13 = 39 cards are nonspades. We must choose all 13 cards from this set, so there are 39
13
such hands.

9.140 52 − 13 = 39 cards are nonhearts. We must choose 12 cards from this set, and one heart, so there are 39
12
· 13
1
such hands.

48
9.141 There are 48 nonkings, and 9 cards left to choose, so there are there are 9
such hands.

48

9.142 There are 48 nonqueens, and 13 cards left to choose, so there are there are 13
such hands.

9.143 There are 48 nonjacks, and 4 jacks. We choose 11 of the former and two of the latter, so there are there are 48
11
· 4
2
such hands.

9.144 There are 44 nonjack-nonqueens, 4 jacks,

and 4 queens. We choose 9 of the first, and two each of the latter two
categories. Thus there are there are 449 · 42 · 42 such hands.

9.145
We choose a suit for the honors, and then the remaining 8 cards from the other 47 cards in the deck. So there are
4
1
· 478 choices. However, this overcounts, because it’s possible to have honors in more than one suit at the same time.
The number of hands with double high honors can be thought of this way: choose
two suits in which to have honors, then
choose the remaining three cards from the other 42 cards in the deck: 42 · 433
. It’s impossible to have triple honors—that
would require 15 cards.
Thus the total number of hands with high honors is 41 · 47 8
− 42 · 433 .
9.4 Combinations and Permutations 181

36

9.146 Simply, we can’t have any face cards. There are 9 non-face cards in each suit, for 36 total—thus we have 13
such
hands.

9.147 A zero-point hand has only non-face cards, and has three or more cards in each suit. That is, the hand must have
(3, 3, 3, 4) cards per suit. Thus we can represent a zero-point hand as follows: first we choose which suit gets four cards.
Then we choose 4 of the 9 non-face cards in the 4-card suit and in each 3-card suit we choose 3 of the 9 non-face cards.
3
In total, that’s 4 · 93 · 94 hands.

There are 13 total bridge hands. Therefore the total fraction of hands that have zero points is 0.0004704 · · · . I seem
52

to get them a lot more often than that.

9.148 20232

202! 202!
9.149 (202−32)!
= 170!

202+32−1
233
233!
9.150 32
= 32
= 201!·32!

202
202! 202!
9.151 32
= (202−32)!·32!
= 170!·32!

10

9.152 5
= 252

14

9.153 5
= 2002

29

9.154 10
= 20,030,010

20

9.155 10
= 184,756

9.156 2nn

9.157 2n · (2n − 1) · (2n − 2) · · · (n + 1) = (2n)!

9.158 All that matters is how many of the n elements of x are matched by each element of y; the alignment follows
immediately. We can think of this process, then, as choosing
n elements
out of a set of 2n options, where repetition is
allowed but order doesn’t matter. Thus there are 2n+n−1
n
= 3n−1
n
ways to make this choice.

2n

9.159 All that matters is which n elements of y we match; the alignment follows immediately. There are n
ways of doing
this.

elements from {a, b, c}, where order doesn’t matter and repetition is
9.160 This question is equivalent to choosing 202
allowed. So the total number of choices is 204
202
= 20,706.

9.161 This question is equivalent to choosing

8 elements from {a, b, c, d, e}, where order doesn’t matter and repetition is
allowed, so the number of ways is 12 8
= 495.

9.162 This question is equivalent to choosing

88 elements from {a, b, c, d, e}, where order doesn’t matter and repetition
is allowed, so the number of ways is 92
88
= 2,794,155.
182 Counting

9.163 Suppose b = 64 − i; then a + c = 2i; by Theorem 9.16 there are 2i+1 2i
= 2i + 1 ways to choose a and c to make
a + c = 2i (one for each a ∈ {0, 1, . . . , 2i}). Thus the number of choices is
!
X64
2i + 1 X64
= 2i + 1 above discussion
i=0
2i i=0

X
64
= 65 + 2 i
i=1

= 65 + 2 · 32 · 65
= 652 .

141

9.164 3

9.165 103

29

9.166 20

9.167 11 5
locks—one to ensure that each subset of size 5 is locked out. (For a subset A of size 5, everyone other than
the members of A gets the key associated with subset A; the members of Aare the ones who do not get that key.) Each
scientist needs a key for every subset of size 6 she’s in, so she must carry 105 keys—there are 10 other scientists of which
5 are chosen in a particular subset.

9.168 There are n! orderings of the data, and then we divide by the number of orderings within each group ((n/10)! each),
and the number of orderings of the groups (10!):

n!
10
n
10
! · 10!

9.169 One way to think about this problem: we write down n zeros, which leaves n + 1 “gaps” before/between/after the
zeros. We must choose which k of these gaps to fill with a one—so there are n+1 k
ways to do so.
Alternatively, let T(n, k) denote the number of valid bitstrings for a given k and n. We’ll prove for any n ≥ 0 and
k ≥ 0 that T(n, k) = n+1 k
by induction on n.

Base case (n = 0). If k = 0, there’s only one valid string (the empty string); if k = 1, there’s still one valid string(1); if
k ≥ 2, there’s no valid string (because we have no zeros to separate the ones). Indeed 10 = 11 = 1 and k≥2 1
= 0.

Inductive case (n ≥ 1). We assume the inductive hypothesis that, for any k ≥ 0, we have T(n − 1, k) = nk . We must

prove that T(n, k) = n+1 k
for any k ≥ 0. Every (n, k)-valid string either starts with 0 and continues with (n − 1, k)-
valid suffix, or it starts with 1, followed immediately by a 0 (otherwise there’d be two consecutive ones), and then
continuing with (n − 1, k − 1)-valid suffix. In other words, T(n, k) = T(n − 1, k) + T(n − 1, k − 1). Thus we have

T(n, k) = T(n − 1, k) + T(n − 1, k − 1) above discussion

n
n

= k
+ k−1
inductive hypothesis
n+1

= k
. Pascal’s Identity (Theorem 9.19)

9.170 There is a bijection between bitstrings x ∈ {0, 1}

n+k
with n zeros and k ones in even-length blocks and string

y ∈ {0, 2}
n+(k/2)
with n zeros and 2k twos, where the bijection replaces 11 with a 2. The latter set has size n+(k/2)
n
by
definition.

9.171 The combinatorial proof: suppose we have a set of team

of n players, and we need to choose a starting lineup of k of
them, of whom one will be the captain. Thequantity k · nk represents first choosing n
the k starters—that’s k choices; then
we pick one of them as the captain, with 1k = k options. The quantity n · n−1
k−1
represents first choosing captain—with
a
n
1
= n choices; then we pick the k − 1 remaining starters from the n − 1 remaining players, with n−1 k−1
options.
9.4 Combinations and Permutations 183

The algebraic proof:

!
n k · n!
k· = definition of binomial coefficient
k (n − k)! · k!
k · n · (n − 1)!
= definition of factorial; n − k = (n − 1) − (k − 1)
[(n − 1) − (k − 1)]! · k · (k − 1)!
(n − 1)!
=n· cancellation
[(n − 1) − (k − 1)]! · (k − 1)!
!
n−1
=n· . definition of binomial coefficient
k−1

P
9.172 We will prove ni=0 ni = 2n for all n ≥ 0 by induction on n.
P
Base case (n = 0). 0i=0 0i = 00 = 1, and indeed 20 = 1.
P P
Inductive case (n = 1). We assume the inductive hypothesis n−1 i=0
n−1
i
= 2n−1 . We must prove that ni=0 ni = 2n .
! ! ! n−1 !
Xn
n n n X n
= + + definition of summations
i=0
i 0 n i=1
i
! ! n−1 " ! !#
n n X n−1 n−1
= + + + Pascal’s identity
0 n i=1
i i−1
! ! " n−1 !# " n−1 !#
n n X n−1 X n−1
= + + + splitting the summation
0 n i=1
i i=1
i−1
! " n−1 !# " n−2 !# !
n X n−1 X n−1 n
= + + + reindexing and reordering
0 i=1
i i=0
i n
! " n−1 !# " n−2 !# !
n−1 X n−1 X n−1 n−1 (n) ( ) () ( )
= + + + = 1 = n−1 and nn = 1 = n−1
0 i=1
i i=0
i n − 1 0 0 n−1

" n−1 !# " n−1 !#

X n−1 X n−1
= + definition of summations
i=0
i i=0
i
= 2n−1 + 2n−1 inductive hypothesis
= 2n .

9.173 I’m running a baking contest with 2n chefs, of whom n are professional and n are amateurs. Everyone makes a
chocolate cake, and n winners are chosen. How many ways of selecting the n winners are there?

• There are 2n competitors, and n winners. Thus there are 2nn ways of selecting the winners.
• Suppose that precisely k of the n amateurs win. Thenthere are nk ways to pick the amateur winners. That leaves n− k
winners out of the n professionals, and there are n−k ways to pick the professional winners. Thus there are nk · n−k
n n

ways to pick the winners if there are k amateur winners. The value of k can range from 0 to k, so the total number of
ways to pick the winners is
X
n
n
n

k n−k
,
k=0

which equals
X
n

n 2
k
k=0

by Theorem 9.18, because n
k
= n
n−k
so n
k
· n
n−k
= n
k
· n
k
.
184 Counting

2n
X
n

n 2
Because we’ve counted the same set both ways, we have n
= k
.
k=0

9.174 Here’s an algebraic proof:

! !
n m n! m!
= ·
m k (n − m)! · m! (m − k)! · k!
n!
=
(n − m)! · (m − k)! · k!
n! (n − k)!
= ·
k! · (n − k)! (n − m)! · (m − k)!
n! (n − k)!
= ·
k! · (n − k)! ([n − k] − [m − k])! · (m − k)!
! !
n n−k
= · .
k m−k

9.175 Let S denote the set of all choices of pairs ⟨T, M⟩ where: T ⊆ {1, . . . , n} is a team chosen from a set of n candidates,
|T| = m, M ⊆ T is a set of managers chosen from the chosen team, and |M| = k. We calculate |S| in two different ways:

• choose m team members from the set ofn candidates. Then choose k of those m to be managers. By the generalized
product rule, then, we have |S| = mn mk .
• choose k managers from the set of n candidates. Then choose
n−k m − k non-manager team members from the remaining
n − k unchosen candidates. Then we have that |S| = nk m−k .
m
n n
n−k

Therefore m k
= k m−k
.

9.176 Here is a combinatorial proof. There is a binder containing n + 1 pages of tattoo designs, one design per page. You
intend to get m + 1 tattoos. Here are two ways to choose:

• Just pick m + 1 of the n + 1 pages (repetition forbidden; order irrelevant). There are m+1 n+1
ways to do so.
• Choose a page k such that the design on page k+ 1 will be the last tattoo you choose. Select m of the pages {1, 2, . . . , k}
as the other tattoos you’ll select. There are mk ways of choosing m additional tattoos from the first k pages. Your choice
of k can range from 0 to n (so that page k + 1 exists in the binder).
n+1
Pn k

Thus m+1
= k=0 m
.

9.177 We’ll prove that the property

⌊n/2⌋
X n−k

k
= fn+1 ,
k=0

which we’ll call P(n), holds for all n ≥ 0 by strong induction on n.

Base cases (n = 0 and n = 1). For n = 0 and n = 1, we have ⌊n/2⌋ = 0. So

⌊n/2⌋
X n−k
X
0
n−k
n

k
= k
= 0
= 1.
k=0 k=0

And f0 = f1 = 1. Thus P(0) and P(1) hold.

9.4 Combinations and Permutations 185

Inductive case (n ≥ 2). We assume the inductive hypotheses, namely P(0), P(1), . . . , P(n − 1). We must prove P(n).
Observe that
⌊n/2⌋
X n−k

k
k=0
   
⌊n/2⌋ ⌊n/2⌋
X X
= n−1−k
k
+ n−1−k
k−1
 Pascal’s Identity
k=0 k=0
   
⌊n/2⌋ ⌊n/2⌋
X X (n−1−0)
= n−1−k
k
+ n−1−k
k−1

0−1
=0
k=0 k=1
   
⌊n/2⌋ ⌊n/2⌋−1
X X
= n−1−k
k
+ n−2−j
j
 reindexing the second term with j := k − 1
k=0 j=0
   
⌊n/2⌋ ⌊(n−2)/2⌋
X X
= n−1−k
k
+ (n−2)−j
j
 ⌊n/2⌋ − 1 = ⌊(n − 2)/2⌋
k=0 j=0
 
⌊n/2⌋
X
= n−1−k
k
 + fn−1 . inductive hypothesis
k=0

We’d be done if the first term is equal to fn , because fn + fn−1 = fn+1 by definition of the Fibonacci numbers. We’ll
show this in two cases, depending on the parity of n:
Case A: n is odd. Then there’s not much to do:
   
⌊n/2⌋ ⌊(n−1)/2⌋
X n−1−k X
 = (n−1)−k  ⌊n/2⌋ = ⌊(n − 1)/2⌋ when n is odd
k k
k=0 k=0

= fn . inductive hypothesis

Case B: n is even. There’s slightly more work here:

   
⌊n/2⌋ ⌊(n−1)/2⌋+1
X n−1−k X
 = (n−1)−k  n/2 = ⌊n/2⌋ = ⌊(n − 1)/2⌋ + 1 when n is even
k k
k=0 k=0
 
⌊(n−1)/2⌋
X ((n−1)−k) (n−1−n/2) (n/2−1)
= (n−1)−k
k
 But for k = n/2, we have k
= n/2
= n/2
=0
k=0

= fn . inductive hypothesis

In either case, we’ve now shown that

 
⌊n/2⌋ ⌊n/2⌋
X X
n−k
k
= n−1−k 
k
+ fn−1 original argument above
k=0 k=0

= fn + fn−1 even/odd case-based argument above

= fn+1 . definition of the Fibonaccis

9.178 Suppose you have a deck of n red cards and m black cards, from which you choose a hand of k total cards.

• There are n + m total cards of which you choose k, so there are n+m k
ways to do it.

• If you choose r red cards, then you must choose k − r black cards. There are nr · k−rm
ways to do it. You can choose
Pk
any r between 0 and k, so there are r=0 k−rm
· nr ways in total.
(Note that this formula is a generalization of Exercise 9.173.)
186 Counting

9.179 There are 2n subsequences of x and 2n subsequences of y. Thus we execute the a = b test 2n · 2n = 22n = 4n times.

9.180 There are ni subsequences of x of length i, and there are n
i
subsequences of y of length i. So the total number of
executions of the a = b test is

!
X
n
2n
n
i
· n
i
= ,
i=0
n

by Exercise 9.173.

9.181 The performance in Figure 9.38b is

! √ 2n
2n 2n! 2π2n 2ne
= ≈ √ n √ n
n n! · n! 2πn ne 2πn ne
2 n
4n
e2
= √
n n

n n
πn e e
!n
4n2
1
= √ · e2
πn n
e
· n
e
1
= √ · 4n .
πn

√
Thus the latter is an improvement by a factor of ≈ πn.

P
n
9.182 We are asked to prove (−1)k · n
k
= 0 using the Binomial Theorem:
k=0

X
n
X
n

(−1)k · n
k
= (−1)k · 1n−k · n
k
k=0 k=0

= (1 + −1)n Binomial Theorem

= 0n
= 0.

Pn (nk)
3 n
9.183 We are asked to prove k=0 2k = 2
using the Binomial Theorem:

X
n
X
n

n
k
/ 2k = (1/2)k · 1n−k · n
k
k=0 k=0

= ((1/2) + 1)n Binomial Theorem

= (3/2)n .

S
9.184 Consider an element x ∈ ki=1 Ai . Suppose that x appears in exactly xℓ of the sets Ai s—in other words, suppose that
| {i : x ∈ Ai } | = xℓ . Observe that the inner summation in the given formula, over all i-set intersections, can equivalently
9.4 Combinations and Permutations 187

be written as a summation over elements:

" #
Xk X
(−1) ·
i+1
|Aj1 ∩ Aj2 ∩ · · · ∩ Aji |
i=1 j1 <j2 <···<ji
 
X
k X X
= (−1)i+1 · 1
i=1 j1 <j2 <···<ji x∈Aj1 ∩Aj2 ∩···∩Aji
 
X
k X
= (−1) i+1
· (the number of i-set intersections containing x) reindexing to sum over elements instead of sets
∪
i=1 x∈ ki=1 Ai
 
X
k X (xℓ )
= (−1) i+1
· xℓ  x appears in xℓ sets, so there are ways to select i sets that all contain x
i i
∪k
i=1 x∈ i=1 Ai
" #
X X
k

= (−1) i+1
· xℓ
i
pushing constaints into the summation, and reversing the sum
∪
x∈ ki=1 Ai i=1
" #
X X
xℓ
(xℓ )
= (−1) · (−1) · i xℓ
i k
= 0 for k > xℓ
∪k
x∈ i=1 Ai i=1
" #
X X
xℓ

= −(−1)(−1)0 x0ℓ + (−1) · (−1) · i xℓ
i
adding and subtracting the i = 0 term
∪
x∈ ki=1 Ai i=0
" #
X X
xℓ

1− (−1) · i xℓ
= i
∪
x∈ ki=1 Ai i=0
X
= [1 − 0] Exercise 9.182
∪
x∈ ki=1 Ai

[
k
= Ai .
i=1

9.185 We can write

|S| |S|
X |S|
X |S|

k
· 2k . = k
· 2k · 1n−k .
k=0 k=0

By the Binomial Theorem, this quantity is

(2 + 1)|S| .
Thus there are 3|S| pairs in the subset relation on S. (Indeed, in Example 8.4, we saw that the subset relation on {1, 2}
had 9 elements.)
10 Probability

10.2 Probability, Outcomes, and Events

10.1 {0, 1, . . . , 100}

10.2 (1/2)100 ≈ 7.8886 × 10−31

100

10.3 There are 50
sequences that achieve exactly 50 heads, each of which occurs with probability ( 12 )100 . So the
probability is

100
50
· (1/2)100 = 0.079589 · · · .

100

10.4 64
sequences of flips have exactly 64 heads, and each of them occurs with probability ( 12 )100 . so the probability is

100
64
· (1/2)100 = 0.001559 · · · .

10.5 There are more heads than tails in 2 flips only if the number of heads is 2, and Pr [2] = 14 .

3
3

10.6 There are more heads than tails in 3 flips if the number of heads is 2 or 3, which occurs with probability [ 3
+ 2
]· 18 ,
which is (1 + 3) · 18 = 12 .

10.7 Following the hint, observe that the probability of getting k heads is precisely the same as the probability of k tails
because the coin is fair, so Pr [k] = Pr [1001 − k]. Thus

P1001
k=501 Pr [k]
Pr [(strictly) more heads than tails] = hP i hP i
500 1001
k=0 Pr [k] + k=501 Pr [k]
P1001
k=501 Pr [k]
= hP i hP i the hint
k=0 Pr [1001 − k] +
500 1001
k=501 Pr [k]
P1001
k=501 Pr [k]
= hP i hP i reindexing
1001 1001
k=501 Pr [k] + k=501 Pr [k]

1
= .
2

Thus the probability is 12 .

10.8 By the same logic as in the previous exercise, we have that that probability of having strictly more heads than tails is
identical to the probability of having strictly more tails than heads. Thus it’s precisely a half—but half of the probability

188
10.2 Probability, Outcomes, and Events 189

of a non-tie, not necessarily half of one! In particular,

1
Pr [(strictly) more heads than tails] = · [1 − Pr [exactly the same number of heads and tails]]
(2
1
· [1 − Pr [n/2 heads]] if n is even
= 21
2
if n is odd
( −n
1
− n/2 · 2
n
if n is even
= 21
2
if n is odd.

10.9 The fraction of hands that include both cards, namely

50

11
52
= 0.05882 · · · .
13
50

(There are 11
ways to choose the remaining 11 cards from the 50 other cards in the deck.)

10.10 This is just the fraction of A♣ hands that also include the A♢, which is
50

11
51
= 0.23529 · · · .
12

10.11 In the card ranking system, there are 12 cards lower than A♣, and 39 cards higher than A♣. A♣ is the fourth-from-
the-right if and only if there are 9 higher-ranked and 3 lower-ranked cards. Thus the number of hands that meet the stated
criterion is

39
9
· 1 · 123 = 46,621,329,040.
The A♢ is higher than the A♣, so the number of these hands that also include A♢ is

38
8
· 1 · 123 = 10,758,768,240.
Thus the probability that Bridget has the A♢ too is the ratio of these numbers, which is 0.23076 · · · .

10.12 39
12
hands have A♣ as the lowest card; 38
11
of these hands contain the A♢. Thus the probability is
38

11
39
= 0.30769 · · · .
12

10.13 The probability is zero! If A♣ is the highest card, then none of the higher cards (including the A♢) can be in the
hand.

10.14 There are 4 · 13
4
hands of all the same suit: choose a suit, then choose which four cards in that suit to have.

4 · 13 2860
52
=
4
= 0.01056 · · · .
4
270725

10.15 Choose a lowest rank from {A, . . . , 10}. Choose a suit for each of the four elements in the run. There are
10 · 4 · 4 · 4 · 4 = 2560

such hands. There are 52
4
= 270,725 hands, so the probability is 2560
270725
= 0.00945 · · · .

10.16 Choose a lowest rank from {A, . . . , J}. Choose which of the three ranks (r, r + 1, or r + 2) to duplicate. Choose
one suit for each of the other two elements in the run. Choose two suits for the duplicated rank. There are

11 · 31 · 41 · 41 · 42 = 3168

such hands. There are 52 4
3168
hands, so the probability is 270725 = 0.01170 · · · .
190 Probability

10.17 Let’s consider two cases: the lowest rank of the run is A or J, or not.
If the lowest rank is A, then there are 4 · 4 · 4 choices of suits for the A, 2, and 3. The fourth card can be anything
but {A, 2, 3, 4}, so there are 36 choices. Thus there are 43 · 36 = 2304 such runs. Similarly, if the lowest rank is J, then
there are 4 · 4 · 4 choices of suits for the J, Q, and K, and 36 choices for the fourth card (A through 9), so there are again
43 · 36 = 2304 such runs.
Otherwise, the lowest rank r is between 2 and 10, inclusive. There are 9 such ranks. There are 4 · 4 · 4 choices for the
suits in the run. The fourth card cannot have rank between r − 1 and r + 3, so there are only 8 ranks left, or 32 cards.
Thus there are 9 · 43 · 32 = 18,432 such runs.
Thus the total number of runs is 2304 + 2304 + 18,432 = 23,040. There are 524 hands, so the probability is
23040
270725
= 0.08510 · · · .

10.18 The hand has no pair if all ranks are different. There are 13 4
quadruples of distinctranks, and 44 valid choices of
the cards (4 for the lowest card, 4 for the next-lowest card, etc.), so the total is 4 · 4 = 183,040 hands. There
13 4
suits for
are 4 hands, so the probability of no pair is 183040/270725 = 0.67611 · · · ; the probability of a pair is
52

(270725−183040)
270725
= 87685
270725
= 0.32388 · · · .

10.19 There are several ways to get two or more pairs:

• Hands with two distinct pairs: 132 · 42 · 42 . (Choose two ranks, and two suits for each rank.)

• Hands with three of a kind: 13 · 43 · 12 · 41 . (Choose the tripled rank; choose three suits of that rank; choose the other
rank; choose the other suit.)
• Hands with four of a kind: 131 . (Choose the quadrupled rank.)
The total is 2808 + 2496 + 13 = 5317; the probability of multiple pairs is 5317
270725
= 0.01963 · · · .

10.20 A solution in Python is shown in Figures S.10.1 and S.10.2.

10.21 See Figures S.10.1 and S.10.2, especially the all_hands call to verify.

10.22 See Figures S.10.1 and S.10.2, especially the count_fifteens method (and call in verify). Running this code
127,193
reveals that there are 127,193 hands with at least one fifteen, so the probability of getting at least one fifteen is 270,725 =
0.4698 · · · .

10.23 The resulting string matches x if and only if one of the following two things happens:

• neither α-ray makes any change.

• the α-rays hit the same bit, and the second ray undoes the change made by the first one.
These events are disjoint, and happen with probability 1
2
· 1
2
= 1
4
(no change from either ray) and 1
4
· 1
4
· 1
5
= 1
20
(both
make changes and the second ray hits the same bit).
The overall probability is therefore 41 + 20
1
= 103 .

10.24 We need two conditions to be true: first, that

• pivot − 1 ≤ 3n/4 and

• n − pivot ≤ 3n/4.
That is, we need n/4 ≤ pivot ≤ 3n/4 + 1. Thus the first n4 − 1 and the last 4n − 1 choices of pivot are bad. The remaining
n
2
+ 2 are good. Thus the probability that |L| ≤ 3n/4 and |R| ≤ 3n/4 is 21 + 2n .

10.25 There are (1 − α)n − 1 pivots that are too small, and (1 − α)n − 1 pivots that are too big. If α < 12 , then every
pivot is bad; otherwise, there are (2α − 1)n + 2 good pivots, and the probability of a good pivot is
(
2α − 1 + n2 if α ≥ 1/2
0 otherwise.
10.2 Probability, Outcomes, and Events 191

1 import random
2
3 class Card:
4 def __init__(self, rank, suit):
5 self.suit = suit
6 if rank.isdigit():
7 self.value, self.rank = int(rank), int(rank)
8 elif rank == "A":
9 self.value, self.rank = 1, 1
10 elif rank in "TJQK ":
11 self.value, self.rank = 10, 10 + "TJQK ".index(rank)
12
13 def getSuit(self): return self.suit
14 def getRank(self): return self.rank
15 def getValue(self): return self.value
16
17 class Hand:
18 def __init__(self, cards):
19 self.cards = cards
20
21 def contains_flush(self):
22 suits_in_hand = set([card.getSuit() for card in self.cards])
23 return len(suits_in_hand) == 1
24
25 def contains_one_4_run(self):
26 # A hand whose lowest rank is x contains a run of length 4 if its ranks are x, x+1, x+2, x+3.
27 ranks_in_hand = sorted(set([card.getRank() for card in self.cards]))
28 lowest_rank = ranks_in_hand[0]
29 return ranks_in_hand == [lowest_rank, lowest_rank + 1, lowest_rank + 2, lowest_rank + 3]
30
31 def contains_two_3_runs(self):
32 # A hand whose lowest rank is x contains two runs of length 3 if its ranks are x, x+1, x+2
33 # (and thus one of those three ranks is repeated in the hand).
34 ranks_in_hand = sorted(set([card.getRank() for card in self.cards]))
35 lowest_rank = ranks_in_hand[0]
36 return ranks_in_hand == [lowest_rank, lowest_rank + 1, lowest_rank + 2]
37
38 def contains_one_3_run(self):
39 # A hand with its lowest rank == x and its highest == y contains a single run of length 3
40 # if it contains either ranks {x, x+1, x+2} or ranks {y, y-1, y-2} (and y - x >= 4).
41 ranks_in_hand = sorted(set([card.getRank() for card in self.cards]))
42 lowest_rank = ranks_in_hand[0]
43 highest_rank = ranks_in_hand[-1]
44 return highest_rank - lowest_rank >= 4 and \
45 ((lowest_rank + 1 in ranks_in_hand and lowest_rank + 2 in ranks_in_hand) or \
46 (highest_rank - 1 in ranks_in_hand and highest_rank - 2 in ranks_in_hand))
47
48 def count_pairs(self):
49 count = 0
50 for i in range(len(self.cards)):
51 for j in range(i):
52 count += self.cards[i].getRank() == self.cards[j].getRank()
53 return count
54
55 def count_fifteens(self):
56 count = 0
57 for i in range(len(self.cards)):
58 for j in range(i):
59 count += (self.cards[i].getValue() + self.cards[j].getValue() == 15)
60 return count

Figure S.10.1 Counting the number of Cribbage hands with certain properties. (See also Figure S.10.2.)
192 Probability

61 def verify(hands):
62 print("Fraction with flushes: ", sum([h.contains_flush() for h in hands]) / len(hands))
63 print("Fraction with 4-run: ", sum([h.contains_one_4_run() for h in hands]) / len(hands))
64 print("Fraction with two 3-runs: ", sum([h.contains_two_3_runs() for h in hands]) / len(hands))
65 print("Fraction with one 3-run: ", sum([h.contains_one_3_run() for h in hands]) / len(hands))
66 print("Fraction with >=1 pair: ", sum([h.count_pairs() >= 1 for h in hands]) / len(hands))
67 print("Fraction with >=2 pair: ", sum([h.count_pairs() >= 2 for h in hands]) / len(hands))
68 print("Fraction with >=1 15: ", sum([h.count_fifteens() >= 1 for h in hands]) / len(hands))
69
70 deck = [Card(r,s) for r in "A23456789TJQK " for s in ["clubs", "diamonds", "hearts", "spades"]]
71 random_hands = [Hand(random.sample(deck, 4)) for i in range(1000000)]
72 all_hands = [Hand([deck[i], deck[j], deck[k], deck[l]])
73 for i in range(len(deck))
74 for j in range(i)
75 for k in range(j)
76 for l in range(k)]
77
78 print("----- by sampling -------")
79 verify(random_hands)
80 print("----- exhaustively -------")
81 verify(all_hands)

Figure S.10.2 Counting the number of Cribbage hands with certain properties. (See also Figure S.10.1.)
10.2 Probability, Outcomes, and Events 193

10.26 We lose if the median is in the first quarter of the indices. This occurs if two of the three sampled indices are in the
first quarter, which occurs with probability

( 1/ 4) 3 + 3
2
· (1/4)2 · (3/4) = 1
64
+ 9
64
= 10
64
.
all three are in the first quarter two of three are in the first quarter

There’s a similar problem if the median is in the last quarter. Thus we have a success probability of 1 − 10
64
− 10
64
= 44
64
= 11
16
.

10.27 We lose if the median is in the first (1 − α)n of the indices, which occurs if two of the three sampled indices are in
this range, which occurs with probability

(1 − α)3 + 3
2
· (1 − α)2 · α = (1 − α)2 (1 + 2α).
all three are in the first (1 − α)n two of three are in the first (1 − α)n

There’s a similar problem if the median is in the last (1−α)n. Thus we have a success probability of 1−2(1−α)2 (1+2α).

10.28 Suppose Emacs wins with probability p. The tree diagram is shown in Figure S.10.3. Emacs wins in half of these
branches. You can add up the probabilities, or see it this way:

p3 + 3p3 (1 − p) + 6p3 (1 − p)2 .

AAA BAAA, ABAA, AABA AABBA, ABABA, ABBAA, BAABA, BABAA, BBAAA
three straight wins one loss (three ways) two losses (six ways)

For p = 0.6, this probability is 2133/55 = 0.68256.

10.29 Suppose Emacs wins with probability p. Then Emacs wins in 5 games with probability 6p3 (1 − p)2 and loses in 5
games with probability 6p2 (1 − p)3 . Thus the series goes 5 games with probability

6p (1 − p) (p + 1 − p) = 6p (1 − p) .
2 2 2 2

For p = 0.6, this probability is 216/54 = 0.3456.

p 1−p

p 1−p p 1−p

p 1−p p 1−p p 1−p p 1−p

AAA BBB

p 1−p p 1−p p 1−p p 1−p p 1−p p 1−p

AABA ABAA ABBB BAAA BABB BBAB

p 1−p p 1−p p 1−p p 1−p p 1−p p 1−p

AABBA AABBB ABABA ABABB ABBAA ABBAB BAABA BAABB BABAA BABAB BBAAA BBAAB

Figure S.10.3 A tree diagram for Exercise 10.28.

194 Probability

10.30 Plugging p = 0.70 into the formulas from the last two exercises, we have that the probabilities are
h i
73
p 1 + 3(1 − p) + 6(1 − p) = 1000 · 1 + 109 + 100
3 2 54
= 0.83692
6p (1 − p) = 6 · ·
2 2 49 9
100 100
= 0.2646

10.31 The probability of the series having a fifth game is, as computed in the previous exercises, 6p2 (1 − p)2 . Differenti-
ating, we have that the extreme points occur at solutions to

0 = 6p(1 − p)(2 − p).

The solutions to this equation are p ∈ {0, 1/2, 1}. The values p = 0 and p = 1 are minima; p = 1/2 is a maximum,
and the probability here is 6/24 = 3/8 = 0.375.

10.32 There is a fourth game if there’s not a sweep—that is, if neither team wins the first three games. So the probability
is 1 − p3 − (1 − p)3 . Differentiating and solving for p when the derivative is zero yields that p ∈ {0, 1/2, 1} are the
extreme points. Again, we get that p = 0 and p = 1 are minima and p = 1/2 is a maximum. Here, then, the probability
that fourth game occurs is 1 − (1/8) − (1/8) = 3/4 = 0.75.

10.33 The probability of a fourth game is 1 − p3 − (1 − p)3 ; Emacs wins it with probability p · [1 − p3 − (1 − p)3 ] =
p − p4 − p(1 − p)3 . Multiplying out and combining terms, we have that the probability in question is

3p − 3p .
2 3

Differentiating, we have 0 = 6p − 9p2 = 3p(2 − 3p). We have a minimum at p = 0 and a maximum at p = 2/3, when
the probability is 4/9.
P
10.34 Let s ∈ S be an outcome. Then x∈S Pr [x] = 1 by assumption, and therefore
X
Pr [s] + Pr [x] = 1.
x∈S−{s}

Subtracting all other probabilities from both sides, we have

X
Pr [s] = 1 − Pr [x]
x∈S−{s}
X
≤1− 0
x∈S−{s}

= 1,

again by the assumption (Definition 10.2).

10.35 For any event A ⊆ S, we have

X
1= Pr [x] by Definition 10.2
x∈S
X
= Pr [x] S=A∪A
x∈A∪A
X X
= Pr [x] + Pr [x] A and A are disjoint
x∈A x∈A

= Pr [A] + Pr A . definition of probability of an event

Thus Pr A = 1 − Pr [A].
10.2 Probability, Outcomes, and Events 195

10.36 Let A, B ⊆ S be events. We must prove that Pr [A ∪ B] = Pr [A] + Pr [B] − Pr [A ∩ B]. This property is just
Inclusion–Exclusion, weighted by the probability function Pr: by definition,
X
Pr [A ∪ B] = Pr [x]
x∈A∪B

and we can rewrite A ∪ B as A ∪ (B − A), which are disjoint and therefore Pr [A ∪ (B − A)] = Pr [A] + Pr [B − A].
And Pr [B − A] = Pr [B] − Pr [B ∩ A] because B ∩ A ⊆ B. Thus

Pr [A ∪ B] = Pr [A ∪ (B − A)]
= Pr [A] + Pr [B − A]
= Pr [A] + Pr [B] − Pr [B ∩ A] .

10.37 The Union Bound follows immediately from the previous exercise and induction on n. More specifically: because
Pr [A ∪ B] = Pr [A] + Pr [B] − Pr [A ∩ B] and Pr [A ∩ B] ≥ 0, we have Pr [A ∪ B] ≤ Pr [A] + Pr [B]. By applying
this property to the two events An and (A1 ∪ A2 · · · ∪ An−1 ), the Union Bound follows. (It’s a good exercise to write out
the rigorous inductive proof, which we’ve omitted here.)

10.38 Machine i tries to broadcast with probability p. All n − 1 other machines choose not to try to broadcast with
probability (1 − p)n−1 . So machine i broadcasts successfully with probability p(1 − p)n−1 . There are n different possible
choices of i, so the overall probability is np(1 − p)n−1 .

10.39 The probability np(1 − p)n−1 from the previous exercise is maximized when its derivative is zero. And
d
dp
np(1 − p)n−1 = n(1 − p)n−1 − n(n − 1)p(1 − p)n−2 .

So we choose p such that

0 = n(1 − p) − n(n − 1)p(1 − p)n−2

n−1

n(n − 1)p(1 − p) = n(1 − p)n−1

n−2

(n − 1)p = 1 − p
n − 1 = 1/p − 1
p = 1/n.

So we maximize the probability of success when p = 1/n. (Incidentally, p = 1/n is precisely the probability setting for
which we would expect exactly one machine to try to broadcast.)

10.40 The probability of success is

(1 − 1/n)n
np(1 − p) = n(1/n)(1 − 1/n)n−1 =
n−1
.
1 − 1/n
e−1
By the given fact, this quantity is approximately 1−1/n
, which tends to 1/e as n grows large.

10.41 There are 10 · 9 · 8 triples from {1, . . . , 10} with no duplications, and all 103 triples are equally likely. Thus the
3
720
probability of no collisions is 1000 .

10.42 All triples from {1, . . . , 10} are equally likely, and only {⟨1, 1, 1⟩, ⟨2, 2, 2⟩, . . . , ⟨9, 9, 9⟩} have all the hash values
3
10 1
equal. Thus the probability of all three elements having the same hash value is 1000 = 100 .

10.43 Once x1 is placed, say in slot s, there’s a 0.3 probability that x2 is in a slot ±1 of x1 : slot s − 1 or slot s or slot
s + 1. (A collision between x1 and x2 leads to an adjacent slot being filled, as does x2 hashing to either unoccupied spot
immediately to the left or right of x1 .)
Thus there’s a 0.7 chance that x1 and x2 are placed without occupying adjacent cells. In this case, then 6 of the 10 slots
are ±1 of x1 or x2 —and so there’s a 0.6 probability of an adjacency occurring when x3 is placed.
196 Probability

Therefore there’s a 0.3 chance of the first type of adjacency (between x1 and x2 ), and a 0.7 · 0.6 chance that there’s
no x1 -and-x2 adjacency but there is an adjacency of between x3 and either x1 or x2 . Overall, then, the probability of two
adjacent cells being filled is 0.3 + 0.7 · 0.6 = 0.72.

10.44 Place x1 ; its location doesn’t matter. (The cells are completely symmetric, so the situation is identical regardless of
where it’s placed.)
If x2 is adjacent to it (probability 0.2) or collides with it (probability 0.1), then there are two adjacent filled cells when
x3 is placed. There’s a 0.4 chance x3 is within one of the pair of filled cells: the two adjacent cells to the pair, or a collision
with either of the two.
If x2 leaves a one-cell gap from x1 (probability 0.2), then there’s a 0.2 chance x3 fills in the gap, by hitting it directly
or hitting the left of the two filled cells.
Otherwise—if x1 and x2 are separated by more than one cell—then there’s no chance that a contiguous block of three
cells will end up being filled.
In sum, then, the probability that 3 adjacent slots are filled is

0.2 · 0.2 + 0.3 · 0.4 = 0.16.

x1 and x2 are separated by one cell x1 and x2 are adjacent
(and x3 fills that gap) (and x3 lands next to one of them)

10.45 Hashing to any of the k full blocks, or the unoccupied adjacent cells, extends the block. Thus there are k + 2 cells
within or immediately adjacent to the block; if the next element hashes into one of those then the block grows. In other
words, the block is extended with probability k+2
n
.

10.46 A solution in Python is shown in Figure S.10.4. When I ran the tester for LinearProbingHashTable, I got statistics
of x5000 moving about 1.5 slots on average (with some small variation, but again less than about ±0.1 slots across runs),
with a maximum movement varying more widely but around 30 to 40 slots.

10.47 Again, see Figure S.10.4. When I ran the tester for QuadraticProbingHashTable, I got statistics of x5000 moving
about 1.1 slots on average (with some small variation, about ±0.1 slots across runs), with a maximum movement varying
between around 10 to 15 slots.

10.48 The major difference between clustering and secondary clustering is that every element in a clustered block with
linear probing has the same probe sequence: no matter where you “enter” the block, you “exit” on the right-hand side,
one slot at a time. Suppose slot a is full: although any two elements with h(x) = h(y) = a have the same probe sequence,
if h(x) = a and h(y) = a + 1 then x and y have different probe sequences in quadratic probing. Thus the clustering in
linear probing is worse than the clustering in quadratic probing.

10.49 Once again, a solution in Python is shown in Figure S.10.4. The tester for DoubleHashingHashTable gave me the
statistics of x5000 moving between 0.9 and 1.0 slots on average, with a maximum movement of around 10 to 12 slots.

10.50 The problem of clustering drops by a large proportion with double hashing: the only time x and y have the same
probe sequence is if both hash functions line up (that is, h(x) = h(y) and g(x) = g(y), which will happen only n12 of the
time instead of 1n of the time. But it comes at a cost: we have to spend the time to compute two hash functions instead of
one, so it’s noticeably slower when there aren’t too many collisions.

10.51 We differentiate and set the result to zero:

k
d n
dp k
· p · (1 − p)n−k = 0

⇔ nk [kpk−1 (1 − p)n−k + (n − k)pk (1 − p)n−k−1 (−1)] = 0
⇔ kp (1 − p)n−k = (n − k)pk (1 − p)n−k−1
k−1

⇔ k(1 − p) = (n − k)p
⇔ k − kp = np − kp
⇔ k = np.
k
Thus k = np, or p = n
maximizes this probability. (You can check that this extreme point is a maximum.)
10.2 Probability, Outcomes, and Events 197

1 import random
2
3 class HashTable:
4 def __init__(self, slots):
5 self.table = [None for i in range(slots)]
6
7 def next_slot(self, x, starting_slot, attempt_number):
8 raise NotImplementedError("No probe sequence operation was implemented")
9
10 def add_element(self, x):
11 '''Adds the element x to a randomly chosen cell in this table;
12 Returns the **number** of cells probed (beyond the original).'''
13 starting_slot = random.randint(0, len(self.table) - 1)
14 slot = starting_slot
15 attempt = 0
16 while self.table[slot] != None:
17 attempt += 1
18 slot = self.next_slot(x, starting_slot, attempt)
19 self.table[slot] = x
20 return attempt
21
22 class LinearProbingHashTable(HashTable):
23 def next_slot(self, x, starting_slot, attempt_number):
24 return (starting_slot + attempt_number) % len(self.table)
25
26 class QuadraticProbingHashTable(HashTable):
27 def next_slot(self, x, starting_slot, attempt_number):
28 return (starting_slot + attempt_number**2) % len(self.table)
29
30 class DoubleHashingHashTable(HashTable):
31 def __init__(self, slots):
32 self.table = [None for i in range(slots)]
33 self.stepper = {}
34
35 def next_slot(self, x, starting_slot, attempt_number):
36 if attempt_number == 1:
37 self.stepper[x] = random.randint(1, len(self.table) - 1)
38 return (starting_slot + attempt_number * self.stepper[x]) % len(self.table)
39
40 def sample(hash_table_class, elements, slots, samples):
41 movements = []
42 for run in range(samples):
43 T = hash_table_class(slots)
44 for x in range(elements):
45 movement = T.add_element(x)
46 movements.append(movement)
47 print(str(hash_table_class), "average =", sum(movements) / samples, "; max =", max(movements))
48
49 sample(LinearProbingHashTable, 5000, 10007, 2048)
50 sample(QuadraticProbingHashTable, 5000, 10007, 2048)
51 sample(DoubleHashingHashTable, 5000, 10007, 2048)

Figure S.10.4 Testing how far elements move when resolving collisions in various ways in a hash table.

10.52 We differentiate and set the result to zero:

d
dp
(1 − p)n−1 p = p(1 − p)n−2 (−1)(n − 1) + (1 − p)n−1 = 0
⇔ ( n − 1) p = 1 − p
⇔ np − p = 1 − p
⇔ np = 1.
(Again, you can check that this extreme point is a maximum.) Thus the maximum likelihood estimate occurs when np = 1,
that is when p = 1/n.
198 Probability

10.3 Independence and Conditional Probability

10.53 The events in question are:
• Even: {2, 4, 6, 8, 10, 12}. (Probability 12 .)
• Divisible by 3: {3, 6, 9, 12}. (Probability 31 .)
• Both: {6, 12}. (Probability 16 .)
Because 1
6
= 1
2
· 13 , the events are independent.

10.54 The events in question are:

• Even: {2, 4, 6, 8, 10, 12}. (Probability 12 .)
• Divisible by 5: {5, 10}. (Probability 16 .)
• Both: {10}. (Probability 121 .)
Because 1
12
= 1
2
· 16 , the events are independent.

10.55 The events in question are:

• Even: {2, 4, 6, 8, 10, 12}. (Probability 12 .)
• Divisible by 6: {6, 12}. (Probability 16 .)
• Both: {6, 12}. (Probability 16 .)
Because 1
6
̸= 1
12
= 1
2
· 16 , the events are not independent.

10.56 The events in question are:

• Even: {2, 4, 6, 8, 10, 12}. (Probability 12 .)
• Divisible by 7: {7}. (Probability 121 .)
• Both: none. (Probability 0.)
Because 0 ̸= 1
24
= 1
2
· 1
12
, the events are not independent.

10.57 The events in question are:

• R: {1, 2, 3, 4, 9, 10, 11, 12}. (Probability 128 .)
• Odd: {1, 2, 3, 5, 7, 8, 10, 12}. (Probability 128 .)
• Both: {1, 2, 3, 10, 12}. (Probability 125 .)
Because 5
12
̸= 4
9
= 8
12
· 8
12
, the events are not independent.

10.58 The events in question are:

• Even: 0, 2, 4, or 6 heads. Probability:
h i
1
26
· 6
0
+ 6
2
+ 6
4
+ 6
6
= 1+15+15+1
64
= 12 .

• Divisible by 3: 0 or 3 or 6 heads. Probability:

h i
1
26
· 0
6
+ 6
3
+ 6
6
= 1+20+1
64
= 22
64
.

• Both: 0 or 6 heads. Probability:

h i
1
26
· 6
0
+ 6
6
= 1+1
64
= 1
32
.

Because 1
32
̸= 11
64
= 1
2
· 22
64
, the events are not independent.

10.59 The events in question are:

• Even: 0, 2, 4, or 6 heads. Probability:
h i
1
26
· 6
0
+ 6
2
+ 6
4
+ 6
6
= 1+15+15+1
64
= 12 .
10.3 Independence and Conditional Probability 199

• Divisible by 4: 0 or 4 heads. Probability:

h i
1
26
· 6
0
+ 6
4
= 1+15
64
= 1
16
.

• Both: 0 or 4 heads. The probability is still 1

16
.

Because 1
16
̸= 1
32
= 1
2
· 1
16
, the events are not independent.

10.60 The events in question are:

• Even: 0, 2, 4, or 6 heads. Probability:

h i
1
26
· 0
6
+ 6
2
+ 6
4
+ 6
6
= 1+15+15+1
64
= 12 .

• Divisible by 5: 0 or 5 heads. Probability:

h i
1
26
· 6
0
+ 6
5
= 1+6
64
= 7
64
.

• Both: 0 or 5 heads. The probability is still 7

64
.

Because 7
64
̸= 7
128
= 1
2
· 7
64
, the events are not independent.

10.61 Here’s a table of the sample space and the two events:

outcome number of heads in {a, b} is odd? number of heads in {b, c} is odd?

HHH 7 7
HHT 7 3
HTH 3 3
HTT 3 7
THH 3 7
THT 3 3
TTH 7 3
TTT 7 7

Thus each event happens individually in 4 of 8 outcomes (probability 21 ), and they happen together in 2 of 8 outcomes
(probability 14 ). Because 14 = 12 · 12 , the events are independent.

10.62 The outcomes are the same as in the previous exercise, but the probabilities have changed:

outcome number of heads in {a, b} is odd? number of heads in {b, c} is odd? probability
HHH 7 7 p3
HHT 7 3 p2 (1 − p)
HTH 3 3 p2 (1 − p)
HTT 3 7 p(1 − p)2
THH 3 7 p2 (1 − p)
THT 3 3 p(1 − p)2
TTH 7 3 p(1 − p)2
TTT 7 7 (1 − p)3
200 Probability

Thus the probabilities of the events are:

Pr [{a, b} has odd number of heads] = 2p2 (1 − p) + 2p(1 − p)2

= 2p(1 − p)[p + 1 − p]
= 2p(1 − p).

Pr [{b, c} has odd number of heads] = 2p(1 − p).

Pr [both {a, b} and {b, c} have odd numbers of heads] = p2 (1 − p) + p(1 − p)2
= p(1 − p)[p + 1 − p]
= p(1 − p).
The events are independent if and only if the product of the individual probabilities equals the probability of both occurring
together. That is, the events are independent if and only if

p(1 − p) = 2p(1 − p) · 2p(1 − p)

⇔ p(1 − p)[4p(1 − p) − 1] = 0.

This equation is true if and only if p = 0, p = 1, or 4p(1 − p) − 1 = 0, which occurs when

0 = 4p − 4p − 1 = (2p − 1) .
2 2

Thus the events are independent if and only if p ∈ {0, 21 , 1}.

10.63 Pr [A] = 68 , Pr [B] = 28 , Pr [A ∩ B] = 0

8
. Thus Pr [A ∩ B] < Pr [A] · Pr [B], so the events are dependent.
Specifically, they are negatively correlated.

10.64 Pr [A] = 68 , Pr [C] = 68 , Pr [A ∩ B] = 6

8
. Thus Pr [A ∩ C] > Pr [A] · Pr [C], so the events are dependent.
Specifically, they are positively correlated.

10.65 Pr [B] = 28 , Pr [C] = 68 , Pr [B ∩ C] = 0

8
. Thus Pr [B ∩ C] < Pr [B] · Pr [C], so the events are dependent.
Specifically, they are negatively correlated.

10.66 Pr [A] = 68 , Pr [D] = 48 , Pr [A ∩ D] = 38 . Thus Pr [A ∩ D] = Pr [A] · Pr [D], so the events are independent.

10.67 Pr [A] = 68 , Pr [E] = 48 , Pr [A ∩ E] = 38 . Thus Pr [A ∩ E] = Pr [A] · Pr [E], so the events are independent.

10.68 Pr [A ∩ B] = 08 , Pr [E] = 48 , Pr [(A ∩ B) ∩ E] = 08 . Thus Pr [(A ∩ B) ∩ E] = Pr [A ∩ B] · Pr [E], so the events

are independent.

10.69 Pr [A ∩ C] = 68 , Pr [E] = 48 , Pr [(A ∩ C) ∩ E] = 38 . Thus Pr [(A ∩ C) ∩ E] = Pr [A ∩ C] · Pr [E], so the events

are independent.

10.70 Pr [A ∩ D] = 38 , Pr [E] = 48 , Pr [(A ∩ D) ∩ E] = 08 . Thus Pr [(A ∩ D) ∩ E] < Pr [A ∩ D] · Pr [E], so the events

are dependent. Specifically, they are negatively correlated.

10.71 Events A and B are independent if and only if Pr [A ∩ B] = Pr [A]·Pr [B]. But Pr [B] = 0 implies that Pr [A ∩ B] =
0, because A ∩ B ⊆ B. Thus

Pr [A ∩ B] = 0 above discussion
= Pr [A] · 0 x·0=0
= Pr [A] · Pr [B] . by assumption Pr [B] = 0

Thus A and B are independent, by definition.

10.3 Independence and Conditional Probability 201

10.72 Proving one direction suffices: the converse is the same claim, for the events A and B (because B = B).
Assume that A and B are independent. We’ll prove that A and B are independent:

Pr A ∩ B
[ ]
= Pr [A] − Pr [A ∩ B] law of total probability: Pr [A] = Pr [A ∩ B] + Pr A ∩ B
= Pr [A] h− Pr [A] · Pr
i [ B] A and B are independent
= Pr [A] 1 − Pr [B] factoring

= Pr [A] Pr B . Theorem 10.4

10.73 A solution in Python is shown as the SubstitutionCipher class and get_random_substitution_cipher

function in Figure S.10.5.

10.74 A solution in Python is shown as the get_letter_frequencies function in Figure S.10.5.

10.75 A solution in Python is shown as the decrypt_substitution_cipher_by_frequency function in Figure S.10.5.

10.76 A solution in Python is shown as the CaesarCipher class and get_random_caesar_cipher function in
Figure S.10.6.

10.77 A solution in Python is shown as the decrypt_caesar_cipher_by_frequency function in Figure S.10.6.

10.78 There are 2n outcomes, each of which is equally likely. Of these outcomes, Ai,j occurs in precisely 2n−1 : for either
value of the ith flip,n one and only one value of the jth flip causes Ai,j to occur; all other flips can be either heads or tails.
Thus Pr [Ai,j ] = 2n2−1 = 12 .

10.79 Either {i, j} and {i′ , j′ } share a single index, or they share none. We’ll consider these two cases separately:
Case 1: {i, j} and {i′ , j′ } share one index. Write {i, j} = {x, y} and {i′ , j′ } = {y, z}. Then (restricting our attention
to these three flips [x and y and z]) the outcomes/events are:

xyz x⊕y y⊕z

HHH
HHT 3
HTH 3 3
HTT 3
THH 3
THT 3 3
TTH 3
TTT

Thus Pr [x ⊕ y|y ⊕ z] = 2
4
= 0.5 and Pr [x ⊕ y|y ⊕ z] = 2
4
= 0.5.
202 Probability

1 import random
2
3 ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
4 alphabet = 'abcdefghijklmnopqrstuvwxyz'
5
6 class SubstitutionCipher:
7 def __init__(self, replacement_alphabet):
8 self.mapping = replacement_alphabet
9
10 def encode(self, string):
11 output = ""
12 for ch in string:
13 if ch in ALPHABET:
14 output += self.mapping[ALPHABET.index(ch)]
15 elif ch in alphabet:
16 output += self.mapping[alphabet.index(ch)].lower()
17 else:
18 output += ch
19 return output
20
21 def get_random_substitution_cipher():
22 permutation = list(ALPHABET)
23 random.shuffle(permutation)
24 return SubstitutionCipher(permutation)
25
26 def get_letter_frequencies(ciphertext):
27 counts = {ch : 0 for ch in ALPHABET}
28 letter_count = 0
29 for ch in ciphertext:
30 if ch.upper() in ALPHABET:
31 counts[ch.upper()] += 1
32 letter_count += 1
33 return {ch : counts[ch] / letter_count for ch in ALPHABET}
34
35 def decrypt_substitution_cipher_by_frequency(ciphertext, reference_text):
36 cipher_frequencies = get_letter_frequencies(ciphertext)
37 cipher_by_usage = sorted(ALPHABET, key=lambda ch: cipher_frequencies[ch])
38
39 reference_frequencies = get_letter_frequencies(reference_text)
40 reference_by_usage = sorted(ALPHABET, key=lambda ch: reference_frequencies[ch])
41
42 decryption_alphabet = [reference_by_usage[cipher_by_usage.index(ch)] for ch in ALPHABET]
43 return SubstitutionCipher("".join(decryption_alphabet))

Figure S.10.5 A substitution cipher in Python.

Case 2: {i, j} and {i′ , j′ } do not share an index. Then the two events don’t even refer in any way to the same coin flip.
Writing {i, j} = {w, x} and {i′ , j′ } = {y, z}, we have
wxyz w⊕x y⊕z
HHHH
HHHT 3
HHTH 3
HHTT
HTHH 3
HTHT 3 3
HTTH 3 3
HTTT 3
THHH 3
THHT 3 3
THTH 3 3
THTT 3
TTHH
TTHT 3
TTTH 3
TTTT
10.3 Independence and Conditional Probability 203

44 class CaesarCipher(SubstitutionCipher):
45 def __init__(self, shift):
46 self.mapping = ALPHABET[shift:] + ALPHABET[:shift]
47
48 def get_random_caesar_cipher():
49 shift = random.randint(0, 25)
50 return CaesarCipher(shift)
51
52 def decrypt_caesar_cipher_by_frequency(ciphertext, reference_text):
53 cipher_frequencies = get_letter_frequencies(ciphertext)
54 reference_frequencies = get_letter_frequencies(reference_text)
55 best_shift, best_difference, best_decrypter = None, None, None
56 for shift in range(26):
57 decrypter = CaesarCipher(shift)
58 difference = 0
59 for ch in ALPHABET:
60 error = cipher_frequencies[ch] - reference_frequencies[decrypter.encode(ch)]
61 difference += abs(error)
62 if best_shift == None or difference < best_difference:
63 best_shift, best_difference, best_decrypter = shift, difference, decrypter

Figure S.10.6 A Caesar cipher in Python (relying on the code in Figure S.10.5).

Thus Pr [w ⊕ x|y ⊕ z] = 4
8
= 0.5 and Pr [x ⊕ y|y ⊕ z] = 4
8
= 0.5.

10.80 Consider the three events x ⊕ y, x ⊕ z, and y ⊕ z. (For example, A1,2 and A1,3 and A2,3 .) Knowing the values of x ⊕ y
and x ⊕ z will allow you to infer y ⊕ z with certainty:

xyz x⊕y x⊕z y⊕z

HHH
HHT 3 3
HTH 3 3
HTT 3 3
THH 3 3
THT 3 3
TTH 3 3
TTT

Thus Pr [x ⊕ z|x ⊕ y, y ⊕ z] = Pr [x ⊕ z|x ⊕ y, y ⊕ z] = 02 = 0.

But Pr [x ⊕ z|x ⊕ y, y ⊕ z] = Pr [x ⊕ z|x ⊕ y, y ⊕ z] = 22 = 1.

4
10.81 4
=1

4
10.82 8
= 0.5

2
10.83 3

2
10.84 4
= 0.5

2
10.85 4
= 0.5

2
10.86 7

4
10.87 7

10.88 Let’s draw a tree to see what’s happening:

204 Probability

1 1
12 12
···
(no threes present) 1 1 1 1
2 2 2 2

Let A denote the event “the top half of the drawn domino is 3.” Let B denote the event “doubles.” Then we have Pr [A] =
1
24
+ 241 + 241 = 81 —the three left-most drawn domino leaves in the tree. And Pr [A ∩ B] = 241 + 241 = 121 —the two
left-most drawn domino leaves in the tree. Thus
Pr [A ∩ B] 1/8 2
Pr [B|A] = = = .
Pr [A] 1/12 3

10.89 By definition, A and B are independent if Pr [A ∩ B] = Pr [A] · Pr [B]. Because A ∩ B = ∅, we know that
Pr [A ∩ B] = 0. Thus A and B are independent if and only if Pr [A] · Pr [B] = 0, which occurs precisely when Pr [A] = 0
or Pr [B] = 0. Thus, even though Pr [A ∩ B] = 0, A and B are independent if and only if either A or B occurs with
probability zero.

10.90 There’s not enough information. For example, flip two fair coins, and define the following events:
A = “the first coin came up heads” B = “the second coin came up heads” C = “both coins came up tails.”
Then Pr [A|B] = Pr [B|A] = 1
2
, and A and B are independent. On the other hand, Pr [A|C] = Pr [C|A] = 0, but A and
C are not independent.

n−k
10.91 n

10.92 The probability that the (k + 1)st element xk+1 causes a collision conditioned on no collisions occurring before
that element is inserted is n−k
n
, by Exercise 10.91. Say that xi is “safe” if no collision occurs when it’s inserted. Then the
probability that no collision occurs is
Pr [x1 safe] · Pr [x2 safe|x1 safe] · Pr [x3 safe|x1 , x2 safe] · · · · · Pr [xn safe|x1 , x2 , . . . , xn−1 safe]
Y
n−1
n−k
= n
k=0
n!
= .
nn

10.93 Let D be the event that x has the disease. By the law of total probability, we have

Pr [error] = Pr [error|D] Pr [D] + Pr error| D Pr D
= (0.01)(0.001) + (0.03)(0.999)
= 0.02998.

10.94 Let D be the event that x has the disease. By the law of total probability, we have

Pr [error] = Pr [error|D] Pr [D] + Pr error| D Pr D
= (0.999)(0.001) + (0.001)(0.999)
= 0.001998.

10.95 Bob receives a message with an even number of ones if there are zero, two, or four flips in transmission. The
conditional probability of zero flips is therefore
(1 − p)4 (1 − p)4
= .
(1 − p)4 + 2 p (1 − p) + p
4 2 2 4 (1 − p) + 6p2 (1 − p)2 + p4
4
10.3 Independence and Conditional Probability 205

For p = 0.01, this probability is

(0.99)4
= 0.999388 · · · .
(0.99)4 + 6(0.01)2 (0.99)2 + (0.01)4

10.96 The conditional probability of receiving the correct message

(0.9)4
= 0.930902 · · · .
(0.9)4 + 6(0.1)2 (0.9)2 + (0.1)4

10.97 Let biased and fair denote the events of pulling out the p-biased coin and the fair coin, respectively. Let H denote
the event of flipping heads. Then:

Pr [H|biased] · Pr [biased]
Pr [biased|H] = Bayes’ Rule
Pr [H]
Pr [H|biased] · Pr [biased]
= Law of Total Probability
Pr [H|biased] · Pr [biased] + Pr [H|fair] · Pr [fair]
2
3
· 12
= the given process of choosing a coin; the coins are p-biased and fair
2
3
· 1
2
+ 21 · 1
2
4
= .
7

10.98 Again let biased, fair, and H denote the events of choosing the p-biased coin, the fair coin, and flipping heads. Then:

Pr [HHHT|biased] · Pr [biased]
Pr [biased|HHHT] = Bayes’ Rule
Pr [HHHT]
Pr [HHHT|biased] · Pr [biased]
= Law of Total Probability
Pr [HHHT|biased] · Pr [biased] + Pr [HHHT|fair] · Pr [fair]
(3/4)3 (1/4) · 12
= the given process
(3/4)3 (1/4) · 12 + (1/2)3 (1/2) · 1
2
27
= .
31

10.99 Again let biased, fair, and H denote the events of choosing the p-biased coin, the fair coin, and flipping heads. Then:

Pr [biased|HTTTHT]
Pr [HTTTHT|biased] · Pr [biased]
= Bayes’ Rule
Pr [HTTTHT]
Pr [HTTTHT|biased] · Pr [biased]
= Law of Total Probability
Pr [HTTTHT|biased] · Pr [biased] + Pr [HTTTHT|fair] · Pr [fair]
(3/4)2 (1/4)4 · 12
= the given process
(3/4)2 (1/4)4 · 12 + (1/2)2 (1/2)4 · 1
2
9
= .
73

10.100 The slot is full with probability m1 , and it’s empty with probability 1 − m1 .

10.101 The probability that hash function i does not fill this cell is 1 − m1 . We need all k of these non-filling events to
occur, one per hash function. The events are independent, so the probability is (1 − m1 )k .
206 Probability

10.102 The event of a particular element filling cell i is independent for each element, and we just computed that its
probability is (1 − m1 )k . Thus the probability of this cell remaining empty is

h 1 k in
(1 − ) .
m

10.103 Here is the sample space (that is, the set of all 16 possible hash values for x and y under h1 and h2 ):

h1 (x) h2 (x) h1 (y) h2 (y) false positive?

1 1 1 1 3
1 1 1 2
1 1 2 1
1 1 2 2
1 2 1 1 3
1 2 1 2 3
1 2 2 1 3
1 2 2 2 3
2 1 1 1 3
2 1 1 2 3
2 1 2 1 3
2 1 2 2 3
2 2 1 1
2 2 1 2
2 2 2 1
2 2 2 2 3

10
Each outcome is equally likely, so there is a 16
probability of a false positive—the chance that both y hash values are
contained in {h1 (x), h2 (x)}.

10.104 The calculated false-positive rate is

h h in i2 h i1 2 2
3 9
1 − (1 − m1 ) = 1 − (1 − 12 )2
k
= = .
4 16

Thus the actual false-positive rate (10/16) is slightly higher than the calculated false-positive rate (9/16).
Another way of seeing this issue from the table: observe that Pr [B2 ] = 34 = 0.75. But Pr [B2 |B1 ] = 12 10
= 0.8333 · · · ,
so Pr [B2 |B1 ] ̸= Pr [B2 ].
The reason for this discrepancy is that the slot h1 (y) being occupied (B1 ) is positively correlated with h2 (y) being
occupied (B2 ), for the following somewhat subtle reason: the fact that h1 (y) is occupied makes it more likely that both
cells are full because of x (that is, that h1 (x) ̸= h2 (x))—and therefore more likely that h2 (y) will also be occupied.

10.105 The (mis)calculated error rate is 0.05457 · · · ; my program varied across multiple runs but across 10 runs its error
rate was above that number 5 times and below 5 times R (ranging from 0.05297 to 0.05566). My average observed error
rate was 0.05444 · · · , just a little lower than predicted. The Python code is shown in Figure S.10.7.

10.4 Random Variables

10.106 The outcomes and the corresponding probabilities are:
10.4 Random Variables 207

1 import random
2
3 def create_random_function(input_size, output_size):
4 '''Returns a random function {0, 1, ..., input_size - 1} --> {0, 1, ..., output_size - 1}.'''
5 return [random.randint(0, output_size - 1) for i in range(input_size)]
6
7 def bloom_filter_false_positive_rate_empirical(n, m, k):
8 '''Builds a Bloom filter in a m-cell hash table with k hash functions, and inserts n elements.
9 Returns the observed fraction of false positives (computed for n non-inserted elements).'''
10 table = [False for i in range(m)]
11 hash_functions = [create_random_function(2 * n, m) for i in range(k)]
12
13 for x in range(n):
14 for h in hash_functions:
15 table[h[x]] = True
16
17 false_positives = 0
18 for y in range(n, 2*n): # y is an uninserted element!
19 false_positives += all([table[h[y]] for h in hash_functions])
20 return false_positives / n
21
22 def bloom_filter_false_positive_rate_calculated(n, m, k):
23 expected_error_rate = (1 - (1 - 1/m)**(k*n))**k
24 return expected_error_rate

Figure S.10.7 Calculating the rate of false positives in a Bloom filter.

outcome probability L V
9
Computers 44
9 3
3
are 44
3 2
7
useless 44
7 3
4
They 44
4 1
3
can 44
3 1
4
only 44
4 1
4
give 44
4 2
3
you 44
3 2
7
answers 44
7 2

10.107 Pr [L = 4] = 12
44
and E [V|L = 4] = 43 .

14 11
10.108 No, they are not independent; for example, Pr [L = 7] = 44
and Pr [V = 1] = 14
, but

Pr [L = 7 and V = 1] = 0 ̸= 14
44
· 11
14
.

10.109 By definition, E [L] and E [V] are

E [L] = 9
44
·9+ 3
44
·3+ 7
44
·7+ 4
44
·4+ 3
44
·3+ 4
44
·4+ 4
44
·4+ 3
44
·3+ 7
44
· 7 = 254/44 and
E [V] = 9
44
·3+ 3
44
·2+ 7
44
·3+ 4
44
·1+ 3
44
·1+ 4
44
·1+ 4
44
·2+ 3
44
·2+ 7
44
· 2 = 93/44.

10.110 The variance of L is

h i
var (L) = E L − [E [L]]
2 2

2542
= 449 · 81 + 443 · 9 + 7
44
· 49 + 444 · 16 + 443 ·9+ 4
44
· 16 + 4
44
· 16 + 3
44
·9+ 7
44
· 49 − 442

= 729+27+343+64+27+64+64+27+343
44
− 64516
1936
74272−64516
= 1936

= 9756
1936
= 5.039256 · · · .
208 Probability

10.111 The variance of V is

h i
var (V) = E V − [E [V]]
2 2

932
= 449 · 9 + 443 · 4 + 7
44
· 9 + 444 · 1 + 3
44
·1+ 4
44
·1+ 4
44
·4+ 3
44
·4+ 7
44
·4 − 442
= 81+12+63+4+3+4+16+12+28
44
− 8649
1936
9812−8649
= 1936

= 1163
1936
= 0.600723 · · · .

1
10.112 We have H = 0 only if all coins come up tails, which happens with probability 216
. Otherwise, with probability
1 − 2116 , we have H = 1. Thus E [H] = Pr [H = 1] = 1 − 2116 .

10.113 Figure S.10.8 shows Python code to compute E [R] by computing all 216 sequences of coin flips, and then calculating
the length of the longest run in each. I get an expected length of 4.23745 · · · .

10.114 No: Pr [R = 8] = 16
8
· 1
216
and Pr [H = 0] = 1
216
, but Pr [H = 0 and R = 8] = 0 ̸= Pr [R = 8] · Pr [H = 0].

10.115 There are precisely 2 of the 18 faces (among the three dice) with each number in {1, . . . , 9}. Thus the probability
2
of rolling any of those numbers is exactly 18 = 19 .

1
10.116 By the previous exercise and the fact that the two rolls are independent, we know that there’s a 81 probability for
each ⟨X, Y⟩ ∈ {1, 2, . . . , 9} . The function f(x, y) = 9x − y is a bijection between {1, 2, . . . , 9} and {0, 1, . . . , 80}.
2 2

Thus Pr [9X − Y = k] = 811 for each k ∈ {0, 1, . . . , 80}.

10.117 The expected value for each die is 5:

1+2+5+6+7+9 30 1+3+4+5+8+9 30 2+3+4+6+7+8 30
E [ B] = 6
= 6
=5 E [R] = 6
= 6
=5 E [ K] = 6
= 6
= 5.

17
10.118 First, we’ll show that Pr [B > R], Pr [R > K], and Pr [K > B] are all equal to 36
:
Pr [B > R|R = 1] · Pr [R = 1] Pr [B > R|R = 3] · Pr [R = 3] and so forth

Pr [B > R] = Pr [B > 1] · Pr [R = 1] + Pr [B > 3] · Pr [R = 3] + Pr [B > 4] · Pr [R = 4]

+ Pr [B > 5] · Pr [R = 5] + Pr [B > 8] · Pr [R = 8] + Pr [B > 9] · Pr [R = 9]
1
= · [ Pr [B > 1] + Pr [B > 3] + Pr [B > 4] + Pr [B > 5] + Pr [B > 8] + Pr [B > 9] ]
6
5+4+4+3+1+0 17
= = .
36 36
1
Pr [R > K] = · [ Pr [R > 2] + Pr [R > 3] + Pr [R > 4] + Pr [R > 6] + Pr [R > 7] + Pr [R > 8] ]
6

1 def longest_run_length(x):
2 zeros, ones = "0", "1"
3 while zeros in x or ones in x:
4 zeros = zeros + "0"
5 ones = ones + "1"
6 return len(zeros) - 1
7
8 run_lengths = [longest_run_length(bin(index)) for index in range(2**16)]
9 print("The average longest run length is", sum(run_lengths) / len(run_lengths))
10 # Output: The average longest run length is 4.237457275390625

Figure S.10.8 A Python implementation to calculate the expected length of the longest run in a 16-bit string.
10.4 Random Variables 209

5+4+3+2+2+1 17
= =
36 36
1
Pr [K > B] = · [ Pr [K > 1] + Pr [K > 2] + Pr [K > 5] + Pr [K > 6] + Pr [K > 7] + Pr [K > 9] ]
6
6+5+3+2+1+0 17
= = .
36 36
Also observe that Pr [B = R] = 363 because there are three numbers shared between the two dice (and each pair comes
3
up with probability 361 ). Similarly, Pr [R = K] = 36 and Pr [K = B] = 363 . Thus
Pr [B > R and B ̸= R]
Pr [B > R|B ̸= R] =
Pr [B ̸= R]
Pr [B > R]
= if B > R then B ̸= R
Pr [B ̸= R]
17/36 17
= = .
33/36 33

By precisely the same argument, we have Pr [R > K|R ̸= K] = 17

33
and Pr [K > B|K ̸= B] = 17
33
.

10.119 All three dice are fair, so the expectation of one is 16 of the sum of its faces. And 4 · 3 + 6 = 3 · 2 + 3 · 5 =
1 + 5 · 4 = 21. Thus all three random variables have expectation 21 6
= 3.5.

10.120 K > L unless we observe K = 3 and L = 5, which happens with probability (5/6) · (1/2) = 5/12.
L > M unless we observe L = 2 and M = 4, which happens with probability (1/2) · (5/6) = 5/12.
M > K unless we get M = 6 or K = 1, which happens with probability (1/6) + (1/6) − (1/36) = 11/36. (We had
to subtract the 1/36 because we double-counted the ⟨6, 1⟩ outcome.)
Thus Pr [K > L] = Pr [L > M] = 7/12 and Pr [M > K] = 25/36, all of which are strictly greater than 1/2.

10.121 Each die has expectation 216 . By linearity of expectation, the sum of any the two random variables V1 and V2 has
expectation E [V1 + V2 ] = E [V1 ] + E [V2 ] = 42
6
= 7.

10.122 Here’s a solution in Python:

1 K = [3,3,3,3,3,6]
2 L = [2,2,2,5,5,5]
3 M = [1,4,4,4,4,4]
4
5 KK = [k1 + k2 for k1 in K for k2 in K]
6 LL = [l1 + l2 for l1 in L for l2 in L]
7 MM = [m1 + m2 for m1 in M for m2 in M]
8
9 print("K beats L:", len([(kk,ll) for kk in KK for ll in LL if kk > ll])) # output: K beats L: 531
10 print("L beats M:", len([(ll,mm) for ll in LL for mm in MM if ll > mm])) # output: L beats M: 531
11 print("M beats K:", len([(mm,kk) for mm in MM for kk in KK if mm > kk])) # output: M beats K: 625

There’s a 531/1296 = 0.40972 · · · chance that K1 + K2 > L1 + L2 . There’s a 531/1296 = 0.40972 · · · chance that
L1 + L2 > M1 + M2 . There’s a 625/1296 = 0.48225 · · · chance that M1 + M2 > K1 + K2 . Indeed, all three of these
probabilities are strictly less than 1/2. (And the probability of a tie—for example, of K1 + K2 = L1 + L2 —is still zero.)

10.123 The probabilities that we have P pairs in a five-card hand are:

13 · 12 · 4
Pr [P = 6] = 52
choose a 4-of-a-kind rank; choose any other rank (and suit)
5
Pr [P = 5] = 0 impossible to have exactly five pairs

13 · 4
· 12 · 4
Pr [P = 4] = 3
52
2
choose a 3-of-a-kind rank (and three suits); choose another rank for a pair (and two suits)
5

13 · 4
· 12
· 42
Pr [P = 3] = 3
52
2 choose a 3-of-a-kind rank (and three suits); choose two other ranks (of any suit)
5
210 Probability

13
· 4
· 4
· 11 · 4
Pr [P = 2] = 2 2
52
2 choose two pair ranks (and two suits each); choose any other rank (and suit)
5

13 · 4
· 12
· 43
Pr [P = 1] = 2
52
3
choose a pair rank (and two suits); choose three other ranks (and any suit)
5

13
· 45
Pr [P = 0] = 5

52
choose five distinct ranks (and any suit for each)
5

P6
The expected number of pairs is i=0 Pr [P = i] · i, which, using the above calculations, comes out to be 0.5882 · · · .

10.124 Define an indicator random variable Ri,j that’s 1 if and only if cards #i and #j are a pair. We have
Pr [Ri,j = 1] = 513 : no matter what card #i is, there are precisely 3 cards of the remaining 51 that pair with it. Thus
P P
the total number of pairs in the hand is 5i=1 i−1
j=1 Ri,j , which, by linearity of expectation, is

5
2
· E [Ri,j ] = 10 · 3
51
= 30
51
= 0.5882 · · · .

10.125 Let Hi denote the high-card points of the ith card in the hand. Then we have

E [Hi ] = 4 · Pr [A] + 3 · Pr [K] + 2 · Pr [Q] + 1 · Pr [J]

= 4 · 131 + 3 · 131 + 2 · 131 + 1 · 131
= 10
13
.

By linearity of expectation, then,

" #
X
13 X
13 X
13
10
E Hi = E [ Hi ] = 13
= 10.
i=1 i=1 i=1

The expected number of high-card points in a bridge hand is 10.

10.126 There are 13 hearts and 39 nonhearts. Thus

• The number of hands void in hearts is 39
13
: choose 13 nonhearts.

• The number of hands with one heart is 131 · 39 : choose 1 heart, and 12 nonhearts.
12
• The number of hands with two hearts is 132 · 3911
: choose 2 hearts, and 11 nonhearts.

Thus the expected number of hearts distribution points is

39 13
· 39 13
· 39
3· 13
52
+2· 1
52
12 + 1 · 2
52
11 = 0.404369 · · · .
13 13 13

10.127 By linearity of expectation, the expected number of points in a hand is

10 + 4 · 0.404369 · · · = 11.617479 · · · ,

where 4 · 0.404369 is the expected number of distribution points (multiplying the result from the previous exercise by
four, because there are four suits in which it’s possible to get distribution points).
10.4 Random Variables 211

10.128 The second version of the summation for expectation in Definition 10.17 results from collecting each outcome x
that has the same value of the random variable X(x). Here’s an algebraic proof of their equivalence:
X
X(x) · Pr [x]
x∈S
X X
= X(x) · Pr [x] collecting all terms in which X(x) = y
y∈R x∈S:
X(x)=y
X X
= y· Pr [x] for these terms, X(x) = y, so we can pull out the X(x) = y
y∈R x∈S:
X(x)=y
X ∑
= y · Pr [X = y] . Pr [X = y] is precisely x∈S: Pr [x]
X(x)=y
y∈R

10.129 Let X and Y be independent random variables. Then

X
E [ X · Y] = (X · Y)(s) · Pr [s] definition of expectation
s∈S
X
= (x · y) · Pr [X = x and Y = y] summing over values of X, Y instead of outcomes
x,y
XX
= (x · y) · Pr [X = x] · Pr [Y = y] definition of independent random variables
x y
X X
= x · Pr [X = x] · y · Pr [Y = y] factoring
x y
X
= x · Pr [X = x] · E [Y] definition of expectation
x
X
= E [ Y] · x · Pr [X = x] factoring
x

= E [ Y] · E [ X] . definition of expectation

10.130 Flip two fair coins. Let X denote the number of heads in those tosses, and let Y denote the number of tails in those
tosses. Then E [X] = E [Y] = 1. But
E [X · Y]
= X(HH) · Y(HH) · Pr [HH] + X(HT) · Y(HT) · Pr [HT] + X(TH) · Y(TH) · Pr [TH] + X(TT) · Y(TT) · Pr [TT]
= 2 · 0 · Pr [HH] + 1 · 1 · Pr [HT] + 1 · 1 · Pr [TH] + 0 · 2 · Pr [TT]
= 0 · 41 + 1 · 14 + 1 · 14 + 0 · 14
= 21 .
Thus these random variables form an example instance in which E [X · Y] ̸= E [X] · E [Y].

10.131 Consider the following sample space and corresponding values for the random variables X and Y:
outcome probability X Y
HH 0.25 0 1
HT 0.25 1 0
TH 0.25 −1 0
TT 0.25 0 −1
Then E [X · Y] = E [X] · E [Y]:
E [X] = 0.25(0 + 1 − 1 + 0) = 0
E [Y] = 0.25(1 + 0 + 0 − 1) = 0
E [XY] = 0.25(0 + 0 + 0 + 0) = 0.
212 Probability

1 1
But X and Y are not independent because, for example, Pr [X = 0] = 2
and Pr [Y = 0] = 2
but

Pr [X = 0 and Y = 0] = 0 ̸= 1
4
= Pr [X = 0] · Pr [Y = 0] .

Thus X and Y are an example of dependent random variables for which E [X · Y] = E [X] · E [Y].

10.132 Following the hint, define a random variable Xi denoting the number of coin flips after the (i − 1)st Heads before
P
you get another Heads. We have to wait 1000 1
i=1 Xi flips before we get the 1000th head, and we have E [Xi ] = p . By linearity
1000
of expectation, we have to wait p flips before we get the 1000th head.

10.133 Roughly, this process is flipping a single coin that comes up heads with probability p2 , so we’d expect a waiting
time of about p12 flips. (But there’s a bit of subtlety, as the fact that flips #1 and #2 didn’t produce two consecutive heads
makes it a bit less likely that flips #2 and #3 would.) To solve this problem more precisely, let F denote the expected
number of flips before getting two consecutive heads. Then:

• there a 1 − p chance the first flip is tails; if so, we need F more flips (after the first) to get two consecutive heads.
• there’s a p · (1 − p) chance the first flip is heads but the second is tails, in which case we need F more flips (after the
first two) to get two consecutive heads.
• there’s a p2 chance the first two flips are heads, in which case we’re done.
Therefore


2 with probability p2
F = 1 + F with probability 1 − p

2 + F with probability p(1 − p).

In other words, we know that F satisfies

F = 2p + (1 + F) · (1 − p) + (2 + F) · p · (1 − p)
2

= 2p2 + 1 − p + F(1 − p + p − p2 ) + 2p − 2p2

= 1 + p + F(1 − p2 )
2
or p F = 1 + p subtracting F(1 − p2 ) from both sides
1+p
and thus F = p2
. dividing both sides by p2

Another way to think about it: a p-fraction of the time that the coin comes up Heads (which happens every p1 flips in
expectation), we’re done with one additional flip; a (1 − p)-fraction of the time we must start over. So the expected
number F of flips satisfies

F = p · ( p1 + 1) + (1 − p) · ( p1 + 1 + F) ,
1+p
which also leads to F = p2
.

10.134 The ith element of the array is swapped all the way back to the beginning of the array if and only if A[i] is the
smallest element in A[1 . . . i]. The probability that the ith element is the smallest out of these i elements is precisely 1i .

10.135 The number of comparisons P is precisely the number of swapped pairs plus all failed (non-swap-inducing)
comparisons. Write the latter as ni=1 Yi , where
(
1 if element i is not swapped all the way to the beginning
Yi =
0 if it is.

Observe that E [Yi ] = 1 − 1

i
by the previous exercise, and so

P
n P
n
E [Y] = E [Yi ] = n − 1
i
= n − Hn ,
i=1 i=1
10.4 Random Variables 213

n

where Hn is the nth harmonic number. We’ve already shown that the expected number of swapped pairs is 2
/2. Thus
the total number of comparisons, in expectation, is
n

2
+ n − Hn .
2

10.136 The expected number of collisions is n(n−1)

200,000
. That value first exceeds 1 for n = 448, as 448 · 447 = 200,256, but
447 · 446 = 199,362. For 100,000 collisions, we need n(n − 1) ≥ 200,000 · 100,000, which first occurs for n = 141,422.

10.137 Using the Python implementation in Figure S.10.9, I saw 1.022 collisions on average with n = 448 elements, and
99996.74 collisions on average with n = 141,422 elements.

10.138 There are i − 1 filled slots, so there is a i−1

1
chance that the next element goes into an already filled slot, and a
m−i+1
m
chance that the next element goes into an unfilled slot. Thus
1
E [ Xi ] = Example 10.41
Pr [get a new type in a draw if you have i − 1 types already]
m
= . above discussion
m−i+1

10.139 Let X denote the number of elements hashed before all m slots are full. Let X1 denote the time until you have one
full slot; X2 denote the time between having one full slot and two full slots; X3 denote the time between having two full
slots and three full slots; etc. Observe that X = X1 + X2 + X3 + · · · + Xm . But now
" m #
X
E [X] = E Xi definition of Xi s
i=1
X
m
= E [Xi ] linearity of expectation
i=1
Xm
m
= Exercise 10.138
i=1
m−i+1
X
m
1
=m· factoring
i=1
m−i+1
Xm
1
=m· change of index of summation
j=1
m

1 import random
2
3 def count_collisions(slots, elements):
4 '''
5 Count the number of collisions in a hash table with number of cells==slots
6 and number of hashed items==elements, under a simple uniform hashing assumption.
7 '''
8 T = [0 for i in range(slots)]
9 collisions = 0
10 for x in range(elements):
11 slot = random.randint(0, slots - 1) # choose the random slot.
12 collisions += T[slot]
13 T[slot] += 1
14 return collisions
15
16 for (num_trials, num_slots, num_elements) in [(1000, 100000, 448), (100, 100000,141422)]:
17 total_collisions = sum([count_collisions(num_slots, num_elements) for i in range(num_trials)])
18 print("for", num_elements, "average number of collisions =", total_collisions / num_trials)

Figure S.10.9 Counting collisions when using chaining to resolve collisions in a hash table.
214 Probability

= m · Hm . definition of harmonic numbers

10.140 There are 2020 equally probable possible sequences of responses; we’re happy with any of the 20! permutations of
the elements. Thus we get 20 different answers in the first 20 trials with probability
20!
≈ 0.0000000232019 · · · .
2020

10.141 We expect to need 20 · H20 trials. The exact value of H20 is 1 + 1

2
+ 1
3
+ ··· + 1
20
= 3.5977 · · · , so we’d expect
to need 71.95479 · · · trials to collect all 20 answers.

19
10.142 Each trial needs to fail to get the particular answer, which happens with probability 20
on any particular trial. We’d
need all 200 trials to fail, which happens with probability (19/20)200 ≈ 0.00003505.
S20
10.143 Let Ei be the event that we’ve missed answer i. We’re looking to show that the probability of i=1 Ei is small.
" 20 #
[ X20
Pr Ei ≤ Pr [Ei ] Union Bound
i=1 i=1

X
20

19 200
= 20
previous exercise
i=1

19 200
= 20 · 20
= 0.00070105 · · · .
Thus the probability of failure is upper bounded by ≈ 0.07%.

10.144 Consider an n-bit number. If the error occurs in bit i, which occurs with probability 1n , then the error is of size 2i−1 .
Thus the expected error is
X
n−1
1
n
· 2i = 1
n
· (2n − 1).
i=0

232 −1
For n = 32, the expected error is therefore of size 32
= 227 − 1
32
.

10.145 The probability is 1n : there are n slots, and i is equally likely to go into any of them.
Pn
10.146 Let Xi be an indicator random variable where Xi = 1 if and only if πi = 1. Then X = i=1 Xi , and, by linearity
of expectation and the previous exercise,
" n #
X X n X
n X n
E Xi = E [Xi ] = Pr [Xi ] = 1
n
= n · n1 = 1.
i=1 i=1 i=1 i=1

10.147 Let X be a random variable that is always nonnegative. Let α ≥ 1 be arbitrary. Note that X either satisfies X ≥ α
or 0 ≤ X < α. Thus:
E [X] = E [X|X ≥ α] · Pr [X ≥ α] + E [X|X < α] · Pr [X < α] the “law of total expectation” (Theorem 10.21)
≥ α · Pr [X ≥ α] + 0 · Pr [X < α] X ≥ α and X ≥ 0 in the two terms, respectively
= α · Pr [X ≥ α] .
E [ X]
Dividing both sides by α yields that Pr [X ≥ α] ≤ .
α

10.148 By Markov’s inequality with α = (the median of X), we have

E [X]
Pr [X ≥ median] ≤ .
median
10.4 Random Variables 215

And, by definition of median, we also have

1
Pr [X ≥ median] ≥ .
2

Combining these inequalities, we have 1

2
≤ E[X]
median
and thus, rearranging, median ≤ 2 · E [X].

1
10.149 The probability that it takes k flips to get the first heads is 2k
. Thus your expected payment from the game is

X
∞ X
∞ X
∞
1
2i
· ( 32 )i = ( 34 )i = 3
4
· ( 34 )i .
i=1 i=1 i=0

Using the formula for a geometric sum, this value is

1
3
· = 3
· 4 = 3.
4
1− 3
4
4

You should be willing to pay $3 to play.

1
10.150 The probability that it takes k flips to get the first heads is 2k
. Thus your expected payment from the game is
X P
1
2i
· 2i = i=1,2,... 1.
i=1,2,...

In other words, this game has infinite expected payment! You should be willing to pay any amount y to play (. . . assuming
you care only about expectation, which is certainly not true of your true preferences for dollars. For example, suppose I
gave you the choice between (i) a 50% chance at winning two trillion dollars [and a 50% chance of winning nothing], and
(ii) a guaranteed one trillion dollars. You are not neutral between (i) and (ii), even though the two scenarios have exactly
the same expected value!)

4
1
10.151 Note that X = i with probability i
/16 for each i, which works out to , 4, 6, 4, 1
16 16 16 16 16
for X = 0, 1, 2, 3, 4. Thus:
h i
var (X) = E X − [E [X]]
2 2

2
02 · 1 + 12 · 4 + 22 · 6 + 32 · 4 + 42 · 1 0·1+1·4+2·6+3·4+4·1
= −
16 16
2
0 + 4 + 24 + 36 + 16 0 + 4 + 12 + 12 + 4
= −
16 16
80
= − 2 = 5 − 4 = 1.
2
16

The variance is 1.

10.152 Note that E [Y] = 3.5, just as for a single roll: write Y1 and Y2 as half the sum of the result of two independentrolls

of the dice; then E [Yi ] = 0.5 · 3.5 = 1.75, and, by linearity of expectation, E [Y] = E [Y1 ] + E [Y2 ] = 3.5. As for E Y2 ,
there are 6 − i ways to get a total of 7 ± i, so
h i h i
E Y2 = 1
36
1( 22 )2 + 2( 32 )2 + 3( 42 )2 + 4( 25 )2 + 5( 62 )2 + 6( 72 )2 + 5( 82 )2 + 4( 92 )2 + 3( 102 )2 + 2( 11
2
)2 + 1( 122 )2
= 987
72
≈ 13.70833.

Thus var (Y) = 987/72 − (7/2)2 = 105/72 ≈ 1.45833, half of what it was before.
216 Probability

10.153 Let a ∈ R, and let X be a random variable. Then

X
E [a · X] = (a · X)(x) definition of expectation
x∈S
X
= a · X(x) definition of random variable a · X
x∈S
X
=a· X(x) algebra
x∈S

= a · E [ X] . definition of expectation

10.154 Let a ∈ R, and let X be a random variable. Then

h i
var (a · X) = E (a · X) − (E [a · X])
2 2
Theorem 10.23
h i
= E a2 · X2 − (E [a · X])2 algebra
h i
= a2 · E X2 − (a · E [X])2 linearity of expectation/Exercise 10.153
h i
= a2 · E X2 − a2 · (E [X])2 algebra
h h i i
= a2 · E X2 − (E [X])2 factoring

= a · var (X) .
2
Theorem 10.23

10.155 Let X and Y be two independent random variables. Then:

h i
var (X + Y) = E (X + Y) − [E [X + Y]]
2 2
definition of variance
h i
= E (X + Y)2 − [E [X] + E [Y]]2 linearity of expectation
h i
= E X2 + 2XY + Y2 − [E [X]]2 − 2E [X] E [Y] − [E [Y]]2 algebra
h i h i
= E X2 + E [2XY] + E Y2 − [E [X]]2 − 2E [X] E [Y] − [E [Y]]2 linearity of expectation
h i h i
= E X2 − [E [X]]2 + E Y2 − [E [Y]]2 + E [2XY] − 2E [X] E [Y] rearranging terms

= var (X) + var (Y) + E [2XY] − 2E [X] E [Y] definition of variance

= var (X) + var (Y) + E [2XY] − E [2XY] Exercise 10.129
= var (X) + var (Y) .

10.156 Define indicator random variables X1 , X2 , . . . , Xn that areP1 if the ith flip comes up heads, so that X is the number
n
of heads found in n flips of a p-biased coin is given by X = i=1 Xi . Note that these indicator random variables are
2
independent, E [Xi ] = p, and var (Xi ) = E Xi − E [Xi ] = p − p2 = p(1 − p). Thus
2

" n #
X Xn
E [ X] = E Xi = E [Xi ] = np
i=1 i=1

by linearity of expectation and the above discussion, and

!
Xn X
n
var (X) = var Xi = var (Xi ) = np(1 − p)
i=1 i=1

by Exercise 10.155 and the above discussion.

10.157 Define an indicator random variable Xi that’s 1 if the ith flip came up heads. Then we can write
Pn
i=1 Xi 1
Y= = · X,
n n
10.4 Random Variables 217

where X is the number of heads found in n flips of a p-biased coin. Thus

1
E [Y] = E n
·X and var (Y) = var 1
n
·X
= 1
n
· E [ X] linearity of expectation = 1
n2
· var (X) Exercise 10.154

= 1
n
· pn Exercise 10.156 = 1
n2
· np(1 − p) Exercise 10.156
= p. p(1 − p)
= .
n

10.158 Recall the formula for a geometric summation

X
n
i rn+1 − 1
r = . (1)
i=0
r−1

Thus the first derivatives of both sides must be equal, and likewise the second derivatives. Taking the first and second
derivatives of the left-hand side of (1), we have

d X i X
n n
i−1
r = ir (2)
dr i=0 i=0

d2 X i X
n n
i( i − 1 ) r .
i−2
r = (3)
d2 r i=0 i=0

Differentiating the right-hand side of (1), we have

d rn+1 − 1 (r − 1)(n + 1)rn − (rn+1 − 1)
=
dr r−1 (r − 1)2
rn+1 n − rn (n + 1) + 1
=
(r − 1)2
1
→ as n → ∞ for r ∈ [0, 1). (4)
(r − 1)2

rn+1 n−rn (n+1)+1

Similarly, taking the derivative of (r−1)2
we see that

d2 rn+1 − 1 −2 2
→ = as n → ∞ for r ∈ [0, 1). (5)
d2 r r−1 (r − 1)3 ( 1 − r) 3

Assembling these observations (and taking the limit at n → ∞), we have

X
∞
i−1 1
ir = by (1), (2), and (4)
i=0
( 1 − r) 2
X∞
i r
ir = , multiplying both sides by r
i=0
( 1 − r) 2

which was the first formula we were asked to show. Similarly,

X
∞
2
i(i − 1)r
i−2
= by (1), (3), and (5)
i=0
( 1 − r) 3
X∞
2r2
i( i − 1 ) r =
i
. multiplying both sides by r2
i=0
( 1 − r) 3
218 Probability

Finally, adding the two sides of our first result to this equality, we derive our second desired formula:

X
∞ X
∞
2r2 r
i(i − 1)r +
i i
ir = +
i=0 i=0
( 1 − r) 3 ( 1 − r) 2
X
∞
2 i r 2r2
ir = + i + i(i − 1) = i2
i=0
( 1 − r) 2 ( 1 − r) 3
X∞
2 i r(1 − r) + 2r2
ir = algebra
i=0
( 1 − r) 3
X∞
2 i r( 1 + r)
ir = . algebra
i=0
( 1 − r) 3

10.159 Let X be a geometric random variable with parameter p. Then

h i
var (X) = E X − E [X] probability of needing i flips
2

"∞ #
X 1
2
= (1 − p) p · i −
i−1 2

i=1
p
"∞ #
p X 1
= ( 1 − p ) i i2 − 2
1 − p i=1 p
p (1 − p)(2 − p) 1
= · − 2 previous exercise
1−p p3 p
1−p
= .
p2

10.160 Each clause has three literals. There’s a 21 probability that any one of those literals fails to be set in the way that
makes the clause true; the clause is false only if all three literals fail to be set in the way that makes the clause true. The
three literals fails independently unless a variable and its negation both appear in the same clause, so the probability that
a clause isn’t satisfied is 12 · 12 · 12 = 18 . (If a variable and its negation both appear in the same clause, then the clause must
be satisfied by any assignment.) Thus the clause is set to true with probability 87 or great.

10.161 Let Xi be an indicator random variable that’s 1 if the ith clause is satisfied, and 0 if the ith clause is not
Pmsatisfied.
By the previous exercise, Pr [Xi = 1] ≥ 87 , and thus EP[Xi ] ≥ 78 . The number of satisfied clauses is X = i=1 Xi ; by
linearity of expectation, we can conclude that E [X] = mi=1 E [Xi ] ≥ 7m8
.

10.162 Let X be any random variable. Suppose for a contradiction that Pr [X < E [X]] = 0. Then
E [X] = E [X|X ≥ E [X]] · Pr [X ≥ E [X]] + E [X|X < E [X]] · Pr [X < E [X]] the “law of total expectation” (Theorem 10.21)
= E [X|X ≥ E [X]] · 0 + E [X|X < E [X]] · 1
= E [X|X < E [X]]
< E [X] . X (and thus its expectation) is always less than E [X] if we condition on X < E [X]!

That’s a contradiction: we can’t have E [X] < E [X]!

The expected number of satisfied clauses in a random assignment for an m-clause 3CNF proposition φ is at least
7m
8
, by Exercise 10.161. Thus by the above fact, the probability that a random assignment satisfies at least 7m
8
clauses is
nonzero—and if there’s a nonzero chance of choosing an assignment in a set S, then |S| ̸= ∅!
11 Graphs and Trees

11.2 Formal Introduction

11.1 The graph is naturally undirected. It is not simple: there’s a self-loop on 1. Here’s the graph:

10
9 1

8 2

7 3

6 4
5

11.2 The graph is directed, and not simple: every node has a self-loop.

10
9 1

8 2

7 3

6 4
5

11.3 The graph is directed and simple (there are no self-loops nor parallel edges).

10
9 1

8 2

7 3

6 4
5

219
220 Graphs and Trees

11.4 Edges: {A, B}, {B, C}, {B, E}, {C, D}, {C, E}, {D, E}, {D, G}, {E, F}, {F, H}, {G, H}. The highest-degree node is E,
with degree 4.

11.5 Edges: {A, H}, {B, E}, {B, D}, {C, H}, {C, E}, {D, B}, {E, F}, {G, F}. The node of highest degree is E, with degree 3.

11.6 The node of highest in-degree is H, with in-degree 3. The nodes of highest out-degree are A, B, C, and D, all with
out-degree 2.

11.7 The nodes of highest in-degree are actually all of nodes, which are tied with in-degree 1. The node of highest out-
degree is A, with out-degree 3.

11.8 |E| can be as large as n2 = n(n − 1)/2: every unordered pair of nodes can be joined by an edge.
|E| can be as small as 0: there’s no requirement that there be any edges.

11.9 |E| can be as large as n(n − 1): every ordered pair of distinct nodes can be joined by an edge. (Or, to say it another
way: every unordered pair can be joined in 2 directions.)
|E| can still be as small as 0: there’s no requirement that there be any edges.

11.10 |E| can now be as large as n2 : all pairs in V × V are legal edges, because the joined nodes no longer have to be
distinct. |E| can still be as small as 0.

11.11 |E| can now be arbitrarily large: for any integer k, we can have k edges from one node to another. So there’s no upper
bound in terms of n. |E| can still be as small as 0.

11.12 There can be as many as 150 + 150 · 149 = 22,500 people in |S|. Here is a graph with this structure: each of Alice’s
150 friends has 149 distinct friends other than Alice, and so there are 150 people at distance 1 and 150 · 149 = 22,350
people at distance 2. Writing A1 , A2 , . . . , A150 to denote Alice’s friends, the structure of the network that we just described
is as follows:

Alice

A1 A2 ... A150

1 2 ... 149 1 2 ... 149 1 2 ... 149

11.13 There can be as few as 150 people in |S|. If all of Alice’s friends are also friends with each other, then each of these
people knows the 149 others and Alice, for a total of 150 friends. Here is a scaled-down version of the structure of the
network, (shown here with only 15 nodes for readability):
11.2 Formal Introduction 221

11.14 The number of people Bob knows via a chain of k or fewer intermediate friends can be as many as
X
k
149k+1 − 1 150
150 · 149 = 150 · · (149k+1 − 1)
i
=
i=0
149 − 1 148

using Theorem 5.5. (To check: when k = 0, this value is 150

148
·(149 − 1) = 150. When k = 1, this value is 150
148
·(1492 − 1) =
150
148
· (149 − 1)(149 + 1) = 150 .) 2

At each layer k ≥ 1, all 149k−1 people who are k − 1 hops removed from Bob have precisely 149 additional new
friends. There are 150 people at level 0. (The rest of the argument matches the tree from Exercise 11.12, just with more
layers.)

11.15 There can still be as few as 150 people in |Sk |. If all of Bob’s friends are also friends with each other, then no
matter how many “layers” outward from Bob we go, we never encounter any new people. (The picture is identical to
Exercise 11.13.)

11.16 Each of u’s neighbors v has degree at least one, because u is a neighbor of v. Thus
X X
degree(v) ≥ 1 = |neighbors(u)| = degree(u).
v∈neighbors(u) v∈neighbors(u)

P
11.17 By Theorem 11.8, the quantity u∈V degree(u) is 2|E|, which is an even number. But a sum with an odd number
of odd-valued terms would be odd, so we must have that the number of odd terms in the sum—that is, nodd —is even.

11.18 Imagine looping over each edge to compute all nodes’ in- and out-degrees:

1 initialize inu and outu to 0 for each node u

2 initialize m to 0
3 for each edge {u, v} ∈ E:
4 outu := outu + 1
5 inv := inv + 1
6 m := m + 1

By precise analogy to the proof of Theorem 11.8, at the end of P we have that inu = in-degree(u) and
P the for loop,
outu = out-degree(u) and m = |E|. Furthermore, it’s clear that u inu = u outu = m, because each of these three
quantities increased by 1 in each iteration. The claim follows.

11.19 head

11.20 head

11.21 There can’t be parallel edges because there’s one and only one outgoing edge from any particular node, but there
can be self loops:

head

11.22 Here’s the idea of the algorithm: we’re going to remember two nodes, called one and two. Start them both as the
head of the list. We’ll repeatedly advance them down the list, but we’ll move one forward by one step in the list, and two
222 Graphs and Trees

forward by two. If we ever reach the end of the list (a null pointer), then the list is not circular. If one is ever equal to
two after the initialization, then there’s a loop! Here’s the algorithm:

1 if L has fewer than two elements then

2 return “circular” or “noncircular” (as appropriate)
3 one := L.head.next
4 two := L.head.next.next
5 while one ̸= two
6 if L has < 2 elements following one or two then
7 return “circular” or “noncircular” (as appropriate)
8 one := one.next
9 two := two.next.next
10 return “circular”

(Testing whether there are fewer than 2 nodes following x is done as follows: remember x, x.next, and x.next.next,
testing that none of these values are null [in which case the list is noncircular] and that none of them are identical [in
which case the list is circular].)
One direction of correctness is easy. Suppose that L is not circular. Number the nodes of L as 1, 2, . . . , n. It’s a
straightforward proof by induction that one always points to a smaller-index node than two so that we always have one ̸=
two. And, after at most n2 iterations, we reach the end of the list (two.next.next is null) and the algorithm correctly
returns “noncircular.”
For the other direction, suppose that L is circular—with a k-node circle after the initial “tail” of n − k nodes. Then after
k iterations of the loop, both one and two must be in the circle. If two is ℓ nodes behind one on the circle after k iterations,
then after an additional ℓ mod k iterations, the values of one and two will coincide and the algorithm will correctly return
“circular.”

11.23 2

1 3

5 4

11.24 Let n = 1. Then node 1’s next pointer is back to 1, because 1 mod n + 1 = 1. (Likewise, node 1’s prev pointer
is back to 1.)

prev
1 next

11.25 Let n = 2. Then node 1’s next pointer is to 2, and 2’s next pointer is to 1 because 2 mod n + 1 = 1. Thus 1’s
next pointer is to 2 and 1’s previous pointer is 2.
11.2 Formal Introduction 223

next
prev next
1 2
prev

11.26 A:B
B : A, C, E
C : B, D, E
D : C, E, G
E : B, C, D, F
F : E, H
G : D, H
H : D, G

11.27 A:H
B : D, E
C : E, H
D:B
E : B, C, F
F : E, G
G:F
H : A, C

11.28 A : B, C
B : E, H
C : D, H
D : G, H
E:B
F:
G:F
H:

11.29 A : B, C, D
B:
C:
D:A

11.30 A B C D E F G H
A 0 1 0 0 0 0 0 0
B 0 1 0 1 0 0 0
C 0 1 1 0 0 0
D 0 1 0 1 0
E 0 1 0 0
F 0 0 1
G 0 1
H 0
224 Graphs and Trees

11.31 A B C D E F G H
A 0 0 0 0 0 0 0 1
B 0 0 1 1 0 0 0
C 0 0 1 0 0 1
D 0 0 0 0 0
E 0 1 0 0
F 0 1 0
G 0 0
H 0

11.32 A B C D E F G H
A 0 1 1 0 0 0 0 0
B 0 0 0 0 1 0 0 0
C 0 0 0 1 0 0 0 1
D 0 0 0 0 0 0 1 1
E 0 1 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0
G 0 0 0 0 0 1 0 0
H 0 0 0 0 0 0 0 0

11.33 A B C D
A 0 1 1 1
B 0 0 0 0
C 0 0 0 0
D 1 0 0 0

11.34 Yes, it must be a directed graph.

Suppose that G is an undirected graph with n nodes. There are n possible degrees—0, 1, . . . , n − 1 (with the last degree
possible only if every other node is a neighbor). But it is not possible to have a node u of degree 0 and a node v of degree
n − 1: either {u, v} ∈ E (in which case u has nonzero degree) or {u, v} ∈ / E (in which case v’s degree is less than n − 1).
Thus there are only n − 1 possible degrees in G, and n nodes. By the pigeonhole principle, two nodes must have the same
degree.
Thus we have a contradiction of the assumption, and so G cannot be an undirected graph.

11.35 We have n nodes {0, 1, 2, . . . , n − 1}, with edges ⟨x, y⟩ when x < y. Thus node i has in-degree i (from the nodes
0, 1, . . . , i − 1) and out-degree n − 1 − i (to the nodes i + 1, i + 2, . . . , n). Thus every node has a unique out-degree, as
required:

0 1 2 3 4 5 ···
n−1

11.36 There are n − 1 edges out of n
2
= n(n − 1)/2 possible edges. The density is thus
n−1 2(n − 1) 2
= = .
n(n − 1)/2 n(n − 1) n

11.37 There are n edges out of n
2
= n(n − 1)/2 possible edges. The density is thus
n 2n 2
= = .
n(n − 1)/2 n(n − 1) n−1

n
11.38 There are 3 edges in each triangle, and 3
triangles. So there are n edges, and the density is (just as for a cycle)
n 2n 2
= = .
n(n − 1)/2 n(n − 1) n−1
11.2 Formal Introduction 225

11.39 All nodes have identical degree in this graph, and all have the same number of potential neighbors too. Thus the
density of the graph as a whole is equal to the fraction of actual neighbors to possible neighbors for any particular node.
Consider any single node. It has n3 − 1 neighbors. It could have n − 1 neighbors. Thus the density of the graph is

n
3
−1 n−3
= .
n−1 3n − 3

11.40 100 110

000 010

101 111

001 011

11.41 0000 : 0001, 0010, 0100, 1000

0001 : 0000, 0011, 0101, 1001
0010 : 0011, 0000, 0110, 1010
0011 : 0010, 0001, 0111, 1011
0100 : 0101, 0110, 0000, 1100
0101 : 0100, 0111, 0001, 1101
0110 : 0111, 0100, 0010, 1110
0111 : 0110, 0101, 0011, 1111
1000 : 1001, 1010, 1100, 0000
1101 : 1000, 1011, 1101, 0001
1010 : 1011, 1000, 1110, 0010
1011 : 1010, 1001, 1111, 0011
1100 : 1101, 1110, 1000, 0100
1101 : 1100, 1111, 1001, 0101
1110 : 1111, 1100, 1010, 0110
1111 : 1110, 1101, 1011, 0111
0000
0001
0010
0011
0100
0101
0110
0111
1000
1101
1010
1011
1100
1101
1110
1111

11.42
0000 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0
0001 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1
0010 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1
0011 0 0 0 0 1 0 0 0 1 1 0 0 0 1 1 0
0100 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1
0101 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0
0110 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0
0111 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
1000 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1
1101 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0
1010 0 1 0 0 1 0 0 1 0 1 0 0 0 1 0 0
1011 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0
1100 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0
1101 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0
1110 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0
1111 0 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0
226 Graphs and Trees

11.43 The degree of each node is n: there P

is one bit to be flipped per bit position, and there are n bit positions. There are
2n nodes. So the total number of edges is u degree(u) = 2n · n, and thus there are n · 2n−1 edges. So the density is

n · 2n−1 n
=
2n (2n −1) 2n − 1
2

11.44 Yes, these graphs are isomorphic. (Actually they’re both the hypercube H3 .) One way to see it is via the following
mapping (you can verify that all the edges match up using this mapping).

A B C D E F G H
↕ ↕ ↕ ↕ ↕ ↕ ↕ ↕
J O P I L M K N

11.45 No, they’re not: there are four degree-3 nodes in the first graph (B, D, E, and G) but only two in the second (H, M).

11.46 No, they’re not isomorphic. The nodes 11 and 13 have degree zero in G1 , but only node 23 has degree zero in G2 .

11.47 The claim is true: there must be precisely two edges (the sum of the nodes’ degrees is 4), and they must connect two
distinct pairs of nodes—otherwise there’d be a node with degree 2. The only graph of this form looks like

11.48 The claim is true. There can be only one pair of nodes missing an edge. Just mapping the two degree-three nodes to
each other establishes the isomorphism.

11.49 The claim is false. Consider these graphs:

In the latter case, the two nodes of degree three are neighbors. In the former, they’re not.

11.50 The claim is false. Consider these graphs:

In the former case, there’s a way to get from every node to every other node. (In the language that we’ll see soon: the
graph is connected.) That’s not true in the latter graph, and therefore they can’t be isomorphic.

11.51 There’s a complete subgraph of size 4: D, G, E, and I.

11.52 There’s a complete subgraph of size 5: B, C, E, H, and J.

11.53 There are no triples of nodes that are all joined by an edge, so the largest clique in this graph is of size two. Any
pair of nodes with an edge between them forms a complete subgraph of size 2.
11.2 Formal Introduction 227

11.54 There could be as few as five movies, but not fewer. Each of the three “big” cliques needs at least one movie to
generate it, and the curved edges can only have been the result of a 2-person movie (because they do not participate in
any larger cliques).

11.55 We can’t be certain. Any additional movie m that involved a subset of the actors of some movie m′ described in the
previous exercise wouldn’t create any additional edges. (That is, if m ⊆ m′ then adding m to a list of movies including
m′ would have no effect on the collaboration graph.) So we can’t tell from this representation whether any such movie
exists!

11.56 This graph is bipartite for any n. All edges join an even-numbered node to an odd-numbered node, so we can separate
the nodes into two columns based on parity.

11.57 This graph is bipartite for any even n. In this case, all edges join an odd-numbered node to an even-numbered node
(because ⟨n − 1, (n − 1) + 1 mod n⟩ joins an odd node to 0, an even node). So for even n we can separate the nodes into
two columns based on parity.
But the graph is not bipartite for odd n, for essentially the same reason: 0 goes in one column, then 1 must go in the
other, then 2 must go into 0’s column, etc. But we get stuck when we try to join n − 1 (which is even) to 0 (also even).

11.58 This is bipartite if and only if n ∈ {1, 2}. Once n = 3, we’re stuck: there’s a triangle!

11.59 This graph is always bipartite. Put nodes {0, 1, . . . , n − 1} in the first column and nodes {n, n + 1, . . . , 2n − 1}
in the second column.

11.60 Yes, it is bipartite. Here’s a rearrangement of the edges into two columns:

A H
C E
B D
F G

11.61 This graph is not bipartite. If we put B in the left column, then we must put both C and E in the right column—but
there’s an edge between them, so we’ve failed.

11.62 True: every edge connects a node in L to a node in R. Thus both the sum of the degrees of the nodes in L and the
sum of the degrees of the nodes in R must equal |E|.

11.63 False. For example, consider L = {1} and R = {2} and there’s an edge {1, 2}. Then the sum of the degrees of the
nodes in L is 1.

11.64 True! In fact, this claim is true even if the graph isn’t bipartite, by Theorem 11.8.

11.65 There are |L| · |R| = |L| · (n − |L|) edges in the graph. You can show in many different ways (calculus, or just
plotting this quantity) that this quantity is maximized when |L| = n2 . In this case, the number of edges in the graph is
n
2
· n2 = n2 /4.

11.66 Zero! There’s nothing in the definition that prevents us from having |L| = 0 and |R| = n. And the complete bipartite
graph K0,n has zero edges!

11.67 False. For example, the following pentagon is not bipartite, but contains no triangle as a subgraph:
228 Graphs and Trees

2
1 3

5 4

The correct statement, incidentally: any graph that does not contain any odd-length cycle (see Section 11.4) as a subgraph
is bipartite.

11.68 The total number of edges is precisely the sum of the in-degrees and is also equal to the sum of the out-degrees.
(See Exercise 11.18.) That is,
X X
din = |V| · din = |E| and dout = |V| · dout = |E|.
u∈V u∈V

Therefore din = dout = |E|/|V|.

11.69 Here is the same graph, with the nodes rearranged so that no edges cross:

F B
E
A
C D
G

11.70 Here is the same graph, with the nodes rearranged so that no edges cross:

A H B G

F C
I E

11.71 We’ll prove it by giving an algorithm to place the nodes without edges crossing.

1 y := 0
2 while there are undrawn nodes:
3 choose an arbitrary undrawn node v0 and place it at the point ⟨0, y⟩
4 let v1 be one of v0 ’s neighbors, chosen arbitrarily, and place it at the point ⟨1, y⟩
5 x := 1
6 while vx ’s other neighbor (other than vx−1 ) is not v0 :
7 place vx at the point the point ⟨x, y⟩
8 let vx+1 be vx ’s other neighbor
9 x := x + 1
10 draw the edge from vx to v0 [without the vertical position of the arc going above y + 0.5]
11 y := y + 1

Consider an iteration for a particular value of y. Each newly encountered node must have either a “new” second neighbor,
or it must have v0 as its second neighbor. (It cannot be any other previously encountered node, because all of those nodes
already have their two neighbors accounted for!) When we find the node that reaches back to v0 —and we must, because
the set of nodes is finite—we simply draw an curved arc back to v0 .
There may still be nodes left, undrawn, at this point; we, just repeat the above algorithm (shifted far enough up so that
the newly drawn nodes won’t be anywhere near the previously drawn ones) for a newly chosen v0 .
11.3 Paths, Connectivity, and Distances 229

We were asked to prove that any 2-regular graph is planar, and we’ve now done so constructively: we’ve given an
algorithm that finds a planar layout of any such graph!

11.3 Paths, Connectivity, and Distances

11.72 ⟨D, E, B⟩

11.73 ⟨C, H⟩ and ⟨C, D, E, B, H⟩

11.74 ⟨C, D, B⟩

11.75 ⟨A, G, H⟩ and ⟨A, G, B, H⟩

11.76 ⟨D, B, C, E, G, B, H⟩

11.77 ⟨B, A, F, D, C⟩

11.78 ⟨B, C, F⟩

11.79 ⟨B, E, H, F, D, C⟩ (among others)

11.80 {A, B, C, F}.

11.81 All of them: {A, B, C, D, E, F, G, H}.

11.82 The graph is not connected; there’s no path between A and B, for example. The connected components are:

• {A, F, G}
• {B, C, D, E, H}

11.83 The graph is not strongly connected; there’s no path from any node ̸= A to A, for example. The strongly connected
components are:

• { A}
• { F}
• { H}
• {B, C, D, E, G}

11.84 The graph is connected. We can go from B to every other node except E via prefixes of the path ⟨B, A, F, D, C⟩, and
get to E on the path ⟨B, A, E⟩. The graph is undirected, so each of these paths can also be followed in reverse. That’s
enough to get from any node u to any node v: go from u to B as above, then go from B to v.

11.85 The graph is not strongly connected; there’s no path from any node A to D, for example. The strongly connected
components are

• {A, B, C, F}
• {D, E}

11.86 The adjacency matrix is symmetric, so we might as well view the graph as undirected (though it could be directed,
too, just with every edge matched by a corresponding one that goes in the opposite direction). The graph is connected,
which is easiest to see via picture:
230 Graphs and Trees

C
D
A F B G
H
E

11.87 Suppose there’s a path from s to t. By definition, that is, there is a sequence ⟨u0 , u1 , . . . , uk ⟩ where:
• u0 = s and uk = t, and
• for each i ∈ {0, 1, . . . , k − 1} we have ⟨ui , ui+1 ⟩ ∈ E.
Then the sequence ⟨uk , uk−1 , . . . , u0 ⟩ is a path from t to s: its first entry is t = uk ; its last entry is s = u0 ; and every pair
⟨ui+1 , ui ⟩ ∈ E because the graph is undirected (and thus ⟨ui , ui+1 ⟩ ∈ E if and only if ⟨ui+1 , ui ⟩ ∈ E).

11.88 In short: we can snip out the part between visits to the same node.
Consider any non-simple path P = ⟨u0 , u1 , . . . , uk ⟩ from u0 to uk . Let v be the first node repeated in P, and say that
v = ui and v = uj for j > i. That is,

P = ⟨u0 , u1 , . . . , ui−1 , v, ui+1 , . . . , uj−1 , v, uj+1 , . . . , uk ⟩.

Then we claim that

′
P = ⟨u0 , u1 , . . . , ui−1 , v, uj+1 , . . . , uk ⟩

is also a path from u0 to uk (all adjacent pairs in P′ were also adjacent pairs in P, so the edges exist), and it’s shorter than
P. Thus P could not have been a shortest path.

11.89 In the complete graph Kn , we have

(
1 if s ̸= t
d(s, t) =
0 if s = t.

Thus the diameter can be as small as 1 (if n ≥ 2) and it can be 0 (if n = 1).

11.90 The largest possible diameter is n − 1, which occurs on the graph that’s just an n-node path:

It can’t be larger, because the shortest path is always simple, and thus includes at most n nodes (and thus n − 1) edges.
And the distance between the two ends of the path is exactly n − 1.

11.91 There can be as many as n/4 connected components: each connected component is a K4 , which gives each node an
edge to the other 3 nodes in its component. There are precisely 4 nodes in each component, and thus n/4 total connected
components.

11.92 There can be as few as one single connected component in such a graph. For example, define the graph with nodes
V = {0, 1, . . . , n − 1} and edges {⟨u, (u + 1) mod n⟩ : u ∈ V}. This graph is 2-regular (every node has an edge to
the node “one higher” and to the node “one lower”), and is connected. Add an arbitrary matching to these edges—for
example, add the edges {⟨u, u + (n/2)⟩ : 0 ≤ u < n/2}—and the resulting graph is 3-regular, and still connected.

11.93 Here’s a picture of such a graph:

···
11.3 Paths, Connectivity, and Distances 231

We have n/4 groups of 4 nodes each. The graph is 3-regular: the two middle nodes in each group have three intragroup
neighbors; the top and bottom nodes have two intragroup and one intergroup neighbors. The groups are arranged in a
circle; the distance from a node s in group #1 to a node t in group #(n/8) is at least n/8 (because we have to go group by
group from s to t, and either direction around the circle passes through n/8 groups).

11.94 Create a complete binary tree of depth k. Add a cycle in all subtrees of four leaves to give all the leaves a degree
of 3. Add one extra leaf from the bottom-most right-most node, and include that extra node in the last cycle; then add an
edge from that extra node back to the root. Every node now has degree 3:

We’ll argue that there is an upper bound of 8 log n on every distance. To get from any node s to any other node t, consider the
path that goes via the root, using only tree edges. That path has length at most 2k. There are 1 + 2 + 4 +· · ·+ 2k = 2k+1 − 1
nodes. Observe that
8 log n = 8 log(2
k+1
− 1) ≥ 8(k + 1) log 2 ≥ 8k.
And we showed that path lengths were at most 2k.

11.95 False. The claim fails on the graph L = {1, 3} and R = {2, 4} and E = {{1, 2} , {3, 4}}.

1 2 3 4

11.96 ABCDEJHFGI (among others).

11.97 We can choose any order of the n − 2 nodes other than {s, t}, and visit them in that order. So there are (n − 2)!
different Hamiltonian paths from s to t in the n-node complete graph.

11.98 Suppose s is on the left side. After 2k − 1 steps, we will have visited k nodes on the left, and k − 1 nodes on the
right. Thus we can only find a Hamiltonian path in the following two cases:
• each side of graph has the same number of nodes (that is, n = m), and furthermore s and t are on opposite sides.
• s’s side of the graph has one more node than the other side (that is, n = m + 1), and furthermore s and t are both on
the larger (n-node) side of the graph.
In the former case, we can visit all n − 1 other nodes on the s-side in any order, and all m − 1 other nodes on the t-side
in any order—though we must alternate sides. So there are (n − 1)! · (m − 1)! orderings.
In the latter case, we can visit all n nodes on the m-node side in any order, and all n − 2 other nodes on the n-node
side in any order after we start at s and before we finish at t. Again we must alternate sides. So there are (n − 2)! · m!
orderings.
In any other case, there are no Hamiltonian paths from s to t.

11.99 Fix a node u. There are 2 nodes at distance 1 from u (its neighbors), there are 2 nodes at distance 2 from u (its
neighbors’ neighbors), and so forth—up to 2 nodes at distance (n − 1)/2. Thus

2 · (1 + 2 + 3 + · · · + n−1
2
)
the average distance from u to other nodes = the above argument
n−1
n−1
· ( n−1
+ 1) 1
=2· 2 2
· arithmetic summation (Theorem 5.3)
2 n−1
n−1
+1 n+1
= 2 = . algebraic simplification
2 4
232 Graphs and Trees

11.100 Fix a node u. There are 2 nodes at distance 1 from u (its neighbors), there are 2 nodes at distance 2 from u (its
neighbors’ neighbors), and so forth—up to 2 nodes at distance (n − 2)/2. There is one additional node at distance n/2
from u. Thus

2 · (1 + 2 + 3 + · · · + n−2
2
) + n2
the average distance from u to other nodes = the above argument
n−1
2 · 2 · ( 2 + 1) + 2n
n−2 n−2
1
= · arithmetic summation (Theorem 5.3)
2 n−1
n2
= . algebraic simplification
4(n − 1)

11.101 Following the hint, we’ll compute the number of pairs of nodes in an n-node path separated by distance k, for any
particular integer k. The largest node-to-node distance is certainly ≤ n, and there are n2 pairs of nodes. Thus:
Pn
k=1 (the number of pairs separated by distance k) · k
the average distance between nodes in an n-node path = n
.
2

Looking at the nodes from left to right, observe that the first n − k nodes have a node to their right at distance k. (The
node that’s k hops from the right-hand edge of the path does not have a k-hops-to-the-right node, because that would-be
node is beyond the end of the chain.) Thus there are precisely n − k pairs of nodes that are separated by distance k, and
therefore the average distance is
Pn
k=1 (n − k) · k
n
the above argument
2
P P n
n [ nk=1 k] − k=1 k
2
= n
multiplying out and factoring
2

n n(n+1) − n(n+1)(2n+1)
= 2
n
6 sum of the first n (squares of) integers (Theorem 5.3/Exercise 5.1)
2
n
− 2n+1 (n)
= n · (n + 1) · 2 n(n−1)6 factoring; definition of 2
2
3n−2n−1 n−1
n+1
= n · (n + 1) · n(n−1) =
6
n · (n + 1) · 6
n(n−1)
= . algebra
3
2 2

11.102 A solution in Python is shown in Figure S.11.1, which verifies the three previous exercises’ formulas for 2 ≤ n ≤
512.

11.103 Observe that if G = ⟨V, E⟩ is disconnected then there must be a partition of V into disjoint sets A ∪ B with
1 ≤ |A| ≤ n − 1 such that no edge crosses from A to B. If there are k nodes in A and n − k nodes in B, then the maximum
number of possible edges is
! !
k n−k k(k − 1) + (n − k)(n − k − 1)
+ =
2 2 2
k(k − 1) + n2 − n(k + 1 + k) + k(k + 1)
=
2
k2 − k + n2 − 2kn − n + k2 + k
=
2
2k2 − 2kn + n2 − n
= .
2
11.3 Paths, Connectivity, and Distances 233

1 def average_distance(nodes, distance_function):

2 '''
3 Computes the average distance in a graph with the given nodes
4 and the distance between node U and node V given by distance_function(U,V).
5 '''
6 all_distances = [distance_function(x, y) for x in nodes for y in nodes if x != y]
7 num_pairs = [1 for x in nodes for y in nodes if x != y]
8 return sum(all_distances) / sum(num_pairs)
9
10 def average_distance_in_path(n):
11 '''
12 Computes the average distance in a graph (1) -- (2) -- (3) -- ... -- (n-1) -- (n).
13 '''
14 distance = lambda u, v : abs(u - v) # The distance between node U and node V is |U - V|.
15 return average_distance(range(1, n+1), distance)
16
17 def average_distance_in_cycle(n):
18 '''
19 Computes the average distance in a graph (1) -- (2) -- (3) -- ... -- (n-1) -- (n).
20 |____________________________________|
21 '''
22 # The shortest path length between U and V: the smaller of
23 # the clockwise distance (V - U) % n and the counterclockwise distance (U - V) % n.
24 distance = lambda u, v : min((v - u) % n, (u - v) % n)
25 return average_distance(range(1, n+1), distance)
26
27 for n in range(2, 512): # All tests pass!
28 assert average_distance_in_path(n) == (n + 1) / 3
29 if n % 2 == 0:
30 assert average_distance_in_cycle(n) == n**2 / (4 * (n - 1))
31 if n % 2 == 1:
32 assert average_distance_in_cycle(n) == (n + 1) / 4

Figure S.11.1 Verifying the average distance in a path and a cycle.

There are multiple ways to identify the values of k that cause extreme values, but differentiating is the easiest. This quantity
has extreme points at the endpoints of the range (k = 1 and k = n − 1) and the value of k where

d 2k2 − 2kn + n2 − n
0=
dk 2
4k − 2n
⇔0=
2
n
⇔k= .
2
We can check that k = n/2 is a minimum, and k ∈ {1, n − 1} are the maxima we seek. In other words: G can be an
(n − 1)-node clique Kn−1 , plus one additional node. That graph has
(n − 1) · (n − 2)
2
edges, and we argued above that we can’t have more edges in a disconnected graph.

11.104 We claim that the smallest possible number of edges in an n-node connected graph is n − 1. An n-node path has
this many edges, so we only need to show that fewer isn’t possible. Let P(n) denote the claim that no (n − 2)-edge n-node
graph is connected; we’ll prove P(n) for all n ≥ 2 by induction on n.
base case (n = 2). Obviously, if there are zero edges, the graph is disconnected. Thus P(2) holds.
inductive case (n ≥ 3). We assume the inductive hypothesis P(n − 1), and we must prove P(n).
Let G be an arbitrary connected n-node graph with (n − 2) or fewer edges. First we claim that there must be a node
u with degree 1. (If every node has degree at least 2, then the sum of the degrees is at least 2|V| and thus |E| ≥ n, by
Theorem 11.8. If any node has degree 0, the graph is not connected.)
234 Graphs and Trees

Now let G′ be G with both the node u and the lone edge incident to u deleted. Then G′ has n − 1 nodes and n − 3
or fewer edges. But, by the inductive hypothesis P(n − 1), then G′ is not connected. Adding back in node u cannot
bridge components because u has only one neighbor. Therefore G is also disconnected, and P(n) holds.

11.105 Define the graph with nodes V = {0, 1, . . . , n − 1} and edges {⟨u, (u + 1) mod n⟩ : u ∈ V}. In other words, the
graph consists of a circle of n nodes, with a directed edge from each node to its clockwise neighbor. This graph is strongly
connected (we can get from i to j by “going around the loop”) and has n edges.
Now we’ll argue that there cannot be fewer edges. For any n ≥ 2, if a graph has fewer than n edges, then there’s a
node u with out-degree equal to 0, and no other nodes are reachable from u—so the graph isn’t strongly connected.

11.106 Define the graph with nodes V = {1, 2, . . . , n} and edges {⟨u, v⟩ : u < v}. Every node is alone in its own SCC
P P
(there are no “left-pointing” edges), and there are ni=1 n − i = n−1
i=0 i = n(n − 1)/2 edges.
Now we’ll argue that there cannot be more edges. If a graph has more than n(n − 1)/2 edges, then there’s a pair
of nodes u and v with an edge ⟨u, v⟩ and another edge ⟨v, u⟩. (The argument is by the pigeonhole principle: there are
n(n − 1)/2 unordered pairs of nodes and more than n(n − 1)/2 edges, so there must be an unordered pair of nodes
containing more than one edge.) These nodes u and v are in the same SCC.

11.107 Let dG (u, v) denote the distance (shortest path length) between nodes u ∈ V and v ∈ V for a graph G = ⟨V, E⟩.
Then:
Reflexivity: For any node u ∈ V, there’s a path of length 0 from u to u, namely ⟨u⟩. There’s no path of length 0 from u to
v ̸= u, because any such path has the form ⟨u, . . . , v⟩, which contains at least two nodes.
Symmetry: Symmetry follows immediately from Exercise 11.87.
Triangle inequality: Let dG (u, z) = k and dG (z, v) = ℓ. Consider a shortest path ⟨u, p1 , p2 , . . . , pk−1 , z⟩ from u to z and
a shortest path ⟨z, q1 , q2 , . . . , qℓ−1 , v⟩ from z to v. Then the sequence
⟨u, p1 , p2 , . . . , pk−1 , z, q1 , q2 , . . . , qℓ−1 , v⟩
is a path from u to v of length k + ℓ (after we delete any segments that repeat edges, as in Exercise 11.88). This path
may not be a shortest path, but it is a path, and therefore the shortest path’s length cannot exceed this path’s length.
So dG (u, v) ≤ k + ℓ.

11.108 Consider G = ⟨V, E⟩ where V = {A, B, C} and E = {⟨A, B⟩, ⟨B, C⟩, ⟨C, A⟩}. Then dG (A, B) = 1 via the path
⟨A, B⟩ but dG (B, A) = 2 via the path ⟨B, C, A⟩. Thus symmetry is violated even though the graph is strongly connected.

11.109 The quantification is over all pairs in C. If there’s a path from A ∈ C to B ∈ C but not from B to A, then under the
old definition C doesn’t meet the specification for ⟨s, t⟩ = ⟨A, B⟩ or for ⟨s, t⟩ = ⟨B, A⟩. Under the new definition C only
fails for ⟨s, t⟩ = ⟨B, A⟩—but it fails nonetheless!

11.110 Let R(u, v) denote mutual reachability in a directed graph G. Then:

Reflexivity: For any node u, we have that ⟨u⟩ is a path from u to u. Thus R(u, u).
Symmetry: For any nodes u and v, if R(u, v) then (i) there’s a path from u to v, and (ii) a path from v to u. But (ii) and (i),
respectively, establish that R(v, u).
Transitivity: For any nodes u and v and x, if R(u, v) and R(v, x), then there’s a directed path P1 from u to v, a directed path
P2 from v to u, a directed path Q1 from v to x, and a directed path Q2 from x to v. Stapling together P1 Q1 and Q2 P2
yields directed paths from u to x and from x to u.

11.111 The SCCs are:

• {A, B, F}
• {C, E, D}
• { G}

11.112 The graph is strongly connected. The following list of edges describes a way to go from node 1 back to node 1
visiting every node along the way: 1 → 5 → 1 → 9 → 1 → 2 → 10 → 4 → 7 → 8 → 12 → 8 → 11 → 6 → 7 →
0 → 3 → 1. This list means that there’s a directed path from every node to every other node, and the graph has only one
SCC (containing all nodes).

11.113 Node D: it’s at distance 4, and no other node is that far away from A.
11.3 Paths, Connectivity, and Distances 235

11.114 Nodes G and E: both are at distance 3, and no other nodes are that far away from A.

11.115 Nodes 2, 5, 9, 10, 11, and 12 are all distance 3 away from 0.

11.116 Nodes 5 and 9 are both distance 8 away from 12; no other nodes are that far away.

11.117 Here’s the modification to BFS (as described in Figure 11.46b) to compute distances. Changes to the pseudocode
are underlined.

Breadth-First Search (BFS):

Input: a graph G = ⟨V, E⟩ and a source node s ∈ V
Output: the set of nodes reachable from s in G, and their distances

1 Frontier := ⟨⟨s, 0⟩⟩ // Frontier will be a list of nodes to process, in order.

2 Known := ∅ // Known will be the list of already-processed nodes.
3 while Frontier is nonempty:
4 ⟨u, d⟩ := the first node and distance in Frontier
5 remove ⟨u, d⟩ from Frontier
6 for every neighbor v of u:
7 if v is in neither Frontier nor Known then
8 add ⟨v, d + 1⟩ to the end of Frontier
9 add u to Known
10 return Known

11.118 The list is always of the form

(block of nodes at distance d) + (block of nodes at distance d + 1).

We can prove this fact inductively: to start, there’s only one distance (zero) stored in the list, and, while we’re processing
a node in the distance d block, we can only add nodes of distance d + 1 to the end of Frontier. (We could write out a fully
rigorous proof by induction; it’s good practice to do so, but we’ll omit it here.)

11.119 Consider the following graph:

H
F

B D
A
C E

Starting DFS at A will discover, say, A, then B, then F. (If the order in which nodes are visited is slightly different, we’d
have to tweak the example slightly—to include nodes at H’s “level” of the graph that are neighbors of each of D, E, and
G—but the idea is precisely the same.) At this point, all of C, D, and H will be in Frontier, and their distances are 1, 2, and
3, respectively.
P
11.120 The ⟨i, j⟩th entry of the product is k Mi,k Mk,j , which denotes the number of entries k such that Mi,k = Mk,j = 1:
that is, the number of nodes k such that i → k and k → j are both edges in the graph. Thus (MM)i,j denotes the number
of paths of length 2 from i to j.

11.121 A solution in Python is shown in Figure S.11.2.

236 Graphs and Trees

1 def distance(wordA, wordB):

2 if len(wordA) != len(wordB):
3 return None
4 else:
5 return sum([wordA[i] != wordB[i] for i in range(len(wordA))])
6
7 def word_ladder(start, finish):
8 valid = [w for w in words if len(start) == len(w)]
9 def neighbors(w):
10 return [x for x in valid if distance(w,x) == 1]
11
12 previous_layer = [start]
13 known = {start: None} # known[word] = the word in the previous layer one different from word
14 while finish not in known:
15 next_layer = []
16 for parent in previous_layer:
17 for child in neighbors(parent):
18 if child not in known:
19 known[child] = parent
20 next_layer.append(child)
21 previous_layer = next_layer
22
23 # Reconstruct the path by tracing backwards from finish back to start.
24 path = [finish]
25 while path[-1] != start:
26 path.append(known[path[-1]])
27 return path
28
29 with open("/usr/share/dict/words") as f:
30 words = [x.strip() for x in f]
31 print(word_ladder("smile", "frown"))

Figure S.11.2 Finding a word ladder using BFS.

11.4 Trees
11.122 ABCA, BCEB, DEGFD, ABECA

11.123 ABGECA, BDFHGB, ABDFHGECA

11.124 CEC, FHF, ABGECA, ABDFHGECA

11.125 A simple cycle can visit each node once and only once, and thus traverses at most n edges. And a simple cycle
traversing n edges is achievable, as in

11.126 A cycle must enter and leave each node it visits the same number of times. Thus the number of edges incident to
any node u that are traversed by the cycle must be an even number. This solution is closely related to Eulerian paths,
which traverse each edge of a graph once and only once. (Compare them to Hamiltonian paths, which go through each
node once and only once.) Consider Kn for an odd number of nodes n. Then every node has even degree, and it turns out
that we can actually traverse all of the edges in a cycle. Here’s the algorithm:
11.4 Trees 237

• start walking from node u by repeatedly choosing an arbitrary neighbor of the current node, deleting edges as they’re
traversed, and stopping when the path is at a node with no remaining edges leaving it. Call C the resulting cycle.

(Note that the only node where we can have ended is u: every node has even degree, and entering a node leaves it with
odd remaining degree.) We may not have traversed every edge in the graph, but every node still has even degree. Also,
we can’t be stuck anywhere unless we’ve visited every node, as the original graph was complete. So, after computing C:

• choose a node v with nonzero remaining degree. Walk from it as before until you return to v. “Splice” in the edges you
just found into C, just after the first time C passes through v.
• repeat until there are no edges left.

This process will traverse every edge—that is, all n2 edges.

In an n-node graph where n is even, it’s not possible to use all n2 edges, because each node’s degree is odd, so each
node must have one untraversed edge. But we can proceed exactly as above, counting a node as unvisitable as soon as it
has remaining degree 1. This cycle will leave exactly 2n unvisited edges, for a total cycle length of n2 − n2 .

11.127 Choose any subset S of the nodes other than u. If |S| ≥ 2, then there is a cycle involves u and the nodes of S. In fact,
there are |S|! different orderings of those other nodes, defining |S|! different cycles—except that we’ve double counted in a
slightly subtle way. Specifically, we’ve counted each cycle ⟨u, s1 , s2 , . . . , s|S| , u⟩ and ⟨u, s|S| , s|S|−1 , . . . , s1 , u⟩ separately,
going “clockwise” and “counterclockwise,” while they’re actually the same cycle.
How many such subsets are there? We have n − 1 other nodes, and we can choose any number k ≥ 2 of them. There
are n−1k
subsets of k other nodes, and thus the number of cycles in which u participates is
!
X
n−1
n−1 k! X
n−1
(n − 1)! k!
· = · definition of choose
k=2
k 2 k=2
k! · ( n − k − 1 )! 2

(n − 1)! X
n−1
1
=
2 k=2
(n − k − 1)!

(n − 1)! X 1
n−3
= . reindexing: j = n − k − 1
2 k=0
j!
P
As n gets large, it turns out that ni=0 i!1 approaches e = 2.7182818 · · · . (If you happen to have seen Taylor series in
calculus and you also happen to remember Taylor series from calculus, you can use them to show that this summation
approaches e.) So for large n the number of cycles involving a particular node is approximately

e(n − 1)!
.
2

11.128 The only difference in the directed graph is that there’s no double counting, because ⟨u, s1 , s2 , . . . , s|S| , u⟩ and
⟨u, s|S| , s|S|−1 , . . . , s1 , u⟩ are different cycles. Thus the answer is twice the previous exercise’s solution:

X
n−3
1
(n − 1)! · ,
k=0
j !

which tends to e · (n − 1)! as n gets large.

11.129 The problem here is with the neighbors of s themselves: they have edges that go back to s, which falsely leads to
the algorithm claiming that a cycle exists. For example, in the two-node graph with an edge from A to B, start running the
algorithm with s = A. After the A iteration, Known = {A} and Frontier = ⟨B⟩. And s = A is a neighbor of the first node
in Frontier and so we claim there’s a cycle, but there isn’t one.
Here what’s perhaps the simplest fix: if s has k neighbors u1 , u2 , . . . , uk , let’s run BFS k times looking for s. In the ith
run, we start the search from ui but exclude the edge {s, ui } from the graph. Note that if we find any other uj for j ̸= i in
the BFS starting at ui , then there’s a cycle. Thus these BFS runs are on disjoint subsets of the nodes, and therefore their
total running time is unchanged from the original BFS, and the algorithm still runs in O(|V| + |E|) time.

11.130 We can use the same algorithm as in the proof of Lemma 11.33:
238 Graphs and Trees

1 let u0 be an arbitrary node in the graph, and let i := 0

2 while the current node ui has no unvisited out-neighbors:
3 let ui+1 be an out-neighbor of ui that has not previously been visited.
4 increment i

(The only change from Lemma 11.33 is underlined.) This process must terminate in at most |V| iterations, because we
must visit a new node in each step. If the last node that we visited does not have out-degree zero, then we have found a
cycle (just as in Lemma 11.33).

11.131 Choose any edge ⟨u, v⟩, and run the algorithm from the proof of Lemma 11.33 from u: that is, starting at u,
repeatedly move to an unvisited neighbor of the current node; stop when the current node has no unvisited neighbors.
You’ll find a degree-1 node x when you stop, because the graph is acyclic (and you’ll have gone at least one step—to v
if nowhere else). Now restart the same process from x. For precisely the same reasons, you’ll terminate at a node y with
degree 1, and y cannot equal x for the same reason that x ̸= u.

11.132 No simple path can contain a repeated node, including s—so a path from s to s isn’t a valid simple path unless it
has length 0. (Under this definition, there’s no such thing as a cycle of nonzero length!) Even aside from that problem,
this definition also forbids other nodes being repeated in the middle of the cycle, which is allowed in non-simple cycles.

11.133 This definition says that A,B,A is a cycle in the 2-node graph with V = {A, B} and E = {{A, B}}. (Thus, under
this definition, no graph with an edge would be acyclic!)

11.134 This definition says that ⟨s⟩ (a 0-length path from s to itself) is a cycle. (Thus, under this definition, no graph at all
would be acyclic!)

11.135 This definition says that A,B,C,B,A is a cycle in the graph A B C . But this graph is acyclic!

11.136 This definition says that A,B,C,D,B,A is a cycle in A B C . But A is not involved in a cycle!

11.137 If there’s a cycle c = ⟨s, u1 , u2 , . . . , uk , s⟩, then repeat the following until c is simple: find an i and j > i such that
ui = uj , and redefine c as c′ = ⟨s, u1 , . . . , ui−1 , ui , uj+1 , . . . , s⟩. This c′ is still a cycle, and it’s shorter by at least one
edge than c was. After at most |E| repetitions of this shrinking process, the cycle c is simple. (A loose upper bound on the
number of shrinking rounds is the number of edges in the graph, since the original c can traverse each edge at most once
and we shrink by at least one edge in each round.)
Thus we’ve argued that if there’s a cycle in the graph, then there’s a simple cycle in the graph too. Conversely, if there’s
a simple cycle, then there’s certainly a cycle—simple cycles are cycles. Thus G has a cycle if and only if G has a simple
cycle.

11.138 The two graphs are the 2-node, 1-edge graph, and the 1-node, 0-edge graph:

A B A

11.139 By Lemma 11.33, there must be a node in a tree with degree 0 or degree 1. So the graph is either 0-regular or
1-regular.

• For a 0-regular graph, by Theorem 11.8, we have that the number of edges is |E| = 21 · |V| · 0 = 0. By Theorem 11.35,
we need |E| = |V| − 1. Thus |V| = 1.
• For a 1-regular graph, by Theorem 11.8, we have that the number of edges is |E| = 21 · |V| · 1 = |V|/2. By
Theorem 11.35, we need |E| = |V| − 1. Thus |V|/2 = |V| − 1, and so |V| = 2.

11.140 In Kn , every unordered quadruple of nodes forms multiple squares: for the triple ABCD, the squares are ABCDA,
ABDCA, and ACBDA. (We might as well start from A; the other orderings of BCD are the same as the above three cycles but
going around in the opposite direction.) There are n4 quadruples of nodes, so there are 3 · n4 squares.
11.4 Trees 239

11.141 In Kn , every unordered trio of nodes forms a distinct triangle. There are n
3
trios of nodes, so there are n
3
triangles.

11.142 Let S denote the non-v neighbors of u and let T denote the non-u neighbors of v. If there exists a node z ∈ S ∩ T,
then there’s a triangle of nodes {u, v, z}; thus S ∩ T = ∅. Thus

degree(u) + degree(v) = |S| + 1 + |T| + 1 definition of S and T

= |S ∪ T| + 2 S and T are disjoint by the above, so |S ∪ T| = |S| + |T|
≤n−2+2 both S ⊆ V − {u, v} and T ⊆ V − {u, v}, so |S ∪ T| ≤ n − 2
= n.

11.143 Let P(n) denote the claim that any n-node triangle-free graph has at most n2 /4 edges. We’ll prove that P(n) holds
for all n ≥ 1 by strong induction on n.

base case #1 (n = 1): There are no edges in a 1-node graph, and indeed 0 ≤ 12 /4 = 0.25.
base case #2 (n = 2): There is at most one edge in a 2-node graph, and indeed 1 ≤ 22 /4 = 1.
inductive case (n ≥ 3): We assume the inductive hypotheses P(n′ ) for 1 ≤ n′ ≤ n − 1, and we must prove P(n). Let
G = ⟨V, E⟩ be an arbitrary n-node triangle-free graph. If there are no edges in G, we’re done. If there is an edge
⟨u, v⟩ ∈ E, then let G′ be the induced subgraph on V − {u, v}. Note that G′ is triangle-free and contains n − 2 nodes.
Then

|E| = number of edges in G′ + number of edges involving u and v

(n − 2)2
≤ + number of edges involving u and v inductive hypothesis P(n − 2)
4
(n − 2)2
= + degree(u) + degree(v) − 1 degree(u) + degree(v) double-counts the edge between u and v
4
(n − 2)2
≤ +n−1 previous exercise
4
n2 − 4n + 4 + 4n − 4
=
4
n2
= .
4

11.144 The complete bipartite graph Kn/2,n/2 contains n nodes, is triangle-free because it’s bipartite, and contains
(n/2) · (n/2) = n2 /4 edges.

11.145 No, it’s not connected. For example, there’s no path from A to C.

11.146 Yes; it looks like this: C .

A B F

D
240 Graphs and Trees

11.147 Yes; it looks like this: A .

11.148 No, it’s not acyclic. For example, ACFA is a cycle.

11.149 False; the following graph is a counterexample.

11.150 False; the following graph is a counterexample.

11.151 False; the following graph is a counterexample.

11.152 It breaks on the 1-node, 0-edge graph: the root of that tree is a leaf (though it has no parent).

11.153 C, E, H, I

11.154 A, B, D, F, G

11.155 The parent is A, the children are E and F, and the only sibling is B.

11.156 E, F, G, H, I

11.157 A, D

11.158 4

11.159 False: for example, rerooting A at B yields the tree B . The former has two leaves; the latter has only one.
B C A
C

11.160 Let P(h) denote the claim that a complete binary tree of height h contains precisely 2h+1 − 1 nodes. We’ll prove
P(h) for all h ≥ 0 by induction on h.

base case (h = 0): A complete binary tree of height 0 is a single node, so it indeed has 21 − 1 = 1 nodes.
11.4 Trees 241

inductive case (h ≥ 1): We assume the inductive hypothesis P(h − 1) and we must prove P(h). Note that a complete
binary tree of height h is a root with two subtrees, both of which are complete binary trees of height h − 1. Thus

number of nodes in a complete binary tree of height h

= 1 + 2 · (number of nodes in a complete binary tree of height h − 1) above discussion

= 1 + 2(2 − 1)
h
inductive hypothesis P(h − 1)

=2 h+1
− 1.

Thus P(h) holds.

11.161 The largest number occurs in a complete binary tree of height h, which has 2h leaves.
The smallest number occurs in a nearly complete binary tree in which there is only one node at distance h from the
root: this tree has the 2h−1 leaves of a complete binary tree of height h − 1, except the leftmost node at level h − 1 is an
internal node, not a leaf, because it has one child (the only level-h node, which is itself a leaf). Thus the number of leaves
in this tree is 2h−1 − 1 + 1 = 2h−1 .

11.162 If the height of a nearly complete binary tree is h, then there are between 1 and 2h nodes in the bottom row of the
tree T. There is a leaf in the left subtree of T, but the crucial question is: is there also a leaf in the right subtree of T? If so,
the diameter is 2h because the only path from the leftmost leaf to the rightmost leaf goes through the root, and traverses
h edges to get to the root and h edges from the root. If not, the rightmost leaf is only h − 1 away from the root, and the
diameter is h + (h − 1) = 2h − 1.

11.163 Each leaf is 2h away from any leaf on the “other side” of the tree (that is, in the root’s opposite subtree), as per
Exercise 11.162. Thus, as each layer of the rerooted tree is at an increasing distance from the root, the new height is 2h.

11.164 The choice of root has no effect on the diameter. It’s still the distance between the two nodes that are farthest
apart—namely 2h.

11.165 No node of degree 2 or 3 can be a leaf, and every non-root node of degree 1 must be a leaf. Thus the only change
is that the new root used to be a leaf and isn’t anymore, so there are 2h − 1 leaves.

11.166 Arrange the nodes in a line, so node 1 is the root, and each node i has one child, node i + 1. Every node has one
child (except for the 1000th node), and the height of the tree is 999.

11.167 A nearly complete binary tree of height 9 can fit 1000 nodes, so it’s possible to have height 9.
Can we have a binary tree of height 8 that contains 1000 nodes? No: in a complete binary tree of height 8, there are
P P
precisely 8i=0 2i nodes. And 8i=0 2i = 29 − 1 = 511, which is less than 1000.

11.168 There can’t be more that 500 leaves in a 1000-node graph, by Example 5.19/Example 5.20. And the nearly complete
binary tree from the previous exercise has 500 leaves (see the structure below), which therefore achieves the maximum
possible:

9 full layers, with

1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 = 511 nodes

9th full layer, containing 256 total nodes.

(From left to right:
244 internal nodes with 2 children;
489 nodes in the last layer 1 internal node with 1 child; 11 leaves)
(all leaves)
242 Graphs and Trees

11.169 The “straight line” graph—every node except the 1000th has precisely one child—has only one leaf. And no acyclic
1000-node graph can have fewer than one leaf.

11.170 Consider a tree of height h in which no node that’s a left child of its parent has any children, and every node that’s
a right child of its parent has two children: .

..
.
Such a tree of height h has 2 nodes at depth d, for all d ≥ 1. Thus there are 2h + 1 nodes in a tree of height h, so the
height of this n-node tree is (n − 1)/2. That’s the largest possible height for a binary tree that contains no nodes with one
child.

11.171 ABCDEGHIF

11.172 CBAHGIEDF

11.173 CBHIGEFDA

11.174 1

4 3

2 5

11.175 1

11.176 The key will be define a tree in which there’s no node with a nonempty left subtree. In this case, the following two
orderings are equivalent:
• traverse left; visit root; traverse right.
• visit root; traverse left; traverse right.
(Both are equivalent to “visit root; traverse right.”) Here is an example of such a tree:

1
2
3
4

Thus pre-order and in-order traversals of this tree are both 1, 2, 3, . . . , n.

11.177 If there’s no node with a nonempty right subtree, then these two orderings are equivalent:
• traverse left; visit root; [traverse right].
• traverse left; [traverse right]; visit root.
11.4 Trees 243

(Both are equivalent to “traverse left; visit root.”) Here’s such a tree:

1
2
3
4

Thus post-order and in-order traversals of this tree are both n, n − 1, . . . , 2, 1.

11.178 In both of the following trees, the pre-order traversal is CAB and the post-order traversal is BAC:

C C

A A

B B

11.179 Here is such an algorithm:

reconstruct(i1...n , p1...n ):
Input: i1...n is the in-order traversal of some tree T; p1...n is the post-order traversal of some tree T
Output: the tree T.
1 if n = 0 then
2 return an empty tree.
3 else
4 root := pn
5 j := the index such that ij = root.
6 L := reconstruct(i1,2,...,j−1 , p1,2,...,j−1 )
7 R := reconstruct(ij+1,...,n , pj,...,n−1 )
8 return the tree with root root, left subtree L, and right subtree R.

11.180 We must include all but two edges from G in the spanning tree, and a 5-node tree contains 4 edges, by Theo-
rem 11.35. We must include the edge to the lone node of degree 1 in the graph. There are five other edges, of which we
keep three, and there are 53 = 10 ways to do that. However, two of those ways leave the graph disconnected: we cannot
leave out both of the edges incident to either of the degree-2 vertices.

11.181 We must remove two edges:

• BG and any one of AB/BD/DF/FH/HG/GE/EC/CA; or

• any one of AB/GE/EC/CA and any one of BD/DF/FH/HG.

That’s a total of 1 · 8 = 8 plus 4 · 4 = 16 spanning trees, for a grand total of 24.

11.182 We must remove three edges:

• AC and BF and any one of AE/EC/CG/GF/FD/DB/BA; or

• AC and one of BD/DF and any one of AE/EC/CG/GF/FB/BA; or
• one of AC/AE and BF and any one of AC/CG/GF/FD/DB/BA; or
• one of AC/AE and one of BD/DF and any one of AC/CG/GF/FB/BA.

That’s a total of 1 · 1 · 7 = 7 plus 1 · 2 · 6 = 12 plus 2 · 1 · 6 = 12 plus 2 · 2 · 5 = 20 spanning trees, for a grand total of
7 + 12 + 12 + 20 = 51.
244 Graphs and Trees

11.5 Weighted Graphs

11.183 ACDE, length 2 + 1 + 3 = 6.

11.184 ABE, length 1 + 3 = 4.

11.185 ACE or ABE, both length 1 + 3 = 4.

11.186 ABFH, length 1 + 2 + 4 = 7.

11.187 ACDFH, length 3 + 1 + 2 + 10 = 16.

11.188 We’ll define a graph with 1 + 3k nodes and k “diamonds”; in each diamond, both the upper path and the lower
path have the same weight, so there are 2 shortest choices per diamond, and thus 2k total shortest paths:

1 n+2 3 n+4 2k − 1 n + 2k
s ··· t
2 n+1 4 n+3 2k n + 2k − 1

We chose the weights of the edges carefully so that all the weights are different. Because there are n = 3k + 1 nodes in
this k-diamond graph, we have that (n − 1)/3 = k. Thus there are
k
2 =2
(n−1)/3
= (21/3 )n−1 ≈ 1.2599n−1
shortest paths through the graph.
If n is not one more than a multiple of 3, then we can simply add one or two additional nodes in a chain between s and
the first diamond. There are thus at least 1.2599n−3 = 0.5 · 1.2599n shortest s-to-t paths in the graph.

11.189 d(A, D) = 8

11.190 d(A, D) = 8, then d(A, E) = 9, and then d(A, H) = 17.

11.191 Running Dijkstra’s algorithm from H finds the following nodes and distances, in the order listed:
node distance
H 0
F 10
D 12
B 16
A 17
C 17
E 17
(The last three nodes could have been listed in any order.)

11.192 When we’re proving that the distance from s to the chosen node u∗ is at least the number du∗ , we consider an
arbitrary path P from s to u∗ , and we consider the portion of P up the first exit from the set S and the portion after. The
argument claims that the portion of P after the first exit has length at least zero—but that can fail if the graph has negative
edge weights. For example, consider this graph:

A
98
S
99
B
11.5 Weighted Graphs 245

We say d(S, S) = 0 in the first iteration, and then we say d(S, A) = 98 in the second iteration. But if there’s an edge of
length −10 from B to A, then in fact there’s a path of length 90 from A, via B. But Dijkstra’s algorithm would never update
its recorded value for A.

11.193 We’ll take the second option, by describing a graph G′ so that Dijkstra’s algorithm run on G′ finds an SMP in G.
G′ has the same nodes and edges as G, but with different weights: for an edge e with cost w in G, the edge e in G has
cost w′ = log2 w. Then on any particular path P, we have
!
X ′ X Y
we = log2 we = log2 we . Theorem 2.10 (logb xy = logb x + logb y)
e∈P e∈P e∈P
Q Q
Finding the path in G′ with the smallest log2 ( e∈P we ) is equivalent to finding the path in G′ with the smallest e∈P we
P ′
because x < y ⇔ log x < log y. Thus finding the path with the smallest e∈P we solves the SMP problem. And therefore
we can just run Dijkstra’s algorithm on the graph with the modified weights w′ .

11.194 We need all the logarithmic costs to be nonnegative, which means that we need all pre-logged costs to be greater
than or equal to 1. (Recall that log x < 0 if and only if x < 1.) Thus we require that every we ≥ 1.

11.195 Include the edges AB, AC, CD, DE.

11.196 Include the edges AB, BD, BE, CE.

11.197 Include any two of the edges AB, BC, AC; include one of the edges BE, CE, and include the edge DE. Thus the
following are all minimum spanning trees:
• AB, BC, BE, DE
• AB, BC, CE, DE
• AB, AC, BE, DE
• AB, AC, CE, DE
• AC, BC, BE, DE
• AC, BC, CE, DE

11.198 Just include the six cheapest edges and we get a tree: AB, AE, BF, CD, CE, FH.

11.199 There is no minimum spanning tree, because the graph isn’t connected!

11.200 If the 8 cheapest edges form a simple path, then there is an MST that uses only these edges, and thus has cost

X
8
8 · (8 + 1)
i= = 36.
i=1
2

11.201 If the 8 most expensive edges are all incident to the same node, then we’ll have to choose one of them, so there
can be an MST with an edge of weight as high as 29 (leaving out the edges 30, 31, . . . , 36).

11.202 It’s possible to arrange the edge costs so that the MST has cost 1 + 2 + 4 + 7 + 11 + 16 + 22 + 29 = 92. Here’s
how: call the nodes {0, 1, . . . , 8}. Arrange the edges so that, for each k = 0, 1, . . . , 8, the 2k cheapest edges are all
among the nodes {0, 1, . . . , k}. (This arrangement causes the largest possible number of cheapest edges to be “wasted”
by being part of even cheaper cycles.)
Another way to describe this arrangement: connect the 8 most expensive edges to node 9. As argued the previous
exercise, we must include the cheapest of these edges, namely the one of weight 29. Now there are 8 remaining nodes.
Connect the 7 most expensive remaining edges to node 8. Again we must include the cheapest of these edges, namely the
one of weight 22. And so forth. This yields an MST of cost 29 + 22 + 16 + 11 + 7 + 4 + 2 + 1 = 92.
Pn−1
11.203 The cheapest possible MST is the n − 1 cheapest edges, which has cost i=1 i = n(n − 1)/2.
We’ll phrase the solution for the most expensive possible MST with a recurrence relation. Let C(n) denote the cost
of the most expensive MST on an Kn with edge weights {1, 2, . . . , n2 }. The most expensive MST has a single node n
246 Graphs and Trees

that has the n − 1 most expensive edges as its neighbors, and the other n − 1 nodes have the heaviest possible minimum
spanning tree on Kn−1 . That is,
C(1) = 0
( n)
C(n) = n
2
− (n − 2) + C(n − 1). 2
− (n − 2) is the cost of the (n − 1)st-most expensive edge

Note that, for n ≥ 2, we have

n
2
− (n − 2) = n(n−1)
2
− (n − 1) + 1 = (n−2)(n−1)
2
+1= n−1
2
+ 1. (∗ )
Therefore, iterating the recurrence, we see that
X
n
k
C( n ) = 2
− ( k − 2) the recurrence, as described above, iterated out
k=2
Xn
k−1

= 2
+1 by (∗)
k=2

X
n−1

=n−1+ k
2
pulling out the constant; reindexing
k=1
!
n−1+1
=n−1+ by Exercise 9.176
2+1
!
n
=n−1+
3
n(n − 1)(n − 2)
=n−1+ .
6
9·8·7 504
Indeed, for n = 9, we have 8 + 6
=8+ 6
= 8 + 84 = 92, exactly as in Exercise 11.202.

11.204 The graph is symmetric, so let’s just argue that the probability that we’re at node A after one step is still 31 :
Pr [u1 = A] = Pr [u1 = A|u0 = A] · Pr [u0 = A] + Pr [u1 = A|u0 = B] · Pr [u0 = B] + Pr [u1 = A|u0 = C] · Pr [u0 = C]
= 0 · Pr [u0 = A] + 1
2
· Pr [u0 = B] + 1
2
· Pr [u0 = C]
we choose the next step randomly from the current node’s neighbors
=0· 1
3
+ 1
2
· 1
3
+ 1
2
· 1
3
1
the first node u0 was chosen according to p—that is, p(A) = p(B) = p(C) = 3
1 1
=0 + 6
+ 6
1
= 3
.
Thus Pr [u1 = A] = 1/3 = p(A). Similarly, Pr [u1 = B] = 1/3 = p(B) and Pr [u1 = C] = 1/3 = p(C). Thus p is a
stationary distribution for this graph.

11.205 Any distribution of the following form, for any q ∈ [0, 1], is stationary:
q q q 1−q 1−q 1−q
p(D) = 3
p(E) = 3
p(F) = 3
p(G) = 3
p(H) = 3
p(I) = 3
.
The same calculation as in the previous exercise shows that the probability is stationary within each triangle. But if, for
example, q = 0 then we’re uniform on the right-hand triangle, and if q = 1 then we’re uniform on the right-hand triangle.

11.206 A solution in Python is shown in Figure S.11.3, including testing on the graph from Figure 11.100a. I saw the
statistics {'a': 671, 'c': 675, 'b': 654}. (The theoretical prediction says that we’d expect 2000/3 = 666.6
each.)

11.207 In even-numbered steps, you’ll always be on the “side” of the bipartite graph where you started; in odd-numbered
steps you’ll be on the opposite “side.” So in this graph p1000 (J) = p1000 (L) ≈ 12 each, and p1000 (K) = p1000 (M) = 0 if we
start at J. But if we start at K, the probabilities are reversed. Thus we haven’t converged to a unique stationary distribution.
11.5 Weighted Graphs 247

1 import random
2
3 def random_walk(V, E, v_zero, steps):
4 '''
5 Performs a random walk on the (undirected) graph G = (V, E) starting at node v_zero,
6 and continuing for steps steps, and returns the node at which the walk ends.
7 '''
8 current = v_zero
9 for i in range(steps):
10 neighbors = [x for x in V if set([x, current]) in E]
11 current = random.choice(neighbors)
12 return current
13
14 # The graph from Figure 11.00a.
15 V = ['a', 'b', 'c']
16 E = [set(['a','b']), set(['a','c']), set(['b','c'])]
17
18 # Performs 2000 instances of a 2000-step random walk, starting from node a.
19 v_zero = 'a'
20 counts = {'a' : 0, 'b' : 0, 'c' : 0}
21 for i in range(2000):
22 counts[random_walk(V, E, v_zero, 2000)] += 1
23 print(counts)

Figure S.11.3 Performing a 2000-step random walk 2000 times.

11.208 To show that p is a stationary distribution, we need to prove the following claim: if the probability distribution of
where we are in step k is p, then the probability distribution of where we are in step k + 1 is p too.
Consider a particular node u. Let v1 , v2 , . . . , vℓ denote the neighbors of u, where ℓ = degree(u). The key question is
this: what is the probability that we will be at node u in step k + 1? Note that to be at node u at step k + 1, we had to be
at one of u’s neighbors at step k and we had to follow the edge from that neighbor to u in step k + 1. And the probability
that we’re at node vi in step k is exactly p(vi ), by the assumption, and the probability that we moved from vi to u in step
k + 1 is 1/degree(vi ) because vi is a neighbor of u, and we moved from vi with equal probability to each of its degree(vi )
neighbors. Thus:

X
ℓ
Pr [uk+1 = u] = Pr [uk = vi and uk+1 = u] law of total probability; we can only reach u from a neighbor
i=1

X
ℓ
= Pr [uk = vi ] · Pr [uk+1 = u|uk = vi ]
i=1

X
ℓ
1
= p(vi ) · assumption/definition of the random walk (see above discussion)
i=1
degree(vi )
Xℓ
degree(vi ) 1
= · definition of p
i=1
2 · m degree(vi )
X
ℓ
1
=
i=1
2·m
ℓ degree(u)
= = = p(u). definition of ℓ = degree(u) and p
2·m 2·m
Thus the probability that we’re at node u in step k + 1 is precisely degree(u)/2m, which is just p(u) by definition. This
proves the theorem: we assumed that p was the distribution at step k and proved that p was therefore the distribution at
step k + 1, and therefore showed that p is a stationary distribution.
248 Graphs and Trees

Mat 1830 Notes Booklet
No ratings yet
Mat 1830 Notes Booklet
71 pages
Computational Discrete Mathematics With Python PDF
No ratings yet
Computational Discrete Mathematics With Python PDF
59 pages
Student Solution Chap 09
No ratings yet
Student Solution Chap 09
10 pages
Mathematical Symbols List
No ratings yet
Mathematical Symbols List
9 pages
Floor Function - Titu Andreescu, Dorin Andrica - MR 2006 PDF
0% (1)
Floor Function - Titu Andreescu, Dorin Andrica - MR 2006 PDF
5 pages
Excel Functions
0% (1)
Excel Functions
209 pages
Thisyear PDF
No ratings yet
Thisyear PDF
112 pages
ch-02
No ratings yet
ch-02
80 pages
Unit Ii Asymmetric Ciphers
No ratings yet
Unit Ii Asymmetric Ciphers
19 pages
MATH2920 Notes
No ratings yet
MATH2920 Notes
33 pages
Lecture 1
No ratings yet
Lecture 1
57 pages
Comp106 Numbertheory
No ratings yet
Comp106 Numbertheory
99 pages
chp-4 3
No ratings yet
chp-4 3
21 pages
summon 3
No ratings yet
summon 3
54 pages
Number Theory CSE 103
No ratings yet
Number Theory CSE 103
37 pages
2-Module 1 - Finite Fields and Number Theory-05-01-2024
No ratings yet
2-Module 1 - Finite Fields and Number Theory-05-01-2024
82 pages
Mathematics of Cryptography: Part I: Modular Arithmetic, Congruence, and Matrices
No ratings yet
Mathematics of Cryptography: Part I: Modular Arithmetic, Congruence, and Matrices
82 pages
Cns Unit 3
No ratings yet
Cns Unit 3
74 pages
Numerics Lecture - With Code
No ratings yet
Numerics Lecture - With Code
39 pages
Mathematics Lectures For Master's Computer Science-Ch 2
No ratings yet
Mathematics Lectures For Master's Computer Science-Ch 2
31 pages
Lecture 4
No ratings yet
Lecture 4
67 pages
3-Module - 1 Fundamentals of Number Theory-04-01-2024
No ratings yet
3-Module - 1 Fundamentals of Number Theory-04-01-2024
35 pages
HW 3 Solns 55
No ratings yet
HW 3 Solns 55
9 pages
Fermats Theorem
No ratings yet
Fermats Theorem
17 pages
MAT1830NotesBooklet (3)
No ratings yet
MAT1830NotesBooklet (3)
70 pages
Various Arithmetic Functions and Their Applications
No ratings yet
Various Arithmetic Functions and Their Applications
402 pages
CCS U-Iii
No ratings yet
CCS U-Iii
32 pages
Number System
No ratings yet
Number System
18 pages
The Fundamentals: Algorithms The Integers
No ratings yet
The Fundamentals: Algorithms The Integers
55 pages
Lecture 3-shrt
No ratings yet
Lecture 3-shrt
36 pages
Cryptography
No ratings yet
Cryptography
201 pages
Number Theory
No ratings yet
Number Theory
27 pages
Mathematics PF
No ratings yet
Mathematics PF
384 pages
Main
No ratings yet
Main
39 pages
Lecture 3- Number-Theory
No ratings yet
Lecture 3- Number-Theory
28 pages
Num Theory
No ratings yet
Num Theory
51 pages
Mathematics - Basic Mathematics - Progression - Complete Module
100% (1)
Mathematics - Basic Mathematics - Progression - Complete Module
112 pages
Lecture 08 Num Theory 1
No ratings yet
Lecture 08 Num Theory 1
51 pages
Divisibility for Jr Oly
No ratings yet
Divisibility for Jr Oly
6 pages
P2S
No ratings yet
P2S
12 pages
Ca529 Cns-Module 3
No ratings yet
Ca529 Cns-Module 3
51 pages
Number Theory and Cryptography
No ratings yet
Number Theory and Cryptography
28 pages
Project Euler Solutions
No ratings yet
Project Euler Solutions
17 pages
Lecture 3 Number Theory (1)
No ratings yet
Lecture 3 Number Theory (1)
51 pages
(Ebook - PDF - Mathematics) - Abstract Algebra
100% (2)
(Ebook - PDF - Mathematics) - Abstract Algebra
113 pages
Theory of Numbers
No ratings yet
Theory of Numbers
71 pages
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Capsule Calculus
From Everand
Capsule Calculus
Ira Ritow
No ratings yet
Introduction to Calculus
From Everand
Introduction to Calculus
Joan Van Glabek
4.5/5 (8)
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Fast mental calculation tricks
From Everand
Fast mental calculation tricks
EasyMath
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
CMPE 246 Lecture 20-(Mar 20) (1)
No ratings yet
CMPE 246 Lecture 20-(Mar 20) (1)
55 pages
CMPE 246 Lecture 17-(Mar 11)
No ratings yet
CMPE 246 Lecture 17-(Mar 11)
43 pages
CMPE 246 Lecture 16-(Mar 6)
No ratings yet
CMPE 246 Lecture 16-(Mar 6)
50 pages
CMPE 246 Lecture 15-(Mar 3) (1)
No ratings yet
CMPE 246 Lecture 15-(Mar 3) (1)
45 pages
CMPE 246 Lecture 9-(Feb 4) (1)
No ratings yet
CMPE 246 Lecture 9-(Feb 4) (1)
52 pages
CMPE 246 Lecture 2-(Jan.9) (1)
No ratings yet
CMPE 246 Lecture 2-(Jan.9) (1)
91 pages
CMPE 246 Lecture 8-(Jan.30) (2)
No ratings yet
CMPE 246 Lecture 8-(Jan.30) (2)
71 pages
CMPE 246 Lecture 5-(Jan.21) (1)
No ratings yet
CMPE 246 Lecture 5-(Jan.21) (1)
46 pages
CMPE 246 Lecture 7-(Jan.28) (1)
No ratings yet
CMPE 246 Lecture 7-(Jan.28) (1)
46 pages
CMPE 246 Lecture 1_(Jan.7th)
No ratings yet
CMPE 246 Lecture 1_(Jan.7th)
49 pages
Connecting Discrete Mathematics and Computer Science
No ratings yet
Connecting Discrete Mathematics and Computer Science
694 pages
ESS UNIT 3 TEXT BOOK
No ratings yet
ESS UNIT 3 TEXT BOOK
50 pages
Class 12th_StudyMaterial 2023-24
No ratings yet
Class 12th_StudyMaterial 2023-24
162 pages
GMQ1M2 Week 1
No ratings yet
GMQ1M2 Week 1
12 pages
Lesson 1 Number and Sequences
No ratings yet
Lesson 1 Number and Sequences
38 pages
Integer Divisibility
No ratings yet
Integer Divisibility
42 pages
Spreadsheet Functions - Sierra Chart
No ratings yet
Spreadsheet Functions - Sierra Chart
11 pages
SQL Notes
No ratings yet
SQL Notes
61 pages
F90 Basics
No ratings yet
F90 Basics
65 pages
Mathematics in The Modern World
100% (2)
Mathematics in The Modern World
27 pages
Practice Test Paper Definite Integral-Iit Level
No ratings yet
Practice Test Paper Definite Integral-Iit Level
4 pages
Class 9 - Computer Chapter 4,5,7and 8 (Revision)
No ratings yet
Class 9 - Computer Chapter 4,5,7and 8 (Revision)
12 pages
Unit 1 Set Theory, Functions and Counting
No ratings yet
Unit 1 Set Theory, Functions and Counting
30 pages
25 Combi and NT Problems
No ratings yet
25 Combi and NT Problems
7 pages
MS Excel Formulas
0% (1)
MS Excel Formulas
36 pages
Chandan Mandal 11
100% (1)
Chandan Mandal 11
14 pages
2011 U of I Mock Putnam Contest Solutions
No ratings yet
2011 U of I Mock Putnam Contest Solutions
5 pages
Continuity
No ratings yet
Continuity
40 pages
PPS unit-III PDF
No ratings yet
PPS unit-III PDF
161 pages
Week 2
No ratings yet
Week 2
31 pages
3 Integer Functions
No ratings yet
3 Integer Functions
14 pages
BSc_R_Basics
No ratings yet
BSc_R_Basics
28 pages
CSIR NET December 2019 Mathematics Question With Answer Key - Pure Mathematical Academy
No ratings yet
CSIR NET December 2019 Mathematics Question With Answer Key - Pure Mathematical Academy
27 pages
MatLab Unit 3 Questions with Answers
No ratings yet
MatLab Unit 3 Questions with Answers
17 pages
Lab 2
No ratings yet
Lab 2
26 pages
5 Set Relation and Function DPP
No ratings yet
5 Set Relation and Function DPP
15 pages
General-Mathematics Q1 Module-1
100% (1)
General-Mathematics Q1 Module-1
19 pages
Math Ira 2012 Orals
No ratings yet
Math Ira 2012 Orals
8 pages