Lin Alg Book
Lin Alg Book
László Babai
in collaboration with Noah Halford
This text offers a guided tour of discovery of the foundations of Linear Algebra. The text is written
in IBL style (Inquiry-based learning): we introduce concepts and results, but most of the proofs
are left to the reader who will build up the techniques and the theory through series of exercises.
Further creative exercises are designed to enhance the experience of discovery. One of my favorites:
any non-zero polynomial can be multiplied by some other non-zero polynomial such that in the
product, only terms with prime number exponents can have non-zero coefficients.
Some of the surprising key results of the basic theory are highlighted as “miracles.” One of
the several equivalent formulations of the First Miracle of Linear Algebra is the impossibility of
boosting linear independence: among all linear combinations of m vectors, we shall never find m + 1
that are linearly independent. Another, equivalent formulation is that the dimension of Rn is n,
not more. The Second Miracle is that the row-rank and the column-rank of a matrix are equal, a
fact I cannot cease to marvel. What on earth does linear independence of columns have to do with
linear independence of rows—they don’t even live in the same universe, the matrix does not need
to be square. The Third Miracle is related to the second: if the columns of a real matrix form an
orthonormal basis, then so do the rows. What do the dot products of the rows have to do with the
dot products of the columns? While rushing through these basic facts, we tend to overlook their
magical quality.
∗ ∗ ∗
Linear algebra deals with objects called “vectors.” The basic operation is linear combination of
vectors, i. e., expressions of the form α1 v1 + · · · + αn vn where the vi are vectors and the αi are
“scalars.” The domain of scalars is a prespecified field. Examples of fields include Q, the set of
rational numbers, R, the set of real numbers, C, the field of complex numbers, and the finite fields
Fq . The reader who is not comfortable with the general concept of fields can assume, for most of
the material in this book, that the scalars are real numbers and in some cases, complex numbers.
Babai: Discover Linear Algebra. ii This chapter last updated January 15, 2023
c 2016 László Babai.
iii
R and C are the two most important fields for most applications of linear algebra, although finite
fields have also been of great significance for discrete mathematics and digital communications
engineering (error correcting codes). The general concept of fields, including their characteristic,
and specifically finite fields, as well as other elements of basic abstract algebra, are introduced in
Chapter 14, at the beginning of Part II. The material of Chapter 14 should suffice for the reader
to appreciate the material of the entire book in the context of an arbitrary field as the domain of
scalars.
In the book, F denotes any field. Each chapter is marked with a subset of the symbols F, R,
C. If a chapter or section is marked R, it means that the material of that unit is restricted to real
coefficients. The title of Chapter 1 is marked (F, R), meaning that some (in fact, most) sections in
this chapter talk about an arbitrary field, but some part of the material (in this case, Section 1.5)
applies to the case of real scalars only. Chapters marked (R, C) indicate that some of the material
applies to real scalars only, and other parts of the material only to complex scalars.
The book requires a degree of mathematical maturity—a good understanding of set notation,
familiarity with proofs.
But Part I, “Matrix Theory,” is relatively hands-on, it does not require abstract algebra. Chapter
6, “Determinants,” is a cornerstone of Part I, a theory that seems to be somewhat neglected in
undergraduate linear algebra courses.
In Part II we take a more abstract approach; we discuss linear algebra in the framework of the
general concept of vector spaces, introduced in Chapter 15.
∗ ∗ ∗
The first draft of this book was written up by Noah Halford, then an undergraduate, in 2016, based
on my lectures and detailed instructions. I owe a debt of gratitude to Noah for jumpstarting this
project. However, polishing the presentation remains an ongoing effort and will take some more
time; meanwhile I advise the reader to read the material critically. I will appreciate any warnings
of the, no doubt numerous, errors.
László Babai
University of Chicago
Contents
Notation x
I Matrix Theory 1
Introduction to Part I 2
1 Column Vectors 3
1.1 Column vector basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 The domain of scalars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Subspaces and span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Linear independence and the First Miracle of Linear Algebra . . . . . . . . . . . . 11
1.4 Dot product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Dot product over R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Matrices 20
2.1 Matrix basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Arithmetic of diagonal and triangular matrices . . . . . . . . . . . . . . . . . . . . 31
2.4 Permutation Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.5 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3 Matrix Rank 39
3.1 Column and row rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
iv
CONTENTS v
6 The Determinant 61
6.1 Motivation: solving systems of linear equations . . . . . . . . . . . . . . . . . . . . 61
6.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3 Defining the determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.4 Properties of determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.5 Expressing rank via determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.6 Dependence of the rank on the field of scalars . . . . . . . . . . . . . . . . . . . . . 69
6.7 Cofactor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.8 Determinantal formula for the inverse matrix . . . . . . . . . . . . . . . . . . . . . 73
6.9 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.10 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9 Orthogonal Matrices 91
9.1 Orthogonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
9.2 Orthogonal similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
14 Algebra 118
14.1 Basic concepts of arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
14.1.1 Arithmetic of sets of integers . . . . . . . . . . . . . . . . . . . . . . . . . . 118
CONTENTS vii
24 Hints 217
24.1 Column Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
24.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
24.3 Matrix Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
24.4 Qualitative Theory of Systems of Linear Equations . . . . . . . . . . . . . . . . . . 223
24.5 Affine and Convex Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
24.6 The Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
24.7 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
CONTENTS ix
25 Solutions 230
25.1 Column Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
25.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.3 Matrix Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.4 Theory of Systems of Linear Equations I: Qualitative Theory . . . . . . . . . . . . 231
25.5 Affine and Convex Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.6 The Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.7 Theory of Systems of Linear Equations II: Cramer’s Rule . . . . . . . . . . . . . . 231
25.8 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.9 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.10 The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.11 Bilinear and Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.12 Complex Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.13 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.14 Basic Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.15 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.16 Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.17 Minimal Polynomials of Matrices and Linear Transformations . . . . . . . . . . . . 233
25.18 Euclidean Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.19 Hermitian Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.20 Finite Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Notation
Symbol Meaning
[n] the set {1, . . . , n} where n is a non-negative integer
N the set {1, 2, 3, . . . } of natural numbers
N0 the set {0, 1, 2, . . . } of non-negative integers
Z the set {. . . , −2, −1, 0, 1, 2, . . . } of integers
Q the field of rational numbers
R the field of real numbers
R× the set of real numbers excluding zero
C the field of complex numbers
C× the set of complex numbers exclusing zero
F an arbitrary field
F× the field F excluding zero
Zm the set of residue classes modulo m
Z×m the set of those residue classes modulo m that are relatively prime to m
Fk×n the set of k × n matrices over the field F
Mn (F) the set of n × n matrices over the field F
diag(α1 , . . . , αn ) n × n diagonal matrix with the αi in the diagonal
im(ϕ) image of the linear map ϕ
⊕ direct sum
≤ subspace
∅ the empty set
⊥ “perp” – orthogonality, orthogonal complement
Babai: Discover Linear Algebra. x This chapter last updated August 27, 2016
c 2016 László Babai.
Part I
Matrix Theory
1
Introduction to Part I
TO BE WRITTEN.
Babai: Discover Linear Algebra. 2 This chapter last updated August 21, 2016
c 2016 László Babai.
Chapter 1
Definition 1.1.1 (Column vector). A column vector of height k is a list of k numbers arranged in a
column, written as
α1
α2
.. .
.
αk
The k numbers in the column are referred to as the entries of the column vector; we will normally
use lower case Greek letters such as α, β, and ζ to denote these numbers. We denote column vectors
by bold letters such as u, v, w, x, y, b, e, f , etc., so we may write
α1
α2
v = .. .
.
αk
Babai: Discover Linear Algebra. 3 This chapter last updated August 29, 2016
c 2016 László Babai.
4 CHAPTER 1. COLUMN VECTORS
∗ ∗ ∗
Definition 1.1.4 (The space Fk ). For a domain F of scalars, we define the space Fk of column vectors
of height k over F by
α 1
α2
k
F := .. αi ∈ F (1.1)
.
α
k
1.1. COLUMN VECTOR BASICS 5
We often write 0 instead of 0k when the height of the vector is clear from context.
Definition 1.1.6 (All-ones vector). The all-ones vector in Fk is the vector
1
1
1k := .. (1.3)
.
1
Exercise 1.1.9. Verify that vector addition is commutative, i. e., for v, w ∈ Fk , we have
Exercise 1.1.10. Verify that vector addition is associative, i. e., for u, v, w ∈ Fk , we have
u + (v + w) = (u + v) + w . (1.6)
6 CHAPTER 1. COLUMN VECTORS
Column vectors also carry with them the notion of “scaling” by an element of F.
Definition 1.1.11 (Multiplication of a column vector by a scalar). Let v ∈ Fk and let λ ∈ F. Then
the vector λv is the vector v after each entry has been scaled (multiplied) by a factor of λ, i. e.,
α1 λα1
α2 λα2
λ .. = .. . (1.7)
. .
αk λαk
2 6
Example 1.1.12. Let v = −1 . Then 3v = −3.
1 3
Definition 1.1.13 (Linear combination). Let v1 , . . . , vm ∈ Fn . Then a linear combination of the
m
X
vectors v1 , . . . , vm is a sum of the form αi vi where α1 , . . . , αm ∈ F.
i=1
R
will be generalized in Ex. 1.2.7.
Exercise 1.1.18. To what does a linear combination of the empty list of vectors evaluate? ( Con-
vention 1.1.3)
Let us now consider the system of linear equations
α11 x1 + α12 x2 + · · · + α1n xn = β1
α21 x1 + α22 x2 + · · · + α2n xn = β2
.. (1.8)
.
αk1 x1 + αk2 x2 + · · · + αkn xn = βk
8 CHAPTER 1. COLUMN VECTORS
Given the αij and the βi , we need to find x1 , . . . , xn that satisfy these equations. This is arguably
one of the most fundamental problems of applied mathematics. We can rephrase this problem in
terms of the vectors a1 , . . . , an , b, where
α1j
α2j
aj := ..
.
αkj
represents the right-hand side. With this notation, our system of linear equations takes the more
concise form
x1 a1 + x2 a2 + · · · + xn an = b . (1.9)
The problem of solving the system of equations (1.8) therefore is equivalent to expressing the vector
b as a linear combination of the ai .
(b) If u, v ∈ W , then u + v ∈ W
Exercise 1.2.4. Show that, if W is a nonempty subset of Fn , then (c) implies (a).
Exercise 1.2.5. Show
(a) {0} ≤ Fk ;
(b) Fk ≤ Fk .
We refer to these as the trivial subspaces of Fk .
Exercise 1.2.6.
α1
(a) The set 0 α1 = 2α3 is a subspace of R3 .
α3
α1
(b) The set α2 = α1 + 7 is not a subspace of R2 .
α2
Exercise 1.2.7. Let
α 1
α 2
Xk
k
Wk = .. ∈ F αi = 0
. i=1
α
k
(b) The intersection of any (finite or infinite) collection of subspaces of Fn is also a subspace of
Fn .
(a) span S ⊇ S;
♦
This theorem tells us that the span exists. The next theorem constructs all the elements of the
span.
Theorem 1.2.13. For S ⊆ Fn , span S is the set of all linear combinations of the finite subsets of
S. (Note that this is true even when S is empty. Why?) ♦
Proposition 1.2.14. Let S ⊆ Fn . Then S ≤ Fn if and only if S = span(S).
Let S ⊆ Fn .
Proposition 1.2.15. Prove that span(span(S)) = span(S). Prove this
(a) based on the definition;
A + B = {a + b | a ∈ A, b ∈ B} . (1.10)
RR
Fact 1.3.20. Every sublist of a linearly independent list of vectors is linearly independent.
The following lemma is central to the proof of the First Miracle of Linear Algebra ( Theorem
1.3.40) as well as to our characterization of bases as maximal linearly independent sets ( Prop.
1.3.37).
Lemma 1.3.21. Suppose (v1 , . . . , vk ) is a linearly independent list of vectors and the list (v1 , . . . , vk+1 )
is linearly dependent. Then vk+1 ∈ span(v1 , . . . , vk ). ♦
Proposition 1.3.22. The vectors v1 , . . . , vk are linearly dependent if and only if there is some j
such that
vj ∈ span(v1 , . . . , vj−1 , vj+1 , . . . , vk ) .
Exercise 1.3.24. Prove that a list of vectors with repeated elements (the same vector occurs more
than once) is linearly dependent. (This follows from combining which two previous exercises?)
R
Definition 1.3.25 (Parallel vectors). Let u, v ∈ Fn . We say that u and v are parallel if there exists
a scalar α such that u = αv or v = αu. Note that 0 is parallel to all vectors, and the relation of
being parallel is an equivalence relation ( Def. 14.2.12) on the set of nonzero vectors.
Exercise 1.3.26. Let u, v ∈ Fn . Show that the list (u, v) is linearly dependent if and only if u
and v are parallel.
Definition 1.3.28 (Rank). The rank of a set S ⊆ Fn , denoted rk S, is the size of the largest linearly
independent subset of S. The rank of a list is the rank of the set formed by its elements.
rk(S, T ) ≤ rk S + rk T (1.11)
where rk(S, T ) is the rank of the list obtained by concatenating the lists S and T .
Definition 1.3.34 (Standard basis of Fk ). The standard basis of Fk is the basis (e1 , . . . , ek ), where
ei is the column vector which has its i-th component equal to 1 and all other components equal to
0. The vectors e1 , . . . , ek are sometimes called the standard unit vectors.
For example, the standard basis of F3 is
1 0 0
0 , 1 , 0
0 0 1
proved shortly (
most k.
R
Exercise 1.3.38. Prove: every subspace W ≤ Fk has a basis. You may assume the fact, to be
Cor. 1.3.49), that every linearly independent list of vectors in Fk has size at
Next we state a central result to which the entire field of linear algebra arguably owes its
character.
1.3. LINEAR INDEPENDENCE AND THE FIRST MIRACLE OF LINEAR ALGEBRA 15
Theorem 1.3.40 (First Miracle of Linear Algebra). Let v1 , . . . , vk be linearly independent with
vi ∈ span(w1 , . . . , wm ) for all i. Then k ≤ m.
The proof of this theorem requires the following lemma.
Lemma 1.3.41 (Steinitz exchange lemma). Let (v1 , . . . , vk ) be a linearly independent list of vectors
such that vi ∈ span(w1 , . . . , wm ) for all i. Then there exists j (1 ≤ j ≤ m) such that the list
(wj , v2 , . . . , vk ) is linearly independent. ♦
Exercise 1.3.42. Use the Steinitz exchange lemma to prove the First Miracle of Linear Algebra.
Corollary 1.3.43. Let W ≤ Fk . Every basis of W has the same size (same number of vectors).
Exercise 1.3.44. Prove: Cor. 1.3.43 is equivalent to the First Miracle, i. e., infer the First Miracle
from Cor. 1.3.43.
Corollary 1.3.45. Every basis of Fk has size k.
The following result is essentially a restatement of the First Miracle of Linear Algebra.
Corollary 1.3.46. rk(v1 , . . . , vk ) = dim (span(v1 , . . . , vk )).
Exercise 1.3.47. Prove Cor. 1.3.46 is equivalent to the First Miracle, i. e., infer the First Miracle
from Cor. 1.3.46.
Corollary 1.3.48. dim Fk = k.
Corollary 1.3.49. Every linearly independent list of vectors in Fk has size at most k.
Corollary 1.3.50. Let W ≤ Fk and let L be a linearly independent list of vectors in W . Then L
can be extended to a basis of W .
Exercise 1.3.51. Let U1 , U2 ≤ Fn with U1 ∩ U2 = {0}. Let v1 , . . . , vk ∈ U1 and w1 , . . . , w` ∈ U2 .
If the lists (v1 , . . . , vk ) and (w1 , . . . , w` ) are linearly independent, then so is the concatenated list
(v1 , . . . , vk , w1 , . . . , w` ).
Proposition 1.3.52. Let U1 , U2 ≤ Fn with U1 ∩ U2 = {0}. Then
dim U1 + dim U2 ≤ n . (1.12)
Proposition 1.3.53 (Modular equation). Let U1 , U2 ≤ Fk . Then
dim(U1 + U2 ) + dim(U1 ∩ U2 ) = dim U1 + dim U2 . (1.13)
16 CHAPTER 1. COLUMN VECTORS
x·y =y·x .
Exercise 1.4.3. Show that the dot product is distributive, that is,
R
x · (y + z) = x · y + x · z .
Exercise 1.4.4. Show that the dot product is bilinear ( Def. 6.4.9), i. e., for x, y, z ∈ Fn and
α ∈ F, we have
(x + z) · y = x · y + z · y (1.15)
(αx) · y = α(x · y) (1.16)
x · (y + z) = x · y + x · z (1.17)
x · (αy) = α(x · y) (1.18)
Exercise 1.4.5. Show that the dot product preserves linear combinations, i. e., for x1 , . . . , xk , y ∈
Fn and α1 , . . . , αk ∈ F, we have
k
! k
X X
αi xi · y = αi (xi · y) (1.19)
i=1 i=1
(a) 1k · 1k for k ≥ 1.
α0
α1
(b) x · x where x = .. ∈ Fk
.
αk
Exercise 1.4.12.
Theorem 1.4.13. If v1 , . . . , vk are pairwise orthogonal and non-isotropic non-zero vectors, then
they are linearly independent. ♦
18 CHAPTER 1. COLUMN VECTORS
Exercise 1.6.3. Let A, B ⊆ {1, . . . , n}. Express the dot product vA · vB in terms of the sets A
and B.
Matrices
We may write M = (αi,j )k×n to indicate a matrix whose entry in position (i, j) (i-th row, k-th
column) is αi,j . For typographical convenience we usually omit the comma separating the row
index and the column index and simply write αij instead of αi,j ; we use the comma if its omission
would lead to ambiguity. So we write M = (αij )k×n , or simply M = (αij ) if the values k and
n are clear from context. We also write (M )ij to indicate the (i, j) entry of the matrix M , i. e.,
αij = (M )ij .
Babai: Discover Linear Algebra. 20 This chapter last updated August 29, 2016
c 2016 László Babai.
2.1. MATRIX BASICS 21
Definition 2.1.3 (The space Fk×n ). The set of k × n matrices with entries from the domain F of
scalars is denoted by Fk×n . Recall that F always denotes a field. Square matrices (k = n) have
special significance, so we write Mn (F) :=Fn×n . We identify M1 (F) with F and omit the matrix
notation, i. e., we write α rather than α . An integral matrix is a matrix with integer entries.
Naturally, Zk×n will denote the set of k × n integral matrices, and Mn (Z) = Zn×n . Recall that Z is
not a field.
2 6 9
0 −1 4 7
Example 2.1.4. For example, ∈ R2×4 is a 2 × 4 matrix and 3 −4 −2 ∈
−3 5 6 8
−5 1 4
M3 (R) is a 3 × 3 matrix.
Observe that the column vectors of height k introduced in Chapter 1 are k × 1 matrices, so
Fk = Fk×1 . Moreover, every statement about column vectors in Chapter 1 applies analogously to
1 × n matrices (“row vectors”).
Notation 2.1.5. When writing row vectors,
we use commas to avoid ambiguity, so we write, for
example, (3, 5, −1) instead of 3 5 −1 .
Notation 2.1.6 (Zero matrix). The k × n matrix with all of its entries equal to 0 is called the zero
matrix and is denoted by 0k×n , or simply by 0 if k and n are clear from context.
Notation 2.1.7 (All-ones matrix). The k × n matrix with all of its entries equal to 1 is denoted by
Jk×n or J. We write Jn for Jn×n .
Definition 2.1.8 (Diagonal matrix). A matrix A = (αij ) ∈ Mn (F) is diagonal if αij = 0 whenever
i 6= j. The n × n diagonal matrix with entries λ1 , . . . λn is denoted by diag(λ1 , . . . , λn ).
Example 2.1.9.
5 0 0 0 0
0 3 0 0 0
diag(5, 3, 0, −1, 5) =
0 0 0 0 0
0 0 0 −1 0
0 0 0 0 5
Notation 2.1.10. To avoid filling most of a matrix with the number “0”, we often write matrices
like the one above as
5
3 0
diag(5, 3, 0, −1, 5) =
0
0 −1
5
22 CHAPTER 2. MATRICES
where the big 0 symbol means that every entry in the triangles above or below the diagonal is 0.
Definition 2.1.11 (Upper and lower triangular matrices). A matrix A = (αij ) ∈ Mn (F) is upper
triangular if αij = 0 whenever i > j. A is said to be strictly upper triangular if αij = 0 whenever
i ≥ j. Lower triangular and strictly lower triangular matrices are defined analogously.
Examples 2.1.12.
5 2 0 7 2
3 0 −4 0
(a)
0 6 0 is upper triangular.
0 −1 −3
5
0 2 0 7 2
0 0 −4 0
(b)
0 6 0 is strictly upper triangular.
0 0 −3
0
5
2 3 0
(c) 0 0 0
is lower triangular.
7 −4 6 −1
2 0 0 −3 5
0
2 0 0
(d)
0 0 0
is strictly lower triangular.
7 −4 6 0
2 0 0 −3 0
Fact 2.1.13. The diagonal matrices are the matrices which are simultaneously upper and lower
triangular.
Definition 2.1.14 (Matrix transpose). The transpose of a k × ` matrix M = (αij ) is the ` × k matrix
(βij ) defined by
βij = αji (2.1)
2.1. MATRIX BASICS 23
and is denoted M T . (We flip it across its main diagonal, so the rows of A become the columns of
AT and vice versa.)
Examples 2.1.15.
T 3 1
3 1 4
(a) = 1 5
1 5 9
4 9
(c) In Examples 2.1.12, the matrix (c) is the transpose of (a), and (d) is the transpose of (b).
T
Fact 2.1.16. Let A be a matrix. Then AT = A.
Definition 2.1.17 (Symmetric matrix). A matrix M is symmetric if M = M T .
Note that if a matrix M ∈ Fk×` is symmetric then k = ` (M is square).
1 3 0
Example 2.1.18. The matrix 3 5 −2 is symmetric.
0 −2 4
Definition 2.1.19 (Matrix addition). Let A = (αij ) and B = (βij ) be k × n matrices. Then the sum
A + B is the k × n matrix with entries
Proposition 2.1.23 (Associativity). Matrix addition obeys the associative law: if A, B, C ∈ Fk×n ,
then (A + B) + C = A + (B + C).
Definition 2.1.24 (The negative of a matrix). Let A ∈ Fk×n be a matrix. Then −A is the k × n
matrix defined by (−A)ij = −(A)ij .
Definition 2.1.26 (Multiplication of a matrix by a scalar). Let A = (αij ) ∈ Fk×n , and let ζ ∈ F.
Then ζA is the k × n matrix whose (i, j) entry is ζ · αij .
Example 2.1.27.
1 2 3 6
3 −3 4 = −9 12
0 6 0 18
(AB)T = B T AT . (2.4)
Proposition 2.2.3 (Distributivity). Matrix multiplication obeys the right distributive law: if A ∈
Fk×n and B, C ∈ Fn×` , then A(B + C) = AB + AC. Analogously, it obeys the left distributive law:
if A, B ∈ Fk×n and C ∈ Fn×` , then (A + B)C = AC + BC.
Proposition 2.2.4 (Associativity). Matrix multiplication obeys the associative law: if A, B, and
C are matrices with compatible dimensions, then (AB)C = A(BC).
2.2. MATRIX MULTIPLICATION 25
Proposition 2.2.5 (Matrix multiplication vs. scaling). Let A ∈ Fk×n , B ∈ Fn×` , and α ∈ F. Then
Numerical exercise 2.2.6. For each of the following triples of matrices, compute the products
AB, AC, and A(B + C). Self-check : verify that A(B + C) = AB + AC.
1 2
(a) A = ,
2 1
3 1 0
B= ,
−4 2 5
1 −7 −4
C=
5 3 −6
2 5
(b) A = 1 1 ,
3 −3
4 6 3 1
B= ,
3 3 −5 4
1 −4 −1 5
C=
2 4 10 −7
3 1 2
(c) A = 4 −2 4 ,
1 −3 −2
−1 4
B= 3 2 ,
−5 −2
2 −3
C = −6 −2
0 1
(`×m)
Exercise 2.2.7. Let A ∈ Fk×n , and let Eij be the ` × m matrix with a 1 in the (i, j) position
and 0 everywhere else.
(k×k)
(a) What is Eij A?
26 CHAPTER 2. MATRICES
(n×n)
(b) What is AEij ?
Definition 2.2.8 (Rotation matrix). The rotation matrix Rθ is the matrix defined by
cos θ − sin θ
Rθ = . (2.6)
sin θ cos θ
R
As we shall see, this matrix is intimately related to the rotation of the Euclidean plane by θ
( Example 16.5.2).
R
Exercise 2.2.9. Prove Rα+β = Rα Rβ . Your proof may use the addition theorems for the trigono-
metric functions. Later when we learn about the connection between matrices and linear transfor-
mations, we shall give a direct proof of this fact which will imply the addition theorems ( Ex.
16.5.11).
Definition 2.2.10 (Identity matrix). The n × n identity matrix, denoted In or I, is the diagonal
matrix whose diagonal entries are all 1, i. e.,
1
1 0
I = diag(1, 1, . . . , 1) = (2.7)
. . .
0
1
This is also written as I = (δij ), where the Kronecker delta symbol δij is defined by
1 i=j
δij = (2.8)
R
0 i 6= j
Fact 2.2.11. The columns of I are the standard unit vectors ( Def. 1.3.34).
Ik A = AIn = A . (2.9)
Definition 2.2.13 (Scalar matrix). The matrix A ∈ Mn (F) is a scalar matrix if A = αI for some
α ∈ F.
2.2. MATRIX MULTIPLICATION 27
Exercise 2.2.14. Let A ∈ Fk×n and let B ∈ F`×k . Let D = diag(λ1 , . . . , λk ). Show
(a) DA is the matrix obtained by scaling the i-th row of A by λi for each i;
(b) BD is the matrix obtained by scaling the j-th column of B by λj for each j.
Definition 2.2.15 (`-th power of a matrix). Let A ∈ Mn (F) and ` ≥ 0. We define A` , the `-th power
of A, inductively:
(i) A0 = I ;
(ii) A`+1 = A · A` .
So
A` = |A ·{z
· · A} . (2.10)
` times
(a) A`+m = A` Am ;
m
(b) A` = A`m .
Definition 2.2.18 (Nilpotent matrix). The matrix N ∈ Mn (F) is nilpotent if there exists an integer
k such that N k = 0.
R
if and only if it is “similar” ( Def. 8.2.1) to a strictly upper triangular matrix (
Notation 2.2.20. We denote by Nn the n × n matrix defined by
R
So every strictly upper triangular matrix is nilpotent. Later we shall see that a matrix is nilpotent
Ex. 8.2.4).
0 1
0 1
0
0 1
Nn = . (2.11)
. . . .
. .
0 0 1
0
R
Exercise 2.2.21. Find Nnk for k ≥ 0.
Fact 2.2.22. In Section 1.4, we defined the dot product of two vectors ( Def. 1.4.1). The dot
k
product may also be defined in terms of matrix multiplication. Let x, y ∈ F . Then
x · y = xT y . (2.12)
A = [a1 | · · · | an ]
Exercise 2.2.25 (Extracting columns and elements of a matrix via multiplication). Let ei be the
i-th column of I (i. e., the i-th standard unit vector), and let A = (αij ) = [a1 | · · · | an ]. Then
2.2. MATRIX MULTIPLICATION 29
(a) Aej = aj ;
Exercise 2.2.26. [Linear combination as a matrix product] Let A = [a1 | · · · | an ] ∈ Fk×n and let
x = (α1 , . . . , αn )T ∈ Fn . Show that
Ax = α1 a1 + · · · + αn an . (2.13)
Exercise 2.2.27 (Left multiplication acts column by column). Let A ∈ Fk×n and let B = [b1 |
· · · | b` ] ∈ Fn×` . (The bi are the columns of B.) Then
Proposition 2.2.28 (No cancellation). For all n ≥ 2 and k ≥ 1 and for all x ∈ Fn , there exist
k × n matrices A and B such that Ax = Bx but A 6= B.
Corollary 2.2.30 (Cancellation). If A, B ∈ Fk×n are matrices such that Ax = Bx for all x ∈ Fn ,
then A = B. Note: compare with Prop. 2.2.28.
Proposition 2.2.31 (No double cancellation). Let k ≥ 2. Then for all n and for all x ∈ Fk and
y ∈ Fn , there exist k × n matrices A and B such that xT Ay = xT By but A 6= B.
Corollary 2.2.33 (Double cancellation). If A, B ∈ Fk×n are matrices such that xT Ay = xT By for
all x ∈ Fk and y ∈ Fn , then A = B.
Definition 2.2.34 (Trace). The trace of a square matrix A = (αij ) ∈ Mn (F) is the sum of its diagonal
entries, that is,
X n
Tr(A) = αii (2.15)
i=1
Examples 2.2.35.
(a) Tr(In ) = n
30 CHAPTER 2. MATRICES
3 1 2
(b) Tr 4 −2 4 = −1
1 −3 −2
Tr vwT = vT w .
(2.16)
Exercise 2.2.39. Show that the trace of a product is invariant under a cyclic permutation of the
terms, i. e., if A1 , . . . , Ak are matrices such that the product A1 · · · Ak is defined and is a square
matrix, then
Tr(A1 · · · Ak ) = Tr(Ak A1 · · · Ak−1 ) . (2.18)
Exercise 2.2.40. Show that the trace of a product is not invariant under all permutations of the
terms. In particular, find 2 × 2 matrices A, B, and C such that
Tr(ABC) 6= Tr(BAC) .
Exercise 2.2.41 (Trace cancellation). Let B, C ∈ Mn (F). Show that if Tr(AB) = Tr(AC) for all
A ∈ Mn (F), then B = C.
2.3. ARITHMETIC OF DIAGONAL AND TRIANGULAR MATRICES 31
A + B = diag(α1 + β1 , . . . , αn + βn ) (2.19)
λA = diag(λα1 , . . . , λαn ) (2.20)
AB = diag(α1 β1 , . . . , αn βn ) . (2.21)
Proposition 2.3.2. Let A = diag(α1 , . . . , αn ) be a diagonal matrix. Then Ak = diag α1k , . . . , αnk
R
for all k.
Definition 2.3.3 (Substition of a matrix into a polynomial). Let f ∈ F[t] be the polynomial ( Def.
8.3.1) defined by
f = α0 + α1 t + · · · + αd td .
Just as we may substitute ζ ∈ F for the variable t in f to obtain a value f (ζ) ∈ F, we may also
“plug in” the matrix A ∈ Mn (F) to obtain f (A) ∈ Mn (F). The only thing we have to be careful
about is what we do with the scalar term α0 ; we replace it with α0 times the identity matrix, so
f (A) := α0 I + α1 A + · · · + αd Ad . (2.22)
In our discussion of the arithmetic of triangular matrices, we focus on the diagonal entries.
Notation 2.3.5. For the remainder of this section, the symbol ∗ in a matrix will represent an arbitrary
value with which we will not concern ourselves. We write
∗
α1
α2
. . .
0 αn
32 CHAPTER 2. MATRICES
for
α1 ∗ ··· ∗
α2 · · · ∗
.. .
..
.
0 αn
.
and
∗
β1
β2
B=
...
0 βn
∗
α1 + β1
α 2 + β2
A+B = (2.24)
..
.
0 α n + βn
∗
λα1
λα2
λA = (2.25)
...
0 λαn
∗
α1 β1
α 2 β2
AB = . (2.26)
..
.
0 α n βn
2.4. PERMUTATION MATRICES 33
∗
α1k
k
α2k
A = (2.27)
..
.
0 αnk
for all k.
Proposition 2.3.8. Let f ∈ F[t] be a polynomial and let A be as in Prop. 2.3.6. Then
∗
f (α1 )
f (α2 )
f (A) = . (2.28)
..
.
0 f (αn )
FIGURE HERE
Fact 2.4.2. The number of rook arrangements on an n × n chessboard is n!.
Definition 2.4.3 (Permutation matrix). A permutation matrix is a square matrix with the following
properties.
(a) Every nonzero entry is equal to 1 .
(b) Each row and column has exactly one nonzero entry.
Observe that rook arrangements correspond to permutation matrices where each rook is placed
on a 1. Permutation matrices will be revisited in Chapter 6 where we discuss the determinant.
Recall that we denote by [n] the set of integers {1, . . . , n}.
34 CHAPTER 2. MATRICES
i 1 2 3 4 5 6
iσ 3 6 4 1 5 2
i 4 6 1 5 2 3
iσ 1 2 3 5 6 4
Moreover, the permutation can be represented by a diagram, where the arrow i 7→ iσ means
that i maps to iσ .
FIGURE HERE
Definition 2.4.6 (Composition of permutations). Let σ, τ : [n] → [n] be permutations. The compo-
sition of τ with σ, denoted στ , is the permutation which maps i to iστ defined by
Example 2.4.7. Let σ be the permutation of Example 2.4.5 and let τ be the permutation given
in the table
i 1 2 3 4 5 6
iτ 4 1 6 5 2 3
Then we can find the table representing the permutation στ by rearranging the table for τ so
that its first row is in the same order as the second row of the table for σ, i. e.,
i 3 6 4 1 5 2
iτ 6 3 5 4 2 1
2.4. PERMUTATION MATRICES 35
Definition 2.5.2 (Commutator). Let A, B ∈ Mn (F). The commutator of A and B is the matrix
[A, B] := AB − BA.
Definition 2.5.3. Two matrices A, B ∈ Mn (F) commute if AB = BA, i. e., if [A, B] = 0.
Exercise 2.5.4. (a) Find an example of two 2 × 2 matrices that do not commute.
Exercise 2.5.5. Let D ∈ Mn (F) be a diagonal matrix such that all diagonal entries are distinct.
R
Show that if A ∈ Mn (F) commutes with D then A is a diagonal matrix.
Exercise 2.5.6. Show that only the scalar matrices ( Def. 2.2.13) commute with all matrices
in Mn (F). (A scalar matrix is a matrix of the form λI.)
Exercise 2.5.7.
(a) Show that the commutator of two matrices over C is never the identity matrix.
Exercise 2.5.8 (Submatrix sum). Let I1 ⊆ [k] and I2 ⊆ [n], and let B be the submatrix ( R Def.
3.3.11) of A ∈ Fk×n with entries αij for i ∈ I1 , j ∈ I2 . Find vectors a and b such that aT Ab equals
the sum of the entries of B.
Exercise 2.5.10. Let A be a Vandermonde matrix generated by distinct αi . Show that the rows
of A are linearly independent. Do not use determinants.
Exercise 2.5.11. Prove that polynomials of a matrix commute: let A be a square matrix and let
f, g ∈ F[t]. Then f (A) and g(A) commute. In particular, A commutes with f (A).
Definition 2.5.12 (Circulant matrix). The circulant matrix generated by the sequence (α0 , α1 , . . . αn−1 )
of scalars is the n × n matrix
α0 α1 · · · αn−1
αn−1 α0 · · · αn−2
C(α0 , α1 , . . . αn−1 ) = .. (2.31)
.. . . ..
. . . .
α1 α2 · · · α0
Exercise 2.5.13. Prove that all circulant matrices commute. Prove this
(a) directly,
(b) in a more elegant way, by showing that all circulant matrices are polynomials of a particular
circulant matrix.
Definition 2.5.14 (Jordan block). For λ ∈ C and n ≥ 1, the Jordan block J(n, λ) is the matrix
J(n, λ) := λI + Nn (2.32)
Exercise 2.5.15. Let f ∈ F[t]. Prove that f (J(n, λ)) is the matrix
f (λ) f 0 (λ) 2!1 f (2) (λ) · · · (n−1)!
1
f (n−1) (λ)
0 1
(n−2)
f (λ) f (λ) · · · (n−2)!
f (λ)
. . . . .
. .
(2.33)
. . .
0
0 f (λ) f (λ)
f (λ)
Exercise 2.5.16. The converse of the second statement in Ex. 2.5.11 would be:
38 CHAPTER 2. MATRICES
(∗) The only matrices that commute with A are the polynomials of A.
R
(c) Prove: (∗) is true for Jordan blocks.
(d) Characterize the matrices over C for which (∗) is true, in terms of their Jordan blocks ( ??).
Project 2.5.17. For A ∈ Mn (R), let f (A, k) denote the largest absolute value of all entries of Ak ,
and define
Mn(λ) (R) := {A ∈ Mn (R) | f (A, 1) ≤ λ} (2.34)
(the matrices where all entries have absolute value ≤ λ). Define
| rkcol (A) − rkcol (B)| ≤ rkcol (A + B) ≤ rkcol (A) + rkcol (B) . (3.1)
Exercise 3.1.6. Let A be a k × n matrix and let B be an n × ` matrix. Then col(AB) ≤ col(A).
Babai: Discover Linear Algebra. 39 This chapter last updated January 16, 2023
c 2016 László Babai.
40 CHAPTER 3. MATRIX RANK
Exercise 3.1.7. Let A and B be matrices such that the product matrix AB is defined. Then
rkcol (AB) ≤ rkcol A.
R
(
We shall see that this terminology is consistent with the general definition of nonsingular matrices
Def. 3.3.10).
Proposition 3.2.5. Let A be a square matrix with linearly independent rows. Then by performing
a series of row operations followed by a permutation of the columns, we can transfrom the matrix
into a nonsingular diagonal matrix.
Proposition 3.2.6. Let A be a matrix with linearly independent rows. Then by performing a
series of row operations and by permuting rows and columns, we can bring the matrix into the 1 × 2
block matrix form [D | B] where D is a nonsingular diagonal matrix.
Proposition 3.2.7. By a series of elementary row operations followed by a permutation of the
columns and a permutation of the rows, any matrix can be transformed into a 2 × 2 block matrix
where the top left block is a nonsingular diagonal matrix and both bottom blocks are 0.
The process of transforming a matrix into this form by performing a series of elementary row
operations followed by a permutation of the rows and a permutation of the columns is called
Gaussian elimination.
3.3. THE SECOND MIRACLE OF LINEAR ALGEBRA 41
Corollary 3.2.8. By a series of elementary row operations and elementary column operations
followed by a permutation of the columns and a permutation of the rows, any matrix can be
transformed into a 2 × 2 block matrix where the top left block is a nonsingular diagonal matrix and
the other three blocks are 0.
Theorem 3.3.1 (Second Miracle of Linear Algebra). The row-rank of a matrix is equal to its
column-rank.
This result will be an immediate consequence of the following two lemmas, together with Corol-
lary 3.2.8.
Lemma 3.3.2. Elementary column operations do not change the column-rank of a matrix. In fact,
elementary column operations do not change the column space of a matrix. ♦
Lemma 3.3.3. Elementary row operations do not change the column-rank of a matrix.
Exercise 3.3.4. Use these two lemmas, together with Corollary 3.2.8, to prove Theorem 3.3.1.
The proof of Lemma 3.3.2 is very simple. The proof of Lemma 3.3.3 is somewhat more involved;
we will break it into exercises.
The following exercise demonstrates why the proof of Lemma 3.3.3 is not as straightforward as
the proof of Lemma 3.3.2.
Exercise 3.3.5. An elementary row operation can change the column space of a matrix.
Proposition 3.3.6. Let A = [a1 | · · · | an ] ∈ Fk×n be a matrix, and suppose the linear relation
n
X
αi ai = 0
i=1
42 CHAPTER 3. MATRIX RANK
Corollary 3.3.7. If the columns vi1 , . . . , vi` are linearly independent, then this remains true after
an elementary row operation.
∗ ∗ ∗
Because the Second Miracle of Linear Algebra establishes that the row-rank and column-rank
of a matrix A are equal, it is no longer necessary to differentiate between them; this quantity is
simply referred to as the rank of A, denoted rk(A).
Definition 3.3.9 (Full rank). Let A ∈ Mn (F) be a square matrix. We say that A has full rank if
rk A = n.
the other hand, the notions of full column-rank and full row-rank (
importance.
R
Observe that only in the case of square matrices is the concept of “full rank” defined. On
Def. 3.1.4) retain their
Definition 3.3.10. [Nonsingular matrix] Let A be square matrix. We say that A is nonsingular if
it has full rank. Otherwise A is singular.
Definition 3.3.11 (Submatrix). Let A ∈ Fk×n be a matrix. Then the matrix B is a submatrix of A
if it can be obtained by deleting some rows and columns from A.
In other words, a submatrix of a matrix A is a matrix obtained by taking the intersection of a
set of rows of A with a set of columns of A.
Theorem 3.3.12 (Rank vs. nonsingular submatrices). Let A ∈ Fk×n be a matrix. Then rk A is
the largest value of r such that A has a nonsingular r × r submatrix. ♦
Exercise 3.3.13. Show that for all k, the intersection of k linearly independent rows with k linearly
independent columns can be singular. In fact, for any k, it can be the zero matrix.
3.4. MATRIX RANK AND INVERTIBILITY 43
Exercise 3.3.14. Let A be a matrix of rank r. Show that the intersection of any r linearly
independent rows with any r linearly independent columns is a nonsingular r × r submatrix of
A. (Note: this exercise is more difficult than Theorem 3.3.12 and is not needed for the proof of
Theorem 3.3.12.)
Exercise 3.3.15. Let A be a matrix. Show that if the intersection of k linearly independent
columns with ` linearly independent rows of A has rank s, then rk(A) ≥ k + ` − s.
(a) Show that A has a right inverse if and only if A has full row-rank, i. e., rk A = k.
(b) Show that A has a left inverse if and only if A has full column-rank, i. e., rk A = n.
Note in particular that if A has a right inverse, then k ≤ n, and if A has a left inverse, then
k ≥ n.
Corollary 3.4.3. Let A be a nonsingular square matrix. Then A has both a right and a left inverse.
Exercise 3.4.4. For all k < n, find a k × n matrix that has infinitely many right inverses.
Definition 3.4.5 (Two-sided inverse). Let A ∈ Mn (F). Then the matrix B ∈ Mn (F) is a (two-sided)
inverse of A if AB = BA = In . The inverse of A is denoted A−1 . If A has an inverse, then A is
said to be invertible.
Proposition 3.4.6. Let A be a matrix. If A has a left inverse as well as a right inverse, then A
has a unique two-sided inverse and it has no left or right inverse other than the two-sided inverse.
The proof of this lengthy statement is just one line, based solely on the associativity of matrix
multiplication. The essence of the proof is in the next lemma.
44 CHAPTER 3. MATRIX RANK
Lemma 3.4.7. Let A ∈ Fk×n be a matrix with a right inverse B and a left inverse C. Then B = C
is a two-sided inverse of A and k = n. ♦
Corollary 3.4.8. Under the conditions of Lemma 3.4.7, k = n and B = C is a two-sided inverse.
Moreover, if C1 is also a left inverse, then C1 = C; analogously, if B1 is also a right inverse, then
B1 = B.
Corollary 3.4.9. Let A be a matrix with a left inverse. Then A has at most one right inverse.
Corollary 3.4.10. A matrix A has an inverse if and only if A is a nonsingular square matrix.
(a) A is nonsingular
A more detailed version of Theorem 3.4.12 appears later as Theorem 6.4.16. The most important
addition to the list of equivalent conditions is the determinant condition (f) stated above.
Exercise 3.4.13. Assume F is infinite and let A ∈ Fk×n where n > k. If A has a right inverse,
then A has infinitely many right inverses.
3.5. CODIMENSION (OPTIONAL) 45
Definition 3.5.7 (Null space). The null space or kernel of a matrix A ∈ Fk×n , denoted null(A), is
the set
null(A) = {v ∈ Fn | Av = 0} . (3.6)
Exercise 3.5.8. Let A ∈ Fk×n . Show that
rk A = codim(null(A)) . (3.7)
46 CHAPTER 3. MATRIX RANK
Proposition 3.5.9. Let U ≤ Fn and let W ≤ Fk such that dim W = codim U = `. Then there is
a matrix A ∈ Fk×n such that null(A) = U and col A = W .
Definition 3.5.10 (Corank of a matrix). Let A ∈ Fk×n . We define the corank of A as the corank of
its column space, i. e.,
corank A := k − rk A . (3.8)
Exercise 3.5.11. When is corank A = corank AT ?
Exercise 3.5.12. Let A ∈ Fk×n and let B ∈ Fn×` . Show that
Proposition 3.6.8. Let A ∈ Fk×n . Then rk A is the smallest r such that there exist matrices
B ∈ Fk×r and C ∈ Fr×n with A = BC.
Proposition 3.6.9. Show that rk(A) is the smallest integer r such that A can be expressed as the
sum of r matrices of rank 1.
Proposition 3.6.10 (Characterization of matrices of rank 1). Let A ∈ Fk×n . Show that rk A = 1
if and only if there exist column vectors a ∈ Fk and b ∈ Fn such that A = abT .
(c) Let f be a polynomial of degree d, and let Af = (f (αij )). Prove that rk(Af ) ≤ r+d
d
.
(d) Show that each of these bounds is tight for all r and d, i. e., for every r and d
(i) there exists a matrix A such that the rank of the corresponding matrix D is rk(D) =
r+d−1
d
, and
(ii) there exists a matrix A and a polyomial f of degree d such that rk(Af ) = r+d
d
.
Chapter 4
Here, the αij and βi are scalars, while the xj are unknowns. In Section 1.1, we represented this
system as a linear combination of column vectors. Matrices allow us to write this system even more
concisely as Ax = b, where
α11 α12 · · · α1n x1 β1
α21 α22 · · · α2n x2 β2
A = .. , x = .. , b = . (4.1)
.. .. .. ..
. . . . . .
αk1 αk2 · · · αkn xn βk
Babai: Discover Linear Algebra. 48 This chapter last updated August 8, 2016
c 2016 László Babai.
4.1. HOMOGENEOUS SYSTEMS OF LINEAR EQUATIONS 49
We know that the simplest linear equation is of the form ax = b (one equation in one unknown);
remarkably, thanks to the power of our matrix formalism, essentially the same equation now de-
scribes the far more complex systems of linear equations. The first question we ask about any
system of equations is its solvability.
Definition 4.1.1 (Solvable system of linear equations). Given a matrix A ∈ Fk×n and a vector
b ∈ Fk , we say that the system Ax = b of linear equations is solvable if there exists a vector x ∈ Fn
that satisfies Ax = b.
Definition 4.1.2 (Homogeneous system of linear equations). The system Ax = 0 is called a homo-
geneous system of linear equations.
Every system of homogeneous linear equations is solvable.
Definition 4.1.3 (Trivial solution to a homogeneous system of linear equations). The trivial solution
to the homogeneous system of linear equations Ax = 0 is the solution x = 0.
So when presented with a homogeneous system of linear equations, the question we ask is not,
“Is this system solvable?” but rather, “Does this system have a nontrivial solution?”
columns of A are linearly dependent, i. e., A does not have full column rank ( R
Theorem 4.1.4. Let A ∈ Fk×n . The system Ax = 0 has a nontrivial solution if and only if the
Def. 3.1.4). ♦
Definition 4.1.5 (Solution space). Let A ∈ Fk×n . The set of solutions to the homogeneous system
of linear equations Ax = 0 is the set U = {x ∈ Fn | Ax = 0} and is called the solution space of
Ax = 0. Ex. 4.1.8 explains the terminology.
Definition 4.1.6 (Null space). The null space or kernel of a matrix A ∈ Fk×n , denoted null(A), is
the set
null(A) = {v ∈ Fn | Av = 0} . (4.2)
Definition 4.1.7. The nullity of a matrix A is the dimension of its null space.
For the following three exercises, let A ∈ Fk×n and let U ≤ Fn be the solution space of the
system Ax = 0.
(d) dim U = n − rk A
An immediate consequence of (d) is the Rank-Nullity Theorem, which will be crucial in our
study of linear maps in Chapter 16.1
Corollary 4.1.11 (Rank–Nullity Theorem). Let A ∈ Fk×n be a matrix. Then
An explanation of (d) is that dim U measures the number of coordinates of x that we can choose
independently. This quantity is referred to by physicists as the “degree of freedom” left in our choice
of x after the set Ax = b of constraints. If there are no constraints, the degree of freedom of the
system is equal to n. It is plausible that each constraint reduces the degree of freedom by 1, which
would suggest dim U = n − k, but effectively there are only rk A constraints because every equation
that is a linear combination of previous equations can be thrown out. This makes it plausible that
the degree of freedom is n − rk A. This argument is not a proof, however.
Proposition 4.1.12. Let A ∈ Fk×n and consider the homogeneous system Ax = 0 of linear
equations. Let U be the solution space of Ax = 0. Prove that the following are equivalent.
(a) Ax = 0 has no nontrivial solution, i. e., U = {0}
Proposition 4.1.13. Let A be a square matrix. Then Ax = 0 has no nontrivial solution if and
only if A is nonsingular.
Definition 4.2.2 (Augmented system). When speaking of the system Ax = b, we call A the matrix
of the system and [A | b] (the column b added to A) the augmented system.
Proposition 4.2.3. The system Ax = b of linear equations is solvable if and only if the matrix of
the system and the augmented matrix have the same rank, i. e., rk A = rk[A | b].
R
S + v = {s + v | s ∈ S} (4.4)
is called the translate of S by v. Such an object is called an affine subspace of Fn ( Def. 5.1.3).
Our discussion of the determinant in the next chapter will allow us to add a particularly impor-
tant additional property:
(n) det A 6= 0
Chapter 5
The reader comfortable with abstract vector spaces can skip to Chapter ?? for a more general
discussion of this material.
R
5.1 (F) Affine combinations
In Section 1.1, we defined linear combinations of column vectors ( Def. 1.1.13). We now consider
affine combinations.
Definition 5.1.1 (Affine combination). An affine combination of the vectors v1 , . . . , vk ∈ Fn is a
X k k
X
linear combination αi vi where αi = 1.
i=1 i=1
Example 5.1.2.
1 1 1
2v1 − v2 − v4 − v5
2 3 6
is an affine combination of the column vectors v1 , . . . , v5 .
Definition 5.1.3 (Affine-closed set). The set S ⊆ Fn is affine-closed if it is closed under affine
combinations.
Fact 5.1.4. The empty set is affine-closed (why?).
Babai: Discover Linear Algebra. 53 This chapter last updated April 3, 2024
c 2016 László Babai.
54 CHAPTER 5. AFFINE AND CONVEX COMBINATIONS (OPTIONAL)
binations ( R
Throughout this book, the term “subspace” refers to subsets that are closed under linear com-
Def. 1.2.1). Subspaces are also referred to as “linear subspaces.” This (redundant)
longer term is especially useful in contexts where affine subspaces are discussed in order to distin-
guish linear subspaces from affine subspaces.
Definition 5.1.7 (Affine hull). The affine hull of a subset S ⊆ Fn , denoted aff(S), is the smallest
affine-closed set containing S, i. e.,
(a) aff(S) ⊇ S;
Theorem 5.1.10. For S ⊆ Fn , aff(S) is the set of all affine combinations of the finite subsets of
S. ♦
R
Proposition 5.1.14.
Proposition 5.1.15. The intersection of a (finite or infinite) family of affine subspaces is either
empty or equal to a translate of the intersection of their corresponding linear subspaces.
∗ ∗ ∗
Next we connect these concepts with the theory of systems of linear equations.
Exercise 5.1.16. Let A ∈ Fk×n and let b ∈ Fk . Then the set of solutions to the system Ax = b of
linear equations is an affine-closed subset of Fn .
Exercise 5.1.17. Every affine-closed subset of Fn is the set of solutions to the system Ax = b of
linear equations for some A ∈ Fk×n and b ∈ Fk .
Proposition 5.1.18 (General vs. homogeneous systems of linear equations). Let A ∈ Fk×n and
b ∈ Fn . Let S = {x ∈ Fn | Ax = b} be the set of solutions of the system Ax = b and let
U = {x ∈ Fn | Ax = 0} be the set of solutions of the corresponding system of homogeneous linear
equations. Then either S is empty or S is a translate of U .
∗ ∗ ∗
Proposition 5.1.19. The span of the set S ⊆ Fn is the affine hull of S ∪ {0}.
Definition 5.1.21 (Dimension of an affine subspace). The (affine) dimension of an affine subspace
U ≤aff Fn , denoted dimaff U , is the dimension of its corresponding linear subspace (of which it is
a translate). In order to assign a dimension to all affine-closed sets, we adopt the convention that
dim ∅ = −1.
Fact 5.1.26. Any single vector is affine-independent and affine-closed at the same time.
(a) For k ≥ 0, the vectors v1 , . . . vk are linearly independent if and only if the vectors 0, v1 , . . . , vk
are affine-independent.
(b) For k ≥ 1, the vectors v1 , . . . , vk are affine-independent if and only if the vectors v2 −
v1 , . . . , vk − v1 are linearly independent.
Definition 5.1.29 (Affine basis). An affine basis of an affine subspace W ≤aff Fn is an affine-
independent set S such that aff(S) = W .
Proposition 5.1.30. Let W be an affine subspace of Fn . Every affine basis of W has 1 + dim W
elements.
k
X
dimaff (aff{W1 , . . . , Wk }) ≤ (k − 1) + dimaff Wi . (5.2)
i=1
5.2. HYPERPLANES 57
1( R
Definition 5.2.1 (Linear hyperplane). A linear hyperplane of Fn is a subspace of Fn of codimension
Def. 3.5.1).
Definition 5.2.2 (Codimension of an affine subspace). The (affine) codimension of an affine subspace
U ≤aff Fn , denoted codimaff U , is the codimension of its corresponding linear subspace (of which it
is a translate).
Definition 5.2.3 (Hyperplane). A hyperplane is an affine subspace of codimension 1.
Proposition 5.2.4. Let S ⊆ Fn be a hyperplane. Then there exist a nonzero vector a ∈ Fn and
β ∈ F such that aT v = β if and only if v ∈ S.
The vector a whose existence is guaranteed by the preceding proposition is called the normal
vector of the hyperplane S.
Example 5.3.2.
1 1 1 1
v1 + v2 + v4 + v5
2 4 6 12
is a convex combination of the vectors v1 , . . . , v5 . Note that the affine combination in Example
5.1.2 is not convex.
Definition 5.3.4 (Convex set). A convex set is a subset S ⊆ Rn that is closed under convex combi-
nations.
Proposition 5.3.5. The intersection of a (finite or infinite) family of convex sets is convex.
Definition 5.3.6. The convex hull of a subset S ⊆ Rn , denoted conv(S), is the smallest convex set
containing S, i. e.,
(a) conv(S) ⊇ S;
Theorem 5.3.8. For S ⊆ R, conv(S) is the set of all convex combinations of the finite subsets of
S. ♦
Proposition 5.3.12. The set S ⊆ Rn is convex if and only if it contains the straight-line segment
connecting u and v for every u, v ∈ S.
5.4. HELLY’S THEOREM 59
Definition 5.3.13 (Dimension). The dimension of a convex set C ⊆ Rn , denoted dimconv C, is the
dimension of its affine hull. A convex subset C of Rn is full-dimensional if aff C = Rn .
Definition 5.3.14 (Half-space). A closed half-space is a region of Rn defined as {v ∈ Rn | aT v ≥ β}
for some nonzero a ∈ Rn and β ∈ R and is denoted H(a, β). An open half-space is a region of Rn
defined as {v ∈ Rn | aT v > β} for some nonzero a ∈ Rn and β ∈ R and is denoted Ho (a, β).
Exercise 5.3.15. Let a ∈ Rn be a nonzero vector and let β ∈ R. Prove: the set {v ∈ Rn | aT v ≤ β}
is also a (closed) half-space.
Fact 5.3.16. Let S ⊆ Rn be a hyperplane defined by aT v = β. Then S divides Rn into two
(b) If S ⊆ Rn is finite then conv(S) is the intersection of a finite number of closed half-spaces.
Exercise 5.3.18. Find a convex set which is not the intersection of any number of open or closed
Project 5.5.1. Define a “partially open half-space” which can be defined by 2n − 1 parameters2 so
that every convex set is the intersection of the partially open half-spaces and the closed half-spaces
containing it.
1
In fact, in a well-defined sense, half-spaces can be defined by n real parameters.
2
In fact, this object can be defined by 2n − 3 real parameters.
Chapter 6
The Determinant
α22 β1 − α12 β2
x1 = (6.3)
α22 α11 − α12 α21
α11 β2 − α21 β1
x2 = (6.4)
α22 α11 − α12 α21
N31
x1 = (6.5)
D3
N32
x2 = (6.6)
D3
N33
x3 = (6.7)
D3
Babai: Discover Linear Algebra. 61 This chapter last updated February 17, 2021
c 2016 László Babai.
62 CHAPTER 6. THE DETERMINANT
where
D3 = α11 α22 α33 + α12 α23 α31 + α13 α21 α32 (6.8)
− α11 α23 α32 − α12 α21 α33 − α13 α22 α31 (6.9)
N31 = α22 α33 β1 + α12 α23 β3 + α13 α32 β2 (6.10)
− α23 α32 β1 − α12 α33 β2 − α13 α22 β3 (6.11)
N32 = α11 α33 β2 + α23 α31 β1 + α13 α32 β3 (6.12)
− α11 α23 β3 − α21 α33 β1 − α13 α31 β2 (6.13)
N33 = α11 α22 β3 + α12 α31 β2 + α21 α32 β1 (6.14)
− α11 α32 β2 − α12 α21 β3 − α22 α31 β1 (6.15)
In particular, the numerators and the denominator of these expressions each have six terms,
half of which have a negative sign. For a system of 4 equations in 4 unknowns, the numerators and
denominator each have 24 terms; again, half of them have a negative sign. For general n we get n!
terms in the numerators and denominator, again half of them with a negative sign.
The denominator of these expressions is known as the determinant of the matrix A (where our
system of linear equations is written as Ax = b). Texts often focus on the rules of how to calculate
the determinant. Before discussing those rules, however, we need to give a definition, what exactly
we wish to calculate.
The determinant is a function from the space of n × n matrices to numbers, that is, det :
Mn (F) → F. Before formulating the definition of this function, we need to discuss permutations.
6.2 Permutations
Definition 6.2.1 (Permutation). A permutation of a set Ω is a bijection f : Ω → Ω. The set Ω is
called the permutation domain.
Definition 6.2.2 (Symmetric group). The symmetric group of degree n, denoted Sn , is the set of all
permutations of the set {1, . . . , n}.
Definition 6.2.3 (Inversion). Let σ ∈ Sn be a permutation of the set {1, . . . , n}, and let 1 ≤ i, j ≤ n
with i 6= j. We say that the pair {i, j} is inverted by σ if i < j and σ(i) > σ(j) or i > j and
σ(i) < σ(j). We denote by Inv(σ) the number of inversions of σ, that is, the number of pairs {i, j}
which are inverted by σ.
6.2. PERMUTATIONS 63
Definition 6.2.5 (Even and odd permutations). If Inv(σ) is even, then we say that σ is an even
permutation, and if Inv(σ) is odd, then σ is an odd permutation.
for all a ∈ Ω.
We also refer to the composition of σ with τ as the product of σ and τ .
Definition 6.2.9. [Transposition] Let Ω be a set. The transposition of the elements a 6= b ∈ Ω is
the permutation that swaps a and b and fixes every other element. Formally, it is the permutation
τ defined by
b x=a
τ (x) := a x=b (6.18)
x otherwise
This permutation is denoted τ = (a, b). Note that in this notation, (a, b) = (b, a).
In the light of this result, we could use its conclusion as the definition of even permutations.
The advantage of this definition is that it can be applied to any set Ω, not just the ordered set
{1, . . . , n}.
Corollary 6.2.12. While the number of inversions of a permutation depends on the ordering of
the permutation domain, its parity (being even or odd) does not.
Definition 6.2.13 (Neighbor transposition). A neighbor transposition of the set {1, . . . , n} is a trans-
position of the form τ = (i, i + 1).
Exercise 6.2.14. Let σ ∈ Sn and let τ be a neighbor transposition. Show
| Inv(σ) − Inv(στ )| = 1 .
Corollary 6.2.15. Let σ ∈ Sn and let τ be a neighbor transposition. Then sgn(στ ) = − sgn(σ).
Proposition 6.2.16. Every transposition is the composition of an odd number of neighbor trans-
positions.
Proposition 6.2.17. Neighbor transpositions generate the symmetric group. That is, every ele-
ment of Sn can be expressed as the composition of neighbor transpositions.
Proposition 6.2.18. Composition with a transposition changes the parity of a permutation.
Corollary 6.2.19. Let σ ∈ Sn be a permutation. Then σ is even if and only if σ is the product of
an even number of transpositions.
Theorem 6.2.20. Let σ, τ ∈ Sn . Then
♦
Definition 6.2.21 (k-cycle). A k-cycle is a permutation that cyclically permutes k elements {a1 , . . . , ak }
and fixes all others (FIGURE). That is, σ is a k-cycle if σ(ai ) = ai+1 for some elements a1 , . . . , ak
(where ak+1 = a1 ) and σ(x) = x if x 6∈ {a1 , . . . , ak }. We denote this permutation by (a1 , a2 , . . . , ak ).
Note that (a1 , a2 , . . . , ak ) = (a2 , a3 , . . . , ak , a1 ).
In particular, transpositions are 2-cycles. Observe that our notation is consistent with the
notation for transpositions given in Def. 6.2.9.
6.3. DEFINING THE DETERMINANT 65
Definition 6.2.22 (Disjoint cycles). Let σ and τ be cycles with permutation domain Ω. Then σ and
τ are disjoint if no element of Ω is permuted by both σ and τ .
Exercise 6.2.24. Let σ be a k-cycle. Show that σ is an even permutation if and only if k is odd.
Corollary 6.2.25. Let σ be a permutation. Then σ is even if and only if its cycle decomposition
includes an even number of even cycles.
So the determinant is a sum of n! terms, called the expansion terms, each of which is a product
of n entries of the matrix, with a ± sign added. The n terms of each expansion term are arranged
in a “rook configuration”: there is one term from each row and from each column. Note that there
is a bijection between permutations and rook configurations.
Notation 6.3.2. Another notation for the determinant is that we put the matrix entries between
vertical bars:
α11 · · · α1n
det A = |A| = ... ..
.
.. .
. (6.21)
αn1 · · · αnn
Exercise 6.3.3. Let A ∈ Mn (Z) be an n × n integral matrix. Show that det A is an integer.
66 CHAPTER 6. THE DETERMINANT
Note that this fact would not be evident if all we knew was a practical algorithm, such as
Gaussian elimination, to compute the determinant: in the course of that computation we inevitably
run into fractions. On the other hand, the impractical n!-term sum that defines the determinant
makes the conclusion immediate.
α11 α22 α33 + α12 α23 α31 + α13 α21 α32 − α11 α23 α32 − α12 α21 α33 − α13 α22 α31
Proposition 6.4.1 (Transpose). Show that det(AT ) = det(A). This fact follows from what prop-
erty of inversions?
Proposition 6.4.3 (Common factor of a column). Let B be a matrix obtained from A by multi-
plying every element of a column by c. Then det(B) = c · det(A).
Proposition 6.4.4 (Swapping two columns). Let B be the matrix obtained by swapping two
columns of A. Then det(B) = − det(A).
Proposition 6.4.5 (Equal columns). If two columns of A are equal, then det A = 0.
the determinant ( R R
Warning: prove this without using the fact that elementary column operations do not change
Prop. 6.4.11). This is easier to prove if the characteristic of the field is not
2, in other words, if 1 + 1 6= 0 in F ( Def. 14.4.8). But the statement holds over all fields.
6.4. PROPERTIES OF DETERMINANTS 67
Proposition 6.4.6 (Diagonal matrices). The determinant of a diagonal matrix is the product of
its diagonal entries.
Proposition 6.4.7 (Triangular matrices). The determinant of an upper triangular matrix is the
product of its diagonal entries.
Example 6.4.8.
5 1 7
det 0 2 6 = 30 (6.22)
0 0 3
and this value does not depend on the three entries in the upper-right corner.
Definition 6.4.9 (k-linearity). A function f : V × · · · × V → W is linear in the i-th component, if,
| {z }
k terms
whenever we fix x1 , . . . , xi−1 , xi+1 , . . . xk , the function
g(y) := f (x1 , . . . , xi−1 , y, xi+1 , . . . , xk )
is a linear,1 i. e.,
g(y1 + y2 ) = g(y1 ) + g(y2 ) (6.23)
g(αy) = αg(y) (6.24)
The function f is [k-linear] if it is linear in all k components. A function which is 2-linear is said
to be bilinear.
Proposition 6.4.10 (Multilinearity). The determinant is multilinear in the columns of A. That
is,
column operations ( R
Proposition 6.4.11 (Elementary column operations vs. determinant). Performing elementary
Def. 3.2.1) does not change the determinant of a matrix.
We are ready for a more complete list of conditions equivalent to nonsingularity, augmenting
Theorem 3.4.12. The most significant addition is the first item: the determinantal characterization
of nonsingularity.
(a) det(A) 6= 0.
(c) The columns of A are linearly independent, i. e., A has full column rank.
(d) The rows of A are linearly independent, i. e., A has full row rank.
♦
2 1 −3
Numerical exercise 6.4.17. Let A = 4 −1 0 .
2 5 −1
(a) Use the formula derived in part (b) of Prop. 6.3.4 to compute det A.
(b) Compute the matrix A0 obtained by performing the column operation (1, 2, −4) on A.
(c) Self-check : Use the same formula to compute det A0 , and verify that det A0 = det A.
We now turn to characterizing the rank of a mtrix via determinants. Recall Theorem 3.3.12.
Theorem 6.5.2. Let A ∈ Fk×n be a matrix. Then rk A is the largest value of r such that A has a
nonsingular r × r submatrix. ♦
question, we temporarily use the notation rkF (A) and rkG (A) to denote the rank of A with respect
to the corresponding fields. We will also write rkp to mean rkFp and we shall compare the rank of
an integral matrix in characterisic zero and in characteristic p.
Lemma 6.6.1 (Nonsingularity is insensitive to field extensions). Let F be a subfield of G, and let
A ∈ Mn (F). Then A is nonsingular over F if and only if A is nonsingular over G. ♦
Corollary 6.6.2 (Rank insensitivity to field extension). Let F be a subfield of G, and let A ∈ Fk×n .
Then
rkF (A) = rkG (A) .
Integral matrices (matrices with integer entries) can be interpreted as matrices over any field.
Corollary 6.6.3. Let A be an integral matrix. Then rkF (A) only depends on the characteristic of
F.
In particular, rkQ (A) = rkR (A).
Notation 6.6.4. Let A be an integral matrix and p a prime number or zero. We write rkp (A) to
denote the rank of A over any field of characteristic p.
Exercise 6.6.5. Show that this notation is sound, i. e., rkF (A) only depends on char(F).
Exercise 6.6.6. Let A ∈ Zk×n be an integral matrix.
(a) Show that rkp (A) ≤ rk(A).
(b) For every prime number p, find a (0, 1) matrix A (that is, a matrix whose entries are only 0
and 1) where this inequality is strict, i. e., rkp (A) < rk(A).
Next we consider the rank of real matrices.
Theorem 6.6.7. Let A be a matrix with real entries. Then rk AT A = rk(A). ♦
Exercise 6.6.8. (a) Find a 2 × 2 matrix A over C such that rk AT A < rk(A).
(b) For every prime number p find n ∈ N and a matrix A ∈ Mn (Fp ) such that rkp AT A < rkp (A).
Minimize n. Show that n = 2 suffices if p = 2 or p ≡ 1 (mod 4) and n = 3 suffices for all p.
(c) For every prime number p find n ∈ N and a (0, 1)-matrix A such that rkp AT A < rkp (A).
What is the smallest value of n as a function of p you can find?
6.7. COFACTOR EXPANSION 71
Exercise 6.6.9.
Theorem 6.7.2 (Cofactor expansion). Let A = (αij ) be an n × n matrix, and let Cij be the (i, j)
cofactor of A. Then for all i,
n
X
det A = αij Cij . (6.27)
j=1
This is the cofactor expansion of A along the i-th row. Similarly, for all j,
n
X
det A = αij Cij . (6.28)
i=1
Numerical exercise 6.7.3. Compute the determinants of the following matrices by cofactor ex-
pansion (a) along the first row and (b) along the second column. Self-check : your answers should
be the same.
2 3 1
(a) 0 −4 −1
1 −3 4
3 −3 2
(b) 4 7 −1
6 −4 2
72 CHAPTER 6. THE DETERMINANT
1 3 2
(c) 3 −1
0
0 6 5
6 2 −1
(d) 0 4 1
−3 1 1
Exercise 6.7.4. Compute the determinants of the following matrices.
α β β ··· β β
β α β · · · β β
β β α · · · β β
(a) .. .. .. . .
.. ..
. . . . . .
β β β · · · α β
β β β ··· β α
1 1 0 0 ··· 0 0 0
1 1 1 0 · · · 0 0 0
0 1 1 1 · · · 0 0 0
(b) .. .. .. .. . . .. .. ..
. . . . . . . .
0 0 0 0 · · · 1 1 1
0 0 0 0 ··· 0 1 1
1 1 0 0 ··· 0 0 0
−1 1 1 0 · · · 0 0 0
0 −1 1 1 · · · 0 0 0
(c) ..
.. .. .. . . .. .. ..
. . . . . . . .
0 0 0 0 · · · −1 1 1
R
0 0 0 0 · · · 0 −1 1
Exercise 6.7.5. Compute the determinant of the Vandermonde matrix ( Def. 2.5.9 generated
by α1 , . . . , αn .
Definition 6.7.6 (Fixed point). Let Ω be a set, and let f : Ω → Ω be a permutation of Ω. We say
that x ∈ Ω is a fixed point of f if f (x) = x. We say that f is fixed-point-free, or a derangement, if
f has no fixed points.
6.8. DETERMINANTAL FORMULA FOR THE INVERSE MATRIX 73
♥ Exercise 6.7.7. Let Fn denote the number of fixed-point free permutations of the set {1, . . . , n}.
Decide, for each n, whether the majority of Fn is odd or even (the answer will depend on n).
Definition 6.8.2 (Adjugate of a matrix). Let A ∈ Mn (F). Then the adjugate of A, denoted adj(A),
is the matrix whose (i, j) entry is the (j, i) cofactor of A.
Theorem 6.8.3 (Explicit form of the matrix inverse). Let A be a nonsingular n × n matrix. Then
1
A−1 = adj(A) . (6.30)
det A
♦
Corollary 6.8.4. Let A ∈ Mn (Z) be an integral n × n matrix. A has an integral inverse A−1 ∈
Mn (Z) if and only if det(A) = ±1.
Exercise 6.8.5. Let n be odd and let A ∈ Mn (Z) be a nonsingular symmetric matrix whose
diagonal entries are all 0. Show that A−1 is not integral.
Numerical exercise 6.8.6. Compute the inverses of the following matrices. Self-check : multiply
your answer by the original matrix to get the identity.
1 −3
(a)
−2 4
74 CHAPTER 6. THE DETERMINANT
−4 2
(b)
−1 −1
3 4 −7
(c) −2
1 −4
0 −2 5
−1 4 2
(d) −3
2 −3
1 0 2
Proposition 6.9.1. Let A be a nonsingular square matrix. Then the system Ax = b of n linear
equations in n unknowns is solvable and has a unique solution.
Exercise 6.9.2. Prove Prop. 6.9.1 by showing that x = A−1 b is the unique solution.
Theorem 6.9.3 (Cramer’s Rule). Let A be a nonsingular n × n matrix over F, and let b ∈ Fn .
Let a = (α1 , . . . , αn )T = A−1 b denote the unique solution of the system Ax = b of linear equations.
Then
det Ai
αi = (6.31)
det A
where Ai is the matrix obtained by replacing the i-th column of A by b. ♦
Numerical exercise 6.9.4. Use Cramer’s Rule to solve the following systems of linear equations.
Self-check : plug your answers back into the original equations.
(a)
x1 + 2x2 = 3
x1 − x2 = 6
6.10. ADDITIONAL EXERCISES 75
(b)
−x1 + x2 − x3 = 4
2x1 + 3x2 + 4x3 = −2
−x1 − x2 − 3x3 = 3
R
and all diagonal elements of A are zero.
Exercise 6.10.2. Observe that if char(F) 6= 2, in other words, if 1 + 1 6= 0 in F ( Def. 14.4.8),
then the second condition is redundant: if AT = −A then all diagonal elements are automatically
zero. However, this conclusion is false if char(F) = 2; in that case, the condition AT = −A just
means that A is symmetric (AT = A).
Exercise 6.10.3. Let F be a field.
(a) Show that if A ∈ Mn (F) is skew-symmetric and n is odd then A is singular.
R
j=1
parallelogram(v1 , v2 )
:= {αv1 + βv2 | 0 ≤ α, β ≤ 1} . (6.33)
More generally, the parallelepiped spanned by the vectors v1 , . . . , vn ∈ Rn is the set
parallelepiped(v1 , . . . , vn )
( n )
X
:= αi v i 0 ≤ αi ≤ 1 . (6.34)
i=1
Exercise 6.10.9. Let a, b ∈ R2 . Show that the area of the parallelogram spanned by these vectors
(PICTURE) is | det A|, where A = [a | b] ∈ M2 (R).
The following theorem is a generalization of Ex. 6.10.9.
Theorem 6.10.10. If v1 , . . . , vn ∈ Rn , then the volume of parallelepiped(v1 , . . . , vn ) is | det A|,
where A is the n × n matrix whose i-th column is vi . ♦
Notation 6.10.11. Let A ∈ Fk×n and B ∈ Fn×k be matrices, and let I ⊆ [n]. The matrix AI is the
matrix whose columns are the columns of A which correspond to the elements of I. The matrix I B
T
is BIT , i. e., the matrix whose rows are the rows of B which correspond to the elements of I.
−4 1
3 1 7 3 7
Example 6.10.12. Let A = ,B= 6 2 , and let I = {1, 3}. Then AI =
2 3 2 2 2
−1 0
−4 1
and I B = .
−1 0
Theorem 6.10.13 (Cauchy-Binet formula). Let A ∈ Fk×n and let B ∈ Fn×k . Show that
X
det(AB) = det(AI ) det(I B) . (6.35)
I⊆[n]
|I|=k
♦
Chapter 7
Babai: Discover Linear Algebra. 77 This chapter last updated July 19, 2016
c 2016 László Babai.
Chapter 8
Babai: Discover Linear Algebra. 78 This chapter last updated March 26, 2023
c 2016 László Babai.
8.1. EIGENVECTOR AND EIGENVALUE BASICS 79
R
of Ak .
Exercise 8.1.8. Show that the only eigenvalue of a nilpotent matrix is 0 ( Def. 2.2.18).
Proposition 8.1.9. Show that eigenvectors of a matrix corresponding to distinct eigenvalues are
linearly independent.
Exercise 8.1.10. Let A ∈ Mn (F) be a matrix and let v1 , . . . , vk be eigenvectors to distinct eigen-
values, where k ≥ 2. Then v1 + · · · + vk is not an eigenvector.
Definition 8.1.17 (Left eigenvector). Let A ∈ Mn (F). Then x ∈ F1×n is a left eigenvector if x 6= 0
and there exists λ ∈ F such that xA = λx.
Definition 8.1.18 (Left eigenvalue). Let A ∈ Mn (F). Then λ ∈ F is a left eigenvalue of A if there
exists a nonzero row vector v ∈ F1×n such that vA = λv.
Convention 8.1.19. When we use the word “eigenvector” without a modifier, we refer to a right
eigenvector; the term “right eigenvector” is occasionally used for clarity.
Exercise 8.1.20. Prove that the left eigenvalues and the right eigenvalues of a matrix A ∈ Mn (F)
are the same, and if λ is an eigenvalue then its right and left geometric multiplicities are the same.
We shall see that the left eigenvalues and the right eigenvalues are the same.
80 CHAPTER 8. EIGENVECTORS AND EIGENVALUES
Exercise 8.1.21. Let A ∈ Mn (F). Show that if x is a right eigenvector to eigenvalue λ and yT is
a left eigenvector to eigenvalue µ, and λ 6= µ, then y · x = 0, i. e., x ⊥ y.
Uλ := {v ∈ Fn | Av = λv} . (8.1)
We simply write Uλ for Uλ (A) if the matrix A is clear from the context.
Exercise 8.1.23. (a) Let A ∈ Mn (F) and λ ∈ F. Then Uλ = ker(λI − A). In particular, Uλ ≤ Fn
(subspace).
Definition 8.1.24 (Eigenspace and geometric multiplicity). Let A ∈ Mn (F) be a square matrix and
let λ be an eigenvalue of A. Then we call Uλ the eigenspace corresponding to the eigenvalue λ and
we call the dimension of Uλ the geometric multiplicity of λ.
The next exercise provides a method of calculating the geometric multiplicity.
Exercise 8.1.25. Let λ be an eigenvalue of the n × n matrix A. Then the geometric multiplicity
of λ is n − rk(λI − A).
Exercise 8.1.26. We can analogously define left eigenspaces and left geometric multiplicity. Prove:
λ ∈ F is a right eigenvalue if and only if it is a left eigenvalue, and if so, λ has the same right and
left geometric multiplicity.
(c) Find a nonsingular matrix A and a nilpotent matrix N such that A + N is singular.
8.2. SIMILAR MATRICES AND DIAGONALIZABILITY 81
Definition 8.3.1 (Polynomial). A polynomial over the field F in the variable t is an expression1 of
the form
f (t) = α0 + α1 t + α2 t2 + · · · + αn tn (8.2)
We omit (t) from the name of the polynomial if the name of the variable is either not relevant to
our discussion or it is clear from the context.
The αi are the coefficients of f . We may omit any terms with zero coefficient, e. g.,
Definition 8.3.11 (Root of a polynomial). Let f ∈ F[t] be a polynomial. Then ζ ∈ F[t] is a root of
f if f (ζ) = 0. The roots of a polynomial are also often (confusingly) referred to as the zeros of the
polynomial.
Definition 8.3.13 (Multiplicity of a root). Let f be a polynomial and let ζ be a root of f . The
multiplicity of the root ζ is the largest k for which (t − ζ)k | f .
Proposition 8.3.14. Let f be a polynomial of degree n. Then f has at most n distinct roots.
Moreover, the sum of the multiplicities of the roots is still at most n.
Definition 8.3.15. Let f ∈ F[t] be a nonzero polynomial. We say that f splits into linear factors
over F if f can be written in the form
k
Y
f (t) = αk (t − ζi ) (8.4)
i=1
Exercise 8.3.16. This factoring, if exists, is unique up to the ordering of the terms.
Next we study the relation between the roots and the coefficients of a polynomial that splits
into linear factors.
Then, for 0 ≤ ` ≤ n, X
αn−` = (−1)` λi1 λi2 · · · λi` . (8.6)
i1 <···<i`
84 CHAPTER 8. EIGENVECTORS AND EIGENVALUES
In particular,
n
X
αn−1 = − λi , and (8.7)
i=1
n
Y
n
α0 = (−1) λi . (8.8)
i=1
Definition 8.3.18 (Algebraically closed field). We say that the field F is algebraically closed if every
polynomial of degree ≥ 1 over F has at least one root in F.
The field of real numbers is not algebraically closed: the polynomial t2 + 1 has no real roots.
Theorem 8.3.19 (Fundamental Theorem of Algebra). The field of complex numbers is algebraically
closed. ♦
Proposition 8.3.20. Let F be an algebraically closed field. Let f ∈ F[t] be a nonzero polynomial.
Then f splits into linear factors over F.
Exercise 8.3.21. If F is algebraically closed and f ∈ F[t] is a nonzero polynomial then the sum of
the multiplities of the roots of f is deg f .
To exploit all the consequences of Prop. 8.3.20, it is often helpful to extend our field to an
algebraically closed field.
Exercise 8.4.3. Observe that the characteristic polynomial of an n×n matrix is a monic polynomial
of degree n.
Theorem 8.4.4. The eigenvalues of a square matrix A are precisely the roots of its characteristic
R
polynomial. ♦
In Section 8.1, we defined the geometric multiplicity of an eigenvalue ( Def. 8.1.24). We
now define the algebraic multiplicity of an eigenvalue.
Definition 8.4.5 (Algebraic multiplicity). Let A be a square matrix and let λ be an eigenvalue of
A. The algebraic multiplicity of λ is its multiplicity as a root of the characteristic polynomial of A.
Proposition 8.4.6. Let A be a square matrix and let λ be an eigenvalue of A. Then the geometric
multiplicity of λ is less than or equal to the algebraic multiplicity of λ.
Remark 8.4.7. We shall often make the assumption that the characteristic polynomial of the matrix
A ∈ Mn (F) splits into linear factors over F. We should point out that this condition is automatically
satisfied if
(i) A is triangular, or
♦
Proposition 8.4.13. Let F be algebraically closed. Let A be a square matrix over F. Then A
is diagonalizable over F if and only if for every eigenvalue λ of A, the geometric and algebraic
multiplicities of λ are equal.
R
8.5 The Cayley-Hamilton Theorem
In Section 2.3, we defined how to substitute a square matrix into a polynomial ( Def. 2.3.3).
We repeat that definition here.
R
Definition 8.5.1 (Substitution of a matrix into a polynomial). Let f ∈ F[t] be the polynomial
( Def. 8.3.1) defined by
f = α0 + α1 t + · · · + αd td .
Just as we may substitute ζ ∈ F for the variable t in f to obtain a value f (ζ) ∈ F, we may also
“plug in” the matrix A ∈ Mn (F) to obtain f (A) ∈ Mn (F). The only thing we have to be careful
about is what we do with the constant term α0 ; we replace it with α0 times the identity matrix, so
f (A) := α0 I + α1 A + · · · + αd Ad . (8.14)
8.5. THE CAYLEY-HAMILTON THEOREM 87
Exercise 8.5.2. Let A and B be similar matrices, and let f be a polynomial. Show that f (A) ∼
f (B).
The main result of this section is the following theorem.
Theorem 8.5.3 (Cayley-Hamilton Theorem). Let A be an n × n matrix. Then fA (A) = 0.
Exercise 8.5.4. What is wrong with the following “proof” of the Cayley-Hamilton Theorem?
Exercise 8.5.5. Use the Cayley–Hamilton Theorem and Theorem 8.3.22 to prove that an n × n
matrix is nilpotent if and only if its characteristic polynomial is tn .
Exercise 8.5.6. Prove the Cayley–Hamilton Theorem by brute force for 2 × 2 matrices.
Our proof strategy We shall proceed in three phases.
(a) Prove the Cayley–Hamilton Theorem for diagonalizable matrices.
(b) Prove the Cayley–Hamilton Theorem over C by a continuity argument, observing that the
diagonalizable matrices are dense within Mn (C).
(c) We invoke the Identity Principle to transfer the result from matrices over C to matrices over
any commutative unital ring (ring with identity).
R
We will first prove the Cayley–Hamilton Theorem for diagonal matrices, and then more generally
for diagonalizable matrices ( Def. 8.2.6).
Exercise 8.5.7. Let D = diag(λ1 , . . . , λn ) be an n × n diagonal matrix over F and let f ∈ F[t].
Then f (D) = diag(f (λ1 ), . . . , f (λn )) .
As an immediate consequence, we can prove the Cayley–Hamilton Theorem for diagonal matri-
ces.
Exercise 8.5.8. Let D be a diagonal matrix, and let fD be its characteristic polynomial. Then
fD (D) = 0.
Proposition 8.5.9. Let A ∼ B, so B = S −1 AS. Then
tI − B = S −1 (tI − A) S . (8.16)
88 CHAPTER 8. EIGENVECTORS AND EIGENVALUES
Theorem 8.5.10. If A ∼ B, then the characteristic polynomials of A and B are equal, i. e.,
fA = fB . ♦
Proposition 8.5.11. Let g ∈ F[t], and let A, S ∈ Mn (F) with S nonsingular. Then g (S −1 AS) =
S −1 g(A)S.
Corollary 8.5.12. Let g ∈ F[t] and let A, B ∈ Mn (F) with A ∼ B. Then g(A) ∼ g(B).
Putting all the above together, we obtain the Cayley–Hamilton Theorem for diagonalizable
matrices over any field.
Proposition 8.5.13. The Cayley-Hamilton Theorem holds for diagonalizable matrices.
Let us now focus on matrices over C. All norms being equivalent, we use the simplest one:
elementwise max-norm.
Definition 8.5.14. For A = (aij ) ∈ Cr×s , let kAkmax = maxi,j |aij |. For A, B ∈ Cr×s , we write
ρ(A, B) = kA − B| max. For a sequence {Ak } of matrices in Cr×s , let us say that limk→∞ Ak = B
if limk→∞ ρ(Ak , B) = 0.
Analogousy, for polynomials f (t) = ni=0 αn tn we use the norm kf kmax = maxi |αi | and the corre-
P
sponding distance measure.
Proposition 8.5.15. The diagonalizable matrices are dense in Mn (C). In other words, if B ∈
Mn (C) then for any > 0 there exists a diagonalizabe A ∈ Mn (C) such that ρ(A, B) ≤ .
Hint. First prove this for triangular matrices, using Ex. 8.4.10. Then combine this with Theo-
rem 8.4.9.
The next observation follows by continuity.
Proposition 8.5.16. Let {Ak } be sequence of n × n matrices and {fk } a sequence of monic
polynomials of degree n over C such that fk → g and Ak → B. Then fk (Ak ) → g(B).
Our final observation is that if Ak → B then we have the corresponding limit relation for
their characteristic polynomials: fAk → fB . So to prove the Cayley–Hamilton Theorem for an
arbitrary B ∈ Mn (C), we take a sequence of diagonalizable matrices Ak approaching B; and then
0 = fAk (Ak ) → fB (B). So we completed the proof of the Cayley–Hamilton Theorem for matrices
over C.
Exercise 8.5.17. The ring R = Z[x1 , . . . , xm ] of multivariate polynomials over Z is isomorphic to
a subring of C.
8.6. ADDITIONAL EXERCISES 89
So in particular we proved the Cayley–Hamilton Theorem over this ring R. The following
observation provides the final step of the proof of the Cayley–Hamilton Theorem over any field and
in fact over any commutative unital ring (ring with identity).
Exercise 8.5.18 (Identity principle). Let T be a commutative unital ring with m generators. Then
T is a quotient ring of R = Z[x1 , . . . , xm ]. Therefore any polynomial identity that holds over R also
holds over T .
So we have proved the Cayley–Hamilton Theorem over T , noting that this theorem asserts a set
of polynomial identities.
where αn = 1. Then the companion matrix of f is the matrix C(f ) ∈ Mn (F) defined as
0 0 0 · · · 0 −α0
1
0 0 · · · 0 −α1
0 1 0 · · · 0 −α2
C(f ) := .. .. . (8.18)
.. .. . . ..
. . . . . .
0 0 0 · · · 0 −αn−2
0 0 0 · · · 1 −αn−1
Proposition 8.6.3. Let f be a monic polynomial and let A = C(f ) be its companion matrix.
Then the characteristic polynomial of A is equal to f .
Corollary 8.6.4. Every monic polynomial f ∈ Q[t] is the characteristic polynomial of a rational
matrix.
Exercise 8.6.5. Determine the eigenvalues and their (geometric and algebraic) multiplicities of the
all-ones matrix Jn over R.
90 CHAPTER 8. EIGENVECTORS AND EIGENVALUES
Exercise 8.6.6. (a) Let n be odd. Then every matrix A ∈ Mn (R) has a real eigenvector.
(b) Let n be even. Find a matrix B ∈ Mn (R) that has no real eigenvector.
Prove that f (A) converges if |λi | < r for all eigenvalues λi of A. In particular, eA always converges.
Exercise 8.6.8.
R
9.1 Orthogonal matrices
vectors (R R
In Section 1.4, we defined the standard dot product ( Def. 1.4.1) and the notions of orthogonal
Def. 1.4.8). In this chapter we study orthogonal matrices, matrices whose columns
form an orthonormal basis ( Def. 1.5.6) of Rn .
Definition 9.1.1 (Orthogonal matrix). The matrix A ∈ Mn (R) is orthogonal if AT A = I. The set
of orthogonal n × n matrices is denoted by O(n).
R
Fact 9.1.2. A ∈ Mn (R) is orthogonal if and only if its columns form an orthonormal basis of Rn .
Proposition 9.1.3. O(n) is a group ( Def. 14.3.2) under matrix multiplication (it is called the
orthogonal group).
Exercise 9.1.4. Which diagonal matrices are orthogonal?
Theorem 9.1.5 (Third Miracle of Linear Algebra). Let A ∈ Mn (R). Then the columns of A are
orthonormal if and only if the rows of A are orthonormal. ♦
Proposition 9.1.6. Let A ∈ O(n). Then all eigenvalues of A have absolute value 1.
Exercise 9.1.7. The matrix A ∈ Mn (R) is orthogonal if and only if A preserves the dot product,
i. e., for all v, w ∈ Rn , we have (Av)T (Aw) = vT w.
Exercise 9.1.8. The matrix A ∈ Mn (R) is orthogonal if and only if A preserves the norm, i. e., for
all v ∈ Rn , we have kAvk = kvk.
Babai: Discover Linear Algebra. 91 This chapter last updated August 4, 2016
c 2016 László Babai.
92 CHAPTER 9. ORTHOGONAL MATRICES
Definition 9.1.9 (Hadamard matrix). The matrix A = (αij ) ∈ Mn (R) is an Hadamard matrix if
αij = ±1 for all i, j, and the columns of A are orthogonal. We denote by H the set
Proposition 9.2.5. Let A ∈ Mn (R). Then A ∈ O(n) if and only if it is orthogonally similar to a
matrix which is the diagonal sum of some of the following: an identity matrix, a negative identity
matrix, 2 × 2 rotation matrices (compare with Prop. 16.4.44).
Examples 9.2.6. The following are examples of the matrices described in the preceding proposition.
−1 0 0
1 1
(a) 0 √2 − √2
0 √12 √12
1 0 0 0 0 0 0 0 0
0 −1 0 0 0 0 0 0 0
0 0 −1 0 0 0 0
0 0
0 0 −1 0 0
0 0 0 0
(b) 0 0
0 1 0 0 0 0 0
0 0 0 0 −1
0 0 0 0
0 0 0 0 0 1 0 0 0
√
3 1
0 0 0 0 0 0 0 2
−
√2
1 3
0 0 0 0 0 0 0 2 2
Chapter 10
Theorem 10.1.1 (The Spectral Theorem for real symmetric matrices). Let A ∈ Mn (R) be a real
R
symmetric matrix. Then A has an orthonormal eigenbasis. ♦
The Spectral Theorem can be restated in terms of orthogonal similarity ( Def. 9.2.1).
Theorem 10.1.2 (The Spectral Theorem for real symmetric matrices, restated). Let A ∈ Mn (R)
be a real symmetric matrix. Then A is orthogonally similar to a diagonal matrix. ♦
Exercise 10.1.3. Verify that these two formulations of the Spectral Theorem are equivalent.
Corollary 10.1.5. Let A be a real symmetric matrix. Then all of the eigenvalues of A are real.
Babai: Discover Linear Algebra. 94 This chapter last updated August 4, 2016
c 2016 László Babai.
10.2. APPLICATIONS OF THE SPECTRAL THEOREM 95
Proposition 10.2.1. If two symmetric matrices are similar then they are orthogonally similar.
Exercise 10.2.2. Let A be a symmetric real n × n matrix, and let v ∈ Rn . Let b = (b1 , . . . , bn)
be an orthonormal eigenbasis of A. Express vT Av in terms of the eigenvalues and the coordinates
of v with respect to b.
Definition 10.2.3 (Positive definite matrix). An n × n real matrix A ∈ Mn (R) is positive definite if
for all x ∈ Rn (x 6= 0), we have xT Ax > 0.
Proposition 10.2.4. Let A ∈ Mn (R) be a real symmetric n × n matrix. Then A is positive definite
if and only if all eigenvalues of A are positive.
R
RA (v) = . (10.1)
kvk2
Recall that kvk2 = vT v ( Def. 1.5.2).
Proposition 10.2.6 (Rayleigh’s Principle). Let A be an n × n real symmetric matrix with eigen-
values λ1 ≥ · · · ≥ λn . Then
♦
96 CHAPTER 10. THE SPECTRAL THEOREM
λ1 ≥ µ1 ≥ λ2 ≥ µ2 ≥ · · · ≥ λn−1 ≥ µn−1 ≥ λn .
♦
Chapter 11
Definition 11.1.3 (Dual space). The set of linear forms f : Fn → F is called the dual space of Fn
and is denoted (Fn )∗ .
is a linear form.
Theorem 11.1.5 (Representation Theorem for Linear Forms). Every linear form f : Fn → F has
the form (11.1) for some column vector a ∈ Fn . ♦
Babai: Discover Linear Algebra. 97 This chapter last updated April 15, 2023
c 2016 László Babai.
98 CHAPTER 11. BILINEAR AND QUADRATIC FORMS
Definition 11.1.6 (Bilinear form). A bilinear form is a function f : Fn × Fn → F with the following
properties.
(a) f (x1 + x2 , y) = f (x1 , y) + f (x2 , y)
is a bilinear form. More generally, for any matrix A ∈ Mn (F), the expression
n X
X n
T
f (x, y) = x Ay = αij xi yj (11.4)
i=1 j=1
is a bilinear form.
Theorem 11.1.10 (Representation Theorem for bilinear forms). Every bilinear form f has the
form (11.4) for some matrix A. ♦
Definition 11.1.11 (Nonsingular bilinear form). We say that the bilinear form f (x, y) = xT Ay is
nonsingular if the matrix A is nonsingular.
11.2. MULTIVARIATE POLYNOMIALS 99
R
Exercise 11.1.13.
(a) Assume F does not have characteristic 2, i. e., 1 + 1 6= 0 in F ( Section 14.4 for more about
the characteristic of a field). Prove: if n is odd and the bilinear form f : Fn × Fn → F satisfies
f (x, x) = 0 for all x ∈ Fn , then f is singular.
(b) Over every field F, find a nonsingular bilinear form f over F2 such that f (x, x) = 0 for all
x ∈ F2 .
for some nonzero scalar α and exponents ki . The degree of this monomial is
n
X
deg f := ki . (11.5)
i=1
Examples 11.2.2. The following expressions are multivariate monomials of degree 4 in the variables
x1 , . . . , x6 : x21 x2 x3 , 5x4 x35 , −3x46 .
n
Y
Definition 11.2.4 (Monic monomial). We call the monomials of the form xki i monic. We define
i=1
n
Y n
Y
the monic part of the monomial f = α xki i to be the monomial xki i .
i=1 i=1
Examples 11.2.6. The following are multivariate polynomials in the variables x1 , . . . , x7 (in stan-
dard form).
(c) 0
Note that by our convention, these rules remain valid if f or g is the zero polynomial.
Definition 11.2.9 (Homogeneous multivariate polynomial). The multivariate polynomial f is a ho-
mogeneous polynomial of degree k if every monomial in the standard form expression of f has degree
k.
Fact 11.2.11.
1
This is in accordance with the natural convention that the maximum of the empty list is −∞.
11.3. QUADRATIC FORMS 101
R
Exercise 11.2.12. For all n and k, count the monic monomials of degree k in the variables
x1 , . . . , xn . Note that this is the dimension (
mials of degree k.
Def. 15.3.6) of the space of homogeneous polyno-
Fact 11.2.14. Linear forms are exactly the homogeneous polynomials of degree 1.
degree 2 (R
In the next section we explore quadratic forms, which are the homogeneous polynomials of
(11.6)).
Proposition 11.3.3. For all A ∈ Mn (R), there is a unique B ∈ Mn (R) such that B is a symmetric
matrix and QA = QB , i. e., for all x ∈ Rn , we have
xT Ax = xT Bx .
102 CHAPTER 11. BILINEAR AND QUADRATIC FORMS
where x = (x1 , x2 )T .
(e) Q is indefinite if it is neither positive semidefinite nor negative semidefinite, i. e., there exist
x, y ∈ Rn such that Q(x) > 0 and Q(y) < 0.
Definition 11.3.6. We say that a matrix A ∈ Mn (R) is positive definite if it is symmetric and its
associated quadratic form QA is positive definite. Positive semidefinite, negative definite, negative
semidefinite, and indefinite symmetric matrices are defined analogously.
Notice that we shall not call a non-symmetric matrix A positive definite, etc., even if the
quadratic form QA is positive definite, etc.
Exercise 11.3.7. Categorize diagonal matrices according to their definiteness, i. e., tell, which
diagonal matrices are positive definite, etc.
Exercise 11.3.8. Let A ∈ Mn (R). Assume QA is positive definite and λ is a real eigenvalue of A.
Prove λ > 0. Note that A is not necessarily symmetric.
Corollary 11.3.9. If A is a positive definite (and therefore symmetric by definition) matrix and λ
is an eigenvalue of A then λ > 0.
(e) QA is indefinite if there exist i and j such that λi > 0 and λj < 0 for all i.
Exercise 11.3.12. Show that if A and B are symmetric n × n matrices and A ∼ B and A is
positive definite, then so is B.
Definition 11.3.13 (Corner matrix). Let A = (αij )ni,j=1 ∈ Mn (R) and define for k = 1, . . . , n, the
corner matrix Ak := (αij )ki,j=1 to be the k × k submatrix of A, obtained by taking the intersection
of the first k rows of A with the first k columns of A. In particular, An = A. The k-th corner
determinant of A is det Ak .
(a) If A is positive definite then all of its corner matrices are positive definite.
(b) If A is positive definite then all of its corner determinants are positive.
The next theorem says that (b) is actually a necessary and sufficient condition for positive
definiteness.
Theorem 11.3.15. Let A ∈ Mn (R) be a symmetric matrix. A is positive definite if and only if all
of its corner determinants are positive. ♦
Exercise 11.3.16. Show that the following statement is false for every n ≥ 2: The n×n symmetric
matrix A is positive semidefinite if and only if all corner determinants are nonnegative.
Exercise 11.3.17. Show that the statement of Ex. 11.3.16 remains false for n ≥ 3 if in addition
we require all diagonal elements to be positive.
∗ ∗ ∗
In the next sequence of exercises we study the definiteness of quadratic forms QA where A is
not necessarily symmetric.
104 CHAPTER 11. BILINEAR AND QUADRATIC FORMS
Exercise 11.3.18. Let A ∈ Mn (R) be an n × n matrix such that QA is positive definite. Show that
A need not have all of its corner determinants positive. Give a 2 × 2 counterexample. Contrast this
with part (b) of Ex. 11.3.14.
Proposition 11.3.19. Let A ∈ Mn (R). If QA is positive definite, then so is QAT ; analogous re-
sults hold for positive semidefinite, negative definite, negative semidefinite, and indefinite quadratic
forms.
Exercise 11.3.20. Let A, B, C ∈ Mn (R). Assume B = C T AC. Prove: if QA is positive semidefinite
then so is QB .
R
Exercise 11.3.21. Let A, B ∈ Mn (R). Show that if A ∼o B (A and B are orthogonally similar,
Def. 9.2.1) and QA is positive definite then QB is positive definite.
α β
Exercise 11.3.22. Let A = . Prove that QA is positive definite if and only if α, δ > 0 and
γ δ
(β + γ)2 < 4αδ.
Exercise 11.3.23. Let A, B ∈ Mn (R) be matrices that are not necessarily symmetric.
(a) Prove that if A ∼ B and QA is positive definite, then QB cannot be negative definite.
(b) Find 2 × 2 matrices A and B such that A ∼ B and QA is positive definite but QB is indefinite.
(Contrast this with Ex. 11.3.12.)
S ⊥ := {v ∈ Fn | v ⊥ S}. (11.7)
11.4. GEOMETRIC ALGEBRA 105
∗ ∗ ∗
In particular, if U ≤ Fn then
⊥
U⊥ =U . (11.10)
Corollary 11.4.15 (to Theorem 11.4.9). If U ≤ Fn is a totally isotropic subspace then dim U ≤ b n2 c.
Exercise 11.4.16. For even n, find an n2 -dimensional totally isotropic subspace in Cn , Fn5 , and Fn2 .
Exercise 11.4.17. Let F be a field and let k ≥ 2. Consider the following statement.
Prove:
Exercise 11.4.18. Prove: if Stm(F, 2) is true then every maximal totally isotropic subspace of Fn
has dimension b n2 c. In particular, this conclusion holds for F = F2 and for F = C.
Exercise 11.4.19. Prove: for all finite fields F, every maximal totally isotropic subspace of Fn has
dimension ≥ n2 − 1.
Chapter 12
Complex Matrices
z1 · z2 = z1 · z2 . (12.2)
Definition 12.1.6 (Magnitude√ of a complex number). Let z ∈ C. Then the magnitude, norm, or
absolute value of z is |z| = zz. If |z| = 1, then z is said to have unit norm.
Babai: Discover Linear Algebra. 107 This chapter last updated August 9, 2016
c 2016 László Babai.
108 CHAPTER 12. COMPLEX MATRICES
Proposition 12.1.7. Let z ∈ C have unit norm. Then z can be expressed in the form
(A + B)∗ = A∗ + B ∗ . (12.4)
Exercise 12.2.3. Let A ∈ Ck×n and let B ∈ Cn×` . Show that (AB)∗ = B ∗ A∗ .
R
Fact 12.2.4. Let λ ∈ C and A ∈ Ck×n . Then (λA)∗ = λA∗ .
In Section 1.4, we defined the standard dot product ( Def. 1.4.1). We now define the
standard Hermitian dot product for vectors in Cn .
Definition 12.2.5 (Standard Hermitian dot product). Let v, w ∈ Cn . Then the Hermitian dot
product of v with w is
Xn
∗
v · w := v w = αi βi (12.5)
i=1
T T
where v = (α1 , . . . , αn ) and w = (β1 , . . . , βn ) .
In particular, observe that v∗ v is real and positive for all v 6= 0. The following pair of exercises
show some of the things that would go wrong if we did not conjugate.
Exercise 12.2.6. Find a nonzero vector v ∈ Cn such that vT v = 0.
12.2. HERMITIAN DOT PRODUCT IN CN 109
This norm is also referred to as the (complex) Euclidean norm or the `2 norm.
Theorem 12.3.3. Let A be a Hermitian matrix. Then all eigenvalues of A are real. ♦
Spectral Theorem ( R Theorem 19.4.4) given in Section 19.4 is the following lemma ( R
Exercise 12.3.4 (Alternative proof of the real Spectral Theorem). The key part of the proof of the
Lemma
19.4.7): Let A be a symmetric real matrix. Then A has an eigenvector. Derive this lemma from
R
Theorem 12.3.3.
Recall that a square matrix A is orthogonal ( Def. 9.1.1) if AT A = I. Unitary matrices are
the complex generalization of orthogonal matrices.
Definition 12.3.5 (Unitary matrix). The matrix A ∈ Mn (C) is unitary if A∗ A = I. The set of
unitary n × n matrices is denoted by U (n).
Fact 12.3.6. A ∈ Mn (C) is unitary if and only if its columns form an orthonormal basis of Cn .
Proposition 12.3.7. U (n) is a group under matrix multiplication (it is called the unitary group).
Theorem 12.3.9 (Third Miracle of Linear Algebra). Let A ∈ Mn (C). Then the columns of A are
orthonormal if and only if the rows of A are orthonormal. ♦
Proposition 12.3.10. Let A ∈ U (n). Then all eigenvalues of A have absolute value 1.
Exercise 12.3.11. The matrix A ∈ Mn (C) is unitary if and only if A preserves the Hermitian dot
product, i. e., for all v, w ∈ Cn , we have (Av)∗ (Aw) = v∗ w.
Exercise∗ 12.3.12. The matrix A ∈ Mn (C) is unitary if and only if A preserves the norm, i. e., for
R
all v ∈ Cn , we have kAvk = kvk.
R
Warning. The proof of this is trickier than in the real case ( Ex. 9.1.8).
Exercise 12.3.13. Find an n × n unitary circulant matrix ( Def. 2.5.12) with no zero entries.
12.4. NORMAL MATRICES AND UNITARY SIMILARITY 111
Definition 12.4.3 (Unitary similarity). Let A, B ∈ Mn (C). We say that A is unitarily similar to B,
denoted A ∼u B, if there exists a unitary matrix U such that A = U −1 BU .
Note that U −1 BU = U ∗ BU because U is unitary.
(a) A is normal;
R
(
We note that all of these implications are actually “if and only if,” as we shall demonstrate
Theorem 12.4.14).
112 CHAPTER 12. COMPLEX MATRICES
Proposition 12.4.6. A ∈ Mn (C) has an orthonormal eigenbasis if and only if A is unitarily similar
to a diagonal matrix.
We now show (Cor. 12.4.8) that this condition is equivalent to A being normal.
Lemma 12.4.7. Let A, B ∈ Mn (C) with A ∼u B. Then A is normal if and only if B is normal. ♦
Corollary 12.4.8. If A is unitarily similar to a diagonal matrix, then A is normal.
Unitary similarity is a powerful tool to study matrices, owing largely to the following theorem.
Theorem 12.4.9 (Schur). Every matrix A ∈ Mn (C) is unitarily similar to a triangular matrix.
♦
Definition 12.4.10 (Dense subset). A subset S ⊆ Fk×n is dense in Fk×n if for every A ∈ S and every
ε > 0, there exists B ∈ S such that every entry of the matrix A − B has absolute value less than ε.
Proposition 12.4.11. Diagonalizable matrices are dense in Mn (C).
Exercise 12.4.12. Complete the proof of the Cayley-Hamilton Theorem over C (
8.5.3) using Prop. 12.4.11.
R Theorem
Theorem 12.4.18 (Real version of Schur’s Theorem). If A ∈ Mn (R) and all eigenvalues of A are
real, then A is orthogonally similar to an upper triangular matrix. ♦
kAxk
kAk = maxn (13.1)
x∈R kxk
x6=0
where the k · k notation on the right-hand side represents the Euclidean norm.
Proposition 13.1.5 (Submultiplicativity). Let A ∈ Rk×n and B ∈ Rn×` . Then kABk ≤ kAk · kBk.
Exercise 13.1.6. Let A = [a1 | · · · | an ] ∈ Rk×n . (The ai are the columns of A.) Show kAk ≥ kai k
for every i.
Exercise 13.1.7. Let A = (αij ) ∈ Rk×n . Show that kAk ≥ |αij | for every i and j.
Babai: Discover Linear Algebra. 113 This chapter last updated August 29, 2016
c 2016 László Babai.
114 CHAPTER 13. MATRIX NORMS
Proposition 13.1.9. Let A ∈ Rk×n . Let S ∈ O(k) and T ∈ O(n) be orthogonal matrices. Then
R
kAk = max |λi |.
Exercise 13.1.11. Let A ∈ Rk×n . Show that AT A is positive semidefinite ( Def. 11.3.6).
k×n
√ 13.1.12. Let A ∈ R
Exercise and let AT A have eigenvalues λ1 ≥ · · · ≥ λn . Show that
kAk = λ1 .
Proposition 13.1.13. For all A ∈ Rk×n , we have kAT k = kAk.
R
Exercise 13.1.14.
(a) Find a stochastic matrix ( Def. 22.1.2) of norm greater than 1.
√
(b) Find an n × n stochastic matrix of norm n.
√
(c) Show that an n × n stochastic matrix cannot have norm greater than n.
Numerical exercise 13.1.15.
1 1
(a) Let A = . Calculate kAk.
0 1
0 1
(b) Let B = . Calculate kBk.
1 1
p
Proposition 13.2.2. Let A ∈ Rk×n . Then kAkF = Tr(AT A).
13.3. COMPLEX MATRICES 115
Proposition 13.2.4 (Triangle inequality). Let A, B ∈ Rk×n . Then kA + BkF ≤ kAkF = kBkF .
Proposition 13.2.5 (Submultiplicativity). Let A ∈ Rk×n and B ∈ Rn×` . Then kABkF ≤ kAkF ·
kBkF .
√
Proposition 13.2.6. If A ∈ O(k) then kAkF = k.
Proposition 13.2.7. Let A ∈ Rk×n . Let S ∈ O(k) and T ∈ O(n) be orthogonal matrices. Then
R
Exercise 13.2.10. Prove kAk = kAkF if and only if rk A = 1. Use the Singular Value Decompo-
sition ( Theorem 21.1.2).
R
√
Exercise 13.2.11. Let A be a symmetric real matrix. Show that kAkF = nkAk if and only if
A = λR for some reflection matrix ( ??) R.
Exercise 13.3.2. Generalize the definition of the Frobenius norm and statements 13.2.2-13.2.10 to
C.
√
Exercise 13.3.3. Let A be a normal matrix with eigenvalues λ1 , . . . , λn . Show that kAkF = nkAk
if and only if |λ1 | = · · · = |λn |.
116
Introduction to Part II
TO BE WRITTEN.
Babai: Discover Linear Algebra. 117 This chapter last updated August 21, 2016
c 2016 László Babai.
Chapter 14
Algebra
Babai: Discover Linear Algebra. 118 This chapter last updated March 29, 2024
c 2016 László Babai.
14.1. BASIC CONCEPTS OF ARITHMETIC 119
Notice that (a, b) in the above definition is an ordered pair. In particular, if a1 , a2 ∈ A and
a1 6= a2 then (a1 , a2 ) and (a2 , a1 ) are distinct elements of A × A.
Notation 14.1.6 (Cardinality). For a set A we denote the cardinality of A (the number of elements
of A) by |A|.
For instance, |{4, 5, 4, 6, 6, 4}| = 3.
This following fact is not a theorem, but rather a definition, namely, the definition of multipli-
cation of non-negative integers.
Exercise 14.1.8. Let A and B be finite sets of integers of respective sizes |A| = n and |B| = m.
Show that
(a) |A + B| ≤ mn
(b) if mn 6= 0 then |A + B| ≥ m + n − 1
(c) |A + A| ≤ n+1
2
(d) |A + A + A| ≤ n+2
3
Prove that each of these inequalities is tight for all values of n, m, and k, i. e., for all m, n, k there
exist sets A, B for which equality holds.
Theorem 14.1.9 (Division Theorem). Let a, b ∈ Z, b 6= 0. Then there exist q, r ∈ Z such that
a = qb + r (14.5)
Exercise 14.1.10. Fix the value b. Prove the Division Theorem for every a ≥ 0 by induction on
a. Then reduce the case of negative a to the case of positive a.
120 CHAPTER 14. ALGEBRA
14.1.2 Divisibility
Divisibility is the central concept of arithmetic.
Definition 14.1.16 (Divisibility). Let a, b ∈ Z. We say that a divides b (notation: a | b) if there
exists x ∈ Z such that b = ax. The same circumstance is also expressed by the phrases “a is a
divisor of b” and “b is a multiple of a.”
In the following sequence of exercises, we are building on basic identities of arithmetic (commu-
tativity, associativity, distributivity).
For item (b), point out, what basic arithmetic identity you are using.
Remark 14.1.18. Note that in particular, 0 | 0. Why does this not violate the rock-hard prohibition
against division by zero?
(a) a | −a
(b) a | 0
(c) 1 | a and −1 | a
(b) if for a fixed value a, the relation x | a holds for all x ∈ Z then a = 0
(c) if for a fixed value a, the relation a | y holds for all y ∈ Z then a = ±1
Definition 14.1.25. Let S ⊆ Z be a set of integers and e ∈ S. We say that e is a common multiple
of S if (∀a ∈ S)(a | e).
We write Mult(S) to denote the set of common multiples of S. If S is explicitly listed as S =
{a1 , a2 , . . . } then we write Mult(a1 , a2 , . . . ) for Mult({a1 , a2 , . . . }) (we omit the braces).
• Mult(−4) = 4Z,
• Mult(7Z) = 7Z.
Exercise 14.1.36 (elimination of repetitions). Let a1 , . . . , an ∈ Z. Some items on this list may be
repeated; let S be a maximal repetition-free sublist (i. e., every element of the list appears exactly
once in S). The set of integer linear combinations of the ai is the same as the set of integer linear
combinations of S.
Since the order of the ai does not matter, we can view S as a set, rather than an (ordered) list of
elements.
Definition 14.1.37. If S is an infinite subset of Z then by the integer linear combinations of S we
mean the integer linear combinations of the finite subsets of S.
Exercise 14.1.38. 0 is an integer linear combination of any set of integers (even of the empty set,
because an empty sum of zero by definition).
Theorem 14.1.39 (Existence of gcd and Bezout’s Lemma). Let S ⊆ Z. Then a gr.c.div. of S
exists and can be written as an integer linear combination of S.
The second part of this statement (about integer linear combinations) is usually called Bezout’s
Lemma.
Examples: −6 is a gr.c.div. of 12 and 90. It can be written as −6 = (−8) · 12 + 1 · 90. The gcd
of 21, 30, and 35 is 1. It can be written as 1 = 21 − 3 · 30 + 2 · 35.
The proof is immediate from Theorem 14.1.14 and the following two observations.
Exercise 14.1.40. Let S ⊆ Z. Then the set of integer linear combinations of S is a subgroup of Z.
Exercise 14.1.41. Let S ⊆ Z. Assume the set of integer linear combinations of S is the cyclic
subgroup dZ. Then d is a gr.c.div. of S.
Note that this completes the proof of Theorem 14.1.39.
The following lemma, a critical step towards a proof of the Fundamental Theorem of Arithmetic
(uniqueness of prime factorization) is an immediate consequence of Bezout’s Lemma.
Exercise 14.1.42. Let S ⊆ Z and k ∈ Z. Then gcd(kS) = |k| gcd(S).
Definition 14.1.43. We say that the integers a, b are relatively prime if gcd(a, b) = 1.
Exercise 14.1.44. Let a, b, c ∈ Z. If c | ab and b and c are relatively prime then c | a.
The trick is, we need to prove this without using the Fundamental Theorem of Arithmetic: this
exercise will be key to proving the FTA.
Hint. Multiply the equation gcd(b, c) = 1 by a and use Ex. 14.1.42.
14.1. BASIC CONCEPTS OF ARITHMETIC 125
Exercise 14.1.46. Show that p ∈ Z is a prime number if and only if p ≥ 0 and | Div(p)| = 4.
Exercise 14.1.48. Every positive integer n can be written as a product of prime numbers.
Hint. Induction on n. The base case: n = 1 which is the product of the empty list of primes.
(The product of an empty list of numbers is 1 by definition.)
The FTA states the uniqueness of this factorization.
Theorem 14.1.49 (Fundamental Theorem of Arithmetic). Prime factorization isQunique upQto the
order of the factors. More precisely, let n be a positive integer and assume n = ki=1 pi = `j=1 qj
where the pi and the qj are prime numbers. Then k = ` and there is a permutation σ : [k] → [k]
such that (∀i ∈ [k])(qi = pσ(i) ).
This result appears in Euclid’s Elements (cca. 300 BCE). The proof we give here is essentially
Euclid’s, in modern language. We have already made the bulk of preparations for the proof.
Definition 14.1.50. Let n ∈ Z. We say that n has the prime property if n ∈
/ {±1} and
Exercise 14.1.51. (a) Composite numbers and their negatives do not have the prime property.
Exercise 14.1.52 (Euclid’s Lemma). All prime numbers have the prime property.
This result is the key lemma toward proving the FTA. It follows immediately from Ex. 14.1.44.
Miscellaneous exercises
(a) a ≡ b (mod 1)
(b) a ≡ b (mod 2)
(c) a ≡ b (mod 0)
(a) a + b ≡ c + d (mod m)
(b) a − b ≡ c − d (mod m)
(d) ab ≡ cd (mod m)
128 CHAPTER 14. ALGEBRA
For each item, find an elegant proof based on what we have already learned about divisibility and
congruence; point out, what you are using.
Exercise 14.2.4. Let k ≥ 0. Show that if a ≡ b (mod m), then ak ≡ bk (mod m).
Proceed by induction on k, using the multiplication rule for conguences (Ex. 14.2.3, item (d)).
Definition 14.2.5. Let a, x, m ∈ Z. We say that x is a multiplicative inverse of a modulo m if ax ≡ 1
(mod m).
For example, 13 is a multiplicative inverse of −8 modulo 21 because (−8) · 13 = −104 ≡ 1
(mod 21).
Exercise 14.2.6. Let a, m ∈ Z. Then a has a multiplicative inverse modulo m if and only if a and
m are relatively prime, i. e., gcd(a, m) = 1. In particular, if p is a prime then every integer that is
not divisible by p has a multiplicative inverse modulo p.
Exercise 14.2.8 (Multiplicative inverse unique modulo m). Let x be a multiplicative inverse of
a modulo m. Then an integer y is a multiplicative inverse of a modulo m if and only if x ≡ y
(mod m).
Remark 14.2.10. Note that day x and day y of the month fall on the same day of the week exactly
if x ≡ y (mod 7). For instance, If August 3 is a Wednesday, then August 24 is also a Wednesday,
because 3 ≡ 24 (mod 7). For this reason, when modular arithmetic is taught to kids, it is sometimes
referred to as “calendar arithmetic.”
Definition 14.2.12 (Equivalence relation). Let ∼ be a binary relation on a set A. The relation ∼ is
said to be an equivalence relation if the following conditions hold for all a, b, c ∈ A.
(a) a ∼ a (reflexivity)
(b) If a ∼ b then b ∼ a (symmetry)
(c) If a ∼ b and b ∼ c, then a ∼ c (transitivity)
Proposition 14.2.13. For any fixed m ∈ Z, “congruence modulo m” is an equivalence relation.
Definition 14.2.14 (Equivalence classes). Let A be a set with an equivalence relation ∼, and let
a ∈ A. The equivalence class of a with respect to ∼, denoted [a], is the set
[a] = {b ∈ A | a ∼ b} (14.7)
Note that the equivalence classes are not empty since a ∈ [a].
Theorem 14.2.15 (Fundamental Theorem of Equivalence Relations). Let ∼ be an equivalence
relation on the set A. The equivalence classes partition A, i. e.,
(i) if [a] 6= [b] then [a] ∩ [b] = ∅
S
(ii) a∈A [a] = A.
Exercise 14.2.16. Let ∼ be an equivalence relation on A. Then for all a, b ∈ A, the following are
equivalent:
(a) a ∼ b
(b) a ∈ [b]
(c) [a] = [b]
(d) [a] ∩ [b] 6= ∅
Definition 14.2.17. Let ∼ be an equivalence relation on the set A. Any element of an equivalence
class R is a representative of R. A set T ⊆ A is a complete set of representatives of the equivalence
classes if T includes exactly one element from each equivalence class. In particular, |T | is the number
of equivalence classes.
130 CHAPTER 14. ALGEBRA
Definition 14.2.18. The equivalence classes of the equivalence relation “congruence modulo m” in
Z are called residue classes modulo m or modulo m residue classes.
Exercise 14.2.19. Show that the residue classes modulo m are precisely the sets a + mZ (a ∈ Z)
(see Definitions 14.1.2 and 14.1.3).
Exercise 14.2.20. Given m ∈ Z, how many modulo m residue classes are there? Do not forget
the case m = 0.
Next we define addition and multiplication of residue classes. We use the notation [a]m := a+mZ.
Definition 14.2.21 (Sum of residue classes). Let [a]m and [b]m be residue classes modulo m. We
define their sum as
[a]m + [b]m = [a + b]m (14.8)
Definition 14.2.22 (Product of residue classes). Let [a]m and [b]m be residue classes modulo m. We
define their product as
[a]m · [b]m = [a · b]m (14.9)
Exercise 14.2.23. Show that the sum and the product of residue classes are well defined, that is,
that they do not depend on our choice of representative for each residue class. In other words, if
[a]m = [a0 ]m and [b]m = [b0 ]m then [a + b]m = [a0 + b0 ]m and [ab]m = [a0 b0 ]m .
These equations are immediate consequences of certain facts we have learned about congruences.
Point out, which ones.
Definition 14.2.24. A complete set of representatives of the residue classes modulo m is called a
complete set of residues modulo m.
Exercise 14.2.25. Let m ≥ 1. Then the set {0, 1, . . . , m−1} is a complete set of residues modulo m.
It is called the set of least non-negative residues modulo m.
Exercise 14.2.26. Let T be a complete set of residues modulo m and let c ∈ Z be relatively prime
to m. Then the dilation cT is again a complete set of residues modulo m.
Next we observe that Ex. 14.2.9 allows us to speak of the gcd of m and a residue class modulo m.
Definition 14.2.27. We say that gcd([a]m , m) = d if d = gcd(a, m). In particular, we say that [a]m
and m are relatively prime if gcd(a, m) = 1.
14.2. MODULAR ARITHMETIC 131
Remark 14.2.28. This definition is sound, i. e., the gcd we defined does not depend on the specific
choice of the representative of the residue class [a]m . This is the content of Ex. 14.2.9.
Definition 14.2.29. A reduced set of residues modulo m is a set of representatives of those residue
classes that are relatively prime to m.
Definition 14.2.30 (Euler’s ϕ function). For a positive integer m let ϕ(m) denote the number of
x ∈ [m] that are relatively prime to m.
Exercise 14.2.33. If R is a reduced set of residues modulo m and gcd(c, m) = 1 then cR is also a
reduced set of residues modulo m.
Exercise
Q 14.2.34.
Q If R and S are reduced sets of residues modulo m then
x∈R x ≡ y∈S y (mod m).
Proof. Let R be a reduced set of residues modulo m. Then aR is also a reduced set of residues
modulo m. Let P denote the product of the elements of R and Q the product of the elements of
S. Then Q = aϕ(m) P . On the other hand, by Ex. 14.2.34, P ≡ Q (mod m), i. e., P ≡ aϕ(m) · P
(mod m). Now an application of the Cancellation law (Ex. 14.2.7) yields the desired conclusion.
132 CHAPTER 14. ALGEBRA
14.3 Groups
Definition 14.3.1. A binary operation on the set A is a function A × A → A. If the operation is
denoted by “◦” then we have a map (a, b) 7→ a ◦ b (a, b ∈ A).
Examples of binary operations include addition and multiplication in Z, Q, R, C, Zm .
Definition 14.3.2 (Group). A group is a set G along with a binary operation ◦ that satisfies the
following axioms.
(c) (neutral element) There exists a neutral element e ∈ G such that for all a ∈ G, e◦a = a◦e = a
(d) (inverses) For each a ∈ G, there exists b ∈ G such that a ◦ b = b ◦ a = e. The element b is
called the inverse of a
The first of these axioms is redundant; it just declares that ◦ is a binary operation.
Remark 14.3.3. Axiom (c) ensures that if (G, ◦) is a group then G cannot be the empty set.
Exercise 14.3.4. Prove that the neutral element is the unique element of the group that satisfies
the equation x ◦ x = x.
Exercise 14.3.5. We say that c ∈ G is a left inverse of a ∈ G if ca = e. Right inverses are defined
analogously. A two-sided inverse of a is an element that is both a left inverse and a right inverse.
Prove that every element of G has a unique left inverse, a unique right inverse, and these are equal;
in particular, every element has a unique two-sided inverse, to which we refer as the inverse.
Strictly speaking, G denotes a set, and this set, along with the binary operation ◦, constitutes
the group (G, ◦). However, we often omit ◦ and refer to G as the group when the binary operation
is clear from context.
Groups satisfying the additional axiom
Convention 14.3.6. There are two common notational conventions for groups: additive and mul-
tiplicative. The additive notation is only used for abelian groups.
In additive notation, the operation is written as (a, b) 7→ a + b and the neutral element is called
zero, denoted 0G , or simply 0 if the group is clear from the context. The additive inverse of a ∈ G,
called the negative of a, is denoted −a, and the element a + (−b) is denoted a − b. We call an
abelian group an additive group if we use the additive notation.
In multiplicative notation, the operation is written as (a, b) 7→ a · b or a × b or simply ab.
The neutral element is called the identity, denoted 1G , or simply 1 if the group is clear from the
context. The multiplicative inverse of a ∈ G is denoted a−1 . If the group is abelian, we also call the
multiplicative inverse the reciprocal and sometimes denote it 1/a, and we also denote the element
a · b−1 as the quotient a/b.
Definition 14.3.7 (Order of a group). Let G be a group. The order of G is its cardinality, |G|.
Exercise 14.3.8. The following are examples of abelian groups: (Z, +), (Q, +), (R, +), (C, +).
Exercise 14.3.9. (Zm , +) is an abelian group of order m.
The additive groups listed in the preceding two exercises do not form groups with respect to
multiplication because 0 has no multiplicative inverse.
Definition 14.3.10 (Semigroup, monoid). A semigroup (S, ◦) is a set with an associative binary
operation (axioms (a) and (b) in Def. 14.3.2). A monoid is a semigroup with a neutral element
(axiom (c)). An element a ∈ S of the monoid S is invertible if it satisfies axiom (d).
Exercise 14.3.11. The following sets are examples of (commutative) monoids with respect to
multiplication: Q, R, C, Zm (m ≥ 1).
Notation 14.3.12. Let (S, ×) be a monoid, where the operation is written as multiplication. We
write S × to denote the set of invertible elements of S.
Exercise 14.3.13. If (S, ×) is a monoid then (S × , ×) is a group.
Exercise 14.3.14. Z× = {1, −1}. This set is a multiplicative abelian group of order 2.
Exercise 14.3.15. Q× := Q \ {0}, R× := R \ {0}, C× := C \ {0}. These sets are multiplicative
abelian groups.
m. (See R
Exercise 14.3.16. Z×
multiplication.
m is the set of those residue classes modulo m that are relatively prime to
Def. 14.2.27 and the subsequent remark.) Z× m is an abelian group with respect to
134 CHAPTER 14. ALGEBRA
Convention 14.3.19 (Empty sum). Let (G, +) be an abelian group. The empty sum in (G, +) is
defined to be equal to 0, that is,
X X0
a= ai = 0 . (14.12)
a∈∅ i=1
Exercise 14.3.20. Let (G, +) be an abelian group. Let I be a finite index set and ai ∈ G for i ∈ I.
˙ where the dot indicates that A and B are disjoint: A ∩ B = ∅. Then
Let I = A∪B
X X X
a= a+ a. (14.13)
a∈I a∈A a∈B
Note that the validity of this identity depends on the Convention about empty sums, and our
convention is the only possible way to maintain this identity.
Convention 14.3.21 (Empty product). Let (S, ×) be a commutative monoid. The empty product
in (S, ×) is defined to be equal to 1, that is,
Y 0
Y
a= ai = 1 . (14.14)
a∈∅ i=1
Exercise 14.3.22. State and prove the multiplicative analogue of Ex. 14.3.20. Note that it is
sufficient to talk about monoids, the structure does not need to be a group.
Definition 14.3.23 (Sumset). We define shift and sumsets in an abelian group exactly as we did in
Z (Definitions 14.1.2 and 14.1.4).
Exercise 14.3.24. (i) Prove that the inequalities listed in Ex. 14.1.8 hold in any additive abelian
group.
(ii) Find an infinite abelian group in which inequality (c) in Ex. 14.1.8 is not tight.
Definition 14.3.25 (Subgroup). Let G be a group. If H ⊆ G is a group under the same operation
as G, we say that H is a subgroup of G, denoted H ≤ G.
Proposition 14.3.26. The relation ≤ is transitive, that is, if K ≤ H and H ≤ G, then K ≤ G.
Proposition 14.3.27. The intersection of any collection of subgroups of a group G is itself a
subgroup of G.
Proposition 14.3.28. Let G be a group and let H, K ∈ G. Then H ∪ K ≤ G if and only if H ⊆ K
or K ⊆ H.
Proposition 14.3.29. Let G be a group and let H ≤ G. Then
(a) The identity of H is the same as the identity of G.
(b) Let a ∈ H. The inverse of a in H is the same as the inverse of a in G.
Notation 14.3.30. Let G be an additive abelian group, and let H, K ⊆ G. We write −H for the set
−H = {−h | h ∈ H}, and H − K for the set
H − K = H + (−K) = {h − k | h ∈ H, k ∈ K} (14.15)
Proposition 14.3.31. Let G be a group and H ⊆ G. Then H ≤ G if and only if
(a) 0 ∈ H
(b) −H ⊆ H (closed under inverses)
(c) H + H ⊆ H (closed under addition)
Proposition 14.3.32. Let G be a group and let H ⊆ G. Then H ≤ G if and only if H 6= ∅ and
H − H ⊆ H (that is, H is closed under subtraction).
136 CHAPTER 14. ALGEBRA
14.4 Fields
Definition 14.4.1. A field is a set F with two binary operations, addition and multiplication, such
that
(a) (F, +) is an abelian group with zero element 0F (also simply denoted as 0 if the field F is clear
from the context);
(a) (k + `) · a = k · a + ` · a
(b) k · (a + b) = k · a + k · b
Definition 14.4.8. The characteristic of a field F is the smallest positive integer k such that k·1F = 0F .
If no such k exists, we say that F has characteristic 0. We denote the characteristic of F by char(F).
We say that F has finite characteristic if char(F) 6= 0.
Remark 14.4.9. Note that “infinite” and “zero” are treated as synonyms in this context, and in
many other contexts in number theory, for a good reason. Indeed, 0 is on the top of the divisibility
hierarchy of integers (0 is divisible by all integers). This fact explains the term “finite characteristic,”
see Ex. 14.4.12.
Exercise 14.4.11. If the characteristic of a field is not zero then it is a prime number.
Theorem 14.4.13 (Galois). All finite fields have finite characteristic. If the characteristic of a
finite field F is the prime number p then the order of F is a power of p. ♦
This result has a simple linear algebra reason, as we shall see later. *** This will prove the easy
part of Galois’s Theorem 14.4.4: all finite fields have prime power order. It will not prove either
the existence or the uniqueness of those fields.
Definition 14.4.14. Consider the formal quotients of formal univariate polynomials (polynomials in
one variable) over the field F, where the denominator is not the zero polynomial. (Note that even
over finite fields, there are infinitely many formal polynomials, see Sec. 14.5.) Let us say that two
such quotients, f1 /g1 and f2 /g2 are equivalent if f1 g2 = f2 g1 . The equivalence classes under the
natural operations form a field called the function field over F and is denoted by F(t), where t is
the name of the variable.
This construction is analogous to the way we build Q from Z.
Definition 14.4.15 (Subfield). Let F, G be fields. We say that F is a subfield of G if F ⊆ G and F
the operations in F are the same as those in G, restricted to F. In this case we also say that G is
an extension of F.
138 CHAPTER 14. ALGEBRA
Exercise 14.4.16. Let G be a field and F a subset of G. Then F is a subfield if and only if 1G ∈ F
and F is closed under subtraction and division by nonzero elements, i. e., F − F ⊆ F and for a ∈ F
and b ∈ F× we have a/b ∈ F.
Exercise 14.4.17. If F is a subfield of G then they have the same characteristic.
Note in particular that Fp is not a subfield of R.
Exercise 14.4.18. The field F is a subfield of the function field F(t). In particular, they have the
same characteristic.
This shows that function fields are examples of infinite fields of every characteristic.
Exercise 14.4.19. If F has characteristic p then F has a subfield isomorphic to Fp . If F has
characteristic zero then F has a subfield isomorphic to Q.
For this reason, the fields Fp and Q are called prime fields.
14.5 Polynomials
As mentioned in Section 8.3, “polynomials” in this book refer to univariate polynomials (polyno-
mials in one variable), unless we expressly refer to multivariate polynomials.
In Section 8.3, we developed a basic theory of polynomials. In that section, however, we viewed
polynomials as functions. We now develop a more formal theory of polynomials, viewing a poly-
nomial f as a formal expression whose coefficients are taken from a field F of scalars, rather than
as a function f : F → F. This makes no real difference if F is infinite but the difference is signifi-
cant when the field is finite. For starters, there are only finitely many functions Fq → Fq , namely,
there are q q of them, but there are infinitely many formal polynomials over Fq ; for instance, the
polynomials xn (n ∈ N) are all formally different.
Definition 14.5.1 (Polynomial). A polynomial over the field F is an expression1 of the form
f = α0 + α1 t + α2 t2 + · · · + αn tn (14.16)
where the coefficients αi are scalars (elements of F), and t is a symbol. The set of all polynomials
over F is denoted F[t]. Two expressions, (14.16) and
g = β0 + β1 t + β2 t2 + · · · + βm tm (14.17)
1
Strictly speaking, a polynomial is an equivalence class of such expressions, as explained in the next sentence.
14.5. POLYNOMIALS 139
define the same polynomial if they only differ in leading zero coefficients, i. e., there is some k for
which α0 = β0 , . . . , αk = βk , and all coefficients αj , βj are zero for j > k. We may omit any terms
with zero coefficient, e. g.,
3 + 0t + 2t2 + 0t3 = 3 + 2t2 (14.18)
Definition 14.5.2 (Zero polynomial). The polynomial in which all coefficients are zero is called the
zero polynomial and is denoted by 0.
Definition 14.5.3 (Leading term). The leading term of a polynomial f = α0 + α1 t + · · · + αn tn is
the term corresponding to the highest power of t with a nonzero coefficient, that is, the term αk tk
where αk 6= 0 and αj = 0 for all j > k. The zero polynomial does not have a leading term.
Definition 14.5.4 (Leading coefficient). The leading coefficient of a polynomial f is the coefficient
of the leading term of f .
For example, the leading term of the polynomial 3 + 2t2 + 5t7 is 5t7 and the leading coefficient
is 5.
Definition 14.5.5 (Monic polynomial). A polynomial is monic if its leading coefficient is 1.
Definition 14.5.6 (Degree of a polynomial). The degree of a polynomial f = α0 + α1 t + · · · + αn tn ,
denoted deg f , is the exponent of its leading term.
For example, deg (3 + 2t2 + 5t7 ) = 7.
Convention 14.5.7. The zero polynomial has degree −∞.
Notation 14.5.9. We denote the set of polynomials of degree at most n over F by Pn [F].
Definition 14.5.10 (Sum and difference of polynomials). Let f = α0 + α1 t + · · · + αn tn and g =
β0 + β1 t + · · · + βn tn be polynomials. Then the sum of f and g is defined as
Note that f and g need not be of the same degree; we can add on leading zeros if necessary.
140 CHAPTER 14. ALGEBRA
f = 2t + t2
g = 3 + t + 2t2 + 3t3
h = 5 + t3 + t4
(a) e1 = f − g
(b) e2 = g − h
(c) e3 = h − f
Proposition 14.5.12. Addition of polynomials is (a) commutative and (b) associative, that is, if
f, g, h ∈ F[t] then
(a) f + g = g + f
(b) f + (g + h) = (f + g) + h
(a) e4 = f · g
(b) e5 = f · h
(c) e6 = f · (g + h)
f (ζ) = α0 + α1 ζ + · · · + αn ζ n ∈ F (14.22)
The substitution t 7→ ζ defines a mapping F[t] → F which assigns the value f (ζ) to f . We denote
the F → F function ζ 7→ f (ζ) by f and call f a polynomial function. So f is a function while f is
a formal expression. If A is an n × n matrix, then we define
f = qg + r (14.24)
f = (t − ζ)q + ξ
Definition 14.5.24 (Ideal). Let I ⊆ F[t]. Then I is an ideal if the following three conditions hold.
(a) 0 ∈ I
Notation 14.5.25. Let f ∈ F[t]. We denote by (f ) the set of all multiples of f , i. e.,
(f ) = {f g | g ∈ F[t]} .
Definition 14.5.27 (Principal ideal). Let f ∈ F[t]. The set (f ) is called the principal ideal generated
by f , and f is said to be a generator of this ideal.
(b) g is a common multiple of all common divisors of the fi , i. e., for all e ∈ F[t], if e | fi for all i,
then e | g.
Proposition 14.5.32. Let f1 , . . . , fk , d, d1 , d2 ∈ F[t].
(a) Let ζ be a nonzero scalar. If d is a gcd of f1 , . . . , fk , then ζd is also a gcd of f1 , . . . , fk .
(b) If d1 and d2 are both gcds of f1 , . . . , fk , then there exists ζ ∈ F× = F \ {0} such that d2 = ζd1 .
Exercise 14.5.33. Let f1 , . . . , fk ∈ F[t]. Show that gcd(f1 , . . . , fk ) = 0 if and only if f1 = · · · =
fk = 0.
Proposition 14.5.34. Let f1 , . . . , fk ∈ F[t] and suppose not all of the fi are 0. Then among all of
the greatest common divisors of f1 , . . . , fk , there is a unique monic polynomial.
For the sake of uniqueness of the gcd notation, we write d = gcd(f1 , . . . , fk ) if, in addition to
(a) and (b),
(c) d is monic or d = 0.
Theorem 14.5.35 (Existence of gcd). Let f1 , . . . , fk ∈ F[t]. Then gcd(f1 , . . . , fk ) exists and,
moreover, there exist polynomials g1 , . . . , gk such that
k
X
gcd(f1 , . . . , fk ) = fi gi (14.25)
i=1
♦
144 CHAPTER 14. ALGEBRA
Lemma 14.5.36 (Euclid’s Lemma). Let f, g, h ∈ F[t]. Then Div(f, g) = Div(f − gh, g). ♦
By applying the above theorem and Euclid’s Lemma, we arrive at Euclid’s Algorithm for deter-
mining the gcd of polynomials.
Euclid’s algorithm
Input: polynomials f0 , g0
Output: gcd(f0 , g0 )
f ← f0
g ← g0
while g 6= 0
Find q and r such that
f = gq + r and deg(r) < deg(g)
(Division Theorem)
f ←g
g←r
end(while)
return f
Theorem 14.5.39. Euclid’s Algorithm returns a greatest common divisor of the two input polyno-
mials in at most deg(g0 ) rounds of the while loop. ♦
= Div t2 − 1, −7t − 7
2 t
= Div t − 1 + (−7t + 7), −7t + 7
7
= Div (t − 1, −7t + 7)
1
= Div t − 1 + (−7t + 7), −7t + 7
7
= Div(0, −7t + 7)
= Div(−7t + 7)
Thus −7t + 7 is a gcd of f and g, and we can multiply by − 71 to get a monic polynomial. In
particular, t − 1 is the gcd of f and g, and we may write gcd(f, g) = t − 1.
Exercise 14.5.41. Let
f1 = t2 + t − 2 (14.26)
f2 = t2 + 3t + 2 (14.27)
f3 = t3 − 1 (14.28)
f4 = t4 − t2 − 2t − 1 (14.29)
(b) gcd(f1 , f3 )
(c) gcd(f1 , f3 , f4 )
(d) gcd(f1 , f2 , f3 , f4 )
Proposition 14.5.42. Let f, g, h ∈ F[t]. Then gcd(f g, f h) = f d, where d = gcd(g, h).
Proposition 14.5.43. If f | gh and gcd(f, g) = 1, then f | h.
Exercise 14.5.44. Determine gcd(f, f 0 ) where f = tn + t + 1 (over R).
146 CHAPTER 14. ALGEBRA
t − ζ | f − f (ζ) .
Definition 14.5.56 (Multiplicity). The multiplicity of a root ζ of a polynomial f ∈ F[t] is the largest
k for which (t − ζ)k | f .
√
Exercise 14.5.57. Let f ∈ R[t]. Show that f −1 = 0 if and only if t2 + 1 | f .
Proposition 14.5.58. Let f be a polynomial of degree n. Then f has at most n roots (counting
multiplicity).
Theorem 14.5.59 (Fundamental Theorem of Algebra). Let f ∈ C[t]. If deg f ≥ 1, then f has a
complex root, i. e., there exists ζ ∈ C such that f (ζ) = 0. ♦
Proposition 14.5.64. Let f ∈ R[t] be of odd degree. Then f has a real root.
Proposition 14.5.65. Let f ∈ R[t] and ζ ∈ C. Then f (ζ) = f (ζ). Conclude that if ζ is a complex
root of f , then so is ζ.
Exercise 14.5.66. Let f ∈ R[t]. Show that if f is irreducible over R, then deg f ≤ 2.
Q
Exercise 14.5.67. Let f ∈ R[t] and f 6= 0. Show that f can be written as f = gi where each gi
has degree 1 or 2.
148 CHAPTER 14. ALGEBRA
n
X
Definition 14.5.68 (Formal derivative of a polynomial). Let f = αi ti . Then the formal derivative
i=0
of f is defined to be
n
X
0
f = kαk tk−1 (14.31)
k=1
That is,
f 0 = α1 + 2α2 t + · · · + nαn tn−1 . (14.32)
Note that this definition works even over finite fields. We write f (k) to mean the k-th derivative of
0
f , defined inductively as f (0) = f and f (k+1) = f (k) .
Proposition 14.5.69 (Linearity of differentiation). Let f and g be a polynomials and let ζ be a
scalar. Then
(a) (f + g)0 = f 0 + g 0 ,
(b) (ζf )0 = ζf 0 .
Proposition 14.5.70 (Product rule). Let f and g be a polynomials and let ζ be a scalar. Then
(f g)0 = f 0 g + f g 0 (14.33)
Definition 14.5.71 (Composition of polynomials). Let f and g be polynomials. Then the composition
of f with g, denoted f ◦ g is the polynomial obtained by replacing all occurrences of the symbol t
in the expression for f with g, i. e., f ◦ g = f (g) (we “substitute g” into f ).
Proposition 14.5.72. Let f and g be polynomials and let ζ be a scalar. Then
(f ◦ g)(ζ) = f (g(ζ)) . (14.34)
Proposition 14.5.73 (Chain Rule). Let f, g ∈ F[t] and let h = f ◦ g. Then
h0 = (f 0 ◦ g) · g 0 . (14.35)
Proposition 14.5.74.
(a) Let2 F be Q, R, or C. Let f ∈ F[t] and let ζ ∈ F. Then (t − ζ)k | f if and only if
f (ζ) = f 0 (ζ) = · · · = f (k−1) (ζ) = 0
2
This exercise holds for all subfields of C and more generally for all fields of characteristic 0.
14.5. POLYNOMIALS 149
Proposition 14.5.75. Let f ∈ C[t]. Then f has no multiple roots if and only if gcd(f, f 0 ) = 1.
Exercise 14.5.76. Let n ≥ 1. Prove that the polynomial f = tn + t + 1 has no multiple roots in C.
Chapter 15
Babai: Discover Linear Algebra. 150 This chapter last updated August 16, 2016
c 2016 László Babai.
15.1. VECTOR SPACES 151
The zero vector in a vector space V is written as 0V , but we often just write 0 when the context is
clear.
Property (b2) is referred to as “pseudo-associativity,” because it is a form of associativity in
which we are dealing with different operations (multiplication in F and scaling of vectors). Prop-
erties (b3) and (b4) are two types of distributivity: scaling of a vector distributes both over the
addition of scalars and the addition of vectors.
Proposition 15.1.2. Let V be a vector space over the field F. For all v ∈ V , α ∈ F:
(a) 0 · v = 0.
(b) α0 = 0.
Exercise 15.1.3. Let V be a vector space and let x ∈ V . Show that x + x = x if and only if x = 0.
Example 15.1.4 (Euclidean geometry). The most natural examples of vector spaces are the geo-
metric spaces G2 and G3 . We write G2 for the plane and G3 for the “space” familiar from Euclidean
geometry. We think of G2 and G3 has having a special point called the origin. We view the points
of G2 and G3 as “vectors” (line segments from the origin to the point). Addition is defined by the
parallelogram rule and scalar multiplication (over R) by scaling. Observe that G2 and G3 are vector
spaces over R. These classical geometries form the foundation of our intuition about vector spaces.
Note that G2 is not the same as R2 . Vectors in G2 are directed segments (geometric objects),
while the vectors in R2 are pairs of numbers. The connection between these two is one of the great
discoveries of the mathematics of the modern era (Descartes).
Examples 15.1.5. Show that the following are vector spaces over R.
(a) Rn
(b) Mn (R)
(c) Rk×n
R
(i) The space RΩ of functions f : Ω → R where Ω is an arbitrary set.
In Section 11.1, we defined the notion of a linear form over Fn ( Def. 11.1.1). This generalizes
immediately to vector spaces over F.
Definition 15.1.6 (Linear form). Let V be a vector space over F. A linear form is a function
f : V → F with the following properties.
(a) f (x + y) = f (x) + f (y) for all x, y ∈ Fn ;
(b) f (λx) = λf (x) for all x ∈ Fn and λ ∈ F.
Definition 15.1.7 (Dual space). Let V be a vector space over F. The set of linear forms f : V → F
is called the dual space of V and is denoted V ∗ .
R
Exercise 15.1.8. Let V be a vector space over F. Show that V ∗ is also a vector space over F.
In Section 1.1, we defined linear combinations of column vectors ( Def. 1.1.13). This is
easily generalized to linear combinations of vectors in any vector space.
Definition 15.1.9 (Linear combination). Let V be a vector space over F, and let v1 , . . . , vk ∈ V ,
k
X
α1 , . . . , αk ∈ F. Then αi vi is called a linear combination of the vectors v1 , . . . , vk . The linear
i=1
combination for which all coefficients are zero is the trivial linear combination.
R
Exercise 15.1.10 (Empty linear combination). What is the linear combination of the empty set?
Convention 14.3.21 explains our convention for the empty sum.
Exercise 15.1.11.
(a) Express the polynomial t − 1 as a linear combination of the polynomials t2 − 1, (t − 1)2 ,
t2 − 3t + 2.
(b) Give an elegant proof that the polynomial t2 + 1 cannot be expressed as a linear combination
of the polynomials t2 − 1, (t − 1)2 , t2 − 3t + 2.
Exercise 15.1.12. For α ∈ R, express cos(t + α) as a linear combination of cos t and sin t.
15.2. SUBSPACES AND SPAN 153
(b) span(S) ≤ V ;
R
( Def. 1.3.5). We now generalize this to linear independence of a list (
in a general vector space.
R
Let V be a vector space. In Section 1.3, we defined the notion of linear independence of matrices
Def. 1.3.1) of vectors
Definition 15.3.1 (Linear independence). The list (v1 , . . . , vk ) of vectors in V is said to be linearly
independent over F if the only linear combination equal to 0 is the trivial linear combination. The
list (v1 , . . . , vk ) is linearly dependent if it is not linearly independent.
15.3. LINEAR INDEPENDENCE AND BASES 155
Definition 15.3.2. If a list (v1 , . . . , vk ) of vectors is linearly independent (dependent), we say that
the vectors v1 , . . . , vk are linearly independent (dependent).
Definition 15.3.3. We say that a set of vectors is linearly independent if a list formed by its elements
(in any order and without repetitions) is linearly independent.
Exercise 15.3.4. Show that the following sets are linearly independent over Q.
√ √
(a) 1, 2, 3
√
(b) { x | x is square-free} (an integer n is square-free if there is no perfect square k 6= 1 such
that k | n).
Definition 15.3.6 (Rank and dimension). The rank of a set of vectors is the maximum number of
linearly independent vectors among them. For a vector space V , the dimension of V is its rank,
that is, dim V = rk V .
Exercise 15.3.7.
Definition 15.3.8. We say that the vector w depends on the list (v1 , . . . , vk ) of vectors if w ∈
span(v1 , . . . , vk ), i. e., if w can be expressed as a linear combination of the vi .
Definition 15.3.9 (Linear independence of an infinite list). We say that an infinite list (vi | i ∈ I)
(where I is an index set) is linearly independent if every finite sublist (vi | i ∈ J) (where J ⊆ I and
|J| < ∞) is linearly independent.
Exercise 15.3.10. Verify that Exercises 1.3.11-1.3.26 hold in general vector spaces (replace Fn by
V where necessary).
156 CHAPTER 15. VECTOR SPACES: BASIC CONCEPTS
Example 15.3.11. For k = 0, 1, 2, . . . , let fk be a polynomial of degree k. Show that the infinite
list (f0 , f1 , f2 , . . . ) is linearly independent.
Exercise 15.3.12. Find three nonzero vectors in G3 that are linearly dependent but no two are
parallel.
Exercise 15.3.13. Prove that for all α, β ∈ R, the functions sin(t), sin(t + α), sin(t + β) are linearly
dependent (as members of the function space RR ).
Exercise 15.3.14. Let α1 < α2 < · · · < αn ∈ R. Consider the vectors
α1i
i
α2
vi =
..
(15.1)
.
αni
for i ≥ 0 (recall the convention that α0 = 1 even if α = 0). Show that (v0 , . . . , vn−1 ) is linearly
independent.
Exercise 15.3.15 (Moment curve). Find a continuous curve in Rn , i. e., a continuous injective
function f : R → Rn , such that every set of n points on the curve are linearly independent. The
simplest example is called the “moment curve,” and we bet you will find it.
Exercise 15.3.16. Let α1 < α2 < · · · < αn ∈ R, and define the degree-n polynomial
n
Y
f= (t − αj )
j=1
Definition 15.3.18 (Finite-dimensional vector space). We say that the vector space V is finite di-
mensional if V has a finite set of generators. A vector space which is not finite dimensional is
infinite dimensional.
Note that a list of vectors is not the same as a set of vectors, but a list of vectors which is
linearly independent necessarily has no repeated elements. Note further that lists carry with them
an inherent ordering; that is, bases are ordered.
Examples 15.3.21.
Examples 15.3.22. For each of the following sets S, describe the vectors in span(S) and give a
basis for span(S).
Example 15.3.23. Show that the polynomials t2 , (t + 1)2 , and (t + 2)2 form a basis for P2 [F].
Express the polynomial t in terms of this basis, and write its coordinate vector.
Exercise 15.3.24. For α ∈ R, write the coordinate vector of cos(t + α) in the basis (cos t, sin t).
inR
Exercise 15.3.25. Find a basis of the 0-weight subspace of Fk (the 0-weight subspace is defined
Ex. 1.2.7).
Exercise 15.3.26.
(a) Find a basis of Mn (F).
Definition 15.3.28 (Maximal linearly independent set). A linearly independent set S ⊆ V is maximal
if, for all v ∈ V \ S, S ∪ {v} is not linearly independent.
Proposition 15.3.29. Let e be a list of vectors in a vector space V . Then e is a basis of V if and
only if it is a maximal linearly independent set.
Proposition 15.3.30. Let V be a vector space. Then V has a basis (Zorn’s lemma is needed for
the infinite-dimensional case).
Proposition 15.3.31. Let e be a list of vectors in V . Then it is possible to extend e to a basis of
V , that is, there exists a basis of V which has e as a sublist.
15.4. THE FIRST MIRACLE OF LINEAR ALGEBRA 159
Proposition 15.3.32. Let V be a vector space and let S ⊆ V be a set of generators of V . Then
there exists a list e of vectors in S such that e is a basis of V .
Definition 15.3.33 (Coordinates). The coefficients α1 , . . . , αn of Ex. 15.3.27 are called the coordi-
nates of v with respect to the basis b.
Definition 15.3.34 (Coordinate vector). Let b = (b1 , . . . , bk ) be a basis of the vector space V ,
and let v ∈ V . Then the column vector representation of v with respect to the basis b, or the
coordinatization of v with respect to b, denoted by [v]b , is obtained by arranging the coordinates of
v with respect to b in a column, i. e.,
α1
α2
[v]b = .. (15.2)
.
αk
k
X
where v = αi bi .
i=1
R
15.4 The First Miracle of Linear Algebra
In Section 1.3, we proved the First Miracle of Linear Algebra for Fn ( Theorem 1.3.40). This
generalizes immediately to abstract vector spaces.
Theorem 15.4.1 (First Miracle of Linear Algebra). Let v1 , . . . , vk be linearly independent with
vi ∈ span(w1 , . . . , wm ) for all i. Then k ≤ m.
The proof of this theorem requires the following lemma.
Lemma 15.4.2 (Steinitz exchange lemma). Let (v1 , . . . , vk ) be a linearly independent list such that
vi ∈ span(w1 , . . . , wm ) for all i. Then there exists j (1 ≤ j ≤ m) such that the list (wj , v2 , . . . , vk )
is linearly independent. ♦
Corollary 15.4.3. Let V be a vector space. All bases of V have the same cardinality.
This is an immediate corollary to the First Miracle.
The following theorem is essentially a restatement of the First Miracle of Linear Algebra.
Theorem 15.4.4.
160 CHAPTER 15. VECTOR SPACES: BASIC CONCEPTS
(a) Use the First Miracle to derive the fact that rk(v1 , . . . , vk ) = dim (span(v1 , . . . , vk )).
(b) Derive the First Miracle from the statement that rk(v1 , . . . , vk ) = dim (span(v1 , . . . , vk )).
Exercise 15.4.5. Let V be a vector space of dimension n, and let v1 , . . . , vn ∈ V . The following
are equivalent:
(c) V = span(v1 , . . . , vn )
Exercise 15.4.8. Show that dim(Pk ) = k + 1, where Pk is the space of polynomials of degree at
most k.
Exercise 15.4.10. Show that, if f is a polynomial of degree n, then (f (t), f (t + 1), . . . , f (t + n − 1))
is a basis of Pn [F].
Exercise 15.4.11. Show that any list of polynomials, one of each degree 0, . . . , n, forms a basis of
Pn [F].
Proposition 15.4.12. Let V be an n-dimensional vector space with subspaces U1 , U2 such that
U1 ∩ U2 = {0}. Then
dim U1 + dim U2 ≤ n . (15.3)
Proposition 15.4.13 (Modular equation). Let V be a vector space, and let U1 , U2 ≤ V . Then
Exercise 15.4.14. Let A = (αij ) ∈ Rn×n , and assume the columns of A are linearly independent.
Prove that it is always possible to change the value of an entry in the first row so that the columns
of A become linearly dependent.
Exercise 15.4.15. Call a sequence (a0 , a1 , a2 , . . . ) “Fibonacci-like” if for all n, an+2 = an+1 + an .
(a) Prove that Fibonacci-like sequences form a 2-dimensional vector space.
(b) Find a basis for the space of Fibonacci-like sequences.
(c) Express the Fibonacci sequence (0, 1, 1, 2, 5, 8, . . . ) as a linear combination of these basis vec-
tors.
♥ Exercise 15.4.16. Let f be a polynomial. Prove that f has a multiple g = f · h 6= 0 in which
every exponent is prime, i. e., X
g= α p xp (15.5)
p prime
The following exercise shows that this could actually be used as the definition of the direct sum
in finite-dimensional spaces, but that statement in fact holds in infinite-dimensional spaces as well.
Then
k
X k
M
Ui = Ui .
i=1 i=1
k
X
Proposition 15.5.4. Let V be a vector space and let Ui , . . . Uk ≤ V . Then W = Ui is a direct
i=1
sum if and only if for every choice of k vectors ui (i = 1, . . . , k) where ui ∈ Ui \ {0}, the vectors
u1 , . . . , uk are linearly independent.
We note that the notion of direct sum extends the notion of linear independence to subspaces.
Linear Maps
Babai: Discover Linear Algebra. 163 This chapter last updated August 14, 2016
c 2016 László Babai.
164 CHAPTER 16. LINEAR MAPS
Example 16.1.10. Interpolation is the map f 7→ L(f ) ∈ Pn (R) where L(f ) = p is the unique
polynomial with the property that p(αi ) = f (αi ) (i = 0, . . . , n) for some fixed α0 , . . . , αn . This is a
linear map from C[0, 1] → Pn (R) (verify!).
Example 16.1.11. The map ϕ : Pn (R) → Tn = span{1, cos t, . . . , cos nt} defined by
ϕ (α0 + α1 t + · · · + αn tn )
= α0 + α1 cos t + · · · + αn cosn t (16.4)
Definition 16.1.14 (Composition of linear maps). Let U , V , and W be vector spaces, and let ϕ :
U → V and ψ : V → W be linear maps. Then the composition of ψ with ϕ, denoted by ψ ◦ ϕ or
ψϕ, is the map η : U → W defined by
Linear maps are uniquely determined by their action on a basis, and we are free to choose this
action arbitrarily. This is more formally expressed in the following theorem.
Theorem 16.1.16 (Degree of freedom of linear maps). Let V and W be vector spaces with e =
(e1 , . . . , ek ) a basis of V , and w1 , . . . , wk arbitrary vectors in W . Then there exists a unique linear
map ϕ : V → W such that ϕ(vi ) = wi for 1 ≤ i ≤ k. ♦
In Section 15.3 we represented vectors by the column vectors of their coordinates with respect
to a given basis. As the next step in translating geometric objects to tables of numbers, we assign
matrices to linear maps relative to given bases in the domain and the target space. Our key tool
for this endeavor is Theorem 16.1.16.
16.2 Isomorphisms
Let V and W be vector spaces over the same field.
Definition 16.2.1 (Isomorphism). A linear map ϕ ∈ Hom(V, W ) is said to be an isomorphism if it
is a bijection. If there exists an isomorphism between V and W , then V and W are said to be
isomorphic. “V is isomorphic to W ” is denoted V ∼= W.
Fact 16.2.3. Isomorphisms preserve linear independence and, moreover, map bases to bases.
166 CHAPTER 16. LINEAR MAPS
Fact 16.2.4. Let V and W be vector spaces, let ϕ : V → W be an isomorphism, and let v1 , . . . , vk ∈
V . Then
rk(v1 , . . . , vk ) = rk(ϕ(v1 ), . . . , ϕ(vk ))
(a) V ∼
= V (reflexive)
(b) If V ∼
= W then W ∼
= V (symmetric)
(c) If U ∼
= V and V ∼
= W then U ∼
= W (transitive)
Proposition 16.2.7. Two vector spaces over the same field are isomorphic if and only if they have
the same dimension.
Definition 16.3.4 (Rank of a linear map). The rank of a linear map ϕ is defined as
rk(ϕ) := dim(im(ϕ)) .
nullity(ϕ) := dim(ker(ϕ)) .
♦
Exercise 16.3.8. Let n = k + `. Find a linear map ϕ : Rn → Rn which has rank k, and therefore
has nullity `.
Examples 16.3.9. Find the rank and nullity of each of the linear maps in Examples 16.1.6-16.1.11.
FIGURES
Examples 16.4.5. The following are linear transformations of the 3-dimensional geometric space
G3 (verify!).
g(t) := f (t + α) . (16.9)
Example 16.4.9. Let V = span(sin t, cos t). Differentiation is a linear transformation of V (verify!).
16.4. LINEAR TRANSFORMATIONS 169
∆f := Sα (f ) − f
In Chapter 8, we discussed the notion of eigenvectors and eigenvalues of square matrices. This
is easily generalized to eigenvectors and eigenvalues of linear transformations.
Definition 16.4.11 (Eigenvector). Let ϕ : V → V be a linear transformation. Then v ∈ V is an
eigenvector of ϕ if v 6= 0 and there exists λ ∈ F such that ϕ(v) = λv.
Definition 16.4.12 (Eigenvalue). Let ϕ : V → V be a linear transformation. Then λ ∈ F is an
eigenvalue of ϕ if there exists a nonzero vector v ∈ V such that ϕ(v) = λv.
R
It is easy to see that eigenvectors and eigenvalues as we defined them for square matrices are
just a special case of these definitions; in particular, if A is a square matrix, then its eigenvectors
and eigenvalues are the same as those of ϕA ( Example 16.1.5), the map which takes the column
vector x to Ax.
Definition 16.4.16 (Eigenspace). Let V be a vector space and let ϕ : V → V be a linear transfor-
mation. We denote by Uλ the set
Exercise 16.4.17. Let V be a vector space over F and let ϕ : V → V be a linear transformation.
Show that, for all λ ∈ F, Uλ is a subspace of V .
170 CHAPTER 16. LINEAR MAPS
R
λ λ
Examples 16.4.19. Determine the rank, nullity, eigenvalues (and their geometric multiplicities),
and eigenvectors of each of the transformations in Examples 16.4.4-16.4.10.
Exercise 16.4.22. Let ϕ : G3 → G3 be a rotation about the vertical axis through the origin. What
are the ϕ-invariant subspaces?
Exercise 16.4.23. Let π : G3 → G3 be the projection onto the horizontal plane. What are the
π-invariant subspaces?
Exercise 16.4.24.
Exercise 16.4.26. Over every field F, find an infinite-dimensional vector space V and linear trans-
formation ϕ : V → V that has no finite-dimensional invariant subspaces other than {0}.
R
(c) all 1-dimensional subspaces of V are ϕ-invariant;
R
(d) all hyperplanes ( Def. 5.2.1) are ϕ-invariant.
Exercise 16.4.32. Let S be the shift operator ( Example 16.4.8 (a)) on the space RN of
sequences of real numbers, defined by
(a) In Ex. 15.4.15, we defined the space of Fibonacci-like sequences. Show that this is an S-
invariant subspace of RN .
(c) Use the result of part (b) to find an explicit formula for the n-th Fibonacci number.
Definition 16.4.33 (Minimal invariant subspace). Let ϕ : V → V be a linear transformation. Then
U ≤ V is a minimal invariant subspace of ϕ if the only invariant subspace properly contained in U
is {0}.
172 CHAPTER 16. LINEAR MAPS
0 0 ··· 0 1
1
0 · · · 0 0
Exercise 16.4.34. Let ρ = 0
1 · · · 0 0 be an n × n matrix.
.. .. . . .. ..
. . . . .
0 0 ··· 1 0
(a) Count the invariant subspaces of ρ over C
f = α0 + α1 t + · · · + αn tn .
R
for all i), the Ui together with W do not form a chain.
Exercise 16.4.40. Let dtd : Pn (R) → Pn (R) be the derivative linear transformation ( Def.
14.5.68) of the space of real polynomials of degree at most n.
16.4. LINEAR TRANSFORMATIONS 173
d
(a) Prove that the number of dt
-invariant subspaces is n + 2;
♥ Exercise 16.4.41. Let V = Fp Fp , and let ϕ be the shift-by-1 operator, i. e., ϕ(f )(t) = f (t + 1).
What are the invariant subspaces of ϕ? Prove that they form a maximal chain.
Proposition 16.4.42. Let ϕ : V → V be a linear transformation, and let f ∈ F[t]. Then ker f (ϕ)
and im f (ϕ) are invariant subspaces.
The next exercise shows that invariant subspaces generalize the notion of eigenvectors.
Proposition 16.4.44. Let V be a vector space over R and let ϕ : V → V be a linear transformation.
Then ϕ has an invariant subspace of dimension at most 2.
(a) There exists a maximal chain of subspaces, all of which are invariant.
Proposition 16.4.46. Let V be a finite-dimensional vector space with basis b and let ϕ : V → V
be a linear transformation.
(a) Let b a basis of V . Then [ϕ]b is triangular if and only if every initial segment of b spans a
ϕ-invariant subspace of V .
Proposition 16.4.47. Let V be a finite-dimensional vector space with basis b and let ϕ : V → V
R
be a linear transformation. Then [ϕ]b is diagonal if and only if b is an eigenbasis of ϕ.
Theorem ( R
Exercise 16.4.48. Infer Schur’s Theorem ( Theorem 12.4.9) and the real version of Schur’s
Theorem 12.4.18) from the preceding exercise.
174 CHAPTER 16. LINEAR MAPS
R
16.5 Coordinatization
In Section 15.3, we defined coordinatization of a vector with respect to some basis ( Def.
15.3.34). We now extend this to the notion of coordinatization of linear maps.
Definition 16.5.1 (Coordinatization). Let V be an n-dimensional vector space with basis e =
(e1 , . . . , en ), and let W be an m-dimensional vector space with basis f = (f 1 , . . . , f m ). Let αij
m
X
(1 ≤ i ≤ m, 1 ≤ j ≤ n) be coefficients such that ϕ(ej ) = αij f i . Then the matrix representation
i=1
or coordinatization of ϕ with respect to the bases e and f is the m × n matrix
α11 · · · α1n
[ϕ]e,f := ... .. .. (16.17)
. .
αm1 · · · αmn
R
unit vectors at an angle θ.
(b) Compare the trace and determinant ( Def. 6.3.1) of the matrix corresponding to the same
linear transformation in the basis of the preceding exercise (two perpendicular unit vectors).
Example 16.5.4. Write the matrix representation of each of the linear transformations in Ex.
16.4.5 in the basis consisting of three mutually perpendicular unit vectors.
Example 16.5.5. Write the matrix representation of each of the linear transformations in Ex.
16.4.7 in the basis (1, t, t2 , . . . , tn ) of the polynomial space Pn (F).
The next exercise demonstrates that under our rules of coordinatization, the action of a linear
map corresponds to multiplying a column vector by a matrix.
16.5. COORDINATIZATION 175
R
eigenvalue λ
The next exercise shows that under our coordinatization, composition of linear maps ( Def.
16.1.14) corresponds to matrix multiplication. This gives a natural explanation of why we multiply
matrices the way we do, and why this operation is associative.
Proposition 16.5.8. Let U , V , and W be vector spaces with bases e, f , and g, respectively, and
let ϕ : U → V and ψ : V → W be linear maps. Then
(b) Use matrix multiplication to derive the addition formulas for sin and cos.
Example 16.5.12. Let V be an n-dimensional vector space with basis b, and let A ∈ Fk×n . Define
ϕ : V → Fk by x 7→ A[x]b .
(a) Show that ϕ is a linear map.
Definition 16.5.13. Let V be a finite-dimensional vector space over R and let b be a basis of V . Let
ϕ : V → V be a nonsingular linear transformation. We say that ϕ is sense-preserving if det[ϕ]b > 0
and ϕ is sense-reversing if det[ϕ]b < 0.
Proposition 16.5.14.
(b) The sense-reversing transformations of the plane are reflections about a line through the origin.
Proposition 16.5.17.
(a) The sense-preserving linear transformations of 3-dimensional space are rotations about an
axis.
(b) The sense-reversing linear transformations of 3-dimensional space are rotational reflections.
Proposition 16.6.2. Let V be a vector space with bases e and e0 , and let σ : V → V be the change
of basis transformation from e to e0 . Then [σ]e = [σ]e0 .
For this reason, we often denote the matrix representation of the change of basis transformation
σ by [σ] rather than by, e. g., [σ]e .
Notation 16.6.4. Let V be a vector space with bases e and e0 . When changing basis from e to e0 , we
sometimes refer to e as the “old” basis and e0 as the “new” basis. So if v ∈ V is a vector, we often
write [v]old in place of ve and [v]new in place of [v]e0 . Likewise, if W is a vector space with bases f
and f 0 and we change bases from f to f 0 , we consider f the “old” basis and f 0 the “new” basis. So
if ϕ : V → W is a linear map, we write [ϕ]old in place of [ϕ]e,f and [ϕ]new in place of [ϕ]e0 ,f 0 .
Proposition 16.6.5. Let v ∈ V and let e and e0 be bases of V . Let σ be the change of basis
transformation from e to e0 . Then
[v]new = [σ]−1 [v]old . (16.19)
Numerical exercise 16.6.6. For each of the following vector spaces V , compute the change of
basis matrix from e to e0 . Self-check : pick some v ∈ V , determine [v]e and [v]e0 , and verify that
Equation (16.19) holds.
(a) V = G2 , e = (e1 , e2 ) is two perpendicular unit vectors, and e0 = (e1 , e02 ), where e02 is e1 rotated
by θ
where S is the change of basis matrix from e to e0 and T is the change of basis matrix from f to
f 0.
Proposition 16.6.8. Let N be a nilpotent matrix. Then the linear transformation defined by
x 7→ N x has a chain of invariant subspaces.
Proposition 16.6.9. Every matrix A ∈ Mn (C) is similar to a triangular matrix.
Chapter 17
1 2 −3 4
Example 17.1.2. The matrix A = 2 1 6 −1 may be symbolically written as the
0 −3 −1 2
A11 A12 1 2 −3 4
block matrix A = where A11 = , A12 = , A21 = (0, −3), and
A21 A22 2 1 6 −1
A22 = (−1, 2).
matrices. In this case, we say that A is the diagonal sum of the matrices A1 , . . . , Ak .
Babai: Discover Linear Algebra. 178 This chapter last updated August 23, 2016
c 2016 László Babai.
17.1. BLOCK MATRIX BASICS 179
1 2 0 0 0 0
−3 7 0 0 0 0
0 0 6 0 0 0
A=
0 0 0 4 6 0
0 0 0 2 −1 3
0 0 0 3 2 5
A1 0 0
1 2
is a block-diagonal matrix. We may write A = 0 A2 0 where A1 = , A2 = 6 ,
−3 7
0 0 A3
4 6 0
and A3 = 2 −1
3 .
3 2 5
Exercise 17.1.6. Show that block matrices multiply in the same way as matrices. More precisely,
suppose that A = (Aij ) and B = (Bjk ) are block matrices where Aij ∈ Fri ×sj and Bjk ∈ Fsj ×tk . Let
C = AB (why is this product defined?). Show that C = (Cik ) where Cik ∈ Fri ×tk and
X
Cjk = Aij Bjk . (17.1)
j
180 CHAPTER 17. BLOCK MATRICES (OPTIONAL)
A + B = diag(A1 + B1 , . . . , An + Bn ) (17.2)
λA = diag(λA1 , . . . , λAn ) (17.3)
AB = diag(A1 B1 , . . . , An Bn ) . (17.4)
Proposition 17.2.2. Let A = diag(A1 , . . . , An ) be a block-diagonal matrix. Then Ak = diag Ak1 , . . . , Akn
for all k.
In our discussion of the arithmetic block-triangular matrices, we are interested only in the block-
diagonal entries.
and
∗
B1
B2
B=
...
0 Bn
17.2. ARITHMETIC OF BLOCK-DIAGONAL AND BLOCK-TRIANGULAR MATRICES 181
be block-upper triangular matrices with blocks of the same size and let λ ∈ F. Then
∗
A1 + B1
A2 + B2
A+B = (17.6)
. ..
0 An + Bn
∗
λA1
λA2
λA = (17.7)
..
.
0 λAn
∗
A1 B1
A2 B2
AB = . (17.8)
...
0 An Bn
∗
Ak1
k
Ak2
A = (17.9)
..
.
0 Akn
for all k.
Proposition 17.2.6. Let f ∈ F[t] be a polynomial and let A be as in Prop. 17.2.4. Then
∗
f (A1 )
f (A2 )
f (A) = . (17.10)
...
0 f (An )
Chapter 18
3
3 0
A= 7 .
0 7
7
Babai: Discover Linear Algebra. 182 This chapter last updated August 16, 2016
c 2016 László Babai.
18.1. THE MINIMAL POLYNOMIAL 183
R
Find a minimal polynomial for A. Compare your answer to the characteristic polynomial fA . Recall
( Prop. 2.3.4) that for all f ∈ F[t], we have
f (diag(λ1 , . . . , λn )) = diag(f (λ1 ), . . . , f (λn )) .
Proposition 18.1.6. Let A ∈ Mn (F) and let m be a minimal polynomial of A. Then for all
g ∈ F[t], we have g(A) = 0 if and only if m | g.
Corollary 18.1.7. Let A ∈ Mn (F). Then the minimal polynomial of A is unique up to nonzero
scalar factors.
Convention 18.1.8. When discussing “the” minimal polynomial of a matrix, we refer to the unique
monic minimal polynomial, denoted mA .
Corollary 18.1.9 (Cayley-Hamilton restated). The minimal polynomial of a matrix divides its
characteristic polynomial.
Corollary 18.1.10. Let A ∈ Mn (F). Then deg mA ≤ n.
Example 18.1.11. mI = t − 1 .
1 1
Exercise 18.1.12. Let A = . Prove mA = (t − 1)2 .
0 1
Exercise 18.1.13. Find two 2 × 2 matrices with the same characteristic polynomial but different
minimal polynomials.
Exercise 18.1.14. Let
A = diag(λ1 , . . . , λ1 , λ2 , . . . , λ2 , . . . , λk , . . . , λk )
where the λi are distinct. Prove that
k
Y
mA = (t − λi ) . (18.1)
i=1
A1
A2
0
Exercise 18.1.15. Let A be a block-diagonal matrix, say A = . Give a simple
...
0 Ak
Exercise 18.1.17. Prove: similar matrices have the same minimal polynomial, i. e., if A ∼ B then
mA = mB .
R
Proposition 18.1.18. Let A ∈ Mn (C). If mA does not have multiple roots then A is diagonalizable.
R
In this section, V is an n-dimensional vector space over F.
In Section 16.4.1 we defined what it means to plug a linear transformation into a polynomial
( Def. 16.4.36).
Exercise 18.2.1. Define what it means for the polynomial f ∈ F[t] to annihilate the linear trans-
formation ϕ : V → V .
Convention 18.2.7. When discussing “the” minimal polynomial of a linear transformation, we shall
refer to the unique monic minimal polynomial, denoted mϕ .
(a) Let b be a basis of V and let ϕ : V → V be a linear transformation. Let A = [ϕ]b . Show that
mϕ = mA .
(b) Use this to give a second proof of Ex. 18.1.17 (similar matrices have the same minimal
R
polynomial).
Recall the definition of an invariant subspace of the linear transformation ϕ ( Def. 16.4.21).
Definition 18.2.13 (Minimal invariant subspace). Let ϕ : V → V and let W ≤ V . We say that W is
a minimal ϕ-invariant subspace if W is a ϕ-invariant subspace, W 6= {0}, and the only ϕ-invariant
subspaces of W are W and {0}.
mϕ = lcm mi . (18.2)
i
(a) Prove this without using Ex. 18.1.15, i. e., without translating the problem to matrices.
Proposition 18.2.18. Let ϕ : V → V be a linear transformation. Let f | mϕ and let W = ker f (ϕ).
Then
(a) mϕW | f ;
(b) if gcd f, mfϕ = 1, then mϕW = f .
Theorem 18.2.20. Let A ∈ Mn (C). Then A is diagonalizable if and only if mA does not have
multiple roots over C. ♦
Theorem 18.2.22. Let A ∈ Mn (F). Then there is a matrix B ∈ Mn (F) such that A ∼ B and B is
the diagonal sum of matrices whose minimal polynomials are powers of irreducible polynomials. ♦
Chapter 19
Euclidean Spaces
Definition 19.1.1 (Euclidean space). A Euclidean space V is a vector space over R endowed with
an inner product h·, ·i : V × V → R which is positive definite, symmetric, and bilinear. That is, for
all u, v, w ∈ V and α ∈ R, we have
(c) hv, w + αui = hv, wi + αhv, ui and hv + αu, wi = hv, wi + αhu, wi (bilinear)
Observe that the standard dot product of Rn that was introduced in Section 1.4 has all of these
properties and, in particular, the vector space Rn endowed with this inner product is a Euclidean
space.
Exercise 19.1.2. Let V be a Euclidean space with inner product h·, ·i. Show that for all v ∈ V ,
we have
hv, 0i = h0, vi = 0 . (19.1)
Babai: Discover Linear Algebra. 187 This chapter last updated August 21, 2016
c 2016 László Babai.
188 CHAPTER 19. EUCLIDEAN SPACES
Examples 19.1.3. The following vector spaces together with the specified inner products are
Euclidean spaces (verify this).
R∞
(a) V = R[t], hf, gi = −∞ f (t)g(t)dt where ρ(t) is a nonnegative continuous function which is
R∞
not identically 0 and has the property that −∞ ρ(t)t2n dt < ∞ for all nonnegative integers n
(such a function ρ is called a weight function)
R1
(b) V = C[0, 1] (the space of continuous functions f : [0, 1] → R) with hf, gi = 0 f (t)g(t)dt
R
(c) V = Rk×n , hA, Bi = Tr AB T
(d) V = Rn , and hx, yi = xT Ay where A ∈ Mn (R) is a symmetric positive definite ( Def.
10.2.3) n × n real matrix
Notice that the same vector space can be endowed with different inner products (for example,
different weight functions for the inner product on R[t]), so that there are many Euclidean spaces
with the same underlying vector space.
Because they have inner products, Euclidean spaces carry with them the notion of distance
(“norm”) and the notion of two vectors being perpendicular (“orthogonality”). Just as inner prod-
ucts generalize the standard dot product in Rn , these concepts generalize the definitions of norm
and orthogonality presented for Rn (with respect to the standard dot product) in Section 1.4.
Definition 19.1.4 (Norm). Let V be a Euclidean space, and let v ∈ V . Then the norm of v, denoted
kvk, is p
kvk := hv, vi . (19.2)
The notion of a norm allows us to easily define the distance between two vectors.
Definition 19.1.5. Let V be a Euclidean space, and let v, w ∈ V . Then the distance between the
vectors v and w, denoted d(v, w), is
d(v, w) := kv − wk . (19.3)
The following two theorems show that distance in Euclidean spaces behaves the way we are used
to it behaving in Rn .
Theorem 19.1.6 (Cauchy-Schwarz inequality). Let V be a Euclidean space, and let v, w ∈ V .
Then
|hv, wi| ≤ kvk · kwk . (19.4)
♦
19.1. INNER PRODUCTS 189
Theorem 19.1.7 (Triangle inequality). Let V be a Euclidean space, and let v, w ∈ V . Then
Exercise 19.1.8. Show that the triangle inequality is equivalent to the Cauchy-Schwarz inequality.
Definition 19.1.9 (Angle between vectors). Let V be a Euclidean space, and let v, w ∈ V . The
angle θ between v and w is defined by
hv, wi
θ := arccos . (19.6)
kvkkwk
hv, wi
−1≤ ≤1 (19.7)
kvkkwk
Exercise 19.1.11. Let V be a Euclidean space. What vectors are orthogonal to every vector?
Exercise 19.1.12. Let V = C[0, 2π] be the space of continuous functions f : [0, 2π] → R, endowed
with the inner product Z 2π
hf, gi = f (t)g(t)dt . (19.8)
0
Show that the set {1, cos t, sin t, cos(2t), sin(2t), cos(3t), . . . } is an orthogonal set in this Euclidean
space.
Definition 19.1.15 (Gram matrix). Let V be a Euclidean space, and let v1 , . . . , vk ∈ V . The Gram
matrix of v1 , . . . , vk is the k × k matrix whose (i, j) entry is hvi , vj i, that is,
Exercise 19.1.16. Let V be a Euclidean space. Show that the vectors v1 , . . . , vk ∈ V are linearly
independent if and only if det G(v1 , . . . , vk ) 6= 0.
Proposition 19.1.20. Let V be a Euclidean space with orthonormal basis b. Then for all v, w ∈ V ,
R
hv, wi = [v]Tb [w]b . (19.11)
Proposition 19.1.21. Let V be a Euclidean space. Every linear form f : V → R ( Def. 15.1.6)
can be written as
f (x) = ha, xi (19.12)
for a unique a ∈ V .
19.2. GRAM-SCHMIDT ORTHOGONALIZATION 191
Theorem 19.2.1. Every finite-dimensional Euclidean space has an orthonormal basis. In fact,
every orthonormal system extends to an orthonormal basis. ♦
Before we prove this theorem, we will first develop Gram-Schmidt orthogonalization, an online
procedure that takes a list of vectors as input and produces a list of orthogonal vectors satisfying
certain conditions. We formalize this below.
Lemma 19.2.3. Assume GS(k) holds. Then, for all j ≤ k, we have span(e1 , . . . , ej ) = Uj . ♦
Exercise 19.2.5. Let k ≥ 2 and assume GS(k − 1) is true. Look for ek in the form
k−1
X
ek = v k − αi e i (19.13)
i=1
Prove that the only possible vector ek satisfying the conditions of GS(k) is the ek for which
hvi , ei i
αi = (19.14)
kei k2
except in the case where ei = 0 (in that case, αi can be chosen arbitrarily).
192 CHAPTER 19. EUCLIDEAN SPACES
Exercise 19.2.6. Prove that ek as constructed in the previous exercise satisfies GS(k).
This completes the proof of Theorem 19.2.2.
Proposition 19.2.7. ei = 0 if and only if vi ∈ span(v1 , . . . , vi−1 ).
Proposition 19.2.8. The Gram-Schmidt procedure preserves linear independence.
Proposition 19.2.9. If (v1 , . . . , vk ) is a basis of V , then so is (e1 , . . . , ek ).
Exercise 19.2.10. Let e = (e1 , . . . , en ) be an orthogonal basis of V . From e, construct an or-
thonormal basis e0 = (e01 , . . . , e0n ) of V .
Exercise 19.2.11. Conclude that every finite-dimensional Euclidean space has an orthonormal
basis.
Numerical exercise 19.2.12. Apply the Gram-Schmidt procedure to find an orthonormal basis
for each of the following Euclidean spaces V from the basis b. Self-check : once you have applied
Gram-Schmidt, verify that you have obtained an orthonormal set of vectors.
1 1 −1
3
(a) V = R with the standard dot product, b = 0 , 1 , 1
1 0 2
R∞ 2
(b) V = P2 [R], with hf, gi = −∞ e−t /2 f (t)g(t)dt, and b = (1, t, t2 ) (Hermite polynomials)
R∞ 1 2
(c) V = P2 [R], with hf, gi = −∞ √1−t 2 f (t)g(t)dt, and b = (1, t, t ) (Chebyshev polynomials of
Theorem 19.3.2. If V and W are finite dimensional Euclidean spaces, then they are isometric if
and only if dim V = dim W . ♦
Proposition 19.3.3. Let V and W be Euclidean spaces. Then ϕ : V → W is an isometry if and
only if it maps an orthonormal basis of V to an orthonormal basis of W .
Proposition 19.3.4. Let ϕ : V → W be an isomorphism that preserves orthogonality (so v ⊥ w
if and only if ϕ(v) ⊥ ϕ(w)). Show that there is an isometry ψ and a nonzero scalar λ such that
ϕ = λψ.
The geometric notion of congruence is captured by the concept of orthogonal transformations.
Definition 19.3.5 (Orthogonal transformation). Let V be a Euclidean space. A linear transformation
ϕ : V → V is called an orthogonal transformation if it is an isometry. The set of orthogonal
R
transformations of V is denoted by O(V ), and is called the orthogonal group of V .
Proposition 19.3.6. The set O(V ) is a group ( Def. 14.3.2) under composition.
Exercise 19.3.7. The linear transformation ϕ : V → V is orthogonal if and only if ϕ preserves the
norm, i. e., for all v ∈ Cn , we have kϕvk = kvk.
Theorem 19.3.8. Let ϕ ∈ O(V ). Then all eigenvalues of ϕ are ±1. ♦
Proposition 19.3.9. Let V be a Euclidean space and let e = (e1 , . . . , en ) be an orthonormal basis
of V . Then ϕ : V → V is an orthogonal transformation if and only if (ϕ(e1 ), . . . , ϕ(en )) is an
orthonormal basis.
R
Proposition 19.3.10 (Consistency of translation). Let V be a Euclidean space with orthonormal
basis b, and let ϕ : V → V be a linear transformation. Then ϕ is orthogonal if and only if [ϕ]b is
an orthogonal matrix ( Def. 9.1.1).
Definition 19.3.11. Let V be a Euclidean space and let S, T ⊆ V . For v ∈ V , we say that v is
orthogonal to S (notation: v ⊥ S) if for all s ∈ S, we have v ⊥ s. Moreover, we say that S is
orthogonal to T (notation: S ⊥ T ) if s ⊥ t for all s ∈ S and t ∈ T .
Definition 19.3.12. Let V be a Euclidean space and let S ⊆ V . Then S ⊥ (“S perp”) is the set of
vectors orthogonal to S, i. e.,
S ⊥ := {v ∈ V | v ⊥ S}. (19.16)
Proposition 19.3.13. For all subsets S ⊆ V , we have S ⊥ ≤ V .
194 CHAPTER 19. EUCLIDEAN SPACES
⊥
Proposition 19.3.14. Let S ⊆ V . Then S ⊆ S ⊥ .
Exercise 19.3.15. Verify
(a) {0}⊥ = V
(b) ∅⊥ = V
R
(c) V ⊥ = {0}
The next theorem says that the direct sum ( Def. 15.5.1) of a subspace and its perp is the
entire space.
Theorem 19.3.16. If dim V < ∞ and W ≤ V , then V = W ⊕ W ⊥ . ♦
The proof of this theorem requires the following lemma.
Lemma 19.3.17. Let V be a vector space with dim V = n, and let W ≤ V . Then
♦
Proposition 19.3.18. Let V be a finite-dimensional Euclidean space and let S ⊆ V . Then
⊥
S ⊥ = span(S) . (19.18)
We now study the linear map analogue of the transpose of a matrix, known as the adjoint of
the linear map.
Theorem 19.3.20. Let V and W be Euclidean spaces, and let ϕ : V → W be a linear map. Then
there exists a unique linear map ψ : W → V such that for all v ∈ V and w ∈ W , we have
♦
19.4. FIRST PROOF OF THE SPECTRAL THEOREM 195
Note that the inner product above refers to inner products in two different spaces. To be more
specific, we should have written
hϕv, wiW = hv, ψwiV . (19.20)
Definition 19.3.21. The linear map ψ whose existence is guaranteed by Theorem 19.3.20 is called
the adjoint of ϕ and is denoted ϕ∗ . So for all v ∈ V and w ∈ W , we have
The next exercise shows the relationship between the coordinatization of ϕ and of ϕ∗ . The
reason we denote the adjoint of the linear map ϕ by ϕ∗ rather than by ϕT will become clear in
Section 20.4.
Proposition 19.3.22. Let V , W , and ϕ be as in the statement of Theorem 19.3.20. Let b1 be an
orthonormal basis of V and let b2 be an orthonormal basis of W . Then
Theorem 19.4.4 (Spectral Theorem). Let V be a finite-dimensional Euclidean space and let ϕ :
V → V be a symmetric linear transformation. Then ϕ has an orthonormal eigenbasis. ♦
let W ≤ V . If W is ϕ-invariant ( R
Proposition 19.4.5. Let ϕ be a symmetric linear transformation of the Euclidean space V , and
Def. 16.4.21) then W ⊥ is ϕ-invariant.
The heart of the proof of the Spectral Theorem is the following lemma.
Main Lemma 19.4.7. Let ϕ be a symmetric linear transformation of a Euclidean space of degree
≥ 1. Then ϕ has an eigenvector.
Exercise 19.4.8. Assuming Lemma 19.4.7, prove the Spectral Theorem by induction on dim V .
Theorem 19.4.15. Let V be a Euclidean space and let S ⊆ V be a compact and nonempty subset.
If f : S → R is continuous, then f attains its maximum and its minimum. ♦
Proposition 19.4.18. For all linear transformations ϕ : V → V , the Rayleigh quotient Rϕ attains
its maximum and its minimum.
Definition 19.4.20 (arg max). Let S be a set and let f : S → R be a function which attains its
maximum. Then arg max f := {s0 ∈ S | f (s0 ) ≥ f (s) for all s ∈ S}.
Convention 19.4.21. Let S be a set and let f : S → R be a function which attains its maximum.
We often write arg max f to refer to any (arbitrarily chosen) element of the set arg max f , rather
than the set itself.
Proposition 19.4.22. Let ϕ be a symmetric linear transformation of the Euclidean space V , and
let v0 = arg max Rϕ (v). Then v0 is an eigenvector of ϕ.
Prop. 19.4.22 completes the proof of the Main Lemma and thereby the proof of the Spectral
Theorem.
Proposition 19.4.23. If two symmetric matrices are similar then they are orthogonally similar.
Chapter 20
Hermitian Spaces
In Chapter 19, we discussed Euclidean spaces, whose underlying vector spaces were real. We now
generalize this to the notion of Hermitian spaces, whose underlying vector spaces are complex.
Babai: Discover Linear Algebra. 198 This chapter last updated August 9, 2016
c 2016 László Babai.
20.1. HERMITIAN SPACES 199
Definition 20.1.4 (Hermitian form). Let V be a complex vector space. The function f : V × V → C
is Hermitian if for all v, w ∈ V ,
f (v, w) = f (w, v) . (20.3)
A sesquilinear form that is Hermitian is called a Hermitian form.
Exercise 20.1.5. For what matrices A ∈ Mn (C) is the sesquilinear form f (v, w) = v∗ Aw Hermi-
tian?
Exercise 20.1.6. Show that for Hermitian forms, (b) follows from (a) and (d) follows from (c) in
Def. 20.1.1.
Fact 20.1.7. Let V be a vector space over C and let f : V × V → C be a Hermitian form. Then
f (v, v) ∈ R for all v ∈ V .
Definition 20.1.8. Let V be a vector space over C, and let f : V × V → C be a Hermitian form.
(e) If there exists v, w ∈ V such that f (v, v) > 0 and f (w, w) < 0, then f is indefinite.
Exercise 20.1.9. For what A ∈ Mn (C) is v∗ Aw positive definite, positive semidefinite, etc.?
Exercise 20.1.10. Let V be the space of continuous functions f : [0, 1] → C, and let ρ : [0, 1] → C
be a continuous “weight function.” Define
Z 1
F (f, g) := f (t)g(t)ρ(t)dt . (20.4)
0
Exercise 20.1.11. Let r = (ρ0 , ρ1 , . . . ) be an infinite sequence of complex numbers. Consider the
space V of infinite sequences (α0 , α1 , . . . ) of complex numbers such that
∞
X
|αi |2 |ρi | < ∞. (20.5)
i=0
where a = (α0 , α1 , . . . ) and b = (β0 , β1 , . . . ) is a Hermitian space. This is one of the standard
representations of the complex “separable Hilbert space.”
Euclidean spaces generalize the geometric concepts of distance and perpendicularity via the
notions of norm and orthogonality, respectively; these are easily extended to complex Hermitian
spaces.
Definition 20.1.16 (Norm). Let V be a Hermitian space, and let v ∈ V . Then the norm of v,
denoted kvk, is defined to be p
kvk := hv, vi . (20.9)
Just as in Euclidean spaces, the notion of a norm allows us to define the distance between two
vectors in a Hermitian space.
Definition 20.1.17. Let V be a Hermitian space, and let v, w ∈ V . Then the distance between the
vectors v and w, denoted d(v, w), is
d(v, w) := kv − wk . (20.10)
Distance in Hermitian spaces obeys the same properties that we are used to in Euclidean spaces.
Theorem 20.1.18 (Cauchy-Schwarz inequality). Let V be a Hermitian space, and let v, w ∈ V .
Then
|hv, wi| ≤ kvk · kwk . (20.11)
♦
Theorem 20.1.19 (Triangle inequality). Let V be a Hermitian space, and let v, w ∈ V . Then
♦
Again, like in Euclidean spaces, norms carry with them the notion of angle; however, because
hv, wi is not necessarily real, our definition of angle is not identical to the definition of angle
presented in Section 19.1.
Definition 20.1.20 (Orthogonality). Let V be a Hermitian space. Then we say that v, w ∈ V are
orthogonal (notation: v ⊥ w) if hv, wi = 0.
Exercise 20.1.21. Let V be a Hermitian space. What vectors are orthogonal to every vector?
202 CHAPTER 20. HERMITIAN SPACES
R
is an orthonormal system that is a basis of V .
Exercise 20.1.29. Generalize the Gram-Schmidt orthogonalization procedure ( Section 19.2)
to complex Hermitian spaces. Theorem 19.2.2 will hold verbatim, replacing the word “Euclidean”
by “Hermitian.”
Proposition 20.1.30. Every finite-dimensional Hermitian space has an orthonormal basis. In fact,
every orthonormal list of vectors can be extended to an orthonormal basis.
Proposition 20.1.31. Let V be a Hermitian space with orthonormal basis b. Then for all v, w ∈ V ,
R
hv, wi = [v]∗b [w]b . (20.15)
Proposition 20.1.32. Let V be a Hermitian space. Every linear form f : V → C ( Def.
15.1.6) can be written as
f (x) = ha, xi (20.16)
for a unique a ∈ V .
20.2. HERMITIAN TRANSFORMATIONS 203
R
ogous. In particular, while a linear transformation of a Euclidean space that has an orthonormal
eigenbasis is necessarily symmetric, the analogous statement in complex spaces involves “normal
transformations” ( Def. 20.5.1), as opposed to Hermitian transformations.
Exercise 20.2.5. Find a linear transformation of Cn that has an orthonormal eigenbasis but is not
Hermitian.
However, the Spectral Theorem does extend to Hermitian linear transformations.
Theorem 20.2.6 (Spectral Theorem for Hermitian transformations). Let V be a Hermitian space
and let ϕ : V → V be a Hermitian linear transformation. Then
(a) ϕ has an orthonormal eigenbasis;
(b) all eigenvalues of ϕ are real.
Exercise 20.2.7. Prove the converse of the Spectral Theorem for Hermitian transformations: If
ϕ : V → V satisfies (a) and (b), then ϕ is Hermitian.
to normal transformations ( R
In Section 20.6, we shall see a more general form of the Spectral Theorem which extends part (a)
Theorem 20.6.1).
204 CHAPTER 20. HERMITIAN SPACES
R
20.3 Unitary transformations
In Section 19.3, we introduced orthogonal transformations ( Def. 19.3.5), which captured the
geometric notion of congruence in Euclidean spaces. The complex analogues of real orthogonal
transformations are called unitary transformations.
Definition 20.3.1 (Unitary transformation). Let V be a Hermitian space. Then the transformation
ϕ : V → V is unitary if it preserves the inner product, i. e.,
R
for all v, w ∈ V . The set of unitary transformations ϕ : V → V is denoted by U (V ).
Exercise 20.3.3. The linear transformation ϕ : V → V is unitary if and only if ϕ preserves the
R
norm, i. e., for all v ∈ Cn , we have kϕvk = kvk.
Warning. The proof of this is trickier than in the real case ( Ex. 19.3.7).
Theorem 20.3.6 (Spectral Theorem for unitary transformations). Let V be a Hermitian space and
let ϕ : V → V be a unitary transformation. Then
Exercise 20.3.7. Prove the converse of the Spectral Theorem for unitary transformations: If
ϕ : V → V satisfies (a) and (b), then ϕ is unitary.
20.4. ADJOINT TRANSFORMATIONS IN HERMITIAN SPACES 205
Theorem 20.4.1. Let V and W be Hermitian spaces, and let ϕ : V → W be a linear map. Then
there exists a unique linear map ψ : W → V such that for all v ∈ V and w ∈ W , we have
Note that the inner product above refers to inner products in two different spaces. To be more
specific, we should have written
hϕv, wiW = hv, ψwiV . (20.20)
Definition 20.4.2. The linear map ψ whose existence is guaranteed by Theorem 20.4.1 is called the
adjoint of ϕ and is denoted ϕ∗ . So for all v ∈ V and w ∈ W , we have
TO BE WRITTEN.
In this chapter, we discuss matrices over C, but every statement of this chapter holds over R as
well, if we replace matrix adjoints by transposes and the word “unitary” by “orthogonal.”
0
σ1
σ2
..
.
σr
0
...
0 0
Note that such a matrix is a “diagonal” matrix which is not necessarily square.
Babai: Discover Linear Algebra. 207 This chapter last updated August 29, 2016
c 2016 László Babai.
208 CHAPTER 21. THE SINGULAR VALUE DECOMPOSITION
Theorem 21.1.2 (Singular Value Decomposition). Let A ∈ Ck×n . Then there exist unitary matrices
S ∈ U (k) and T ∈ U (n) such that
R
In this section, we use the Singular Value Decomposition to find low-rank approximations to matri-
ces, that is, the matrix of a given rank which is “closest” to a specified matrix under the operator
norm ( Def. 13.1.1).
21.2. LOW-RANK APPROXIMATION 209
Definition 21.2.1 (Truncated matrix). Let D = diagk×n (σ1 , . . . , σr ) be a rank-r matrix (so σ1 , . . . , σr 6=
0). The rank-` truncation (` ≤ r) of D, denoted D` , is the k × n matrix D` = diagk×n =
diagk×n (σ1 , . . . , σ` ).
The next theorem explains how the Singular Value Decomposition helps us find low-rank ap-
proximations to matrices.
Theorem 21.2.2 (Nearest low-rank matrix). Let A ∈ Ck×n be a matrix of rank r with singular
values σ1 ≥ · · · ≥ σr > 0. Define S, T , and D as guaranteed by the Singular Value Decomposition
Theorem so that
S ∗ AT = diagk×n (σ1 , . . . , σr ) = D . (21.2)
Given ` ≤ r, let
D` = diagk×n (σ1 , . . . , σ` ) (21.3)
and define B` = SD` T ∗ . Then B` is the matrix of rank at most ` which is nearest to A under the
operator norm, i. e., rk B` = ` and for all B ∈ Ck×n , if rk B ≤ `, then kA − B` k ≤ kA − Bk.
(a) kA − B` k = kD − D` k ;
(b) kD − D` k = σ`+1 .
As with the proof of the Singular Value Decomposition Theorem, Theorem 21.2.2 is easier to
prove in terms of linear maps. We restate the theorem as follows.
Theorem 21.2.4. Let V and W be Hermitian spaces with orthonormal bases e and f , respectively,
and let ϕ : V → W be a linear map such that ϕei = σi f i for i = 1, . . . , r. Define the truncated map
ϕ` : V → V by
σi f i 1 ≤ i ≤ `
ϕ ` ei = (21.4)
0 otherwise
Then whenever ψ : V → W is a linear map of rank ≤ `, we have kϕ − ϕ` k ≤ kϕ − ψk.
Exercise 21.2.6. Let ϕ and ϕ` be as in the statement of the preceding theorem. Show kϕ − ϕ` k =
σ`+1 .
210 CHAPTER 21. THE SINGULAR VALUE DECOMPOSITION
It follows that in order to prove Theorem 21.2.4, it suffices to show that for all linear maps
ψ : V → W of rank ≤ `, we have kϕ − ψk ≥ σ`+1 .
Exercise 21.2.7. Let ψ : V → W be a linear map of rank ≤ `. Show that there exists v ∈ ker ψ
such that
k(ϕ − ψ)vk
≥ σ`+1 . (21.5)
kvk
Exercise 21.2.8. Complete the proof of Theorem 21.2.4, hence of Theorem 21.2.2.
Exercise 21.2.9. Show that the matrix B` whose existence is guaranteed by Theorem 21.2.2 is
unique, i. e., if there is a matrix B ∈ Ck×` of rank ` such that, kA − B 0 k ≤ kA − Bk for all rank-`
matrices B 0 ∈ Ck×` , then B = B` .
norm is also the rank-` matrix nearest to A under the Frobenius norm ( R
In fact, the rank-` matrix guaranteed by Theorem 21.2.2 to be nearest to A under the operator
Def. 13.2.1).
Theorem 21.2.10. The statement of Theorem 21.2.2 holds for the same matrix B` when the
operator norm is replaced by the Frobenius norm. That is, we also have kA − B` kF ≤ kA − BkF for
all rank-` matrices B ∈ Ck×n . ♦
Chapter 22
Babai: Discover Linear Algebra. 211 This chapter last updated August 24, 2016
c 2016 László Babai.
212 CHAPTER 22. FINITE MARKOV CHAINS
Exercise 22.1.4. Let A ∈ Rk×n and B ∈ Rn×m . Prove that if A and B are stochastic matrices
then AB is a stochastic matrix.
Exercise 22.1.5.
(a) Let A be a stochastic matrix. Prove that A1 = 1.
(b) Show that the converse is false.
(c) Show that A is stochastic if and only if A is nonnegative and A1 = 1.
Definition 22.1.6 (Probability distribution). A probability distribution is a list of nonnegative num-
bers which add to 1.
Fact 22.1.7. Every row of a stochastic matrix is a probability distribution.
FIGURE HERE
Fact 22.2.3. The transition matrix T of a finite Markov Chain is a stochastic matrix, and every
stochastic matrix is the transition matrix of a finite Markov Chain.
Notation 22.2.4 (r-step transition probability). The r-step transition probability from state i to
(r)
state j, denoted pij is defined by
(r)
pij = P (Xt+r = j | Xt = i) . (22.2)
Exercise 22.2.5 (Evolution of Markov Chains, I). Let
T = (pij ) be the transition matrix corre-
(r)
sponding to a finite Markov Chain. Show that T r = pij where
Definition 22.2.6. Let qt,i be the probability that the particle is in state i at time t. We define
qt = (qt,1 , . . . , qt,n ) to be the distribution of the particle at time t.
X n
Fact 22.2.7. For every t ≥ 0, qt,i = 1.
i=1
Exercise 22.2.8 (Evolution of Markov Chains, II). Let T be the transition matrix of a finite
Markov Chain. Show that qt+1 = qt T and conclude that qt = q0 T t .
R
Definition 22.2.9 (Stationary distribution). The probability distribution q is a stationary distribu-
tion if q = qT , i. e., if q is a left eigenvector ( Def. 8.1.17) with eigenvalue 1.
Proposition 22.2.10. Let A ∈ Mn (R) be a stochastic matrix. Show that A has a left eigenvector
with eigenvalue 1.
R
The preceding proposition in conjunction with the next theorem shows that every stochastic
matrix has a stationary distribution.
Recall ( Def. 22.1.1) that a nonnegative matrix is a matrix whose entries are all nonnegative;
a nonnegative vector is defined similarly.
Do not prove the following theorem.
Theorem 22.2.11 (Perron-Frobenius). Every nonnegative square matrix has a nonnegative eigen-
vector.
Corollary 22.2.12. Every finite Markov Chain has a stationary distribution.
Proposition 22.2.13. If T is the transition matrix of a finite Markov Chain and lim T r = L
r→∞
exists, then every row of L is a stationary distribution.
In order to determine which Markov Chains have transition matrices that converge, we study
the directed graphs associated with finite Markov Chains.
214 CHAPTER 22. FINITE MARKOV CHAINS
22.3 Digraphs
TO BE WRITTEN.
FIGURE HERE
Definition 22.4.1 (Irreducible Markov Chain). A finite Markov Chain is irreducible if its associated
digraph is strongly connected.
Proposition 22.4.2. If T is the transition matrix of an irreducible finite Markov Chain and
lim T r = L exists, then all rows of L are the same, so rk L = 1.
r→∞
Definition 22.4.3 (Ergodic Markov Chain). A finite Markov Chain is ergodic if its associated digraph
is strongly connected and aperiodic.
The following theorem establishes a sufficient condition under which the probability distribution
of a finite Markov Chain converges to the stationary distribution. For irreducible Markov Chains,
this is necessary and sufficient.
Theorem 22.4.4. If T is the transition matrix of an ergodic finite Markov Chain, then lim T r
r→∞
exists.
Proposition 22.4.5. For irreducible Markov Chains, the stationary distribution is unique.
More Chapters
TO BE WRITTEN.
Babai: Discover Linear Algebra. This chapter last updated July 7, 2016
c 2016 László Babai.
Chapter 24
Hints
R R
(b) Consider the sums of the entries in each column vector.
R R
vector produces a zero-weight vector.
R R
What are the subspaces of Rn which are spanned by one vector?
R R
w2 ∈ W2 \ W1 . Where is w1 + w2 ?
R R
Prove (a) and (b) together. Prove that the subspace defined by (b) satisfies the definition.
Babai: Discover Linear Algebra. 217 This chapter last updated August 18, 2016
c 2016 László Babai.
218 CHAPTER 24. HINTS
R
R R
Preceding exercise.
R R
Show that span(T ) ≤ span(S).
R R
combination from U1 ∪ U2 can always be written as a sum u1 + u2 for some u1 ∈ U1 and u2 ∈ U2 .
R R
This gives you a system of linear equations. Find a nonzero solution.
R R
P
Assume αi vi = 0. What condition on αj allows you to express vj in terms of the other vectors?
R R
What does the empty sum evaluate to?
R R
Find a nontrivial linear combination of v and v that evaluates to 0.
R R
The only linear combinations of the list (v) are of the form αv for α ∈ F.
R
1.3.17 Exercise Solution
R R
Ex. 1.3.13.
R R
P
If αi vi = 0 and not all the αi are 0, then it must be the case that αk+1 6= 0 (why?).
R
R R
Prop. 1.3.11
R R
What is the simplest nontrivial linear combination of such a list which evaluates to zero?
R R
Combine Prop. 1.3.15 and Fact 1.3.20.
R R
Begin with a basis. How can you add one more vector that will satisfy the condition?
R R
a contradiction.
R R
end, wi1 , . . . , wik are linearly independent.
R R
k
Use the standard basis of F and the preceding exercise.
R R
than k linearly independent vectors.
R R
Show that if this list were not linearly independent, then U1 ∩ U2 would contain a nonzero vector.
R
1.3.52 Exercise Solution
R R
Ex. 1.3.51.
R R
Start with a basis of U1 ∩ U2 .
R R
Only the zero vector. Why?
R R
Express 1 · x in terms of the entries of x.
R R
coefficients must be 0 by taking the dot product of W with each member of S.
24.2 Matrices
R R
(d3) The size is the sum of the squares of the entries. This is the Frobenius norm of the matrix.
R R
Show, by induction on k, that the (i, j) entry of Ak is zero whenever j ≤ i + k − 1.
R R
Write B as a sum of matrices of the form [0 | · · · | 0 | bi | 0 | · · · | 0] and then use distributivity.
R R
Apply the preceding exercise to B = I.
R R
Consider (A − B)x.
R
2.2.31 Exercise Solution
R R
Prop. 2.2.28.
R
2.2.32 Exercise Solution
R R
Prop. 2.2.29
R
2.2.33 Exercise Solution
R R
Prop. 2.2.29
R R
What is the (i, i) entry of AB?
R
2.2.39 Exercise Solution
R R
Preceding exercise.
R R
If Tr(AB) = 0 for all A, then B = 0.
R R
Induction on k. Use the preceding exercise.
R R
Induction on the degree of f .
R R
Observe that the off-diagonal entries do not affect the diagonal entries.
R R
Induction on k. Use the preceding exercise.
R
2.5.1 Exercise Solution
R R
Ex. 2.2.2
R
R R
Ex. 2.2.14.
R R
Multiply by the matrix with a 1 in the (i, j) position and 0 everywhere else.
(a) Trace.
R R
Incidence vectors ( Def. 1.6.2).
R R
rkcol (A) + rkcol (B). Use the upper bound to derive the lower bound.
R R
Consider the k × r submatrix formed by taking r linearly independent columns.
R R
Extend a basis of U to a basis of Fn , and consider the action of the matrix A on this basis.
R
3.6.7 Exercise
Ex. 2.2.19.
Solution
24.4. QUALITATIVE THEORY OF SYSTEMS OF LINEAR EQUATIONS 223
4.1.4 R Exercise
Immediate from Ex. 2.2.26.
R Solution
5.1.14 R Exercise R
Solution
(b) Let v0 ∈ S and let U = {s − v0 | s ∈ S}. Show that
(i) U ≤ Fn ;
(ii) S is a translate of U .
R
6.4.12 Exercise R Solution
The columns of A are linearly dependent ( R Def. 3.3.10) Use this fact to find elementary column
R R
operations that create an all-zero column.
6.4.15
R
Exercise Solution
Necessity was established in the preceding exercise ( R Cor. 6.4.14). Sufficiency follows from
R R
Prop. 3.2.7.
R R
determinant of a familiar matrix.
R
Sufficiency follows from the determinantal expression of A−1 . Necessity follows from the multiplica-
R R
tivity of the determinant ( Prop. 6.4.13).
R
6.8.5 Exercise Solution
R R
Ex. 6.10.5
R R
Identity matrix.
8.4.9 R Exercise
Induction on n.
R Solution
R R
(α1 , . . . , αn )T works.
R R
Generalize the hint to Theorem 11.1.5.
R
R R
(c) Skew symmetric matrices. Ex. 6.10.3
11.3.3 Exercise Solution
R R
Give a very simple expression for B in terms of A.
11.3.9 Exercise Solution
R R
Use the fact that λ is real.
11.3.15 Exercise Solution
R R
Interlacing.
11.3.23 Exercise Solution
(b) Triangular matrices.
(a)
R R
(b) Prove that your example must have rank 1.
R R
are of this form. The λi are the eigenvalues of the resulting circulant matrix.
12.4.9 Exercise Solution
R R
Induction on n.
R
12.4.15 Exercise Solution
R R
Prop. 12.4.6.
R
12.4.19 Exercise
Theorem 12.4.18
Solution
226 CHAPTER 24. HINTS
13.1.2 R Exercise
kAxk
RSolution
R RR
kxk=1
R
15.3.14 Exercise R Solution
R R
Use the fact that a polynomial of degree n has at most n roots.
R
15.4.9 Exercise Solution
R R
Ex. 14.5.57.
R
15.4.10 Exercise Solution
R R
Ex. 16.4.10.
R R
Start with a basis of U1 ∩ U2 .
R R
It is an immediate result of Cor. 15.4.3 that every list of k + 1 vectors in Rk is linearly dependent.
R R
Coordinate vector.
R R
is a basis for im(ϕ).
(a)
R
(b)
(d)
16.6.7 R Exercise R
Solution
Write A = [ϕ]old and A0 = [ϕ]new . Show that for all x ∈ Fn (n = dim V ), we have A0 x = T −1 ASx.
19.1.7 R Exercise R
Solution
R R
Derive this from the Cauchy-Schwarz inequality.
R Theorem 11.1.5.
R R
19.3.20 Exercise
Prop. 19.1.21.
R Solution
R
19.4.19
f 0 (0) = 0.
Exercise R Solution
R
19.4.22 Exercise
⊥
R Solution
Define U = {v0 } , and for each u ∈ U \ {0}, consider the function fu : R → R defined by
fu (t) = Rϕ (v0 + tu). Apply the preceding exercise to this function.
R
20.1.19 Exercise R Solution
Derive this from the Cauchy-Schwarz inequality.
R R
20.1.32 Exercise
Theorem 11.1.5.
R Solution
R
20.2.2 Exercise R Solution
For an eigenvalue λ, you need to show that λ = λ. The proof is just one line. Consider the
expression x∗ Ax.
R
20.2.5 Exercise R Solution
Which diagonal matrices are Hermitian?
R R
20.4.1 Exercise
Prop. 20.1.32.
R Solution
24.19. THE SINGULAR VALUE DECOMPOSITION 229
R
22.1.5 Exercise R Solution
R R
For a general matrix B ∈ Mn (R), what is the i-th entry of B1?
R
22.2.10 Exercise Solution
R R
Prop. ??.
Solutions
−5 −2 2
−1 = 2 1 − 1 6
2
11 7 6
Babai: Discover Linear Algebra. 230 This chapter last updated August 9, 2016
c 2016 László Babai.
25.2. MATRICES 231
25.2 Matrices
8.1.3 R Exercise
n
R Hint
R R
Every vector in F is an eigenvector of In with eigenvalue 1.
n
! n
!
X X
T
v Av = αi bTi A αi bi
i=1 i=1
n
! n
!
X X
= αi bTi αi Abi
i=1 i=1
n
! n
!
X X
= αi bTi αi λi bi
i=1 i=1
n X
X n
= λi αi αj bTj bi
i=1 j=1
n X
X n
= λi αi αj (bj · bi )
i=1 j=1
Xn X n
= λi αi αj δij
i=1 j=1
Xn
= λi αi2
i=1
so A is positive definite.
25.11. BILINEAR AND QUADRATIC FORMS 233
11.3.3
B= A+AT
R Exercise R Hint
. Verify that this and only this matrix works.
2
18.2.10
Qn 0
R Exercise
mA = i=1 (t − λi ) where
Q0
R Hint
means that we take each factor only once (so eigenvalues λi with
multiplicity greater than 1 only contribute one factor of t − λi ).
235
236 INDEX
Fk , 4, 21 Hyperplane, 57, 57
basis of, 15, 52 intersection of, 57
dimension of, 15 linear, see Linear hyperplane
standard basis of, 14, 26, 28
subspace of, 8, 8–10, 57 I, see Identity matrix
codimension of, 45, 45 Identity matrix, 26, 26, 43, 78
dimension of, 13, 13–15 Incidence vector, 18, 18–19
disjoint, 15
J matrix, 21, 28
intersection of, 9
Jordan block, 37
totally isotropic, 105, 105–106
trivial, 9 k-cycle, 64, 65
union of, 9 Kronecker delta, 26
Fk×n , 21
F[t], 82 `2 norm, 109
Field, 4 `2 norm, 18
of characteristic 2, 75 Linear combination
field affine, see Affine combination
algebraically closed, 84 of column vectors, 6, 6–8, 10, 16, 29
First Miracle of Linear Algebra, 15, 15 trivial, 11
Frobenius norm, 114, 114–115 Linear dependence
submultiplicativity of, 115 of column vectors, 11, 11–69
Fundamental Theorem of Algebra, 84 Linear form
over Fn , 97, 97, 101
Gaussian elimination, 40 Linear hyperplane, 57
Generalized Fisher Inequality, 19 intersection of, 57
Group Linear independence
orthogonal, see Orthogonal group maximal, 14, 14
symmetric, see Symmetric group of column vectors, 11, 11–13, 15, 18, 37,
unitary, see Unitary group 42, 46, 51, 79
List, 11
Hadamard matrix, 92, 92 concatenation of, 11
Hadamard’s Inequality, 75, 76 empty, 12
Half-space, 59, 59 List of generators, 14, 14
Helly’s Theorem, 59, 59
Hermitian dot product, 108, 108–110 Mn (F), 21
238 INDEX
Third Miracle of Linear Algebra, 91, 110 Zero matrix, 21, 29, 42
Trace Zero polynomial, 82, 82
of a matrix, 29, 29–30, 86 Zero vector, see 0