0% found this document useful (0 votes)
9 views

Lin Alg Book

The document is a preliminary draft of a text titled 'Discover Linear Algebra' by László Babai and Noah Halford, focusing on the foundations of Linear Algebra through inquiry-based learning. It highlights key results referred to as 'miracles' in linear algebra, such as the impossibility of boosting linear independence and the equality of row-rank and column-rank. The text is structured into parts covering matrix theory and more abstract concepts, requiring a degree of mathematical maturity from the reader.

Uploaded by

vyaskartik328
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lin Alg Book

The document is a preliminary draft of a text titled 'Discover Linear Algebra' by László Babai and Noah Halford, focusing on the foundations of Linear Algebra through inquiry-based learning. It highlights key results referred to as 'miracles' in linear algebra, such as the impossibility of boosting linear independence and the equality of row-rank and column-rank. The text is structured into parts covering matrix theory and more abstract concepts, requiring a degree of mathematical maturity from the reader.

Uploaded by

vyaskartik328
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 251

Discover Linear Algebra

Incomplete Preliminary Draft

Last updated: April 3, 2024

László Babai
in collaboration with Noah Halford

All rights reserved.


Approved for instructional use only.
Commercial distribution prohibited.
c 2016, 2020, 2023 László Babai.
Preface

This text offers a guided tour of discovery of the foundations of Linear Algebra. The text is written
in IBL style (Inquiry-based learning): we introduce concepts and results, but most of the proofs
are left to the reader who will build up the techniques and the theory through series of exercises.
Further creative exercises are designed to enhance the experience of discovery. One of my favorites:
any non-zero polynomial can be multiplied by some other non-zero polynomial such that in the
product, only terms with prime number exponents can have non-zero coefficients.
Some of the surprising key results of the basic theory are highlighted as “miracles.” One of
the several equivalent formulations of the First Miracle of Linear Algebra is the impossibility of
boosting linear independence: among all linear combinations of m vectors, we shall never find m + 1
that are linearly independent. Another, equivalent formulation is that the dimension of Rn is n,
not more. The Second Miracle is that the row-rank and the column-rank of a matrix are equal, a
fact I cannot cease to marvel. What on earth does linear independence of columns have to do with
linear independence of rows—they don’t even live in the same universe, the matrix does not need
to be square. The Third Miracle is related to the second: if the columns of a real matrix form an
orthonormal basis, then so do the rows. What do the dot products of the rows have to do with the
dot products of the columns? While rushing through these basic facts, we tend to overlook their
magical quality.

∗ ∗ ∗

Linear algebra deals with objects called “vectors.” The basic operation is linear combination of
vectors, i. e., expressions of the form α1 v1 + · · · + αn vn where the vi are vectors and the αi are
“scalars.” The domain of scalars is a prespecified field. Examples of fields include Q, the set of
rational numbers, R, the set of real numbers, C, the field of complex numbers, and the finite fields
Fq . The reader who is not comfortable with the general concept of fields can assume, for most of
the material in this book, that the scalars are real numbers and in some cases, complex numbers.

Babai: Discover Linear Algebra. ii This chapter last updated January 15, 2023
c 2016 László Babai.
iii

R and C are the two most important fields for most applications of linear algebra, although finite
fields have also been of great significance for discrete mathematics and digital communications
engineering (error correcting codes). The general concept of fields, including their characteristic,
and specifically finite fields, as well as other elements of basic abstract algebra, are introduced in
Chapter 14, at the beginning of Part II. The material of Chapter 14 should suffice for the reader
to appreciate the material of the entire book in the context of an arbitrary field as the domain of
scalars.
In the book, F denotes any field. Each chapter is marked with a subset of the symbols F, R,
C. If a chapter or section is marked R, it means that the material of that unit is restricted to real
coefficients. The title of Chapter 1 is marked (F, R), meaning that some (in fact, most) sections in
this chapter talk about an arbitrary field, but some part of the material (in this case, Section 1.5)
applies to the case of real scalars only. Chapters marked (R, C) indicate that some of the material
applies to real scalars only, and other parts of the material only to complex scalars.
The book requires a degree of mathematical maturity—a good understanding of set notation,
familiarity with proofs.
But Part I, “Matrix Theory,” is relatively hands-on, it does not require abstract algebra. Chapter
6, “Determinants,” is a cornerstone of Part I, a theory that seems to be somewhat neglected in
undergraduate linear algebra courses.
In Part II we take a more abstract approach; we discuss linear algebra in the framework of the
general concept of vector spaces, introduced in Chapter 15.

∗ ∗ ∗

The first draft of this book was written up by Noah Halford, then an undergraduate, in 2016, based
on my lectures and detailed instructions. I owe a debt of gratitude to Noah for jumpstarting this
project. However, polishing the presentation remains an ongoing effort and will take some more
time; meanwhile I advise the reader to read the material critically. I will appreciate any warnings
of the, no doubt numerous, errors.

January 15, 2023

László Babai
University of Chicago
Contents

Notation x

I Matrix Theory 1
Introduction to Part I 2

1 Column Vectors 3
1.1 Column vector basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 The domain of scalars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Subspaces and span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Linear independence and the First Miracle of Linear Algebra . . . . . . . . . . . . 11
1.4 Dot product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Dot product over R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Matrices 20
2.1 Matrix basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Arithmetic of diagonal and triangular matrices . . . . . . . . . . . . . . . . . . . . 31
2.4 Permutation Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.5 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Matrix Rank 39
3.1 Column and row rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

iv
CONTENTS v

3.2 Elementary operations and Gaussian elimination . . . . . . . . . . . . . . . . . . . 40


3.3 Invariance of column and row rank, the Second Miracle of Linear Algebra . . . . . 41
3.4 Matrix rank and invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Codimension (optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Qualitative Theory of Systems of Linear Equations 48


4.1 Homogeneous systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . 48
4.2 General systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 Affine and Convex Combinations (optional) 53


5.1 Affine combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 Convex combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.4 Helly’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.5 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6 The Determinant 61
6.1 Motivation: solving systems of linear equations . . . . . . . . . . . . . . . . . . . . 61
6.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3 Defining the determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.4 Properties of determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.5 Expressing rank via determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.6 Dependence of the rank on the field of scalars . . . . . . . . . . . . . . . . . . . . . 69
6.7 Cofactor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.8 Determinantal formula for the inverse matrix . . . . . . . . . . . . . . . . . . . . . 73
6.9 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.10 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7 Theory of Systems of Linear Equations II: Cramer’s Rule 77

8 Eigenvectors and Eigenvalues 78


8.1 Eigenvector and eigenvalue basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
8.2 Similar matrices and diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.3 Polynomial basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.4 The characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
vi CONTENTS

8.5 The Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86


8.6 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

9 Orthogonal Matrices 91
9.1 Orthogonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
9.2 Orthogonal similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

10 The Spectral Theorem 94


10.1 Statement of the Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
10.2 Applications of the Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 95

11 Bilinear and Quadratic Forms 97


11.1 Linear and bilinear forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
11.2 Multivariate polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
11.3 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
11.4 Geometric algebra (optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

12 Complex Matrices 107


12.1 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
12.2 Hermitian dot product in Cn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
12.3 Hermitian and unitary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.4 Normal matrices and unitary similarity . . . . . . . . . . . . . . . . . . . . . . . . . 111

13 Matrix Norms 113


13.1 Operator norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
13.2 Frobenius norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
13.3 Complex Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

II Linear Algebra of Vector Spaces 116


Introduction to Part II 117

14 Algebra 118
14.1 Basic concepts of arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
14.1.1 Arithmetic of sets of integers . . . . . . . . . . . . . . . . . . . . . . . . . . 118
CONTENTS vii

14.1.2 Divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120


14.1.3 Greatest common divisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
14.1.4 Fundamental Theorem of Arithmetic . . . . . . . . . . . . . . . . . . . . . 125
14.1.5 Least common multiple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
14.2 Modular arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
14.2.1 Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
14.2.2 Equivalence relations, residue classes . . . . . . . . . . . . . . . . . . . . . 128
14.3 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
14.4 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
14.5 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

15 Vector Spaces: Basic Concepts 150


15.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
15.2 Subspaces and span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
15.3 Linear independence and bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
15.4 The First Miracle of Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 159
15.5 Direct sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

16 Linear Maps 163


16.1 Linear map basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
16.2 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
16.3 The Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
16.4 Linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
16.4.1 Invariant subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
16.5 Coordinatization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
16.6 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

17 Block Matrices (optional) 178


17.1 Block matrix basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
17.2 Arithmetic of block-diagonal and block-triangular matrices . . . . . . . . . . . . . . 180

18 Minimal Polynomials of Matrices and Linear Transformations (optional) 182


18.1 The minimal polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
18.2 Minimal polynomials of linear transformations . . . . . . . . . . . . . . . . . . . . . 184
viii CONTENTS

19 Euclidean Spaces 187


19.1 Inner products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
19.2 Gram-Schmidt orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
19.3 Isometries and orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
19.4 First proof of the Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 195

20 Hermitian Spaces 198


20.1 Hermitian spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
20.2 Hermitian transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
20.3 Unitary transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
20.4 Adjoint transformations in Hermitian spaces . . . . . . . . . . . . . . . . . . . . . . 205
20.5 Normal transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
20.6 The Complex Spectral Theorem for normal transformations . . . . . . . . . . . . . 206

21 The Singular Value Decomposition 207


21.1 The Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
21.2 Low-rank approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

22 Finite Markov Chains 211


22.1 Stochastic matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
22.2 Finite Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
22.3 Digraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
22.4 Digraphs and Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
22.5 Finite Markov Chains and undirected graphs . . . . . . . . . . . . . . . . . . . . . 214
22.6 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

23 More Chapters 216

24 Hints 217
24.1 Column Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
24.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
24.3 Matrix Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
24.4 Qualitative Theory of Systems of Linear Equations . . . . . . . . . . . . . . . . . . 223
24.5 Affine and Convex Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
24.6 The Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
24.7 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
CONTENTS ix

24.8 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224


24.9 The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
24.10 Bilinear and Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
24.11 Complex Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
24.12 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
24.13 Basic Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
24.14 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
24.15 Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
24.16 Minimal Polynomials of Matrices and Linear Transformations . . . . . . . . . . . . 227
24.17 Euclidean Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
24.18 Hermitian Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
24.19 The Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
24.20 Finite Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

25 Solutions 230
25.1 Column Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
25.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.3 Matrix Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.4 Theory of Systems of Linear Equations I: Qualitative Theory . . . . . . . . . . . . 231
25.5 Affine and Convex Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.6 The Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.7 Theory of Systems of Linear Equations II: Cramer’s Rule . . . . . . . . . . . . . . 231
25.8 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.9 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.10 The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.11 Bilinear and Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.12 Complex Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.13 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.14 Basic Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.15 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.16 Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.17 Minimal Polynomials of Matrices and Linear Transformations . . . . . . . . . . . . 233
25.18 Euclidean Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.19 Hermitian Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.20 Finite Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Notation

Symbol Meaning
[n] the set {1, . . . , n} where n is a non-negative integer
N the set {1, 2, 3, . . . } of natural numbers
N0 the set {0, 1, 2, . . . } of non-negative integers
Z the set {. . . , −2, −1, 0, 1, 2, . . . } of integers
Q the field of rational numbers
R the field of real numbers
R× the set of real numbers excluding zero
C the field of complex numbers
C× the set of complex numbers exclusing zero
F an arbitrary field
F× the field F excluding zero
Zm the set of residue classes modulo m
Z×m the set of those residue classes modulo m that are relatively prime to m
Fk×n the set of k × n matrices over the field F
Mn (F) the set of n × n matrices over the field F
diag(α1 , . . . , αn ) n × n diagonal matrix with the αi in the diagonal
im(ϕ) image of the linear map ϕ
⊕ direct sum
≤ subspace
∅ the empty set
⊥ “perp” – orthogonality, orthogonal complement

Babai: Discover Linear Algebra. x This chapter last updated August 27, 2016
c 2016 László Babai.
Part I

Matrix Theory

1
Introduction to Part I

TO BE WRITTEN.

Babai: Discover Linear Algebra. 2 This chapter last updated August 21, 2016
c 2016 László Babai.
Chapter 1

(F, R) Column Vectors

1.1 (F) Column vector basics


We begin with a discussion of column vectors.

Definition 1.1.1 (Column vector). A column vector of height k is a list of k numbers arranged in a
column, written as
 
α1
 α2 
 ..  .
 
.
αk

The k numbers in the column are referred to as the entries of the column vector; we will normally
use lower case Greek letters such as α, β, and ζ to denote these numbers. We denote column vectors
by bold letters such as u, v, w, x, y, b, e, f , etc., so we may write

 
α1
α2 
v =  ..  .
 
.
αk

Babai: Discover Linear Algebra. 3 This chapter last updated August 29, 2016
c 2016 László Babai.
4 CHAPTER 1. COLUMN VECTORS

1.1.1 The domain of scalars


In general, the entries of column vectors will be taken from a “field,” denoted by F. We shall refer
to the elements of the field F as “scalars,” and we will normally denote scalars by lowercase Greek
letters. We will discuss fields in detail in Section 14.4. Informally, a field is a set endowed with
the operations of addition and multiplication which obey the rules familiar from the arithmetic of
real numbers (commutativity, associativity, inverses, etc.). Examples of fields are Q, R, C, and
Fp , where Q is the set of rational numbers, R is the set of real numbers, C is the set of complex
numbers, and Fp is the “integers modulo p” for a prime p, so Fp is a finite field of “order” p (order
= number of elements).
The reader who is not comfortable with finite fields or with the abstract concept of fields may
ignore the exercises related to them and always take F to be Q, R, or C. In fact, taking F to be R
suffices for all sections except those specifically related to C.
We shall also consider integral vectors, i. e., vectors whose entries are integers. Z denotes the
set of integers and Zk the set of integral vectors of lengh k (integral vectors with k components),
so Zk ⊆ Qk ⊆ Rk ⊆ Ck . Note that Z is not a field (division does not work within Z). Our notation
F for the domain of scalars always refers to a field and therefore does not include Z.
k
X
Notation 1.1.2. Let α1 , . . . , αn ∈ F. The expression αi denotes the sum α1 + · · · + αk . More
i=1 X
generally, for an index set I = {i1 , i2 , . . . , i` }, the expression αi denotes the sum αi1 + · · · + αi` .
i∈I
0
X X
Convention 1.1.3 (Empty sum). The empty sum, denoted αi or αi , evaluates to zero.
i=1 i∈∅

∗ ∗ ∗

Definition 1.1.4 (The space Fk ). For a domain F of scalars, we define the space Fk of column vectors
of height k over F by   

 α 1 

α2 
 

k
F :=  ..  αi ∈ F (1.1)
 

 . 


 α 

k
1.1. COLUMN VECTOR BASICS 5

Definition 1.1.5 (Zero vector). The zero vector in Fk is the vector


 
0
0
0k :=  ..  (1.2)
 
.
0

We often write 0 instead of 0k when the height of the vector is clear from context.
Definition 1.1.6 (All-ones vector). The all-ones vector in Fk is the vector
 
1
1
1k :=  ..  (1.3)
 
.
1

We sometimes write 1 instead of 1k .


Definition 1.1.7 (Addition of column vectors). Addition of column vectors of the same height is
defined elementwise, i. e.,      
α1 β1 α 1 + β1
α2  β2   α2 + β2 
 ..  +  ..  =  ..  . (1.4)
     
. .  . 
αk βk α k + βk
     
2 0 2
Example 1.1.8. Let v =  6  and let w = −3 . Then v + w = 3.
  
−1 2 1

Exercise 1.1.9. Verify that vector addition is commutative, i. e., for v, w ∈ Fk , we have

v+w =w+v . (1.5)

Exercise 1.1.10. Verify that vector addition is associative, i. e., for u, v, w ∈ Fk , we have

u + (v + w) = (u + v) + w . (1.6)
6 CHAPTER 1. COLUMN VECTORS

Column vectors also carry with them the notion of “scaling” by an element of F.
Definition 1.1.11 (Multiplication of a column vector by a scalar). Let v ∈ Fk and let λ ∈ F. Then
the vector λv is the vector v after each entry has been scaled (multiplied) by a factor of λ, i. e.,
   
α1 λα1
α2  λα2 
λ  ..  =  ..  . (1.7)
   
.  . 
αk λαk
   
2 6
Example 1.1.12. Let v = −1 . Then 3v = −3.
  
1 3
Definition 1.1.13 (Linear combination). Let v1 , . . . , vm ∈ Fn . Then a linear combination of the
m
X
vectors v1 , . . . , vm is a sum of the form αi vi where α1 , . . . , αm ∈ F.
i=1

Example 1.1.14. The following is a linear combination.


       
2 −1 6 −5
−3  2  1 2   4 
  + 4  −   =   .
6  0  2  10   1 
1 −3 −8 −7
Numerical exercise 1.1.15. The following linear combinations are of the form αa + βb + γc.
Evaluate in two ways, as (αa + βb) + γc and as αa + (βb + γc). Self-check : you must get the same
answer.
     
2 3 −7
(a) +3 −2
4 −1 2
     
1 −4 1
1
(b) −2 6 + 2 −2 + 1
   
3 6 4
     
3 −2 7
(c) −  7  − 4  3  + −4
−1 0 3
1.1. COLUMN VECTOR BASICS 7

  
−5 −2
Exercise 1.1.16. Express the vector −1 as a linear combination of the vectors
   1  and
  11 7
2
6 .
6
Exercise 1.1.17.
 
3
(a) Express  1  as a linear combination of the vectors
−2
     
1 −3 1
1 ,  3  , 2 .
1 0 3
First describe the nature of the problem you need to solve.
 
3
(b) Give an “aha” proof that  1  cannot be expressed as a linear combination of
−2
     
−3 2 3
 1  , −2 , −8
2 0 5
An “aha” proof may not be easy to find but it has to be immediately convincing. This problem

R
will be generalized in Ex. 1.2.7.
Exercise 1.1.18. To what does a linear combination of the empty list of vectors evaluate? ( Con-
vention 1.1.3)
Let us now consider the system of linear equations
α11 x1 + α12 x2 + · · · + α1n xn = β1
α21 x1 + α22 x2 + · · · + α2n xn = β2
.. (1.8)
.
αk1 x1 + αk2 x2 + · · · + αkn xn = βk
8 CHAPTER 1. COLUMN VECTORS

Given the αij and the βi , we need to find x1 , . . . , xn that satisfy these equations. This is arguably
one of the most fundamental problems of applied mathematics. We can rephrase this problem in
terms of the vectors a1 , . . . , an , b, where
 
α1j
α2j 
aj :=  .. 
 
 . 
αkj

is the column of coefficients of xj and the vector


 
β1
β2 
b :=  .. 
 
.
βk

represents the right-hand side. With this notation, our system of linear equations takes the more
concise form
x1 a1 + x2 a2 + · · · + xn an = b . (1.9)
The problem of solving the system of equations (1.8) therefore is equivalent to expressing the vector
b as a linear combination of the ai .

1.2 (F) Subspaces and span


Definition 1.2.1 (Subspace). A set W ⊆ Fn is a subspace of Fn (denoted W ≤ Fn ) if it is closed
under linear combinations.
Exercise 1.2.2. Let W ≤ Fk . Show that 0 ∈ W . (Why is the empty set not a subspace?)
Proposition 1.2.3. W ≤ Fn if and only if
(a) 0 ∈ W

(b) If u, v ∈ W , then u + v ∈ W

(c) If v ∈ W and α ∈ F, then αv ∈ W .


1.2. SUBSPACES AND SPAN 9

Exercise 1.2.4. Show that, if W is a nonempty subset of Fn , then (c) implies (a).
Exercise 1.2.5. Show
(a) {0} ≤ Fk ;

(b) Fk ≤ Fk .
We refer to these as the trivial subspaces of Fk .
Exercise 1.2.6.
  
 α1 
(a) The set  0  α1 = 2α3 is a subspace of R3 .
α3
 
  
α1
(b) The set α2 = α1 + 7 is not a subspace of R2 .
α2
Exercise 1.2.7. Let   

 α 1 

 α 2 

Xk 

k
Wk =  ..  ∈ F αi = 0
 

 . i=1



 α 

k

Show that Wk ≤ Fk . This is the 0-weight subspace of Fk .


Exercise 1.2.8. Prove that, for n ≥ 2, the space Rn has infinitely many subspaces.
Proposition 1.2.9. Let W1 , W2 ≤ Fn . Then
(a) W1 ∩ W2 ≤ Fn

(b) The intersection of any (finite or infinite) collection of subspaces of Fn is also a subspace of
Fn .

(c) W1 ∪ W2 ≤ Fn if and only if W1 ⊆ W2 or W2 ⊆ W1

Definition 1.2.10 (Span). Let v1 , . . . , vm ∈ Fk . Then the span of S = {v1 , . . . , vm }, denoted


span(v1 , . . . , vm ), is the smallest subspace of Fk containing S, i. e.,
10 CHAPTER 1. COLUMN VECTORS

(a) span S ⊇ S;

(b) span S is a subspace of Fk ;

(c) for every subspace W ≤ Fn , if S ⊆ W then span S ≤ W .


Fact 1.2.11. span(∅) = {0}.
Theorem 1.2.12. Let S ⊆ Fn . Then
(a) span S exists and is unique;
\
(b) span(S) = W . This is the intersection of all subspaces containing S.
S⊆W ≤Fn


This theorem tells us that the span exists. The next theorem constructs all the elements of the
span.
Theorem 1.2.13. For S ⊆ Fn , span S is the set of all linear combinations of the finite subsets of
S. (Note that this is true even when S is empty. Why?) ♦
Proposition 1.2.14. Let S ⊆ Fn . Then S ≤ Fn if and only if S = span(S).
Let S ⊆ Fn .
Proposition 1.2.15. Prove that span(span(S)) = span(S). Prove this
(a) based on the definition;

(b) (linear combinations of linear combinations) based on Theorem 1.2.13.


Proposition 1.2.16 (Transitivity of span). Suppose R ⊆ span(T ) and T ⊆ span(S). Then R ⊆
span(S).
Definition 1.2.17 (Sum of sets). Let A, B ⊆ Fn . Then A + B is the set

A + B = {a + b | a ∈ A, b ∈ B} . (1.10)

Proposition 1.2.18. Let U1 , U2 ≤ Fn . Then U1 + U2 = span(U1 ∪ U2 ).


1.3. LINEAR INDEPENDENCE AND THE FIRST MIRACLE OF LINEAR ALGEBRA 11

1.3 (F) Linear independence and the First Miracle of Lin-


ear Algebra
In this section, v1 , v2 , . . . will denote column vectors of height k, i. e., vi ∈ Fk .
Definition 1.3.1 (List). A list of objects ai is a function whose domain is an “index set” I; we write
the list as (ai | i ∈ I). Most often our index set will be I = {1, . . . , n}. In this case, we write the
list as (a1 , . . . , an ). The size of the list is |I|.
Notation 1.3.2 (Concatenation of lists). Let L = (v1 , v2 , . . . , v` ) and M = (w1 , w2 , . . . , wn ) be lists.
We denote by (L, M ) the list obtained by concatenation L and M , i. e., the list (v1 , v2 , . . . , v` , w1 , w2 , . . . , wn ).
If the list M has only one element, we omit the parentheses the list, that is, we write (L, w) rather
than (L, (w)).

is the linear combination ( R


Definition 1.3.3 (Trivial linear combination). The trivial linear combination of the vectors v1 , . . . , vm
Def. 1.1.13) 0v1 + · · · + 0vn (all coefficients are 0).
Fact 1.3.4. The trivial linear combination evaluates to zero.
Definition 1.3.5 (Linear independence). The list (v1 , . . . , vn ) is said to be linearly independent if the
only linear combination that evaluates to zero is the trivial linear combination. The list (v1 , . . . , vn )
is linearly dependent if it is not linearly independent, i. e., if there exist scalars α1 , . . . , αn , not all
Xn
zero, such that αi vi = 0.
i=1
Definition 1.3.6. If a list (v1 , . . . , vn ) of vectors is linearly independent (dependent), we say that
the vectors v1 , . . . , vn are linearly independent (dependent).
Definition 1.3.7. We say that a set of vectors is linearly independent if a list formed by its elements
(in any order and without repetitions) is linearly independent.
Example 1.3.8. Let      
1 3 −4
v1 = −1 , v2 = 1 , v3 =  1  .
2 0 2
Then v1 , v2 , v3 are linearly independent vectors. It follows that the set {v1 , v2 , v3 , v2 } is linearly
independent while the list (v1 , v2 , v3 , v2 ) is linearly dependent.
Note that the list (v1 , v2 , v3 , v2 ) has four elements, but the set {v1 , v2 , v3 , v2 } has three elements.
12 CHAPTER 1. COLUMN VECTORS
    
1 4 −2
Exercise 1.3.9. Show that the vectors −3 , 2 , −8 are linearly dependent.
2 1 3
Definition 1.3.10. We say that the vector w depends on the list (v1 , . . . , vk ) of vectors if w ∈
span(v1 , . . . , vk ), i. e., if w can be expressed as a linear combination of the vi .
Proposition 1.3.11. The vectors v1 , . . . , vk are linearly dependent if and only if there is some i
such that vi depends on the other vectors in the list.
Exercise 1.3.12. Show that the list (v, w, v + w) is linearly dependent.
Exercise 1.3.13. Is the empty list linearly independent?
Exercise 1.3.14. Show that if a list is linearly independent, then any permutation of it is linearly
independent.
Proposition 1.3.15. The list (v, v), consisting of the vector v ∈ Fn listed twice, is linearly depen-
dent.
Exercise 1.3.16. Is there a vector v such that the list (v), consisting of a single item, is linearly
dependent?
Exercise 1.3.17. Which vectors depend on the empty list?
Definition 1.3.18 (Sublist). A sublist of a list L is a list M consisting of some of the elements of L,
in the same order in which they appear in L.
Examples 1.3.19. Let L = (a, b, c, b, d, e).
(a) The empty list is a sublist of L.

(b) L is a sublist of itself.

(c) The list L1 = (a, b, d) is a sublist of L.

(d) The list L2 = (b, b) is a sublist of L.

(e) The list L3 = (b, b, b) is not a sublist of L.

(f) The list L4 = (a, d, c, e) is not a sublist of L.


1.3. LINEAR INDEPENDENCE AND THE FIRST MIRACLE OF LINEAR ALGEBRA 13

RR
Fact 1.3.20. Every sublist of a linearly independent list of vectors is linearly independent.

The following lemma is central to the proof of the First Miracle of Linear Algebra ( Theorem
1.3.40) as well as to our characterization of bases as maximal linearly independent sets ( Prop.
1.3.37).

Lemma 1.3.21. Suppose (v1 , . . . , vk ) is a linearly independent list of vectors and the list (v1 , . . . , vk+1 )
is linearly dependent. Then vk+1 ∈ span(v1 , . . . , vk ). ♦

Proposition 1.3.22. The vectors v1 , . . . , vk are linearly dependent if and only if there is some j
such that
vj ∈ span(v1 , . . . , vj−1 , vj+1 , . . . , vk ) .

Exercise 1.3.23. Prove that no list of vectors containing 0 is linearly independent.

Exercise 1.3.24. Prove that a list of vectors with repeated elements (the same vector occurs more
than once) is linearly dependent. (This follows from combining which two previous exercises?)

R
Definition 1.3.25 (Parallel vectors). Let u, v ∈ Fn . We say that u and v are parallel if there exists
a scalar α such that u = αv or v = αu. Note that 0 is parallel to all vectors, and the relation of
being parallel is an equivalence relation ( Def. 14.2.12) on the set of nonzero vectors.

Exercise 1.3.26. Let u, v ∈ Fn . Show that the list (u, v) is linearly dependent if and only if u
and v are parallel.

fields, a much stronger statement holds ( R


Exercise 1.3.27. Find n + 1 vectors in Fn such that every n are linearly independent. Over infinite
Ex. 15.3.15).

Definition 1.3.28 (Rank). The rank of a set S ⊆ Fn , denoted rk S, is the size of the largest linearly
independent subset of S. The rank of a list is the rank of the set formed by its elements.

Proposition 1.3.29. Let S and T be lists. Show that

rk(S, T ) ≤ rk S + rk T (1.11)

where rk(S, T ) is the rank of the list obtained by concatenating the lists S and T .

Definition 1.3.30 (Dimension). Let W ≤ Fn be a subspace. The dimension of W , denoted dim W ,


is its rank, that is, dim W = rk W .
14 CHAPTER 1. COLUMN VECTORS

Definition 1.3.31 (List of generators). Let W ≤ Fn . The list L = (v1 , . . . , vk ) ⊆ W is said to be a


list of generators of W if span(v1 , . . . , vk ) = W . In this case, we say that v1 , . . . , vk generate W .
Definition 1.3.32 (Basis). A list b = (b1 , . . . , bn ) is a basis of the subspace W ≤ Fk if b is a linearly
independent list generators of W .

Fact 1.3.33. If W ≤ Fn and b is a list of vectors of W then b is a basis of W if and only if it is


linearly independent and span(b) = W .

Definition 1.3.34 (Standard basis of Fk ). The standard basis of Fk is the basis (e1 , . . . , ek ), where
ei is the column vector which has its i-th component equal to 1 and all other components equal to
0. The vectors e1 , . . . , ek are sometimes called the standard unit vectors.
For example, the standard basis of F3 is
     
1 0 0
0 , 1 , 0
0 0 1

Exercise 1.3.35. The standard basis of Fk is a basis.

Note that subspaces of Fk do not have a “standard basis.”


Definition 1.3.36 (Maximal linearly independent list). Let W ≤ Fn . A list L = (v1 , . . . , vk ) of
vectors in W is a maximal linearly independent list in W if L is linearly independent but for all
w ∈ W , the list (L, w) is linearly dependent.

Proposition 1.3.37. Let W ≤ Fk and let b = (b1 , . . . , bk ) be a list of vectors in W . Then b is a


basis of W if and only if it is a maximal linearly independent list.

proved shortly (
most k.
R
Exercise 1.3.38. Prove: every subspace W ≤ Fk has a basis. You may assume the fact, to be
Cor. 1.3.49), that every linearly independent list of vectors in Fk has size at

Proposition 1.3.39. Let L be a finite list of generators of W ≤ Fk . Then L contains a basis of


W.

Next we state a central result to which the entire field of linear algebra arguably owes its
character.
1.3. LINEAR INDEPENDENCE AND THE FIRST MIRACLE OF LINEAR ALGEBRA 15

Theorem 1.3.40 (First Miracle of Linear Algebra). Let v1 , . . . , vk be linearly independent with
vi ∈ span(w1 , . . . , wm ) for all i. Then k ≤ m.
The proof of this theorem requires the following lemma.
Lemma 1.3.41 (Steinitz exchange lemma). Let (v1 , . . . , vk ) be a linearly independent list of vectors
such that vi ∈ span(w1 , . . . , wm ) for all i. Then there exists j (1 ≤ j ≤ m) such that the list
(wj , v2 , . . . , vk ) is linearly independent. ♦
Exercise 1.3.42. Use the Steinitz exchange lemma to prove the First Miracle of Linear Algebra.
Corollary 1.3.43. Let W ≤ Fk . Every basis of W has the same size (same number of vectors).
Exercise 1.3.44. Prove: Cor. 1.3.43 is equivalent to the First Miracle, i. e., infer the First Miracle
from Cor. 1.3.43.
Corollary 1.3.45. Every basis of Fk has size k.
The following result is essentially a restatement of the First Miracle of Linear Algebra.
Corollary 1.3.46. rk(v1 , . . . , vk ) = dim (span(v1 , . . . , vk )).
Exercise 1.3.47. Prove Cor. 1.3.46 is equivalent to the First Miracle, i. e., infer the First Miracle
from Cor. 1.3.46.
Corollary 1.3.48. dim Fk = k.
Corollary 1.3.49. Every linearly independent list of vectors in Fk has size at most k.
Corollary 1.3.50. Let W ≤ Fk and let L be a linearly independent list of vectors in W . Then L
can be extended to a basis of W .
Exercise 1.3.51. Let U1 , U2 ≤ Fn with U1 ∩ U2 = {0}. Let v1 , . . . , vk ∈ U1 and w1 , . . . , w` ∈ U2 .
If the lists (v1 , . . . , vk ) and (w1 , . . . , w` ) are linearly independent, then so is the concatenated list
(v1 , . . . , vk , w1 , . . . , w` ).
Proposition 1.3.52. Let U1 , U2 ≤ Fn with U1 ∩ U2 = {0}. Then
dim U1 + dim U2 ≤ n . (1.12)
Proposition 1.3.53 (Modular equation). Let U1 , U2 ≤ Fk . Then
dim(U1 + U2 ) + dim(U1 ∩ U2 ) = dim U1 + dim U2 . (1.13)
16 CHAPTER 1. COLUMN VECTORS

1.4 (F) Dot product


We now define the dot product, an operation which takes two vectors as input and outputs a scalar.
Definition 1.4.1 (Dot product). Let x, y ∈ Fn with x = (α1 , α2 , . . . , αn )T and y = (β1 , β2 , . . . , βn )T .
(Note that these are column vectors). Then the dot product of x and y, denoted x · y, is
n
X
x · y := αi βi ∈ F . (1.14)
i=1

Proposition 1.4.2. The dot product is symmetric, that is,

x·y =y·x .

Exercise 1.4.3. Show that the dot product is distributive, that is,

R
x · (y + z) = x · y + x · z .

Exercise 1.4.4. Show that the dot product is bilinear ( Def. 6.4.9), i. e., for x, y, z ∈ Fn and
α ∈ F, we have

(x + z) · y = x · y + z · y (1.15)
(αx) · y = α(x · y) (1.16)
x · (y + z) = x · y + x · z (1.17)
x · (αy) = α(x · y) (1.18)

Exercise 1.4.5. Show that the dot product preserves linear combinations, i. e., for x1 , . . . , xk , y ∈
Fn and α1 , . . . , αk ∈ F, we have
k
! k
X X
αi xi · y = αi (xi · y) (1.19)
i=1 i=1

and for x, y1 , . . . , y` ∈ Fn and β1 , . . . , β` ∈ F,


`
! `
X X
x· βi y i = βi (x · yi ) . (1.20)
i=1 i=1
1.4. DOT PRODUCT 17

Numerical exercise 1.4.6. Let


    
3 1 3
x= 4  , y = 0 , z= 2 
−2 2 −4

Compute x · y, x · z, and x · (y + z). Self-check : verify that x · (y + z) = (x · y) + (x · z).

Exercise 1.4.7. Compute the following dot products.

(a) 1k · 1k for k ≥ 1.
 
α0
 α1 
(b) x · x where x =  ..  ∈ Fk
 
.
αk

Definition 1.4.8 (Orthogonality). Let x, y ∈ Fk . Then x and y are orthogonal (notation: x ⊥ y) if


x · y = 0.

Exercise 1.4.9. What vectors are orthogonal to every vector?

Exercise 1.4.10. Which vectors are orthogonal to 1k ?

Definition 1.4.11 (Isotropic vector). The vector v ∈ Fn is isotropic if v 6= 0 and v · v = 0.

Exercise 1.4.12.

(a) Show that there are no isotropic vectors in Rn .

(b) Find an isotropic vector in C2 .

(c) Find an isotropic vector in F22 .

Theorem 1.4.13. If v1 , . . . , vk are pairwise orthogonal and non-isotropic non-zero vectors, then
they are linearly independent. ♦
18 CHAPTER 1. COLUMN VECTORS

1.5 (R) Dot product over R


We now specialize our discussion of the dot product to the space Rn .
Exercise 1.5.1. Let x ∈ Rn . Show x · x ≥ 0, with equality holding if and only if x = 0.
This “positive definiteness” of the dot product allows us to define the “norm” of a vector, a
generalization of length.
Definition 1.5.2 (Norm). The norm of a vector x ∈ Rn , denoted kxk, is defined as

kxk := x · x . (1.21)

This norm is also referred to as the Euclidean norm or the `2 norm.


Definition 1.5.3 (Orthogonal system). An orthogonal system in Rn is a list of (pairwise) orthogonal
nonzero vectors in Rn .
Exercise 1.5.4. Let S ⊆ Rk be an orthogonal system in Rk . Prove that S is linearly independent.
Definition 1.5.5 (Orthonormal system). An orthonormal system in Rn is a list of (pairwise) orthog-
onal vectors in V , all of which have unit norm.
Definition 1.5.6 (Orthonormal basis). An orthonormal basis of W ≤ Rn is an orthonormal system
that is a basis of W .
Exercise 1.5.7.
(a) Find an orthonormal basis of Rn .
(b) Find all orthonormal bases of R2 .
Orthogonality is studied in more detail in Chapter 19.

1.6 (R) Additional exercises


Exercise 1.6.1. Let v1 , . . . , vn be vectors such that kvi k > 1 for all i, and vi · vj = 1 whenever
i 6= j. Show that v1 , . . . , vn are linearly independent.
Definition 1.6.2 (Incidence vector). Let A ⊆ {1, . . . , n}. Then the incidence vector vA ∈ Rn is the
vector whose i-th coordinate is 1 if i ∈ A and 0 otherwise.
1.6. ADDITIONAL EXERCISES 19

Exercise 1.6.3. Let A, B ⊆ {1, . . . , n}. Express the dot product vA · vB in terms of the sets A
and B.

♥ Exercise 1.6.4 (Generalized Fisher Inequality). Let λ ≥ 1 and let A1 , . . . , Am ⊆ {1, . . . , n} be


distinct sets such that for all i 6= j we have |Ai ∩ Aj | = λ. Then m ≤ n.
Chapter 2

Matrices

2.1 Matrix basics


Definition 2.1.1 (Matrix). A k × n matrix is a table of numbers arranged in k rows and n columns,
written as  
α11 α12 · · · α1n
 .. .. .. 
 . . . 
αk1 αk2 · · · αkn

We may write M = (αi,j )k×n to indicate a matrix whose entry in position (i, j) (i-th row, k-th
column) is αi,j . For typographical convenience we usually omit the comma separating the row
index and the column index and simply write αij instead of αi,j ; we use the comma if its omission
would lead to ambiguity. So we write M = (αij )k×n , or simply M = (αij ) if the values k and
n are clear from context. We also write (M )ij to indicate the (i, j) entry of the matrix M , i. e.,
αij = (M )ij .

Example 2.1.2. This is a 3 × 5 matrix.


 
1 3 −2 0 −1
6 5 4 −3 −6
2 7 1 1 5

In this example, we have α22 = 5 and α15 = −1.

Babai: Discover Linear Algebra. 20 This chapter last updated August 29, 2016
c 2016 László Babai.
2.1. MATRIX BASICS 21

Definition 2.1.3 (The space Fk×n ). The set of k × n matrices with entries from the domain F of
scalars is denoted by Fk×n . Recall that F always denotes a field. Square matrices (k = n) have
special significance, so we write Mn (F) :=Fn×n . We identify M1 (F) with F and omit the matrix
notation, i. e., we write α rather than α . An integral matrix is a matrix with integer entries.
Naturally, Zk×n will denote the set of k × n integral matrices, and Mn (Z) = Zn×n . Recall that Z is
not a field.  
  2 6 9
0 −1 4 7
Example 2.1.4. For example, ∈ R2×4 is a 2 × 4 matrix and  3 −4 −2 ∈
−3 5 6 8
−5 1 4
M3 (R) is a 3 × 3 matrix.
Observe that the column vectors of height k introduced in Chapter 1 are k × 1 matrices, so
Fk = Fk×1 . Moreover, every statement about column vectors in Chapter 1 applies analogously to
1 × n matrices (“row vectors”).
Notation 2.1.5. When writing row vectors,
 we use commas to avoid ambiguity, so we write, for
example, (3, 5, −1) instead of 3 5 −1 .
Notation 2.1.6 (Zero matrix). The k × n matrix with all of its entries equal to 0 is called the zero
matrix and is denoted by 0k×n , or simply by 0 if k and n are clear from context.
Notation 2.1.7 (All-ones matrix). The k × n matrix with all of its entries equal to 1 is denoted by
Jk×n or J. We write Jn for Jn×n .
Definition 2.1.8 (Diagonal matrix). A matrix A = (αij ) ∈ Mn (F) is diagonal if αij = 0 whenever
i 6= j. The n × n diagonal matrix with entries λ1 , . . . λn is denoted by diag(λ1 , . . . , λn ).
Example 2.1.9.  
5 0 0 0 0
0 3 0 0 0
 
diag(5, 3, 0, −1, 5) = 
0 0 0 0 0

0 0 0 −1 0
0 0 0 0 5
Notation 2.1.10. To avoid filling most of a matrix with the number “0”, we often write matrices
like the one above as  


5
 3 0


diag(5, 3, 0, −1, 5) = 
 0 


0 −1 
5
22 CHAPTER 2. MATRICES

where the big 0 symbol means that every entry in the triangles above or below the diagonal is 0.
Definition 2.1.11 (Upper and lower triangular matrices). A matrix A = (αij ) ∈ Mn (F) is upper
triangular if αij = 0 whenever i > j. A is said to be strictly upper triangular if αij = 0 whenever
i ≥ j. Lower triangular and strictly lower triangular matrices are defined analogously.
Examples 2.1.12.
 
5 2 0 7 2
 3 0 −4 0 
 
(a) 
 0 6 0  is upper triangular.

0 −1 −3
5
 
0 2 0 7 2
 0 0 −4 0 
 
(b) 
 0 6 0  is strictly upper triangular.

0 0 −3
0
 


5
2 3 0 

(c) 0 0 0
 
 is lower triangular.
7 −4 6 −1 
2 0 0 −3 5
 


0
2 0 0 

(d) 
 0 0 0 
 is strictly lower triangular.
7 −4 6 0 
2 0 0 −3 0
Fact 2.1.13. The diagonal matrices are the matrices which are simultaneously upper and lower
triangular.
Definition 2.1.14 (Matrix transpose). The transpose of a k × ` matrix M = (αij ) is the ` × k matrix
(βij ) defined by
βij = αji (2.1)
2.1. MATRIX BASICS 23

and is denoted M T . (We flip it across its main diagonal, so the rows of A become the columns of
AT and vice versa.)
Examples 2.1.15.
 
 T 3 1
3 1 4
(a) = 1 5 
1 5 9
4 9

(b) 1k = (1, 1, . . . , 1)T


| {z }
k times

(c) In Examples 2.1.12, the matrix (c) is the transpose of (a), and (d) is the transpose of (b).
T
Fact 2.1.16. Let A be a matrix. Then AT = A.
Definition 2.1.17 (Symmetric matrix). A matrix M is symmetric if M = M T .
Note that if a matrix M ∈ Fk×` is symmetric then k = ` (M is square).
 
1 3 0
Example 2.1.18. The matrix 3 5 −2 is symmetric.
0 −2 4
Definition 2.1.19 (Matrix addition). Let A = (αij ) and B = (βij ) be k × n matrices. Then the sum
A + B is the k × n matrix with entries

(A + B)ij = αij + βij (2.2)

That is, addition is defined elementwise.


Example 2.1.20.      
2 1 1 −6 3 −5
4 −2 + 2 3  = 6 1 
0 5 1 4 1 9

Fact 2.1.21 (Adding zero). For any matrix A ∈ Fk×n , A + 0 = 0 + A = A.


Proposition 2.1.22 (Commutativity). Matrix addition obeys the commutative law: if A, B ∈
Fk×n , then A + B = B + A.
24 CHAPTER 2. MATRICES

Proposition 2.1.23 (Associativity). Matrix addition obeys the associative law: if A, B, C ∈ Fk×n ,
then (A + B) + C = A + (B + C).

Definition 2.1.24 (The negative of a matrix). Let A ∈ Fk×n be a matrix. Then −A is the k × n
matrix defined by (−A)ij = −(A)ij .

Proposition 2.1.25. Let A ∈ Fk×n . Then A + (−A) = 0.

Definition 2.1.26 (Multiplication of a matrix by a scalar). Let A = (αij ) ∈ Fk×n , and let ζ ∈ F.
Then ζA is the k × n matrix whose (i, j) entry is ζ · αij .

Example 2.1.27.    
1 2 3 6
3 −3 4 = −9 12
0 6 0 18

Fact 2.1.28. −A = (−1)A.

2.2 Matrix multiplication


Definition 2.2.1 (Matrix multiplication). Let A = (αij ) be an r × s matrix and B = (βjk ) be an
s × t matrix. Then the matrix product C = AB is the r × t matrix C = (γik ) defined by
s
X
γik = αij βjk (2.3)
j=1

Exercise 2.2.2. Let A ∈ Fk×n and let B ∈ Fn×m . Show that

(AB)T = B T AT . (2.4)

Proposition 2.2.3 (Distributivity). Matrix multiplication obeys the right distributive law: if A ∈
Fk×n and B, C ∈ Fn×` , then A(B + C) = AB + AC. Analogously, it obeys the left distributive law:
if A, B ∈ Fk×n and C ∈ Fn×` , then (A + B)C = AC + BC.

Proposition 2.2.4 (Associativity). Matrix multiplication obeys the associative law: if A, B, and
C are matrices with compatible dimensions, then (AB)C = A(BC).
2.2. MATRIX MULTIPLICATION 25

Proposition 2.2.5 (Matrix multiplication vs. scaling). Let A ∈ Fk×n , B ∈ Fn×` , and α ∈ F. Then

A(αB) = α(AB) = (αA)B. (2.5)

Numerical exercise 2.2.6. For each of the following triples of matrices, compute the products
AB, AC, and A(B + C). Self-check : verify that A(B + C) = AB + AC.
 
1 2
(a) A = ,
2 1 
3 1 0
B= ,
−4 2 5 
1 −7 −4
C=
5 3 −6
 
2 5
(b) A = 1 1  ,
 3 −3 
4 6 3 1
B= ,
3 3 −5 4 
1 −4 −1 5
C=
2 4 10 −7
 
3 1 2
(c) A = 4 −2 4  ,
1 −3 −2 
−1 4
B= 3 2 ,
−5 −2
2 −3
C = −6 −2

0 1
(`×m)
Exercise 2.2.7. Let A ∈ Fk×n , and let Eij be the ` × m matrix with a 1 in the (i, j) position
and 0 everywhere else.
(k×k)
(a) What is Eij A?
26 CHAPTER 2. MATRICES

(n×n)
(b) What is AEij ?

Definition 2.2.8 (Rotation matrix). The rotation matrix Rθ is the matrix defined by
 
cos θ − sin θ
Rθ = . (2.6)
sin θ cos θ

R
As we shall see, this matrix is intimately related to the rotation of the Euclidean plane by θ
( Example 16.5.2).

R
Exercise 2.2.9. Prove Rα+β = Rα Rβ . Your proof may use the addition theorems for the trigono-
metric functions. Later when we learn about the connection between matrices and linear transfor-
mations, we shall give a direct proof of this fact which will imply the addition theorems ( Ex.
16.5.11).

Definition 2.2.10 (Identity matrix). The n × n identity matrix, denoted In or I, is the diagonal
matrix whose diagonal entries are all 1, i. e.,
 
1
 1 0 
I = diag(1, 1, . . . , 1) =  (2.7)
 
. . . 

0 
1

This is also written as I = (δij ), where the Kronecker delta symbol δij is defined by

1 i=j
δij = (2.8)

R
0 i 6= j

Fact 2.2.11. The columns of I are the standard unit vectors ( Def. 1.3.34).

Proposition 2.2.12. For all A ∈ Fk×n ,

Ik A = AIn = A . (2.9)

Definition 2.2.13 (Scalar matrix). The matrix A ∈ Mn (F) is a scalar matrix if A = αI for some
α ∈ F.
2.2. MATRIX MULTIPLICATION 27

Exercise 2.2.14. Let A ∈ Fk×n and let B ∈ F`×k . Let D = diag(λ1 , . . . , λk ). Show

(a) DA is the matrix obtained by scaling the i-th row of A by λi for each i;

(b) BD is the matrix obtained by scaling the j-th column of B by λj for each j.

Definition 2.2.15 (`-th power of a matrix). Let A ∈ Mn (F) and ` ≥ 0. We define A` , the `-th power
of A, inductively:

(i) A0 = I ;

(ii) A`+1 = A · A` .

So
A` = |A ·{z
· · A} . (2.10)
` times

Exercise 2.2.16. Let A ∈ Mn (F). Show that for all `, m ≥ 0, we have

(a) A`+m = A` Am ;
m
(b) A` = A`m .

Exercise 2.2.17. For k ≥ 0, compute


 
k 1 1
(a) A for A = ;
0 1
 
k 0 1
♥ (b) B for B = ;
1 1
 
k cos θ − sin θ
(c) C for C = ;
sin θ cos θ

(d) Interpret and verify the following statements:

(d1) Ak grows linearly


(d2) B k grows exponentially
(d3) C k stays of constant “size.” (Define “size” in this statement.)
28 CHAPTER 2. MATRICES

Definition 2.2.18 (Nilpotent matrix). The matrix N ∈ Mn (F) is nilpotent if there exists an integer
k such that N k = 0.

Exercise 2.2.19. Show that if A ∈ Mn (F) is strictly upper triangular, then An = 0.

R
if and only if it is “similar” ( Def. 8.2.1) to a strictly upper triangular matrix (
Notation 2.2.20. We denote by Nn the n × n matrix defined by
R
So every strictly upper triangular matrix is nilpotent. Later we shall see that a matrix is nilpotent
Ex. 8.2.4).

 


0 1
 0 1 

0
 0 1 
Nn =   . (2.11)
 
. . . .
 . . 
 

0 0 1
0

That is, Nn = (αij ) where (


1 if j = i + 1
αij = .
0 otherwise

R
Exercise 2.2.21. Find Nnk for k ≥ 0.

Fact 2.2.22. In Section 1.4, we defined the dot product of two vectors ( Def. 1.4.1). The dot
k
product may also be defined in terms of matrix multiplication. Let x, y ∈ F . Then

x · y = xT y . (2.12)

Exercise 2.2.23. If v ⊥ 1 then Jv = 0.

Notation 2.2.24. For a matrix A ∈ Fk×n , we sometimes write

A = [a1 | · · · | an ]

where ai is the i-th column of A.

Exercise 2.2.25 (Extracting columns and elements of a matrix via multiplication). Let ei be the
i-th column of I (i. e., the i-th standard unit vector), and let A = (αij ) = [a1 | · · · | an ]. Then
2.2. MATRIX MULTIPLICATION 29

(a) Aej = aj ;

(b) eTi Aej = αij .

Exercise 2.2.26. [Linear combination as a matrix product] Let A = [a1 | · · · | an ] ∈ Fk×n and let
x = (α1 , . . . , αn )T ∈ Fn . Show that

Ax = α1 a1 + · · · + αn an . (2.13)

Exercise 2.2.27 (Left multiplication acts column by column). Let A ∈ Fk×n and let B = [b1 |
· · · | b` ] ∈ Fn×` . (The bi are the columns of B.) Then

AB = [Ab1 | · · · | Ab` ] . (2.14)

Proposition 2.2.28 (No cancellation). For all n ≥ 2 and k ≥ 1 and for all x ∈ Fn , there exist
k × n matrices A and B such that Ax = Bx but A 6= B.

Proposition 2.2.29. Let A ∈ Fk×n . If Ax = 0 for all x ∈ Fn , then A = 0.

Corollary 2.2.30 (Cancellation). If A, B ∈ Fk×n are matrices such that Ax = Bx for all x ∈ Fn ,
then A = B. Note: compare with Prop. 2.2.28.

Proposition 2.2.31 (No double cancellation). Let k ≥ 2. Then for all n and for all x ∈ Fk and
y ∈ Fn , there exist k × n matrices A and B such that xT Ay = xT By but A 6= B.

Proposition 2.2.32. Let A ∈ Fk×n . If xT Ay = 0 for all x ∈ Fk and y ∈ Fn , then A = 0.

Corollary 2.2.33 (Double cancellation). If A, B ∈ Fk×n are matrices such that xT Ay = xT By for
all x ∈ Fk and y ∈ Fn , then A = B.

Definition 2.2.34 (Trace). The trace of a square matrix A = (αij ) ∈ Mn (F) is the sum of its diagonal
entries, that is,
X n
Tr(A) = αii (2.15)
i=1

Examples 2.2.35.

(a) Tr(In ) = n
30 CHAPTER 2. MATRICES
 
3 1 2
(b) Tr 4 −2 4  = −1
1 −3 −2

Fact 2.2.36 (Linearity of the trace). (a) Tr(A + B) = Tr(A) + Tr(B) ;

(b) for λ ∈ F, Tr(λA) = λ Tr(A) ;


P P
(c) for αi ∈ F, Tr ( αi Ai ) = αi Tr Ai .

Exercise 2.2.37. Let v, w ∈ Fn . Then

Tr vwT = vT w .

(2.16)

Note that vwT ∈ Mn (F) and vT w ∈ F.

Proposition 2.2.38 (Trace commutativity). Let A ∈ Fk×n and B ∈ Fn×k . Then

Tr(AB) = Tr(BA) . (2.17)

Exercise 2.2.39. Show that the trace of a product is invariant under a cyclic permutation of the
terms, i. e., if A1 , . . . , Ak are matrices such that the product A1 · · · Ak is defined and is a square
matrix, then
Tr(A1 · · · Ak ) = Tr(Ak A1 · · · Ak−1 ) . (2.18)

Exercise 2.2.40. Show that the trace of a product is not invariant under all permutations of the
terms. In particular, find 2 × 2 matrices A, B, and C such that

Tr(ABC) 6= Tr(BAC) .

Exercise 2.2.41 (Trace cancellation). Let B, C ∈ Mn (F). Show that if Tr(AB) = Tr(AC) for all
A ∈ Mn (F), then B = C.
2.3. ARITHMETIC OF DIAGONAL AND TRIANGULAR MATRICES 31

2.3 Arithmetic of diagonal and triangular matrices


In this section we turn our attention to special properties of diagonal and triangular matrices.
Proposition 2.3.1. Let A = diag(α1 , . . . , αn ) and B = diag(β1 , . . . , βn ) be n × n diagonal matrices
and let λ ∈ F. Then

A + B = diag(α1 + β1 , . . . , αn + βn ) (2.19)
λA = diag(λα1 , . . . , λαn ) (2.20)
AB = diag(α1 β1 , . . . , αn βn ) . (2.21)

Proposition 2.3.2. Let A = diag(α1 , . . . , αn ) be a diagonal matrix. Then Ak = diag α1k , . . . , αnk

R
for all k.
Definition 2.3.3 (Substition of a matrix into a polynomial). Let f ∈ F[t] be the polynomial ( Def.
8.3.1) defined by
f = α0 + α1 t + · · · + αd td .
Just as we may substitute ζ ∈ F for the variable t in f to obtain a value f (ζ) ∈ F, we may also
“plug in” the matrix A ∈ Mn (F) to obtain f (A) ∈ Mn (F). The only thing we have to be careful
about is what we do with the scalar term α0 ; we replace it with α0 times the identity matrix, so

f (A) := α0 I + α1 A + · · · + αd Ad . (2.22)

Proposition 2.3.4. Let f ∈ F[t] be a polynomial and let A = diag(α1 , . . . , αn ) be a diagonal


matrix. Then
f (A) = diag(f (α1 ), . . . , f (αn )) . (2.23)

In our discussion of the arithmetic of triangular matrices, we focus on the diagonal entries.
Notation 2.3.5. For the remainder of this section, the symbol ∗ in a matrix will represent an arbitrary
value with which we will not concern ourselves. We write


 
α1

 α2 

 . . .


0 αn

32 CHAPTER 2. MATRICES

for  
α1 ∗ ··· ∗
 α2 · · · ∗
..  .
 
 ..
.

0 αn
. 

Proposition 2.3.6. Let


∗
 
α1
 α2
A=

.. 
.

0 αn

and
∗
 
β1
 β2
B=

... 

0 βn

be upper triangular matrices and let λ ∈ F. Then


 
α1 + β1
 α 2 + β2 
A+B = (2.24)
 
.. 
.

0 α n + βn

∗ 
 
λα1
 λα2
λA =  (2.25)

... 

0 λαn

∗ 
 
α1 β1
 α 2 β2
AB =   . (2.26)

..
.

0 α n βn

2.4. PERMUTATION MATRICES 33

Proposition 2.3.7. Let A be as in Prop. 2.3.6. Then

∗
 
α1k
k
 α2k
A = (2.27)

.. 
.

0 αnk

for all k.
Proposition 2.3.8. Let f ∈ F[t] be a polynomial and let A be as in Prop. 2.3.6. Then


 
f (α1 )
 f (α2 ) 
f (A) =   . (2.28)
 
..
.

0 f (αn )

2.4 Permutation Matrices


Definition 2.4.1 (Rook arrangement). A rook arrangement is an arrangement of n rooks on an n × n
chessboard such that no pair of rooks attack each other. In other words, there is exactly one rook
in each row and column.

FIGURE HERE
Fact 2.4.2. The number of rook arrangements on an n × n chessboard is n!.
Definition 2.4.3 (Permutation matrix). A permutation matrix is a square matrix with the following
properties.
(a) Every nonzero entry is equal to 1 .

(b) Each row and column has exactly one nonzero entry.
Observe that rook arrangements correspond to permutation matrices where each rook is placed
on a 1. Permutation matrices will be revisited in Chapter 6 where we discuss the determinant.
Recall that we denote by [n] the set of integers {1, . . . , n}.
34 CHAPTER 2. MATRICES

Definition 2.4.4 (Permutation). A permutation is a bijection σ : [n] → [n]. We write σ : i 7→ iσ .

Example 2.4.5. A permutation can be represented by a 2 × n table like this.

i 1 2 3 4 5 6
iσ 3 6 4 1 5 2

This permutation takes 1 to 3, 2 to 6, 3 to 4, etc.


Note that any table obtained by rearranging columns represents the same permutation. For
example, we may also represent the permutation σ by the table below.

i 4 6 1 5 2 3
iσ 1 2 3 5 6 4

Moreover, the permutation can be represented by a diagram, where the arrow i 7→ iσ means
that i maps to iσ .

FIGURE HERE

Definition 2.4.6 (Composition of permutations). Let σ, τ : [n] → [n] be permutations. The compo-
sition of τ with σ, denoted στ , is the permutation which maps i to iστ defined by

iστ := (iσ )τ . (2.29)

This may be represented as


σ τ
i 7→ iσ 7→ iστ .

Example 2.4.7. Let σ be the permutation of Example 2.4.5 and let τ be the permutation given
in the table
i 1 2 3 4 5 6
iτ 4 1 6 5 2 3

Then we can find the table representing the permutation στ by rearranging the table for τ so
that its first row is in the same order as the second row of the table for σ, i. e.,

i 3 6 4 1 5 2
iτ 6 3 5 4 2 1
2.4. PERMUTATION MATRICES 35

and then combining the two tables:


i 1 2 3 4 5 6
iσ 3 6 4 1 5 2
iστ 6 3 5 4 2 1
That is, the table corresponding to the permutation στ is
i 1 2 3 4 5 6
στ
i 6 3 5 4 2 1
Definition 2.4.8 (Identity permutation). The identity permutation id : [n] → [n] is the permutation
defined by iid = i for all i ∈ [n].
Definition 2.4.9 (Inverse of a permutation). let σ : [n] → [n] be a permutation. Then the inverse
of σ is the permutation σ −1 such that σσ −1 = id. So σ −1 takes iσ to i.
Example 2.4.10. To find the table corresponding to the inverse of a permutation σ, we first switch
the rows of the table corresponding to σ and then rearrange the columns so that they are in natural
order. For example, when we switch the rows of the table in Example 2.4.5, we have
i 3 6 4 1 5 2
σ −1
i 1 2 3 4 5 6
Rearranging the columns so that they are in natural order gives us the table
i 1 2 3 4 5 6
σ −1
i 4 6 1 3 5 2
Definition 2.4.11. Let σ : [n] → [n] be a permutation. The permutation matrix corresponding to σ,
denoted Pσ , is the n × n matrix whose (i, j) entry is 1 if σ(i) = j and 0 otherwise.
Exercise 2.4.12. Let σ, τ : [n] → [n] be permutations, let A ∈ Fk×n , and let B ∈ Fn×` .
(a) What is APσ ?
(b) What is Pσ B?
(c) What is Pσ−1 ?
(d) Show that Pσ Pτ = Pτ σ (note the conflict in conventions for composition of permutations and
matrix multiplication).
36 CHAPTER 2. MATRICES

2.5 Additional exercises


Exercise 2.5.1. Let A ∈ Fk×n be a matrix. Show, without calculation, that AT A is symmetric.

Definition 2.5.2 (Commutator). Let A, B ∈ Mn (F). The commutator of A and B is the matrix
[A, B] := AB − BA.
Definition 2.5.3. Two matrices A, B ∈ Mn (F) commute if AB = BA, i. e., if [A, B] = 0.

Exercise 2.5.4. (a) Find an example of two 2 × 2 matrices that do not commute.

(b) (Project) Interpret and prove the following statement:

Almost all pairs of 2 × 2 integral matrices (matrices in M2 (Z)) do not commute.

Exercise 2.5.5. Let D ∈ Mn (F) be a diagonal matrix such that all diagonal entries are distinct.

R
Show that if A ∈ Mn (F) commutes with D then A is a diagonal matrix.

Exercise 2.5.6. Show that only the scalar matrices ( Def. 2.2.13) commute with all matrices
in Mn (F). (A scalar matrix is a matrix of the form λI.)

Exercise 2.5.7.

(a) Show that the commutator of two matrices over C is never the identity matrix.

(b) Find A, B ∈ Mp (Fp ) such that their commutator is the identity.

Exercise 2.5.8 (Submatrix sum). Let I1 ⊆ [k] and I2 ⊆ [n], and let B be the submatrix ( R Def.
3.3.11) of A ∈ Fk×n with entries αij for i ∈ I1 , j ∈ I2 . Find vectors a and b such that aT Ab equals
the sum of the entries of B.

Definition 2.5.9 (Vandermonde matrix). The Vandermonde matrix generated by α1 , . . . , αn is the


n × n matrix  
1 1 ··· 1
 α1
 α2 · · · αn  
 α2 α 2
· · · α 2 
V (α1 , . . . , αn ) =  1 2 n  . (2.30)
 .. .. . . .. 
 . . . . 
n−1 n−1 n−1
α1 α2 · · · αn
2.5. ADDITIONAL EXERCISES 37

Exercise 2.5.10. Let A be a Vandermonde matrix generated by distinct αi . Show that the rows
of A are linearly independent. Do not use determinants.

Exercise 2.5.11. Prove that polynomials of a matrix commute: let A be a square matrix and let
f, g ∈ F[t]. Then f (A) and g(A) commute. In particular, A commutes with f (A).

Definition 2.5.12 (Circulant matrix). The circulant matrix generated by the sequence (α0 , α1 , . . . αn−1 )
of scalars is the n × n matrix
 
α0 α1 · · · αn−1
αn−1 α0 · · · αn−2 
C(α0 , α1 , . . . αn−1 ) =  .. (2.31)
 
.. . . .. 
 . . . . 
α1 α2 · · · α0

Exercise 2.5.13. Prove that all circulant matrices commute. Prove this

(a) directly,

(b) in a more elegant way, by showing that all circulant matrices are polynomials of a particular
circulant matrix.

Definition 2.5.14 (Jordan block). For λ ∈ C and n ≥ 1, the Jordan block J(n, λ) is the matrix

J(n, λ) := λI + Nn (2.32)

where Nn is the matrix defined in Notation 2.2.20.

Exercise 2.5.15. Let f ∈ F[t]. Prove that f (J(n, λ)) is the matrix
 
f (λ) f 0 (λ) 2!1 f (2) (λ) · · · (n−1)!
1
f (n−1) (λ)
0 1
 (n−2)


 f (λ) f (λ) · · · (n−2)!
f (λ)

 . . . . .
.  .

(2.33)

 . . . 
0
0 f (λ) f (λ)
 
 
f (λ)

Exercise 2.5.16. The converse of the second statement in Ex. 2.5.11 would be:
38 CHAPTER 2. MATRICES

(∗) The only matrices that commute with A are the polynomials of A.

(a) Find a matrix A for which (∗) is false.

(b) For which diagonal matrices is (∗) true?

R
(c) Prove: (∗) is true for Jordan blocks.

(d) Characterize the matrices over C for which (∗) is true, in terms of their Jordan blocks ( ??).

Project 2.5.17. For A ∈ Mn (R), let f (A, k) denote the largest absolute value of all entries of Ak ,
and define
Mn(λ) (R) := {A ∈ Mn (R) | f (A, 1) ≤ λ} (2.34)
(the matrices where all entries have absolute value ≤ λ). Define

f1 (n, k) = max f (A, k) ,


(1)
A∈Mn (R)

f2 (n, k) = max f (A, k) ,


(1)
A∈Mn (R)
A nilpotent

f3 (n, k) = max f (A, k) .


(1)
A∈Mn (R)
A strictly upper
triangular

Find the rate of growth of these functions in terms of k and n.


Chapter 3

(F) Matrix Rank

3.1 Column and row rank


In Section 1.3, we defined the rank of a list of column or row vectors. We now view this concept in
terms of the columns and rows of a matrix.
Definition 3.1.1 (Column- and row-rank). The column-rank of a matrix is the rank of the list of its
column vectors. The row-rank of a matrix is the rank of the list of its row vectors. We denote the
column-rank of A by rkcol (A) and the row-rank of A by rkrow (A).
Definition 3.1.2 (Column and row space). The column space of a matrix A, denoted col A, is the
span of its columns. The row space of a matrix A, denoted row A, is the span of its rows.
So if A ∈ Fk×n then col(A) ≤ Fk and row(A) ≤ Fn .
Proposition 3.1.3. The column-rank of a matrix is equal to the dimension of its column space,
and the row-rank of a matrix is equal to the dimension of its row space.
Definition 3.1.4 (Full column- and row-rank). A k × n matrix A has full column-rank if its column-
rank is equal to n, the number of columns. A has full row-rank if its row-rank is equal to k, the
number of rows.
Exercise 3.1.5. Let A and B be k × n matrices. Then

| rkcol (A) − rkcol (B)| ≤ rkcol (A + B) ≤ rkcol (A) + rkcol (B) . (3.1)

Exercise 3.1.6. Let A be a k × n matrix and let B be an n × ` matrix. Then col(AB) ≤ col(A).

Babai: Discover Linear Algebra. 39 This chapter last updated January 16, 2023
c 2016 László Babai.
40 CHAPTER 3. MATRIX RANK

Exercise 3.1.7. Let A and B be matrices such that the product matrix AB is defined. Then
rkcol (AB) ≤ rkcol A.

3.2 Elementary operations and Gaussian elimination


Definition 3.2.1 (Elementary column and row operations). Let A = [a1 | · · · | an ] be a matrix.
The elementary column operation denoted by (i, j, λ) is the operation which replaces ai by ai − λaj
(i 6= j). Elementary row operations are defined analogously.
Definition 3.2.2 (Elementary matrix). We denote by Eij the n × n matrix which has 1 in the (i, j)
position and 0 in every other position. An elementary matrix B is an n × n matrix of the form
B = I − λEij for λ ∈ F.
Proposition 3.2.3. Let A ∈ Fk×n , let λ ∈ F, and let B be the n×n elementary matrix B = I −λEij .
Show that AB is the matrix obtained by performing the elementary column operation (j, i, λ) on
A. Infer (do not repeat the same argument) that an elementary row operations corresponds to
multiplication by an elementary matrix on the left.
Definition 3.2.4. We call a diagonal matrix nonsingular if none of its diagonal entries is zero.

R
(
We shall see that this terminology is consistent with the general definition of nonsingular matrices
Def. 3.3.10).
Proposition 3.2.5. Let A be a square matrix with linearly independent rows. Then by performing
a series of row operations followed by a permutation of the columns, we can transfrom the matrix
into a nonsingular diagonal matrix.
Proposition 3.2.6. Let A be a matrix with linearly independent rows. Then by performing a
series of row operations and by permuting rows and columns, we can bring the matrix into the 1 × 2
block matrix form [D | B] where D is a nonsingular diagonal matrix.
Proposition 3.2.7. By a series of elementary row operations followed by a permutation of the
columns and a permutation of the rows, any matrix can be transformed into a 2 × 2 block matrix
where the top left block is a nonsingular diagonal matrix and both bottom blocks are 0.
The process of transforming a matrix into this form by performing a series of elementary row
operations followed by a permutation of the rows and a permutation of the columns is called
Gaussian elimination.
3.3. THE SECOND MIRACLE OF LINEAR ALGEBRA 41

Corollary 3.2.8. By a series of elementary row operations and elementary column operations
followed by a permutation of the columns and a permutation of the rows, any matrix can be
transformed into a 2 × 2 block matrix where the top left block is a nonsingular diagonal matrix and
the other three blocks are 0.

3.3 Invariance of column and row rank, the Second Miracle


of Linear Algebra
The main result of this section is the following theorem.

Theorem 3.3.1 (Second Miracle of Linear Algebra). The row-rank of a matrix is equal to its
column-rank.

This result will be an immediate consequence of the following two lemmas, together with Corol-
lary 3.2.8.

Lemma 3.3.2. Elementary column operations do not change the column-rank of a matrix. In fact,
elementary column operations do not change the column space of a matrix. ♦

Lemma 3.3.3. Elementary row operations do not change the column-rank of a matrix.

Exercise 3.3.4. Use these two lemmas, together with Corollary 3.2.8, to prove Theorem 3.3.1.

The proof of Lemma 3.3.2 is very simple. The proof of Lemma 3.3.3 is somewhat more involved;
we will break it into exercises.
The following exercise demonstrates why the proof of Lemma 3.3.3 is not as straightforward as
the proof of Lemma 3.3.2.

Exercise 3.3.5. An elementary row operation can change the column space of a matrix.

Proposition 3.3.6. Let A = [a1 | · · · | an ] ∈ Fk×n be a matrix, and suppose the linear relation
n
X
αi ai = 0
i=1
42 CHAPTER 3. MATRIX RANK

holds among the columns.


Let A0 = [a01 | · · · | a0n ] be the result of applying an elementary row operation to A. Then the
columns of A0 obey the same linear relation, that is,
n
X
αi a0i = 0 .
i=1

Corollary 3.3.7. If the columns vi1 , . . . , vi` are linearly independent, then this remains true after
an elementary row operation.

Exercise 3.3.8. Complete the proof of Lemma 3.3.3.

∗ ∗ ∗

Because the Second Miracle of Linear Algebra establishes that the row-rank and column-rank
of a matrix A are equal, it is no longer necessary to differentiate between them; this quantity is
simply referred to as the rank of A, denoted rk(A).
Definition 3.3.9 (Full rank). Let A ∈ Mn (F) be a square matrix. We say that A has full rank if
rk A = n.

the other hand, the notions of full column-rank and full row-rank (
importance.
R
Observe that only in the case of square matrices is the concept of “full rank” defined. On
Def. 3.1.4) retain their

Definition 3.3.10. [Nonsingular matrix] Let A be square matrix. We say that A is nonsingular if
it has full rank. Otherwise A is singular.
Definition 3.3.11 (Submatrix). Let A ∈ Fk×n be a matrix. Then the matrix B is a submatrix of A
if it can be obtained by deleting some rows and columns from A.
In other words, a submatrix of a matrix A is a matrix obtained by taking the intersection of a
set of rows of A with a set of columns of A.

Theorem 3.3.12 (Rank vs. nonsingular submatrices). Let A ∈ Fk×n be a matrix. Then rk A is
the largest value of r such that A has a nonsingular r × r submatrix. ♦

Exercise 3.3.13. Show that for all k, the intersection of k linearly independent rows with k linearly
independent columns can be singular. In fact, for any k, it can be the zero matrix.
3.4. MATRIX RANK AND INVERTIBILITY 43

Exercise 3.3.14. Let A be a matrix of rank r. Show that the intersection of any r linearly
independent rows with any r linearly independent columns is a nonsingular r × r submatrix of
A. (Note: this exercise is more difficult than Theorem 3.3.12 and is not needed for the proof of
Theorem 3.3.12.)

Exercise 3.3.15. Let A be a matrix. Show that if the intersection of k linearly independent
columns with ` linearly independent rows of A has rank s, then rk(A) ≥ k + ` − s.

Notice that Ex. 3.3.14 follows from Ex. 3.3.15.

3.4 Matrix rank and invertibility


Definition 3.4.1 (Left and right inverse). Let A ∈ Fk×n . The matrix B ∈ Fn×k is a left inverse of A
if BA = In . Likewise, the matrix C ∈ Fn×k is a right inverseof A if AC = Ik .

Proposition 3.4.2. Let A ∈ Fk×n .

(a) Show that A has a right inverse if and only if A has full row-rank, i. e., rk A = k.

(b) Show that A has a left inverse if and only if A has full column-rank, i. e., rk A = n.

Note in particular that if A has a right inverse, then k ≤ n, and if A has a left inverse, then
k ≥ n.

Corollary 3.4.3. Let A be a nonsingular square matrix. Then A has both a right and a left inverse.

Exercise 3.4.4. For all k < n, find a k × n matrix that has infinitely many right inverses.

Definition 3.4.5 (Two-sided inverse). Let A ∈ Mn (F). Then the matrix B ∈ Mn (F) is a (two-sided)
inverse of A if AB = BA = In . The inverse of A is denoted A−1 . If A has an inverse, then A is
said to be invertible.

Proposition 3.4.6. Let A be a matrix. If A has a left inverse as well as a right inverse, then A
has a unique two-sided inverse and it has no left or right inverse other than the two-sided inverse.

The proof of this lengthy statement is just one line, based solely on the associativity of matrix
multiplication. The essence of the proof is in the next lemma.
44 CHAPTER 3. MATRIX RANK

Lemma 3.4.7. Let A ∈ Fk×n be a matrix with a right inverse B and a left inverse C. Then B = C
is a two-sided inverse of A and k = n. ♦

Corollary 3.4.8. Under the conditions of Lemma 3.4.7, k = n and B = C is a two-sided inverse.
Moreover, if C1 is also a left inverse, then C1 = C; analogously, if B1 is also a right inverse, then
B1 = B.

Corollary 3.4.9. Let A be a matrix with a left inverse. Then A has at most one right inverse.

Corollary 3.4.10. A matrix A has an inverse if and only if A is a nonsingular square matrix.

Corollary 3.4.11. If A has a right inverse and a left inverse then k = n.

Theorem 3.4.12. For A ∈ Mn (F), the following are equivalent.

(a) A is nonsingular

(b) A has full rank (this is our definition of nonsingularity)

(c) A has a right inverse

(d) A has a left inverse

(e) A has an inverse

In Chapter 6 we shall learn an additional important equivalent condition of nonsingularity:

(f) The determinant of A is not zero (see Corollary 6.4.15)

A more detailed version of Theorem 3.4.12 appears later as Theorem 6.4.16. The most important
addition to the list of equivalent conditions is the determinant condition (f) stated above.

Exercise 3.4.13. Assume F is infinite and let A ∈ Fk×n where n > k. If A has a right inverse,
then A has infinitely many right inverses.
3.5. CODIMENSION (OPTIONAL) 45

3.5 Codimension (optional)


The reader comfortable with abstract vector spaces can skip to Chapter ?? for a more general
discussion of this material.
Definition 3.5.1 (Codimension). Let W ≤ Fn . The codimension of W is the minimum number of
vectors that together with W generate Fn and is denoted by codim W or codim W . (Note that this
is the dimension of the quotient space Fn /W ; see Def. ??)
Proposition 3.5.2. If W ≤ Fn then codim W = n − dim W .
Proposition 3.5.3 (Dual modular equation). Let U, W ≤ Fn . Then

codim(U + W) + codim(U ∩ W) = codim U + codim W . (3.2)

Corollary 3.5.4. Let U, W ≤ Fn . Then

codim(U ∩ W ) ≤ codim U + codim W . (3.3)

Corollary 3.5.5. Let U, W ≤ Fn with dim U = r and codim W = t. Then

dim(U ∩ W ) ≥ max{r − t, 0} . (3.4)

this to be equal to be the dimension of their span (R R


In Section 1.3, we defined the rank of a set of column vectors ( Def. 1.3.28) and then showed
Cor. 1.3.46). We now define the corank of
a set of vectors.
Definition 3.5.6 (Corank). The corank of a set S ⊆ Fn , denoted corank S, is defined by

corank S := codim(span S) . (3.5)

Definition 3.5.7 (Null space). The null space or kernel of a matrix A ∈ Fk×n , denoted null(A), is
the set
null(A) = {v ∈ Fn | Av = 0} . (3.6)
Exercise 3.5.8. Let A ∈ Fk×n . Show that

rk A = codim(null(A)) . (3.7)
46 CHAPTER 3. MATRIX RANK

Proposition 3.5.9. Let U ≤ Fn and let W ≤ Fk such that dim W = codim U = `. Then there is
a matrix A ∈ Fk×n such that null(A) = U and col A = W .
Definition 3.5.10 (Corank of a matrix). Let A ∈ Fk×n . We define the corank of A as the corank of
its column space, i. e.,
corank A := k − rk A . (3.8)
Exercise 3.5.11. When is corank A = corank AT ?
Exercise 3.5.12. Let A ∈ Fk×n and let B ∈ Fn×` . Show that

corank(AB) ≤ corank A + corank B . (3.9)

3.6 Additional exercises


Exercise 3.6.1. Let A = [v1 | · · · | vn ] be a matrix. True or false: if the columns v1 , . . . , v` are
linearly independent this remains true after performing elementary column operations on A.
Proposition 3.6.2. Let A be a matrix with at most one nonzero entry in each row and in each
column. Then rk A is the number of nonzero entries in A.
Numerical exercise 3.6.3. Perform a series of elementary row operations to determine the rank
of the matrix  
1 3 2
−5 −2 3 .
−3 4 7
Self-check : use a different sequence of elementary operations and verify that you obtain the same
answer.
Exercise 3.6.4. Determine the ranks of the n × n matrices
(a) A = (αij ) where αij = i + j

(b) B = (βij ) where βij = ij

(c) C = (γij ) where γij = i2 + j 2


3.6. ADDITIONAL EXERCISES 47

Proposition 3.6.5. Let A ∈ Fk×` and B ∈ F`×m . Then

rk(AB) ≤ min{rk(A), rk(B)} .

Proposition 3.6.6. Let A, B ∈ Fk×` . Then rk(A + B) ≤ rk(A) + rk(B).

Exercise 3.6.7. Find an n × n matrix A of rank n − 1 such that An = 0.

Proposition 3.6.8. Let A ∈ Fk×n . Then rk A is the smallest r such that there exist matrices
B ∈ Fk×r and C ∈ Fr×n with A = BC.

Proposition 3.6.9. Show that rk(A) is the smallest integer r such that A can be expressed as the
sum of r matrices of rank 1.

Proposition 3.6.10 (Characterization of matrices of rank 1). Let A ∈ Fk×n . Show that rk A = 1
if and only if there exist column vectors a ∈ Fk and b ∈ Fn such that A = abT .

♥ Exercise 3.6.11. Let A = (αij ) be a matrix of rank r.


2

(a) Let B = αij be the matrix obtained by squaring every element of A. Show that rk(B) ≤
r+1

2
.

. Show that rk(D) ≤ r+d−1


d
 
(b) Let D = αij d
.

(c) Let f be a polynomial of degree d, and let Af = (f (αij )). Prove that rk(Af ) ≤ r+d

d
.

(d) Show that each of these bounds is tight for all r and d, i. e., for every r and d

(i) there exists a matrix A such that the rank of the corresponding matrix D is rk(D) =
r+d−1

d
, and
(ii) there exists a matrix A and a polyomial f of degree d such that rk(Af ) = r+d

d
.
Chapter 4

Qualitative Theory of Systems of Linear


Equations

4.1 Homogeneous systems of linear equations


Matrices allow us to concisely express systems of linear equations. In particular, consider the general
system of k linear equations in n unknowns,

α11 x1 + α12 x2 + · · · + α1n xn = β1


α21 x1 + α22 x2 + · · · + α2n xn = β2
..
.
αk1 x1 + αk2 x2 + · · · + αkn xn = βk

Here, the αij and βi are scalars, while the xj are unknowns. In Section 1.1, we represented this
system as a linear combination of column vectors. Matrices allow us to write this system even more
concisely as Ax = b, where
     
α11 α12 · · · α1n x1 β1
α21 α22 · · · α2n   x2   β2 
A = ..  , x =  ..  , b = . (4.1)
     
 .. .. ..  .. 
 . . . .  . .
αk1 αk2 · · · αkn xn βk

Babai: Discover Linear Algebra. 48 This chapter last updated August 8, 2016
c 2016 László Babai.
4.1. HOMOGENEOUS SYSTEMS OF LINEAR EQUATIONS 49

We know that the simplest linear equation is of the form ax = b (one equation in one unknown);
remarkably, thanks to the power of our matrix formalism, essentially the same equation now de-
scribes the far more complex systems of linear equations. The first question we ask about any
system of equations is its solvability.
Definition 4.1.1 (Solvable system of linear equations). Given a matrix A ∈ Fk×n and a vector
b ∈ Fk , we say that the system Ax = b of linear equations is solvable if there exists a vector x ∈ Fn
that satisfies Ax = b.
Definition 4.1.2 (Homogeneous system of linear equations). The system Ax = 0 is called a homo-
geneous system of linear equations.
Every system of homogeneous linear equations is solvable.
Definition 4.1.3 (Trivial solution to a homogeneous system of linear equations). The trivial solution
to the homogeneous system of linear equations Ax = 0 is the solution x = 0.
So when presented with a homogeneous system of linear equations, the question we ask is not,
“Is this system solvable?” but rather, “Does this system have a nontrivial solution?”

columns of A are linearly dependent, i. e., A does not have full column rank ( R
Theorem 4.1.4. Let A ∈ Fk×n . The system Ax = 0 has a nontrivial solution if and only if the
Def. 3.1.4). ♦

Definition 4.1.5 (Solution space). Let A ∈ Fk×n . The set of solutions to the homogeneous system
of linear equations Ax = 0 is the set U = {x ∈ Fn | Ax = 0} and is called the solution space of
Ax = 0. Ex. 4.1.8 explains the terminology.
Definition 4.1.6 (Null space). The null space or kernel of a matrix A ∈ Fk×n , denoted null(A), is
the set
null(A) = {v ∈ Fn | Av = 0} . (4.2)
Definition 4.1.7. The nullity of a matrix A is the dimension of its null space.
For the following three exercises, let A ∈ Fk×n and let U ≤ Fn be the solution space of the
system Ax = 0.

Exercise 4.1.8. Prove that U ≤ Fn .

Exercise 4.1.9. Show that null(A) = U .

Proposition 4.1.10. Let A ∈ Fk×n and consider the homogeneous system Ax = b.


50 CHAPTER 4. QUALITATIVE THEORY OF SYSTEMS OF LINEAR EQUATIONS

(a) Let rk A = r and let n = r + d. Then it is possible to relabel the unknowns x1 , . . . , xn


as x0i , . . . , x0n , so that x0j can be represented as a linear combination of x0d+1 , . . . , x0n for j =
1, . . . , d, say
Xn
x0i = λij x0j .
j=d+1
Pn
(b) The vectors ei + j=d+1 λij x0j form a basis of U 0 , the solution space of the relabeled system
of equations.

(c) dim U = dim U 0

(d) dim U = n − rk A
An immediate consequence of (d) is the Rank-Nullity Theorem, which will be crucial in our
study of linear maps in Chapter 16.1
Corollary 4.1.11 (Rank–Nullity Theorem). Let A ∈ Fk×n be a matrix. Then

rk(A) + nullity(A) = n . (4.3)

An explanation of (d) is that dim U measures the number of coordinates of x that we can choose
independently. This quantity is referred to by physicists as the “degree of freedom” left in our choice
of x after the set Ax = b of constraints. If there are no constraints, the degree of freedom of the
system is equal to n. It is plausible that each constraint reduces the degree of freedom by 1, which
would suggest dim U = n − k, but effectively there are only rk A constraints because every equation
that is a linear combination of previous equations can be thrown out. This makes it plausible that
the degree of freedom is n − rk A. This argument is not a proof, however.
Proposition 4.1.12. Let A ∈ Fk×n and consider the homogeneous system Ax = 0 of linear
equations. Let U be the solution space of Ax = 0. Prove that the following are equivalent.
(a) Ax = 0 has no nontrivial solution, i. e., U = {0}

(b) The columns of A are linearly independent

(c) The rows of A span Fn


1
In Chapter 16, we will formulate the Rank–Nullity Theorem in terms of linear maps rather than matrices, but
the two formulations are equivalent.
4.2. GENERAL SYSTEMS OF LINEAR EQUATIONS 51

(d) A has full row rank, i. e., rk A = n

(e) A has a left inverse

Proposition 4.1.13. Let A be a square matrix. Then Ax = 0 has no nontrivial solution if and
only if A is nonsingular.

4.2 General systems of linear equations


Proposition 4.2.1. The system of linear equations Ax = b is solvable if and only if b ∈ col A.

Definition 4.2.2 (Augmented system). When speaking of the system Ax = b, we call A the matrix
of the system and [A | b] (the column b added to A) the augmented system.

Proposition 4.2.3. The system Ax = b of linear equations is solvable if and only if the matrix of
the system and the augmented matrix have the same rank, i. e., rk A = rk[A | b].

Definition 4.2.4 (Translation). Let S ⊆ Fn , and let v ∈ Fn . The set

R
S + v = {s + v | s ∈ S} (4.4)

is called the translate of S by v. Such an object is called an affine subspace of Fn ( Def. 5.1.3).

Proposition 4.2.5. Let S = {x ∈ Fn | Ax = b} be the set of solutions to a system of linear


equations. Then S is either empty or a translate of a subspace of Fn , namely, of the solution space
of to the homogeneous system Ax = 0.

Proposition 4.2.6 (Equivalent characterizations of nonsingular matrices). Let A ∈ Mn (F). The


following are equivalent.

(a) A is nonsingular, i. e., rk(A) = n

(b) The rows of A are linearly independent

(c) The columns of A are linearly independent

(d) The rows of A span Fn


52 CHAPTER 4. QUALITATIVE THEORY OF SYSTEMS OF LINEAR EQUATIONS

(e) The columns of A span Fn

(f) The rows of A form a basis of Fn

(g) The columns of A form a basis of Fn

(h) A has a left inverse

(i) A has a right inverse

(j) A has a two-sided inverse

(k) Ax = 0 has no nontrivial solution

(l) For all b ∈ Fn , Ax = b is solvable

(m) For all b ∈ Fn , Ax = b has a unique solution

Our discussion of the determinant in the next chapter will allow us to add a particularly impor-
tant additional property:

(n) det A 6= 0
Chapter 5

(F, R) Affine and Convex Combinations


(optional)

The reader comfortable with abstract vector spaces can skip to Chapter ?? for a more general
discussion of this material.

R
5.1 (F) Affine combinations
In Section 1.1, we defined linear combinations of column vectors ( Def. 1.1.13). We now consider
affine combinations.
Definition 5.1.1 (Affine combination). An affine combination of the vectors v1 , . . . , vk ∈ Fn is a
X k k
X
linear combination αi vi where αi = 1.
i=1 i=1

Example 5.1.2.
1 1 1
2v1 − v2 − v4 − v5
2 3 6
is an affine combination of the column vectors v1 , . . . , v5 .
Definition 5.1.3 (Affine-closed set). The set S ⊆ Fn is affine-closed if it is closed under affine
combinations.
Fact 5.1.4. The empty set is affine-closed (why?).

Babai: Discover Linear Algebra. 53 This chapter last updated April 3, 2024
c 2016 László Babai.
54 CHAPTER 5. AFFINE AND CONVEX COMBINATIONS (OPTIONAL)

Definition 5.1.5 (Affine subspace). A nonempty, affine-closed subset U of Fn is called an affine


subspace of Fn (notation: U ≤aff Fn ).

Proposition 5.1.6. The intersection of a (finite or infinite) family of affine-closed subsets of Fn is


affine-closed. In other words, an intersection affine subspaces is either empty or an affine subspace.

binations ( R
Throughout this book, the term “subspace” refers to subsets that are closed under linear com-
Def. 1.2.1). Subspaces are also referred to as “linear subspaces.” This (redundant)
longer term is especially useful in contexts where affine subspaces are discussed in order to distin-
guish linear subspaces from affine subspaces.
Definition 5.1.7 (Affine hull). The affine hull of a subset S ⊆ Fn , denoted aff(S), is the smallest
affine-closed set containing S, i. e.,

(a) aff(S) ⊇ S;

(b) aff(S) is affine-closed;

(c) for every affine-closed set T ⊆ Fn , if T ⊇ S then T ⊇ aff(S).

Fact 5.1.8. aff(∅) = ∅.

Contrast this with the fact that span ∅ = {0}.

Theorem 5.1.9. Let S ⊆ Fn . Then aff(S) exists and is unique. ♦

Theorem 5.1.10. For S ⊆ Fn , aff(S) is the set of all affine combinations of the finite subsets of
S. ♦

Proposition 5.1.11. Let S ⊆ Fn . Then aff(aff(S)) = aff(S).

Proposition 5.1.12. Let S ⊆ Fn . Then S is affine-closed if and only if S = aff(S).

Proposition 5.1.13. Let S ⊆ Fn be affine-closed. Then S ≤ Fn if and only if 0 ∈ S.

R
Proposition 5.1.14.

(a) Let W ≤ Fn . All translates W + v of W ( Def. 4.2.4) are affine subspaces.

(b) Every affine subspace S ≤aff Fn is the translate of a (unique) subspace of Fn .


5.1. AFFINE COMBINATIONS 55

Proposition 5.1.15. The intersection of a (finite or infinite) family of affine subspaces is either
empty or equal to a translate of the intersection of their corresponding linear subspaces.

∗ ∗ ∗

Next we connect these concepts with the theory of systems of linear equations.

Exercise 5.1.16. Let A ∈ Fk×n and let b ∈ Fk . Then the set of solutions to the system Ax = b of
linear equations is an affine-closed subset of Fn .

The next exercise shows the converse.

Exercise 5.1.17. Every affine-closed subset of Fn is the set of solutions to the system Ax = b of
linear equations for some A ∈ Fk×n and b ∈ Fk .

Proposition 5.1.18 (General vs. homogeneous systems of linear equations). Let A ∈ Fk×n and
b ∈ Fn . Let S = {x ∈ Fn | Ax = b} be the set of solutions of the system Ax = b and let
U = {x ∈ Fn | Ax = 0} be the set of solutions of the corresponding system of homogeneous linear
equations. Then either S is empty or S is a translate of U .

∗ ∗ ∗

We now study geometric features of Fn , viewed as an affine space.

Proposition 5.1.19. The span of the set S ⊆ Fn is the affine hull of S ∪ {0}.

Proposition 5.1.20. Let S ⊆ W be nonempty. Then for any u ∈ S, we have

aff(S) = u + span(S − u) (5.1)

where S − u is the translate of S by −u.

Definition 5.1.21 (Dimension of an affine subspace). The (affine) dimension of an affine subspace
U ≤aff Fn , denoted dimaff U , is the dimension of its corresponding linear subspace (of which it is
a translate). In order to assign a dimension to all affine-closed sets, we adopt the convention that
dim ∅ = −1.

Exercise 5.1.22. Let U ≤ Fk . Then dimaff U = dim U .


56 CHAPTER 5. AFFINE AND CONVEX COMBINATIONS (OPTIONAL)

Exercise 5.1.23. What are the 0-dimensional affine subspaces?

Definition 5.1.24 (Affine independence). The vectors v1 , . . . , vk ∈ Fn are affine-independent if for


k
X k
X
every α1 , . . . , αk ∈ F, the two conditions αi vi = 0 and αi = 0 imply αi = · · · = αk = 0.
i=1 i=1

Proposition 5.1.25. The vectors v1 , . . . , vk ∈ Fn are affine-independent if and only if none of


them belongs to the affine hull of the others.

Fact 5.1.26. Any single vector is affine-independent and affine-closed at the same time.

Proposition 5.1.27 (Translation invariance). Let v1 , . . . , vk ∈ Fn be affine-independent and let


w ∈ Fn be any vector. Then v1 + w, . . . , vk + w are also affine-independent.

Proposition 5.1.28 (Affine vs. linear independence). Let v1 , . . . , vk ∈ Fn .

(a) For k ≥ 0, the vectors v1 , . . . vk are linearly independent if and only if the vectors 0, v1 , . . . , vk
are affine-independent.

(b) For k ≥ 1, the vectors v1 , . . . , vk are affine-independent if and only if the vectors v2 −
v1 , . . . , vk − v1 are linearly independent.

Definition 5.1.29 (Affine basis). An affine basis of an affine subspace W ≤aff Fn is an affine-
independent set S such that aff(S) = W .

Proposition 5.1.30. Let W be an affine subspace of Fn . Every affine basis of W has 1 + dim W
elements.

Corollary 5.1.31. If W1 , . . . , Wk are affine subspaces of Fn , then

k
X
dimaff (aff{W1 , . . . , Wk }) ≤ (k − 1) + dimaff Wi . (5.2)
i=1
5.2. HYPERPLANES 57

5.2 (F) Hyperplanes

1( R
Definition 5.2.1 (Linear hyperplane). A linear hyperplane of Fn is a subspace of Fn of codimension
Def. 3.5.1).
Definition 5.2.2 (Codimension of an affine subspace). The (affine) codimension of an affine subspace
U ≤aff Fn , denoted codimaff U , is the codimension of its corresponding linear subspace (of which it
is a translate).
Definition 5.2.3 (Hyperplane). A hyperplane is an affine subspace of codimension 1.

Proposition 5.2.4. Let S ⊆ Fn be a hyperplane. Then there exist a nonzero vector a ∈ Fn and
β ∈ F such that aT v = β if and only if v ∈ S.

The vector a whose existence is guaranteed by the preceding proposition is called the normal
vector of the hyperplane S.

Proposition 5.2.5. Let W ≤ Fn . Then

(a) W is the intersection of linear hyperplanes;

(b) if dim W = k, then W is the intersection of n − k linear hyperplanes.

Proposition 5.2.6. Let W ≤aff Fn be an affine subspace. Then

(a) W is the intersection of hyperplanes;

(b) if dimaff W = k, then W is the intersection of n − k hyperplanes.

5.3 (R) Convex combinations


In the preceding section, we studied affine combinations over an arbitrary field F. We now restrict
ourselves to the case where F = R.
Definition 5.3.1 (Convex combination). A convex combination is an affine combination with non-
k
X k
X
negative coefficients. So the expression αi vi is a convex combination if αi = 1 and αi ≥ 0
i=1 i=1
for all i.
58 CHAPTER 5. AFFINE AND CONVEX COMBINATIONS (OPTIONAL)

Example 5.3.2.
1 1 1 1
v1 + v2 + v4 + v5
2 4 6 12
is a convex combination of the vectors v1 , . . . , v5 . Note that the affine combination in Example
5.1.2 is not convex.

Fact 5.3.3. Every convex combination is an affine combination.

Definition 5.3.4 (Convex set). A convex set is a subset S ⊆ Rn that is closed under convex combi-
nations.

Proposition 5.3.5. The intersection of a (finite or infinite) family of convex sets is convex.

Definition 5.3.6. The convex hull of a subset S ⊆ Rn , denoted conv(S), is the smallest convex set
containing S, i. e.,

(a) conv(S) ⊇ S;

(b) conv(S) is convex;

(c) for every convex set T ⊆ Rn , if T ⊇ S then T ⊇ conv(S).

Theorem 5.3.7. Let S ⊆ Rn . Then conv(S) exists and is unique. ♦

Theorem 5.3.8. For S ⊆ R, conv(S) is the set of all convex combinations of the finite subsets of
S. ♦

Proposition 5.3.9. Let S ⊆ Rn . Then S is convex if and only if S = conv(S).

Fact 5.3.10. conv(S) ⊆ aff(S).

Definition 5.3.11 (Straight-line segment). Let u, v ∈ Rn . The straight-line segment connecting u


and v is the convex hull of {u, v}, i. e., the set

conv(u, v) = {λu + (1 − λ)v | 0 ≤ λ ≤ 1} .

Proposition 5.3.12. The set S ⊆ Rn is convex if and only if it contains the straight-line segment
connecting u and v for every u, v ∈ S.
5.4. HELLY’S THEOREM 59

Definition 5.3.13 (Dimension). The dimension of a convex set C ⊆ Rn , denoted dimconv C, is the
dimension of its affine hull. A convex subset C of Rn is full-dimensional if aff C = Rn .
Definition 5.3.14 (Half-space). A closed half-space is a region of Rn defined as {v ∈ Rn | aT v ≥ β}
for some nonzero a ∈ Rn and β ∈ R and is denoted H(a, β). An open half-space is a region of Rn
defined as {v ∈ Rn | aT v > β} for some nonzero a ∈ Rn and β ∈ R and is denoted Ho (a, β).
Exercise 5.3.15. Let a ∈ Rn be a nonzero vector and let β ∈ R. Prove: the set {v ∈ Rn | aT v ≤ β}
is also a (closed) half-space.
Fact 5.3.16. Let S ⊆ Rn be a hyperplane defined by aT v = β. Then S divides Rn into two

Proposition 5.3.17. (a) Every closed (


closed half-spaces containing it.
R
half-spaces, defined by aT v ≥ β and aT v ≤ β. The intersection of these two half-spaces is S.
Def. 19.4.11) convex set is the intersection of the

(b) If S ⊆ Rn is finite then conv(S) is the intersection of a finite number of closed half-spaces.
Exercise 5.3.18. Find a convex set which is not the intersection of any number of open or closed

Proposition∗ 5.3.19. Every open (


half-spaces containing it.
R
half-spaces. (Such a set already exists in 2 dimensions.)
Def. 19.4.10) convex set is the intersection of the open

5.4 (R) Helly’s Theorem


The main result of this section is the following theorem.
Theorem 5.4.1 (Helly’s Theorem). If C1 , . . . , Ck ⊆ Rn are convex sets such that any n + 1 of them
intersect then all of them intersect.
Lemma 5.4.2 (Radon). Let S ⊆ Rn be a set of k ≥ n + 2 vectors in Rn . Then S has two disjoint
subsets S1 and S2 whose convex hulls intersect. ♦
Exercise 5.4.3. Show that the inequality k ≥ n + 2 is tight, i. e., that there exists a set S of n + 1
vectors in Rn such that for any two subsets S1 , S2 ⊆ S, we have conv(S1 ) ∩ conv(S2 ) = ∅.
Exercise 5.4.4. Use Radon’s Lemma to prove Helly’s Theorem.
Exercise 5.4.5. Show the bound in Helly’s Theorem is tight, i. e., there exist n + 1 convex subsets
of Rn such that every n of them intersect but the intersection of all of them is empty.
60 CHAPTER 5. AFFINE AND CONVEX COMBINATIONS (OPTIONAL)

5.5 Additional exercises


Observe that a half-space can be defined by n + 1 real parameters,1 i. e., by defining a and β. The
following project asks you to define a similar object which can be defined by 2n − 1 real parameters.

Project 5.5.1. Define a “partially open half-space” which can be defined by 2n − 1 parameters2 so
that every convex set is the intersection of the partially open half-spaces and the closed half-spaces
containing it.

1
In fact, in a well-defined sense, half-spaces can be defined by n real parameters.
2
In fact, this object can be defined by 2n − 3 real parameters.
Chapter 6

The Determinant

6.1 Motivation: solving systems of linear equations


Let us consider systems of n linear equations in n unknowns. For n = 2, the system

α11 x1 + α12 x2 = β1 (6.1)


α21 x1 + α22 x2 = β2 (6.2)

is not too cumbersome to solve in general; we obtain

α22 β1 − α12 β2
x1 = (6.3)
α22 α11 − α12 α21
α11 β2 − α21 β1
x2 = (6.4)
α22 α11 − α12 α21

For the case of n = 3 we get expressions of the form

N31
x1 = (6.5)
D3
N32
x2 = (6.6)
D3
N33
x3 = (6.7)
D3

Babai: Discover Linear Algebra. 61 This chapter last updated February 17, 2021
c 2016 László Babai.
62 CHAPTER 6. THE DETERMINANT

where

D3 = α11 α22 α33 + α12 α23 α31 + α13 α21 α32 (6.8)
− α11 α23 α32 − α12 α21 α33 − α13 α22 α31 (6.9)
N31 = α22 α33 β1 + α12 α23 β3 + α13 α32 β2 (6.10)
− α23 α32 β1 − α12 α33 β2 − α13 α22 β3 (6.11)
N32 = α11 α33 β2 + α23 α31 β1 + α13 α32 β3 (6.12)
− α11 α23 β3 − α21 α33 β1 − α13 α31 β2 (6.13)
N33 = α11 α22 β3 + α12 α31 β2 + α21 α32 β1 (6.14)
− α11 α32 β2 − α12 α21 β3 − α22 α31 β1 (6.15)

In particular, the numerators and the denominator of these expressions each have six terms,
half of which have a negative sign. For a system of 4 equations in 4 unknowns, the numerators and
denominator each have 24 terms; again, half of them have a negative sign. For general n we get n!
terms in the numerators and denominator, again half of them with a negative sign.
The denominator of these expressions is known as the determinant of the matrix A (where our
system of linear equations is written as Ax = b). Texts often focus on the rules of how to calculate
the determinant. Before discussing those rules, however, we need to give a definition, what exactly
we wish to calculate.
The determinant is a function from the space of n × n matrices to numbers, that is, det :
Mn (F) → F. Before formulating the definition of this function, we need to discuss permutations.

6.2 Permutations
Definition 6.2.1 (Permutation). A permutation of a set Ω is a bijection f : Ω → Ω. The set Ω is
called the permutation domain.
Definition 6.2.2 (Symmetric group). The symmetric group of degree n, denoted Sn , is the set of all
permutations of the set {1, . . . , n}.
Definition 6.2.3 (Inversion). Let σ ∈ Sn be a permutation of the set {1, . . . , n}, and let 1 ≤ i, j ≤ n
with i 6= j. We say that the pair {i, j} is inverted by σ if i < j and σ(i) > σ(j) or i > j and
σ(i) < σ(j). We denote by Inv(σ) the number of inversions of σ, that is, the number of pairs {i, j}
which are inverted by σ.
6.2. PERMUTATIONS 63

Exercise 6.2.4. What is the maximum possible value of Inv(σ) for σ ∈ Sn ?

Definition 6.2.5 (Even and odd permutations). If Inv(σ) is even, then we say that σ is an even
permutation, and if Inv(σ) is odd, then σ is an odd permutation.

Proposition 6.2.6. Half of the n! permutations of {1, . . . , n} are even.

Definition 6.2.7 (Sign of a permutation). Let σ ∈ Sn be a permutation. The sign of σ, denoted


sgn σ, is defined as
sgn(σ) := (−1)Inv(σ) (6.16)

In particular, sgn(σ) = 1 if σ is an even permutation and sgn(σ) = −1 if σ is an odd permutation.


The sign of a permutation (equivalently, whether it is even or odd) is also referred to as the parity
of the permutation.
Definition 6.2.8 (Composition of permutations). Let Ω be a set, and let σ and τ be permutations
of Ω. Then the composition of σ with τ , denoted στ , is defined by

(στ )(a) := σ(τ (a)) (6.17)

for all a ∈ Ω.
We also refer to the composition of σ with τ as the product of σ and τ .
Definition 6.2.9. [Transposition] Let Ω be a set. The transposition of the elements a 6= b ∈ Ω is
the permutation that swaps a and b and fixes every other element. Formally, it is the permutation
τ defined by 
 b x=a
τ (x) := a x=b (6.18)
x otherwise

This permutation is denoted τ = (a, b). Note that in this notation, (a, b) = (b, a).

Proposition 6.2.10. Transpositions generate Sn , i. e., every permutation σ ∈ Sn can be written


as a composition of transpositions.

Theorem 6.2.11. A permutation σ of {1, . . . , n} is even if and only if σ is a product of an even


number of transpositions. ♦
64 CHAPTER 6. THE DETERMINANT

In the light of this result, we could use its conclusion as the definition of even permutations.
The advantage of this definition is that it can be applied to any set Ω, not just the ordered set
{1, . . . , n}.
Corollary 6.2.12. While the number of inversions of a permutation depends on the ordering of
the permutation domain, its parity (being even or odd) does not.
Definition 6.2.13 (Neighbor transposition). A neighbor transposition of the set {1, . . . , n} is a trans-
position of the form τ = (i, i + 1).
Exercise 6.2.14. Let σ ∈ Sn and let τ be a neighbor transposition. Show

| Inv(σ) − Inv(στ )| = 1 .

Corollary 6.2.15. Let σ ∈ Sn and let τ be a neighbor transposition. Then sgn(στ ) = − sgn(σ).
Proposition 6.2.16. Every transposition is the composition of an odd number of neighbor trans-
positions.
Proposition 6.2.17. Neighbor transpositions generate the symmetric group. That is, every ele-
ment of Sn can be expressed as the composition of neighbor transpositions.
Proposition 6.2.18. Composition with a transposition changes the parity of a permutation.
Corollary 6.2.19. Let σ ∈ Sn be a permutation. Then σ is even if and only if σ is the product of
an even number of transpositions.
Theorem 6.2.20. Let σ, τ ∈ Sn . Then

sgn(στ ) = sgn(σ) sgn(τ ) . (6.19)


Definition 6.2.21 (k-cycle). A k-cycle is a permutation that cyclically permutes k elements {a1 , . . . , ak }
and fixes all others (FIGURE). That is, σ is a k-cycle if σ(ai ) = ai+1 for some elements a1 , . . . , ak
(where ak+1 = a1 ) and σ(x) = x if x 6∈ {a1 , . . . , ak }. We denote this permutation by (a1 , a2 , . . . , ak ).
Note that (a1 , a2 , . . . , ak ) = (a2 , a3 , . . . , ak , a1 ).
In particular, transpositions are 2-cycles. Observe that our notation is consistent with the
notation for transpositions given in Def. 6.2.9.
6.3. DEFINING THE DETERMINANT 65

Definition 6.2.22 (Disjoint cycles). Let σ and τ be cycles with permutation domain Ω. Then σ and
τ are disjoint if no element of Ω is permuted by both σ and τ .

Proposition 6.2.23. Every permutation uniquely decomposes into disjoint cycles.

Exercise 6.2.24. Let σ be a k-cycle. Show that σ is an even permutation if and only if k is odd.

Corollary 6.2.25. Let σ be a permutation. Then σ is even if and only if its cycle decomposition
includes an even number of even cycles.

Proposition 6.2.26. Let σ be a permutation. Then Inv (σ −1 ) = Inv(σ).

6.3 Defining the determinant


 
α11 · · · α1n
Let A =  ... ... ..  be an n × n matrix. We also write A = (α ) for short.

.  ij
αn1 · · · αnn
Definition 6.3.1 (Determinant). The determinant of an n × n matrix A = (αij ) is defined as the
sum
X Yn
det A := sgn(σ) αi,σ(i) . (6.20)
σ∈Sn i=1

So the determinant is a sum of n! terms, called the expansion terms, each of which is a product
of n entries of the matrix, with a ± sign added. The n terms of each expansion term are arranged
in a “rook configuration”: there is one term from each row and from each column. Note that there
is a bijection between permutations and rook configurations.
Notation 6.3.2. Another notation for the determinant is that we put the matrix entries between
vertical bars:
α11 · · · α1n
det A = |A| = ... ..
.
.. .
. (6.21)
αn1 · · · αnn

Exercise 6.3.3. Let A ∈ Mn (Z) be an n × n integral matrix. Show that det A is an integer.
66 CHAPTER 6. THE DETERMINANT

Note that this fact would not be evident if all we knew was a practical algorithm, such as
Gaussian elimination, to compute the determinant: in the course of that computation we inevitably
run into fractions. On the other hand, the impractical n!-term sum that defines the determinant
makes the conclusion immediate.

Proposition 6.3.4 (2 × 2 and 3 × 3 determinants). Show


 
α β
(a) det = αδ − βγ
γ δ
 
α11 α12 α13
(b) det α21 α22 α23  =
α31 α32 α33

α11 α22 α33 + α12 α23 α31 + α13 α21 α32 − α11 α23 α32 − α12 α21 α33 − α13 α22 α31

6.4 Properties of determinants


For the following exercises, let A be an n × n matrix over the field F.

Proposition 6.4.1 (Transpose). Show that det(AT ) = det(A). This fact follows from what prop-
erty of inversions?

As a consequence, in every statement below, “column” can be replaced by “row.”

Proposition 6.4.2 (Zero column). If a column of A is 0, then det A = 0.

Proposition 6.4.3 (Common factor of a column). Let B be a matrix obtained from A by multi-
plying every element of a column by c. Then det(B) = c · det(A).

Proposition 6.4.4 (Swapping two columns). Let B be the matrix obtained by swapping two
columns of A. Then det(B) = − det(A).

Proposition 6.4.5 (Equal columns). If two columns of A are equal, then det A = 0.

the determinant ( R R
Warning: prove this without using the fact that elementary column operations do not change
Prop. 6.4.11). This is easier to prove if the characteristic of the field is not
2, in other words, if 1 + 1 6= 0 in F ( Def. 14.4.8). But the statement holds over all fields.
6.4. PROPERTIES OF DETERMINANTS 67

Proposition 6.4.6 (Diagonal matrices). The determinant of a diagonal matrix is the product of
its diagonal entries.
Proposition 6.4.7 (Triangular matrices). The determinant of an upper triangular matrix is the
product of its diagonal entries.
Example 6.4.8.  
5 1 7
det 0 2 6 = 30 (6.22)
0 0 3
and this value does not depend on the three entries in the upper-right corner.
Definition 6.4.9 (k-linearity). A function f : V × · · · × V → W is linear in the i-th component, if,
| {z }
k terms
whenever we fix x1 , . . . , xi−1 , xi+1 , . . . xk , the function
g(y) := f (x1 , . . . , xi−1 , y, xi+1 , . . . , xk )
is a linear,1 i. e.,
g(y1 + y2 ) = g(y1 ) + g(y2 ) (6.23)
g(αy) = αg(y) (6.24)
The function f is [k-linear] if it is linear in all k components. A function which is 2-linear is said
to be bilinear.
Proposition 6.4.10 (Multilinearity). The determinant is multilinear in the columns of A. That
is,

det[a1 | · · · | ai−1 | ai + b | ai+1 | · · · | an ]


= det[a1 | · · · | an ]
+ det[a1 | · · · | ai−1 | b | ai+1 | · · · | an ] (6.25)
and

det[a1 | · · · | ai−1 | αai | ai+1 | · · · | an ]


= α det[a1 | · · · | an ] . (6.26)
1
Linear maps will be discussed in more detail in Chapter 16.
68 CHAPTER 6. THE DETERMINANT

column operations ( R
Proposition 6.4.11 (Elementary column operations vs. determinant). Performing elementary
Def. 3.2.1) does not change the determinant of a matrix.

Corollary 6.4.12. If A ∈ Mn (F) is singular then det(A) = 0.

Proposition 6.4.13 (Multiplicativity of the determinant). Let A, B ∈ Mn (F). Then det(AB) =


det(A) det(B).

Corollary 6.4.14. If A ∈ Mn (F) is nonsingular then det(A) 6= 0 and det(A−1 ) = 1


det(A)
.

Corollary 6.4.15. A ∈ Mn (F) is nonsingular if and only if det(A) 6= 0.

We are ready for a more complete list of conditions equivalent to nonsingularity, augmenting
Theorem 3.4.12. The most significant addition is the first item: the determinantal characterization
of nonsingularity.

Theorem 6.4.16 (Equivalent conditions of nonsingularity). An n × n matrix A is nonsingular if


and only if any of the following equivalent conditions is satisfied.

(a) det(A) 6= 0.

(b) rk(A) = n, i. e., A has full rank.

(c) The columns of A are linearly independent, i. e., A has full column rank.

(d) The rows of A are linearly independent, i. e., A has full row rank.

(e) The columns of A span Fn .

(f ) The rows of A span Fn .

(g) A has a left inverse.

(h) A has a right inverse.

(i) A has an inverse.

(j) The system Ax = 0 of homogeneous linear equation has no nontrivial solution.

(k) For every b ∈ Fn , the system Ax = b of linear equations has a solution.


6.5. EXPRESSING RANK VIA DETERMINANTS 69


 
2 1 −3
Numerical exercise 6.4.17. Let A = 4 −1 0 .
2 5 −1

(a) Use the formula derived in part (b) of Prop. 6.3.4 to compute det A.

(b) Compute the matrix A0 obtained by performing the column operation (1, 2, −4) on A.

(c) Self-check : Use the same formula to compute det A0 , and verify that det A0 = det A.

6.5 Expressing rank via determinants


Corollary 6.5.1 (Determinant vs. system of linear equations). The homogeneous system Ax = 0
of n linear equations in n unknowns has a nontrivial solution if and only if det A = 0.

We now turn to characterizing the rank of a mtrix via determinants. Recall Theorem 3.3.12.

Theorem 6.5.2. Let A ∈ Fk×n be a matrix. Then rk A is the largest value of r such that A has a
nonsingular r × r submatrix. ♦

Combining this result with our determinantal characterization of nonsingularity, we obtain a


determinantal characterization of the rank of a matrix.

Corollary 6.5.3 (Determinantal characterization of rank). Let A ∈ Fk×n be a matrix. Then rk A


is the largest value of r such that A has an r × r submatrix B such that det(B) 6= 0.

6.6 Dependence of the rank on the field of scalars


In this section we assume familiarity with the concept of a field and its characteristic (see Sec-
tion 14.4). We draw conclusions from the determinantal characterization of the rank to the depen-
dence of the rank on the field of scalars.
Let A ∈ Fk×n and let F be a subfield of the field G. Then we can also view A as a matrix
over G. However, this changes the notion of linear combinations and therefore in principle it could
affect the rank. We shall see that this is not the case, but in order to be able to reason about this
70 CHAPTER 6. THE DETERMINANT

question, we temporarily use the notation rkF (A) and rkG (A) to denote the rank of A with respect
to the corresponding fields. We will also write rkp to mean rkFp and we shall compare the rank of
an integral matrix in characterisic zero and in characteristic p.
Lemma 6.6.1 (Nonsingularity is insensitive to field extensions). Let F be a subfield of G, and let
A ∈ Mn (F). Then A is nonsingular over F if and only if A is nonsingular over G. ♦
Corollary 6.6.2 (Rank insensitivity to field extension). Let F be a subfield of G, and let A ∈ Fk×n .
Then
rkF (A) = rkG (A) .
Integral matrices (matrices with integer entries) can be interpreted as matrices over any field.
Corollary 6.6.3. Let A be an integral matrix. Then rkF (A) only depends on the characteristic of
F.
In particular, rkQ (A) = rkR (A).
Notation 6.6.4. Let A be an integral matrix and p a prime number or zero. We write rkp (A) to
denote the rank of A over any field of characteristic p.
Exercise 6.6.5. Show that this notation is sound, i. e., rkF (A) only depends on char(F).
Exercise 6.6.6. Let A ∈ Zk×n be an integral matrix.
(a) Show that rkp (A) ≤ rk(A).

(b) For every prime number p, find a (0, 1) matrix A (that is, a matrix whose entries are only 0
and 1) where this inequality is strict, i. e., rkp (A) < rk(A).
Next we consider the rank of real matrices.

Theorem 6.6.7. Let A be a matrix with real entries. Then rk AT A = rk(A). ♦

Exercise 6.6.8. (a) Find a 2 × 2 matrix A over C such that rk AT A < rk(A).

(b) For every prime number p find n ∈ N and a matrix A ∈ Mn (Fp ) such that rkp AT A < rkp (A).
Minimize n. Show that n = 2 suffices if p = 2 or p ≡ 1 (mod 4) and n = 3 suffices for all p.

(c) For every prime number p find n ∈ N and a (0, 1)-matrix A such that rkp AT A < rkp (A).
What is the smallest value of n as a function of p you can find?
6.7. COFACTOR EXPANSION 71

Exercise 6.6.9.

(a) Show that rk(Jn − In ) = n .

(b) Show that 


n n even
rk2 (Jn − In ) = .
n − 1 n odd

6.7 Cofactor expansion


Definition 6.7.1 (Cofactor). Let A ∈ Mn (F) be a matrix. The (i, j) cofactor of A is (−1)i+j det(î Aĵ ),
where î Aĵ is the (n − 1) × (n − 1) matrix obtained by removing the i-th row and j-th column of A.

Theorem 6.7.2 (Cofactor expansion). Let A = (αij ) be an n × n matrix, and let Cij be the (i, j)
cofactor of A. Then for all i,
n
X
det A = αij Cij . (6.27)
j=1

This is the cofactor expansion of A along the i-th row. Similarly, for all j,
n
X
det A = αij Cij . (6.28)
i=1

This is the cofactor expansion of A along the j-th column. ♦

Numerical exercise 6.7.3. Compute the determinants of the following matrices by cofactor ex-
pansion (a) along the first row and (b) along the second column. Self-check : your answers should
be the same.
 
2 3 1
(a) 0 −4 −1
1 −3 4
 
3 −3 2
(b) 4 7 −1
6 −4 2
72 CHAPTER 6. THE DETERMINANT
 
1 3 2
(c) 3 −1
 0
0 6 5
 
6 2 −1
(d)  0 4 1
−3 1 1
Exercise 6.7.4. Compute the determinants of the following matrices.
 
α β β ··· β β
β α β · · · β β 
 
β β α · · · β β 
(a)  .. .. .. . .
 
.. .. 
. . . . . .
 
β β β · · · α β 
β β β ··· β α
 
1 1 0 0 ··· 0 0 0
1 1 1 0 · · · 0 0 0
 
0 1 1 1 · · · 0 0 0
(b)  .. .. .. .. . . .. .. .. 
 
. . . . . . . .
 
0 0 0 0 · · · 1 1 1
0 0 0 0 ··· 0 1 1
 
1 1 0 0 ··· 0 0 0
−1 1 1 0 · · · 0 0 0
 
 0 −1 1 1 · · · 0 0 0 
(c)  ..
 
.. .. .. . . .. .. .. 
 . . . . . . . .
 
0 0 0 0 · · · −1 1 1

R
0 0 0 0 · · · 0 −1 1
Exercise 6.7.5. Compute the determinant of the Vandermonde matrix ( Def. 2.5.9 generated
by α1 , . . . , αn .
Definition 6.7.6 (Fixed point). Let Ω be a set, and let f : Ω → Ω be a permutation of Ω. We say
that x ∈ Ω is a fixed point of f if f (x) = x. We say that f is fixed-point-free, or a derangement, if
f has no fixed points.
6.8. DETERMINANTAL FORMULA FOR THE INVERSE MATRIX 73

♥ Exercise 6.7.7. Let Fn denote the number of fixed-point free permutations of the set {1, . . . , n}.
Decide, for each n, whether the majority of Fn is odd or even (the answer will depend on n).

6.8 Determinantal formula for the inverse matrix


We now derive an explicit form for the inverse of a nonsingular n × n matrix. Our tool for this
will be the cofactor expansion in addition to the skew cofactor expansion. In Theorem 6.7.2, we
took the dot product of the i-th column of a matrix with the cofactors of corresponding to the i-th
column. We now instead take the dot product of the ith column with the cofactors corresponding
to the j-th column.
Proposition 6.8.1 (Skew cofactor expansion). Let A = (αij ) be an n × n matrix, and fix i and j
(1 ≤ i, j ≤ n) such that i 6= j. Let Ckj be the (k, j) cofactor of A. Then
n
X
αki Ckj = 0 . (6.29)
k=1

Definition 6.8.2 (Adjugate of a matrix). Let A ∈ Mn (F). Then the adjugate of A, denoted adj(A),
is the matrix whose (i, j) entry is the (j, i) cofactor of A.
Theorem 6.8.3 (Explicit form of the matrix inverse). Let A be a nonsingular n × n matrix. Then
1
A−1 = adj(A) . (6.30)
det A

Corollary 6.8.4. Let A ∈ Mn (Z) be an integral n × n matrix. A has an integral inverse A−1 ∈
Mn (Z) if and only if det(A) = ±1.
Exercise 6.8.5. Let n be odd and let A ∈ Mn (Z) be a nonsingular symmetric matrix whose
diagonal entries are all 0. Show that A−1 is not integral.
Numerical exercise 6.8.6. Compute the inverses of the following matrices. Self-check : multiply
your answer by the original matrix to get the identity.
 
1 −3
(a)
−2 4
74 CHAPTER 6. THE DETERMINANT
 
−4 2
(b)
−1 −1
 
3 4 −7
(c) −2
 1 −4
0 −2 5
 
−1 4 2
(d) −3
 2 −3
1 0 2

6.9 Cramer’s Rule


Cramer’s rule gives the explicit unique solution to n × n nonsingular systems of linear equations.
The solution is based on the explicit form of the inverse matrix in terms of determinants. This is
how the concept of determinants arose.

Proposition 6.9.1. Let A be a nonsingular square matrix. Then the system Ax = b of n linear
equations in n unknowns is solvable and has a unique solution.

Exercise 6.9.2. Prove Prop. 6.9.1 by showing that x = A−1 b is the unique solution.

Theorem 6.9.3 (Cramer’s Rule). Let A be a nonsingular n × n matrix over F, and let b ∈ Fn .
Let a = (α1 , . . . , αn )T = A−1 b denote the unique solution of the system Ax = b of linear equations.
Then
det Ai
αi = (6.31)
det A
where Ai is the matrix obtained by replacing the i-th column of A by b. ♦

Numerical exercise 6.9.4. Use Cramer’s Rule to solve the following systems of linear equations.
Self-check : plug your answers back into the original equations.

(a)

x1 + 2x2 = 3
x1 − x2 = 6
6.10. ADDITIONAL EXERCISES 75

(b)

−x1 + x2 − x3 = 4
2x1 + 3x2 + 4x3 = −2
−x1 − x2 − 3x3 = 3

6.10 Additional exercises


Definition 6.10.1 (Skew-symmetric matrix). The matrix A ∈ Mn (F) is skew-symmetric if AT = −A

R
and all diagonal elements of A are zero.
Exercise 6.10.2. Observe that if char(F) 6= 2, in other words, if 1 + 1 6= 0 in F ( Def. 14.4.8),
then the second condition is redundant: if AT = −A then all diagonal elements are automatically
zero. However, this conclusion is false if char(F) = 2; in that case, the condition AT = −A just
means that A is symmetric (AT = A).
Exercise 6.10.3. Let F be a field.
(a) Show that if A ∈ Mn (F) is skew-symmetric and n is odd then A is singular.

(b) For all even n, find a nonsingular skew-symmetric matrix A ∈ Mn (F).


Warning: item (a) is easy if char(F) 6= 2 but not quite so simple if char(F) = 2.
Exercise 6.10.4. Show that part (a) of the preceding exercise is false over F2 (and every field of
characteristic 2).
Exercise 6.10.5. Let F be a field of characteristic 2 and let n be odd. If A ∈ Mn (F) is symmetric2
and all of its diagonal entries are 0, then A is singular.
Proposition 6.10.6 (Hadamard’s Inequality). Let A = [a1 | · · · | an ] ∈ Mn (R). Prove
n
Y
| det A| ≤ kaj k . (6.32)

R
j=1

where kaj k is the norm of the vector aj ( Def. 1.5.2).


2
Observe that in a field of characteristic 2, “skew-symmetric” means the same thing as “symmetric.”
76 CHAPTER 6. THE DETERMINANT

Exercise 6.10.7. When does equality hold in Hadamard’s Inequality?


Definition 6.10.8 (Parallelepiped). Let v1 , v2 ∈ R2 . The parallelogram spanned by v1 and v2 is the
set (PICTURE)

parallelogram(v1 , v2 )
:= {αv1 + βv2 | 0 ≤ α, β ≤ 1} . (6.33)
More generally, the parallelepiped spanned by the vectors v1 , . . . , vn ∈ Rn is the set

parallelepiped(v1 , . . . , vn )
( n )
X
:= αi v i 0 ≤ αi ≤ 1 . (6.34)
i=1

Exercise 6.10.9. Let a, b ∈ R2 . Show that the area of the parallelogram spanned by these vectors
(PICTURE) is | det A|, where A = [a | b] ∈ M2 (R).
The following theorem is a generalization of Ex. 6.10.9.
Theorem 6.10.10. If v1 , . . . , vn ∈ Rn , then the volume of parallelepiped(v1 , . . . , vn ) is | det A|,
where A is the n × n matrix whose i-th column is vi . ♦
Notation 6.10.11. Let A ∈ Fk×n and B ∈ Fn×k be matrices, and let I ⊆ [n]. The matrix AI is the
matrix whose columns are the columns of A which correspond to the elements of I. The matrix I B
T
is BIT , i. e., the matrix whose rows are the rows of B which correspond to the elements of I.
 
  −4 1  
3 1 7 3 7
Example 6.10.12. Let A = ,B=  6 2 , and let I = {1, 3}. Then AI =

2 3 2 2 2
  −1 0
−4 1
and I B = .
−1 0
Theorem 6.10.13 (Cauchy-Binet formula). Let A ∈ Fk×n and let B ∈ Fn×k . Show that
X
det(AB) = det(AI ) det(I B) . (6.35)
I⊆[n]
|I|=k


Chapter 7

Theory of Systems of Linear Equations


II: Cramer’s Rule

Babai: Discover Linear Algebra. 77 This chapter last updated July 19, 2016
c 2016 László Babai.
Chapter 8

(F) Eigenvectors and Eigenvalues

8.1 Eigenvector and eigenvalue basics


Definition 8.1.1 (Eigenvector). Let A ∈ Mn (F) and let v ∈ Fn . Then v is an eigenvector of A if
v 6= 0 and there exists λ ∈ F such that Av = λv.
Definition 8.1.2 (Eigenvalue). Let A ∈ Mn (F). Then λ ∈ F is an eigenvalue of A if there exists a
nonzero vector v ∈ Fn such that Av = λv.
Exercise 8.1.3. What are the eigenvalues and eigenvectors of the n × n identity matrix?
Exercise 8.1.4. Let D = diag(λ1 , . . . , λn ) be the n × n diagonal matrix whose diagonal entries are
λ1 , . . . , λn . What are the eigenvalues and eigenvectors of D?
 
1 1
Exercise 8.1.5. Determine the eigenvalues of the 2 × 2 matrix .
1 0
Exercise 8.1.6. Find examples of real 2 × 2 matrices with
(a) no eigenvectors in R2
(b) exactly one eigenvector in R2 , up to scaling (i. e., any eigenvector is a scalar multiple of any
other eigenvector)
(c) two eigenvectors in R2 , up to scaling
(d) infinitely many eigenvectors in R2 , none of which is a scalar multiple of any other

Babai: Discover Linear Algebra. 78 This chapter last updated March 26, 2023
c 2016 László Babai.
8.1. EIGENVECTOR AND EIGENVALUE BASICS 79

Exercise 8.1.7. Prove: If λ is an eigenvalue of A ∈ Mn (F) then for every k ≥ 0, λk is an eigenvalue

R
of Ak .

Exercise 8.1.8. Show that the only eigenvalue of a nilpotent matrix is 0 ( Def. 2.2.18).

Proposition 8.1.9. Show that eigenvectors of a matrix corresponding to distinct eigenvalues are
linearly independent.

Exercise 8.1.10. Let A ∈ Mn (F) be a matrix and let v1 , . . . , vk be eigenvectors to distinct eigen-
values, where k ≥ 2. Then v1 + · · · + vk is not an eigenvector.

Then A is a scalar matrix ( R


Exercise 8.1.11. Let A ∈ Mn (F) be a matrix to which every nonzero vector is an eigenvector.
Def. 2.2.13).

Definition 8.1.12 (Eigenbasis). Let A ∈ Mn (F) be a matrix. An eigenbasis of A is a basis of Fn


made up of eigenvectors of A.

Exercise 8.1.13. Find all eigenbases of I.

Exercise 8.1.14. Let λ1 , . . . , λn be scalars. Find an eigenbasis of diag(λ1 , . . . , λn ).


 
1 1
Exercise 8.1.15. Prove that the matrix has no eigenbasis.
0 1

Exercise 8.1.16. Let A ∈ Mn (F). Then A is singular if and only if 0 is an eigenvalue of A.

Definition 8.1.17 (Left eigenvector). Let A ∈ Mn (F). Then x ∈ F1×n is a left eigenvector if x 6= 0
and there exists λ ∈ F such that xA = λx.
Definition 8.1.18 (Left eigenvalue). Let A ∈ Mn (F). Then λ ∈ F is a left eigenvalue of A if there
exists a nonzero row vector v ∈ F1×n such that vA = λv.
Convention 8.1.19. When we use the word “eigenvector” without a modifier, we refer to a right
eigenvector; the term “right eigenvector” is occasionally used for clarity.

Exercise 8.1.20. Prove that the left eigenvalues and the right eigenvalues of a matrix A ∈ Mn (F)
are the same, and if λ is an eigenvalue then its right and left geometric multiplicities are the same.

We shall see that the left eigenvalues and the right eigenvalues are the same.
80 CHAPTER 8. EIGENVECTORS AND EIGENVALUES

Exercise 8.1.21. Let A ∈ Mn (F). Show that if x is a right eigenvector to eigenvalue λ and yT is
a left eigenvector to eigenvalue µ, and λ 6= µ, then y · x = 0, i. e., x ⊥ y.

Notation 8.1.22. Let A ∈ Mn (F). We denote by Uλ (A) the set

Uλ := {v ∈ Fn | Av = λv} . (8.1)

We simply write Uλ for Uλ (A) if the matrix A is clear from the context.

Exercise 8.1.23. (a) Let A ∈ Mn (F) and λ ∈ F. Then Uλ = ker(λI − A). In particular, Uλ ≤ Fn
(subspace).

(b) Uλ consist of 0 and all eigenvectors of A to eigenvalue λ.

(c) λ is an eigenvalue if and only if dim(Uλ ) ≥ 1.

Definition 8.1.24 (Eigenspace and geometric multiplicity). Let A ∈ Mn (F) be a square matrix and
let λ be an eigenvalue of A. Then we call Uλ the eigenspace corresponding to the eigenvalue λ and
we call the dimension of Uλ the geometric multiplicity of λ.
The next exercise provides a method of calculating the geometric multiplicity.

Exercise 8.1.25. Let λ be an eigenvalue of the n × n matrix A. Then the geometric multiplicity
of λ is n − rk(λI − A).

Exercise 8.1.26. We can analogously define left eigenspaces and left geometric multiplicity. Prove:
λ ∈ F is a right eigenvalue if and only if it is a left eigenvalue, and if so, λ has the same right and
left geometric multiplicity.

Exercise 8.1.27. Let N be a nilpotent matrix.

(a) Show that I + N is invertible.

(b) Let D be a nonsingular diagonal matrix. Show that D + N is invertible.

(c) Find a nonsingular matrix A and a nilpotent matrix N such that A + N is singular.
8.2. SIMILAR MATRICES AND DIAGONALIZABILITY 81

8.2 Similar matrices and diagonalizability


Definition 8.2.1 (Similar matrices). Let A, B ∈ Mn (F). Then A and B are similar (notation:
A ∼ B) if there exists a nonsingular matrix S such that B = S −1 AS.

Proposition 8.2.3. Let A ∼ B ( R


Exercise 8.2.2. Let A and B be similar matrices. Show that, for k ≥ 0, we have Ak ∼ B k .
Def. 8.2.1). Then det A = det B.
Exercise 8.2.4. Let N ∈ Mn (F) be a matrix. Then N is nilpotent if and only if it is similar to a
strictly upper triangular matrix.
Proposition 8.2.5. Every matrix is similar to a block diagonal matrix where each block has the
form αI + N where α ∈ F and N is nilpotent.
Blocks of this form are “proto-Jordan blocks.” We will discuss Jordan blocks and the Jordan
canonical form of matrices in Section ??
Definition 8.2.6 (Diagonalizability). Let A ∈ Mn (F). We say that A is diagonalizable if A is similar
to a diagonal matrix.
Exercise 8.2.7.
 
1 1
(a) Prove that is not diagonalizable.
0 1
 
1 2
(b) Prove that is diagonalizable.
0 3
 
α11 α12
(c) When is the matrix diagonalizable?
0 α22
Proposition 8.2.8. Let A ∈ Mn (F). Then A is diagonalizable if and only if A has an eigenbasis.

8.3 Polynomial basics


We now digress from the theory of eigenvectors and eigenvalues in order to build a foundation in
polynomials. In this book, the term “polynomials” will refer to univariate polynomials (polynomials
in one variable), unless we expressly refer to multivariate polynomials.
We will discuss polynomials in greater depth in Section 14.5, but here we establish some basics
necessary for studying the characteristic polynomial in Section 8.4.
82 CHAPTER 8. EIGENVECTORS AND EIGENVALUES

Definition 8.3.1 (Polynomial). A polynomial over the field F in the variable t is an expression1 of
the form
f (t) = α0 + α1 t + α2 t2 + · · · + αn tn (8.2)
We omit (t) from the name of the polynomial if the name of the variable is either not relevant to
our discussion or it is clear from the context.
The αi are the coefficients of f . We may omit any terms with zero coefficient, e. g.,

6 + 0t − 3t2 + 0t3 = 6 − 3t2 (8.3)

The set of polynomials over the field F is denoted F[t].


Definition 8.3.2 (Zero polynomial). The polynomial which has all coefficients equal to zero is called
the zero polynomial and is denoted by 0.
Definition 8.3.3. The leading term of a polynomial f (t) = α0 + α1 t + · · · + αn tn is the term corre-
sponding to the highest power of t with a nonzero coefficient, that is, the term αk tk where αk 6= 0
and αj = 0 for all j > k. The zero polynomial does not have a leading term. The leading coefficient
of a polynomial f is the coefficient of the leading term of f . The degree of a polynomial f , denoted
deg f , is the exponent of its leading term. A polynomial is monic if its leading coefficient is 1.
Convention 8.3.4. The zero polynomial has degree −∞.
Example 8.3.5. Let f = 6 − 3t2 + 4t5 . Then the leading term of f is 4t5 , the leading coefficient
of f is 4, and deg f = 5.
Exercise 8.3.6. Which polynomials have degree 0?
Notation 8.3.7. We denote the set of polynomials of degree at most n over F by Pn (F).
Exercise 8.3.8. Pn (F) is a subspace of dimension n + 1 in F[t]. The polynomials 1, t, t2 , . . . , tn
form a basis of Pn (F). (See the relevant definitions in Section 15.3.)
Definition 8.3.9 (Divisibility of polynomials). Let f, g ∈ F[t]. We say that g divides f , or f is
divisible by g, written g | f , if there exists a polynomial h ∈ F[t] such that f = gh. In this case we
also say that g is a divisor of f and f is a multiple of g.
Exercise 8.3.10. Notice that 0 | 0. Why does this not violate the prohibition against division by
zero?
1
In Chapter 14, we will refine this definition and study polynomials more formally.
8.3. POLYNOMIAL BASICS 83

Definition 8.3.11 (Root of a polynomial). Let f ∈ F[t] be a polynomial. Then ζ ∈ F[t] is a root of
f if f (ζ) = 0. The roots of a polynomial are also often (confusingly) referred to as the zeros of the
polynomial.

Proposition 8.3.12. Let f ∈ F[t] be a polynomial and let ζ ∈ F[t].

(a) t − ζ | f (t) − f (ζ) .

(b) ζ is a root of F if and only if t − ζ | f .

Definition 8.3.13 (Multiplicity of a root). Let f be a polynomial and let ζ be a root of f . The
multiplicity of the root ζ is the largest k for which (t − ζ)k | f .

Proposition 8.3.14. Let f be a polynomial of degree n. Then f has at most n distinct roots.
Moreover, the sum of the multiplicities of the roots is still at most n.

Definition 8.3.15. Let f ∈ F[t] be a nonzero polynomial. We say that f splits into linear factors
over F if f can be written in the form
k
Y
f (t) = αk (t − ζi ) (8.4)
i=1

where αk is the leading coefficient of f and ζi ∈ F.

Exercise 8.3.16. This factoring, if exists, is unique up to the ordering of the terms.

Next we study the relation between the roots and the coefficients of a polynomial that splits
into linear factors.

Proposition 8.3.17. Let f (t) = α0 + α1 t + · · · + αn tn ∈ F[t] be a monic polynomial of degree n


(αn = 1) that factors into linear factors over F:
n
Y
f (t) = (t − λi ) . (8.5)
i=1

Then, for 0 ≤ ` ≤ n, X
αn−` = (−1)` λi1 λi2 · · · λi` . (8.6)
i1 <···<i`
84 CHAPTER 8. EIGENVECTORS AND EIGENVALUES

In particular,
n
X
αn−1 = − λi , and (8.7)
i=1
n
Y
n
α0 = (−1) λi . (8.8)
i=1

Definition 8.3.18 (Algebraically closed field). We say that the field F is algebraically closed if every
polynomial of degree ≥ 1 over F has at least one root in F.
The field of real numbers is not algebraically closed: the polynomial t2 + 1 has no real roots.

Theorem 8.3.19 (Fundamental Theorem of Algebra). The field of complex numbers is algebraically
closed. ♦

Proposition 8.3.20. Let F be an algebraically closed field. Let f ∈ F[t] be a nonzero polynomial.
Then f splits into linear factors over F.

Exercise 8.3.21. If F is algebraically closed and f ∈ F[t] is a nonzero polynomial then the sum of
the multiplities of the roots of f is deg f .

To exploit all the consequences of Prop. 8.3.20, it is often helpful to extend our field to an
algebraically closed field.

Theorem 8.3.22. Every field is a subfield of an algebraically closed field. ♦

8.4 The characteristic polynomial


We now establish a method for finding the eigenvalues of an n × n matrix.
Definition 8.4.1 (Characteristic polynomial). Let A be an n × n matrix. Then the characteristic
polynomial of A is defined by
fA (t) := det(tI − A) . (8.9)
 
α β
Exercise 8.4.2. Calculate the characteristic polynomial of the matrix A = .
γ δ
8.4. THE CHARACTERISTIC POLYNOMIAL 85

Exercise 8.4.3. Observe that the characteristic polynomial of an n×n matrix is a monic polynomial
of degree n.
Theorem 8.4.4. The eigenvalues of a square matrix A are precisely the roots of its characteristic

R
polynomial. ♦
In Section 8.1, we defined the geometric multiplicity of an eigenvalue ( Def. 8.1.24). We
now define the algebraic multiplicity of an eigenvalue.
Definition 8.4.5 (Algebraic multiplicity). Let A be a square matrix and let λ be an eigenvalue of
A. The algebraic multiplicity of λ is its multiplicity as a root of the characteristic polynomial of A.
Proposition 8.4.6. Let A be a square matrix and let λ be an eigenvalue of A. Then the geometric
multiplicity of λ is less than or equal to the algebraic multiplicity of λ.
Remark 8.4.7. We shall often make the assumption that the characteristic polynomial of the matrix
A ∈ Mn (F) splits into linear factors over F. We should point out that this condition is automatically
satisfied if
(i) A is triangular, or

(ii) F is algebraically closed, or

(iii) F = R and A is symmetric.


Qn
Proposition 8.4.8. Let A = (aij ) ∈ Mn (F) be a triangular matrix. Then fA (t) = i=1 (t − aii ) .
So for a triangular matrix, the eigenvalues are exactly the diagonal entries. The following exercise
amplifies the significance of this observation.
Theorem 8.4.9. Let A ∈ Mn (F). Assume the characteristic polynomial of A splits into linear
factors over F. Then A is similar (over F) to an upper triangular matrix. ♦
♥ Exercise 8.4.10. Let A = (aij ) ∈ Mn (F) be a triangular matrix, and assume all diagonal entries
of A are distinct. Prove that A is diagonalizable and therefore A ∼ diag(a11 , a22 , . . . , ann ).
We shall use this exercise, in combination with Theorem 8.4.9, in our proof of the Cayley–
Hamilton Theorem (Section 8.5).
Next we study the relation between the matrix and the coefficients of its characteristic polyno-
mial.
86 CHAPTER 8. EIGENVECTORS AND EIGENVALUES

Proposition 8.4.11. Let the characteristic polynomial of the matrix A ∈ Mn (F) be


fA (t) = α0 + α1 t + · · · + αn−1 tn−1 + tn . Then
αn−1 = − Tr A and (8.10)
α0 = (−1)n det A . (8.11)
We are now we are ready to show two key relations between the coefficients of the characteristic
polynomial and the eigenvalues. The following result is immediate by comparing Prop. 8.4.11 and
the relation between the roots and the coefficients of a polynomial described in Prop. 8.3.17.
n−1
Theorem 8.4.12. Let A ∈ Mn (F) and Qnlet fA (t) = α0 + α1 t + · · · + αn−1 t + tn . Assume fA splits
into linear factors over F Let f (t) = i=1 (t − λi ). Then
n
X
Tr A = λi (8.12)
i=1
Yn
det A = λi . (8.13)
i=1


Proposition 8.4.13. Let F be algebraically closed. Let A be a square matrix over F. Then A
is diagonalizable over F if and only if for every eigenvalue λ of A, the geometric and algebraic
multiplicities of λ are equal.

R
8.5 The Cayley-Hamilton Theorem
In Section 2.3, we defined how to substitute a square matrix into a polynomial ( Def. 2.3.3).
We repeat that definition here.

R
Definition 8.5.1 (Substitution of a matrix into a polynomial). Let f ∈ F[t] be the polynomial
( Def. 8.3.1) defined by
f = α0 + α1 t + · · · + αd td .
Just as we may substitute ζ ∈ F for the variable t in f to obtain a value f (ζ) ∈ F, we may also
“plug in” the matrix A ∈ Mn (F) to obtain f (A) ∈ Mn (F). The only thing we have to be careful
about is what we do with the constant term α0 ; we replace it with α0 times the identity matrix, so
f (A) := α0 I + α1 A + · · · + αd Ad . (8.14)
8.5. THE CAYLEY-HAMILTON THEOREM 87

Exercise 8.5.2. Let A and B be similar matrices, and let f be a polynomial. Show that f (A) ∼
f (B).
The main result of this section is the following theorem.
Theorem 8.5.3 (Cayley-Hamilton Theorem). Let A be an n × n matrix. Then fA (A) = 0.
Exercise 8.5.4. What is wrong with the following “proof” of the Cayley-Hamilton Theorem?

fA (A) = det(AI − A) = det 0 = 0 . (8.15)

Exercise 8.5.5. Use the Cayley–Hamilton Theorem and Theorem 8.3.22 to prove that an n × n
matrix is nilpotent if and only if its characteristic polynomial is tn .
Exercise 8.5.6. Prove the Cayley–Hamilton Theorem by brute force for 2 × 2 matrices.
Our proof strategy We shall proceed in three phases.
(a) Prove the Cayley–Hamilton Theorem for diagonalizable matrices.

(b) Prove the Cayley–Hamilton Theorem over C by a continuity argument, observing that the
diagonalizable matrices are dense within Mn (C).

(c) We invoke the Identity Principle to transfer the result from matrices over C to matrices over
any commutative unital ring (ring with identity).

R
We will first prove the Cayley–Hamilton Theorem for diagonal matrices, and then more generally
for diagonalizable matrices ( Def. 8.2.6).
Exercise 8.5.7. Let D = diag(λ1 , . . . , λn ) be an n × n diagonal matrix over F and let f ∈ F[t].
Then f (D) = diag(f (λ1 ), . . . , f (λn )) .
As an immediate consequence, we can prove the Cayley–Hamilton Theorem for diagonal matri-
ces.
Exercise 8.5.8. Let D be a diagonal matrix, and let fD be its characteristic polynomial. Then
fD (D) = 0.
Proposition 8.5.9. Let A ∼ B, so B = S −1 AS. Then

tI − B = S −1 (tI − A) S . (8.16)
88 CHAPTER 8. EIGENVECTORS AND EIGENVALUES

Theorem 8.5.10. If A ∼ B, then the characteristic polynomials of A and B are equal, i. e.,
fA = fB . ♦
Proposition 8.5.11. Let g ∈ F[t], and let A, S ∈ Mn (F) with S nonsingular. Then g (S −1 AS) =
S −1 g(A)S.
Corollary 8.5.12. Let g ∈ F[t] and let A, B ∈ Mn (F) with A ∼ B. Then g(A) ∼ g(B).
Putting all the above together, we obtain the Cayley–Hamilton Theorem for diagonalizable
matrices over any field.
Proposition 8.5.13. The Cayley-Hamilton Theorem holds for diagonalizable matrices.
Let us now focus on matrices over C. All norms being equivalent, we use the simplest one:
elementwise max-norm.
Definition 8.5.14. For A = (aij ) ∈ Cr×s , let kAkmax = maxi,j |aij |. For A, B ∈ Cr×s , we write
ρ(A, B) = kA − B| max. For a sequence {Ak } of matrices in Cr×s , let us say that limk→∞ Ak = B
if limk→∞ ρ(Ak , B) = 0.
Analogousy, for polynomials f (t) = ni=0 αn tn we use the norm kf kmax = maxi |αi | and the corre-
P
sponding distance measure.
Proposition 8.5.15. The diagonalizable matrices are dense in Mn (C). In other words, if B ∈
Mn (C) then for any  > 0 there exists a diagonalizabe A ∈ Mn (C) such that ρ(A, B) ≤ .
Hint. First prove this for triangular matrices, using Ex. 8.4.10. Then combine this with Theo-
rem 8.4.9.
The next observation follows by continuity.
Proposition 8.5.16. Let {Ak } be sequence of n × n matrices and {fk } a sequence of monic
polynomials of degree n over C such that fk → g and Ak → B. Then fk (Ak ) → g(B).
Our final observation is that if Ak → B then we have the corresponding limit relation for
their characteristic polynomials: fAk → fB . So to prove the Cayley–Hamilton Theorem for an
arbitrary B ∈ Mn (C), we take a sequence of diagonalizable matrices Ak approaching B; and then
0 = fAk (Ak ) → fB (B). So we completed the proof of the Cayley–Hamilton Theorem for matrices
over C.
Exercise 8.5.17. The ring R = Z[x1 , . . . , xm ] of multivariate polynomials over Z is isomorphic to
a subring of C.
8.6. ADDITIONAL EXERCISES 89

So in particular we proved the Cayley–Hamilton Theorem over this ring R. The following
observation provides the final step of the proof of the Cayley–Hamilton Theorem over any field and
in fact over any commutative unital ring (ring with identity).

Exercise 8.5.18 (Identity principle). Let T be a commutative unital ring with m generators. Then
T is a quotient ring of R = Z[x1 , . . . , xm ]. Therefore any polynomial identity that holds over R also
holds over T .

So we have proved the Cayley–Hamilton Theorem over T , noting that this theorem asserts a set
of polynomial identities.

8.6 Additional exercises


Proposition 8.6.1. Let n ≥ k. Let A ∈ Fk×n and B ∈ Fn×k . Then fAB = fBA · tn−k .

Definition 8.6.2 (Companion matrix). Let f ∈ Pn [F] be a monic polynomial,

f (t) = α0 + α1 t + · · · + αn−1 tn−1 + αn tn , (8.17)

where αn = 1. Then the companion matrix of f is the matrix C(f ) ∈ Mn (F) defined as
 
0 0 0 · · · 0 −α0
1
 0 0 · · · 0 −α1  
0 1 0 · · · 0 −α2 
C(f ) :=  .. ..  . (8.18)
 
.. .. . . ..
. . . . . . 
 
0 0 0 · · · 0 −αn−2 
0 0 0 · · · 1 −αn−1

Proposition 8.6.3. Let f be a monic polynomial and let A = C(f ) be its companion matrix.
Then the characteristic polynomial of A is equal to f .

Corollary 8.6.4. Every monic polynomial f ∈ Q[t] is the characteristic polynomial of a rational
matrix.

Exercise 8.6.5. Determine the eigenvalues and their (geometric and algebraic) multiplicities of the
all-ones matrix Jn over R.
90 CHAPTER 8. EIGENVECTORS AND EIGENVALUES

Exercise 8.6.6. (a) Let n be odd. Then every matrix A ∈ Mn (R) has a real eigenvector.

(b) Let n be even. Find a matrix B ∈ Mn (R) that has no real eigenvector.

The following two exercises are easier for diagonalizable matrices.

Exercise 8.6.7. Let f (t) = ∞ i


P
i=0 αi t be a power series with convergence radius r > 0 (i. e., it
converges for all t ∈ C such that |t| < r). Here r can be ∞. Define, for A ∈ Mn (R),

X
f (A) = αi A i . (8.19)
i=0

Prove that f (A) converges if |λi | < r for all eigenvalues λi of A. In particular, eA always converges.

Exercise 8.6.8.

(a) Find square matrices A and B such that eA+B 6= eA eB .

(b) Prove: if A and B commute (AB = BA) then eA+B = eA · eB .


Chapter 9

(R) Orthogonal Matrices

R
9.1 Orthogonal matrices

vectors (R R
In Section 1.4, we defined the standard dot product ( Def. 1.4.1) and the notions of orthogonal
Def. 1.4.8). In this chapter we study orthogonal matrices, matrices whose columns
form an orthonormal basis ( Def. 1.5.6) of Rn .
Definition 9.1.1 (Orthogonal matrix). The matrix A ∈ Mn (R) is orthogonal if AT A = I. The set
of orthogonal n × n matrices is denoted by O(n).

R
Fact 9.1.2. A ∈ Mn (R) is orthogonal if and only if its columns form an orthonormal basis of Rn .
Proposition 9.1.3. O(n) is a group ( Def. 14.3.2) under matrix multiplication (it is called the
orthogonal group).
Exercise 9.1.4. Which diagonal matrices are orthogonal?
Theorem 9.1.5 (Third Miracle of Linear Algebra). Let A ∈ Mn (R). Then the columns of A are
orthonormal if and only if the rows of A are orthonormal. ♦
Proposition 9.1.6. Let A ∈ O(n). Then all eigenvalues of A have absolute value 1.
Exercise 9.1.7. The matrix A ∈ Mn (R) is orthogonal if and only if A preserves the dot product,
i. e., for all v, w ∈ Rn , we have (Av)T (Aw) = vT w.
Exercise 9.1.8. The matrix A ∈ Mn (R) is orthogonal if and only if A preserves the norm, i. e., for
all v ∈ Rn , we have kAvk = kvk.

Babai: Discover Linear Algebra. 91 This chapter last updated August 4, 2016
c 2016 László Babai.
92 CHAPTER 9. ORTHOGONAL MATRICES

Definition 9.1.9 (Hadamard matrix). The matrix A = (αij ) ∈ Mn (R) is an Hadamard matrix if
αij = ±1 for all i, j, and the columns of A are orthogonal. We denote by H the set

H := {n | an n × n Hadamard matrix exists} . (9.1)


 
1 1
Example 9.1.10. The matrix is an Hadamard matrix.
1 −1
Exercise 9.1.11. Let A be an n × n Hadamard matrix. Create a 2n × 2n Hadamard matrix.

Proposition 9.1.12. If n ∈ H and n > 2, then 4 | n.

Proposition 9.1.13. If p is prime and p ≡ −1 (mod 4), then p + 1 ∈ H.

Proposition 9.1.14. If k, ` ∈ H, then k` ∈ H.

Proposition 9.1.15. Let A be an n × n Hadamard matrix. Then √1 A is an orthogonal matrix.


n

9.2 Orthogonal similarity


Definition 9.2.1 (Orthogonal similarity). Let A, B ∈ Mn (R). We say that A is orthogonally similar
to B, denoted A ∼o B, if there exists an orthogonal matrix O such that A = O−1 BO.
Note that O−1 BO = OT BO because O is orthogonal.

Proposition 9.2.2. Let A ∼o B.

(a) If A is symmetric then so is B.

(b) If A is orthogonal then so is B.

Proposition 9.2.3. Let A ∼o diag(λ1 , . . . , λn ). Then

(a) If all eigenvalues of A are real then A is symmetric.

(b) If all eigenvalues of A have unit absolute value then A is orthogonal.

Proposition 9.2.4. A ∈ Mn (R) has an orthonormal eigenbasis if and only if A is orthogonally


similar to a diagonal matrix.
9.2. ORTHOGONAL SIMILARITY 93

Proposition 9.2.5. Let A ∈ Mn (R). Then A ∈ O(n) if and only if it is orthogonally similar to a
matrix which is the diagonal sum of some of the following: an identity matrix, a negative identity
matrix, 2 × 2 rotation matrices (compare with Prop. 16.4.44).

Examples 9.2.6. The following are examples of the matrices described in the preceding proposition.
−1 0 0
 
1 1
(a)  0 √2 − √2 
0 √12 √12

1 0 0 0 0 0 0 0 0
 
0 −1 0 0 0 0 0 0 0 
0 0 −1 0 0 0 0
 
0 0 
0 0 −1 0 0
 
0 0 0 0 
 
(b) 0 0
 0 1 0 0 0 0 0 
0 0 0 0 −1

0 0 0 0 
 
0 0 0 0 0 1 0 0 0 
 √ 
3 1
0 0 0 0 0 0 0 2

√2
1 3
0 0 0 0 0 0 0 2 2
Chapter 10

The Spectral Theorem

10.1 Statement of the Spectral Theorem


The Spectral Theorem is one of the most significant results of linear algebra, as well as one of the
most frequently applied mathematical results in pure math, applied math, and science. We will see
a number of different versions of this theorem, and we will not prove it until Section 19.4. However,
we now have developed the tools necessary to understand the statement of the theorem and some
of its applications.

Theorem 10.1.1 (The Spectral Theorem for real symmetric matrices). Let A ∈ Mn (R) be a real

R
symmetric matrix. Then A has an orthonormal eigenbasis. ♦

The Spectral Theorem can be restated in terms of orthogonal similarity ( Def. 9.2.1).

Theorem 10.1.2 (The Spectral Theorem for real symmetric matrices, restated). Let A ∈ Mn (R)
be a real symmetric matrix. Then A is orthogonally similar to a diagonal matrix. ♦

Exercise 10.1.3. Verify that these two formulations of the Spectral Theorem are equivalent.

Corollary 10.1.4. Let A be a real symmetric matrix. Then A is diagonalizable.

Corollary 10.1.5. Let A be a real symmetric matrix. Then all of the eigenvalues of A are real.

Babai: Discover Linear Algebra. 94 This chapter last updated August 4, 2016
c 2016 László Babai.
10.2. APPLICATIONS OF THE SPECTRAL THEOREM 95

10.2 Applications of the Spectral Theorem


Although we have not yet proved the Spectral Theorem, we can already begin to study some of its
many applications.

Proposition 10.2.1. If two symmetric matrices are similar then they are orthogonally similar.

Exercise 10.2.2. Let A be a symmetric real n × n matrix, and let v ∈ Rn . Let b = (b1 , . . . , bn)
be an orthonormal eigenbasis of A. Express vT Av in terms of the eigenvalues and the coordinates
of v with respect to b.

Definition 10.2.3 (Positive definite matrix). An n × n real matrix A ∈ Mn (R) is positive definite if
for all x ∈ Rn (x 6= 0), we have xT Ax > 0.

Proposition 10.2.4. Let A ∈ Mn (R) be a real symmetric n × n matrix. Then A is positive definite
if and only if all eigenvalues of A are positive.

Another consequence of the Spectral Theorem is Rayleigh’s Principle.


Definition 10.2.5 (Rayleigh quotient). Let A ∈ Mn (R). The Rayleigh quotient of A is a function
RA : Rn \ {0} → R defined by
vT Av

R
RA (v) = . (10.1)
kvk2
Recall that kvk2 = vT v ( Def. 1.5.2).

Proposition 10.2.6 (Rayleigh’s Principle). Let A be an n × n real symmetric matrix with eigen-
values λ1 ≥ · · · ≥ λn . Then

(a) max RA (v) = λ1


v∈Rn \{0}

(b) min RA (v) = λn


v∈Rn \{0}

Theorem 10.2.7 (Courant-Fischer). Let A be a symmetric real matrix with eigenvalues λ1 ≥ · · · ≥


λn . Then
λi = maxn min RA (v) . (10.2)
U ≤R v∈U \{0}
dim U =i


96 CHAPTER 10. THE SPECTRAL THEOREM

Theorem 10.2.8 (Interlacing). Let A ∈ Mn (R) be an n × n symmetric real matrix. Let B be


the (n − 1) × (n − 1) matrix obtained by deleting the i-th column and the i-th row of A (so B is
also symmetric). Prove that the eigenvalues of A and B interlace, i. e., if the eigenvalues of A are
λ1 ≥ · · · ≥ λn and the eigenvalues of B are µ1 ≥ · · · ≥ µn−1 , then

λ1 ≥ µ1 ≥ λ2 ≥ µ2 ≥ · · · ≥ λn−1 ≥ µn−1 ≥ λn .


Chapter 11

(F, R) Bilinear and Quadratic Forms

11.1 (F) Linear and bilinear forms


Definition 11.1.1 (Linear form). A linear form is a function f : Fn → F with the following properties.

(a) f (x + y) = f (x) + f (y) for all x, y ∈ Fn ;

(b) f (λx) = λf (x) for all x ∈ Fn and λ ∈ F.

Exercise 11.1.2. Let f be a linear form. Show that f (0) = 0.

Definition 11.1.3 (Dual space). The set of linear forms f : Fn → F is called the dual space of Fn
and is denoted (Fn )∗ .

Example 11.1.4. The function f (x) = x1 + · · · + xn (where x = (x1 , . . . , xn )T ) is a linear form.


More generally, for any a = (α1 , . . . , αn )T ∈ Fn , the function
n
X
T
f (x) = a x = α i xi (11.1)
i=1

is a linear form.

Theorem 11.1.5 (Representation Theorem for Linear Forms). Every linear form f : Fn → F has
the form (11.1) for some column vector a ∈ Fn . ♦

Babai: Discover Linear Algebra. 97 This chapter last updated April 15, 2023
c 2016 László Babai.
98 CHAPTER 11. BILINEAR AND QUADRATIC FORMS

Definition 11.1.6 (Bilinear form). A bilinear form is a function f : Fn × Fn → F with the following
properties.
(a) f (x1 + x2 , y) = f (x1 , y) + f (x2 , y)

(b) f (λx, y) = λf (x, y)

(c) f (x, y1 + y2 ) = f (x, y1 ) + f (x, y2 )

(d) f (x, λy) = λf (x, y)


Exercise 11.1.7. Let f : Fn × Fn → F be a bilinear form. Show that for all x, y ∈ F, we have

f (x, 0) = f (0, y) = 0 . (11.2)

The next result explains the term “bilinear.”


Proposition 11.1.8. The function f : Fn × Fn → F is a bilinear form exactly if
(1) (1)
(a) for all a ∈ Fn the function fa : Fn → F defined by fa (x) = f (a, x) is a linear form and
(2) (2)
(b) for all b ∈ Fn the function fb : Fn → F defined by fb (x) = f (x, b) is a linear form.
Examples 11.1.9. The standard dot product
n
X
T
f (x, y) = x y = xi y i (11.3)
i=1

is a bilinear form. More generally, for any matrix A ∈ Mn (F), the expression
n X
X n
T
f (x, y) = x Ay = αij xi yj (11.4)
i=1 j=1

is a bilinear form.
Theorem 11.1.10 (Representation Theorem for bilinear forms). Every bilinear form f has the
form (11.4) for some matrix A. ♦
Definition 11.1.11 (Nonsingular bilinear form). We say that the bilinear form f (x, y) = xT Ay is
nonsingular if the matrix A is nonsingular.
11.2. MULTIVARIATE POLYNOMIALS 99

Exercise 11.1.12. Is the standard dot product nonsingular?

R
Exercise 11.1.13.

(a) Assume F does not have characteristic 2, i. e., 1 + 1 6= 0 in F ( Section 14.4 for more about
the characteristic of a field). Prove: if n is odd and the bilinear form f : Fn × Fn → F satisfies
f (x, x) = 0 for all x ∈ Fn , then f is singular.

(b)∗ Prove this without the assumption that 1 + 1 6= 0.

(b) Over every field F, find a nonsingular bilinear form f over F2 such that f (x, x) = 0 for all
x ∈ F2 .

(c) Extend this to all even dimensions.

11.2 (F) Multivariate polynomials


Definition 11.2.1 (Multivariate monomial). A monomial in the variables x1 , . . . , xn is an expression
of the form
Yn
f =α xki i
i=1

for some nonzero scalar α and exponents ki . The degree of this monomial is
n
X
deg f := ki . (11.5)
i=1

Examples 11.2.2. The following expressions are multivariate monomials of degree 4 in the variables
x1 , . . . , x6 : x21 x2 x3 , 5x4 x35 , −3x46 .

Definition 11.2.3 (Multivariate polynomial). A multivariate polynomial is an expression which is


the sum of multivariate monomials.
Observe that the preceding definition includes the possibility of the empty sum, corresponding
to the 0 polynomial.
100 CHAPTER 11. BILINEAR AND QUADRATIC FORMS

n
Y
Definition 11.2.4 (Monic monomial). We call the monomials of the form xki i monic. We define
i=1
n
Y n
Y
the monic part of the monomial f = α xki i to be the monomial xki i .
i=1 i=1

Definition 11.2.5 (Standard form of a multivariate polynomial). A multivariate polynomial is in


standard form if it is expressed as a (possibly empty) sum of monomials with distinct monic parts.
The empty sum of monomials is the zero polynomial, denoted by 0.

Examples 11.2.6. The following are multivariate polynomials in the variables x1 , . . . , x7 (in stan-
dard form).

(a) 3x1 x33 x5 + 2x2 x6 − x4

(b) 4x31 + 2x5 x27 + 3x1 x5 x7

(c) 0

Definition 11.2.7 (Degree of a multivariate polynomial). The degree of a multivariate polynomial


f is the highest degree of a monomial in the standard form expression for f . The degree of the 0
polynomial1 is defined to be −∞.

Exercise 11.2.8. Let f and g be multivariate polynomials. Then

(a) deg(f + g) ≤ max{deg f, deg g} ;

(b) deg(f g) = deg f + deg g .

Note that by our convention, these rules remain valid if f or g is the zero polynomial.
Definition 11.2.9 (Homogeneous multivariate polynomial). The multivariate polynomial f is a ho-
mogeneous polynomial of degree k if every monomial in the standard form expression of f has degree
k.

Fact 11.2.10. The 0 polynomial is a homogeneous polynomial of degree k for all k.

Fact 11.2.11.
1
This is in accordance with the natural convention that the maximum of the empty list is −∞.
11.3. QUADRATIC FORMS 101

(a) If f and g are homogeneous polynomials of degree k, then f + g is a homogeneous polynomial


of degree k.

(b) If f is a homogeneous polynomial of degree k and g is a homogeneous polynomial of degree


`, then f g is a homogeneous polynomial of degree k + `.

R
Exercise 11.2.12. For all n and k, count the monic monomials of degree k in the variables
x1 , . . . , xn . Note that this is the dimension (
mials of degree k.
Def. 15.3.6) of the space of homogeneous polyno-

Exercise 11.2.13. What are the homogeneous polynomials of degree 0?

Fact 11.2.14. Linear forms are exactly the homogeneous polynomials of degree 1.

degree 2 (R
In the next section we explore quadratic forms, which are the homogeneous polynomials of
(11.6)).

11.3 (R) Quadratic forms


In this section we restrict our attention to the field of real numbers.
Definition 11.3.1 (Quadratic form). A quadratic form is a function Q : Rn → Rn where Q(x) =
f (x, x) for some bilinear form f .
Definition 11.3.2. Let A ∈ Mn (R) be an n × n matrix. The quadratic form associated with A,
denoted QA , is defined by
Xn X n
T
QA (x) = x Ax = αij xi xj . (11.6)
i=1 j=1

matrix B, the quadratic form QB is the numerator of the Rayleigh quotient RB ( R


Note that for all matrices A ∈ Mn (R), we have QA (0) = 0. Note further that for a symmetric
Def. 10.2.5).

Proposition 11.3.3. For all A ∈ Mn (R), there is a unique B ∈ Mn (R) such that B is a symmetric
matrix and QA = QB , i. e., for all x ∈ Rn , we have

xT Ax = xT Bx .
102 CHAPTER 11. BILINEAR AND QUADRATIC FORMS

Exercise 11.3.4. Find the symmetric matrix B such that

QB (x) = 3x21 − 7x1 x2 + 2x22

where x = (x1 , x2 )T .

Definition 11.3.5. Let Q be a quadratic form in n variables.

(a) Q is positive definite if Q(x) > 0 for all x 6= 0.

(b) Q is positive semidefinite if Q(x) ≥ 0 for all x.

(c) Q is negative definite if Q(x) < 0 for all x 6= 0.

(d) Q is negative semidefinite if Q(x) ≤ 0 for all x.

(e) Q is indefinite if it is neither positive semidefinite nor negative semidefinite, i. e., there exist
x, y ∈ Rn such that Q(x) > 0 and Q(y) < 0.

Definition 11.3.6. We say that a matrix A ∈ Mn (R) is positive definite if it is symmetric and its
associated quadratic form QA is positive definite. Positive semidefinite, negative definite, negative
semidefinite, and indefinite symmetric matrices are defined analogously.
Notice that we shall not call a non-symmetric matrix A positive definite, etc., even if the
quadratic form QA is positive definite, etc.

Exercise 11.3.7. Categorize diagonal matrices according to their definiteness, i. e., tell, which
diagonal matrices are positive definite, etc.

Exercise 11.3.8. Let A ∈ Mn (R). Assume QA is positive definite and λ is a real eigenvalue of A.
Prove λ > 0. Note that A is not necessarily symmetric.

Corollary 11.3.9. If A is a positive definite (and therefore symmetric by definition) matrix and λ
is an eigenvalue of A then λ > 0.

Proposition 11.3.10. Let A ∈ Mn (R) be a symmetric real matrix with eigenvalues λ1 ≥ · · · ≥ λn .

(a) QA is positive definite if and only if λi > 0 for all i.

(b) QA is positive semidefinite if and only if λi ≥ 0 for all i.


11.3. QUADRATIC FORMS 103

(c) QA is negative definite if and only if λi < 0 for all i.

(d) QA is negative semidefinite if and only if λi ≤ 0 for all i.

(e) QA is indefinite if there exist i and j such that λi > 0 and λj < 0 for all i.

Proposition 11.3.11. If A ∈ Mn (R) is positive definite, then its determinant is positive.

Exercise 11.3.12. Show that if A and B are symmetric n × n matrices and A ∼ B and A is
positive definite, then so is B.

Definition 11.3.13 (Corner matrix). Let A = (αij )ni,j=1 ∈ Mn (R) and define for k = 1, . . . , n, the
corner matrix Ak := (αij )ki,j=1 to be the k × k submatrix of A, obtained by taking the intersection
of the first k rows of A with the first k columns of A. In particular, An = A. The k-th corner
determinant of A is det Ak .

Exercise 11.3.14. Let A ∈ Mn (R) be a symmetric matrix.

(a) If A is positive definite then all of its corner matrices are positive definite.

(b) If A is positive definite then all of its corner determinants are positive.

The next theorem says that (b) is actually a necessary and sufficient condition for positive
definiteness.

Theorem 11.3.15. Let A ∈ Mn (R) be a symmetric matrix. A is positive definite if and only if all
of its corner determinants are positive. ♦

Exercise 11.3.16. Show that the following statement is false for every n ≥ 2: The n×n symmetric
matrix A is positive semidefinite if and only if all corner determinants are nonnegative.

Exercise 11.3.17. Show that the statement of Ex. 11.3.16 remains false for n ≥ 3 if in addition
we require all diagonal elements to be positive.

∗ ∗ ∗

In the next sequence of exercises we study the definiteness of quadratic forms QA where A is
not necessarily symmetric.
104 CHAPTER 11. BILINEAR AND QUADRATIC FORMS

Exercise 11.3.18. Let A ∈ Mn (R) be an n × n matrix such that QA is positive definite. Show that
A need not have all of its corner determinants positive. Give a 2 × 2 counterexample. Contrast this
with part (b) of Ex. 11.3.14.
Proposition 11.3.19. Let A ∈ Mn (R). If QA is positive definite, then so is QAT ; analogous re-
sults hold for positive semidefinite, negative definite, negative semidefinite, and indefinite quadratic
forms.
Exercise 11.3.20. Let A, B, C ∈ Mn (R). Assume B = C T AC. Prove: if QA is positive semidefinite
then so is QB .

R
Exercise 11.3.21. Let A, B ∈ Mn (R). Show that if A ∼o B (A and B are orthogonally similar,
Def. 9.2.1) and QA is positive definite then QB is positive definite.
 
α β
Exercise 11.3.22. Let A = . Prove that QA is positive definite if and only if α, δ > 0 and
γ δ
(β + γ)2 < 4αδ.
Exercise 11.3.23. Let A, B ∈ Mn (R) be matrices that are not necessarily symmetric.
(a) Prove that if A ∼ B and QA is positive definite, then QB cannot be negative definite.

(b) Find 2 × 2 matrices A and B such that A ∼ B and QA is positive definite but QB is indefinite.
(Contrast this with Ex. 11.3.12.)

11.4 (F) Geometric algebra (optional)


Let us fix a bilinear form f : Fn × Fn → F.
Definition 11.4.1 (Orthogonality). Let x, y ∈ Fn . We say that x and y are orthogonal with respect
to f (notation: x ⊥ y) if f (x, y) = 0.
Definition 11.4.2. Let S, T ⊆ Fn . For v ∈ Fn , we say that v is orthogonal to S (notation: v ⊥ S)
if for all s ∈ S, we have v ⊥ s. Moreover, we say that S is orthogonal to T (notation: S ⊥ T ) if
s ⊥ t for all s ∈ S and t ∈ T .
Definition 11.4.3. Let S ⊆ Fn . Then S ⊥ (“S perp”) is the set of vectors orthogonal to S, i. e.,

S ⊥ := {v ∈ Fn | v ⊥ S}. (11.7)
11.4. GEOMETRIC ALGEBRA 105

Proposition 11.4.4. For all subsets S ⊆ Fn , we have S ⊥ ≤ Fn .


⊥
Proposition 11.4.5. Let S ⊆ Fn . Then S ⊆ S ⊥ .

Exercise 11.4.6. Prove: (Fn )⊥ = {0} if and only if f is nonsingular.

Exercise 11.4.7. Verify {0}⊥ = Fn .

Exercise 11.4.8. What is ∅⊥ ?

∗ ∗ ∗

For the rest of this section we assume that f is nonsingular.

Theorem 11.4.9 (Dimensional complementarity). Let U ≤ Fn . Then

dim U + dim U ⊥ = n . (11.8)

Corollary 11.4.10. Let S ⊆ Fn . Then


⊥
S⊥ = span(S) . (11.9)

In particular, if U ≤ Fn then
⊥
U⊥ =U . (11.10)

Definition 11.4.11 (Isotropic vector). The vector v ∈ Fn is isotropic if v 6= 0 and v ⊥ v.

Exercise 11.4.12. Let A = I, i. e., f (x, y) = xT y (the standard dot product).

(a) Prove: over R, there are no isotropic vectors.

(b) Find isotropic vectors in C2 , F25 , and F22 .

(c) For what primes p is there an isotropic vector in F2p ?

Definition 11.4.13 (Totally isotropic subspace). The subspace U ≤ Fn is totally isotropic if U ⊥ U ,


i. e., U ≤ U ⊥ .
106 CHAPTER 11. BILINEAR AND QUADRATIC FORMS

Exercise 11.4.14. If S ⊆ Fn and S ⊥ S then span(S) is a totally isotropic subspace.

Corollary 11.4.15 (to Theorem 11.4.9). If U ≤ Fn is a totally isotropic subspace then dim U ≤ b n2 c.

Exercise 11.4.16. For even n, find an n2 -dimensional totally isotropic subspace in Cn , Fn5 , and Fn2 .

Exercise 11.4.17. Let F be a field and let k ≥ 2. Consider the following statement.

Stm(F, k): If U ≤ Fn is a k-dimensional subspace then U contains an isotropic vector.

Prove:

(a) if k ≤ ` and Stm(F, k) is true, then Stm(F, `) is also true;

(b) Stm(F, 2) is true

(b1) for F = F2 and


(b2) for F = C;

(c) Stm(F, 2) holds for all finite fields of characteristic 2;

(d) Stm(F, 2) is false for all finite fields of odd characteristic;

(e)∗ Stm(F, 3) holds for all finite fields.

Exercise 11.4.18. Prove: if Stm(F, 2) is true then every maximal totally isotropic subspace of Fn
has dimension b n2 c. In particular, this conclusion holds for F = F2 and for F = C.

Exercise 11.4.19. Prove: for all finite fields F, every maximal totally isotropic subspace of Fn has
dimension ≥ n2 − 1.
Chapter 12

Complex Matrices

12.1 Complex numbers


Before beginning our discussion of matrices with entries taken from the field C, we provide a refresher
of complex numbers and their properties.

√ number). A complex number is a number z ∈ C of the form z = a + bi,


Definition 12.1.1 (Complex
where a, b ∈ R and i = −1.
Notation 12.1.2. Let z be a complex number. Then Re z denotes the real part of z and Im z denotes
the imaginary part of z. In particular, if z = a + bi, then Re z = a and Im z = b.
Definition 12.1.3 (Complex conjugate). Let z = a + bi ∈ C. Then the complex conjugate of z,
denoted z, is
z = a − bi . (12.1)

Exercise 12.1.4. Let z1 , z2 ∈ C. Show that

z1 · z2 = z1 · z2 . (12.2)

Fact 12.1.5. Let z = a + bi. Then zz = a2 + b2 . In particular, zz ∈ R and zz ≥ 0.

Definition 12.1.6 (Magnitude√ of a complex number). Let z ∈ C. Then the magnitude, norm, or
absolute value of z is |z| = zz. If |z| = 1, then z is said to have unit norm.

Babai: Discover Linear Algebra. 107 This chapter last updated August 9, 2016
c 2016 László Babai.
108 CHAPTER 12. COMPLEX MATRICES

Proposition 12.1.7. Let z ∈ C have unit norm. Then z can be expressed in the form

z = cos θ + i sin θ (12.3)

for some θ ∈ [0, 2π).


Proposition 12.1.8. The complex number z has unit norm if and only if z = z −1 .
Until now, we have dealt with matrices over the reals or over a general field F. We now turn
our attention specifically to matrices with complex entries.

12.2 Hermitian dot product in Cn


Definition 12.2.1 (Conjugate-transpose). Let A = (αij ) ∈ Ck×` . The conjugate-transpose of A is
the ` × k matrix A∗ whose (i, j) entry is αji . The conjugate-transpose is also called the (Hermitian)
adjoint.
Fact 12.2.2. Let A, B ∈ Ck×n . Then

(A + B)∗ = A∗ + B ∗ . (12.4)

Exercise 12.2.3. Let A ∈ Ck×n and let B ∈ Cn×` . Show that (AB)∗ = B ∗ A∗ .

R
Fact 12.2.4. Let λ ∈ C and A ∈ Ck×n . Then (λA)∗ = λA∗ .
In Section 1.4, we defined the standard dot product ( Def. 1.4.1). We now define the
standard Hermitian dot product for vectors in Cn .
Definition 12.2.5 (Standard Hermitian dot product). Let v, w ∈ Cn . Then the Hermitian dot
product of v with w is
Xn

v · w := v w = αi βi (12.5)
i=1
T T
where v = (α1 , . . . , αn ) and w = (β1 , . . . , βn ) .
In particular, observe that v∗ v is real and positive for all v 6= 0. The following pair of exercises
show some of the things that would go wrong if we did not conjugate.
Exercise 12.2.6. Find a nonzero vector v ∈ Cn such that vT v = 0.
12.2. HERMITIAN DOT PRODUCT IN CN 109

Exercise 12.2.7. Let A ∈ Ck×n .

(a) Show that rk (A∗ A) = rk A.



(b) Find A ∈ M2 (C) such that rk AT A < rk(A).

The standard dot product in Rn ( R


Def. 1.4.1) carries with it the notions of norm and
orthogonality; likewise, the standard Hermitian dot product carries these notions with it.
Definition 12.2.8 (Norm). Let v ∈ Cn . The norm of v, denoted kvk, is

kvk := v·v . (12.6)

This norm is also referred to as the (complex) Euclidean norm or the `2 norm.

Fact 12.2.9. If v = (α1 , . . . , αn )T , then


v
u n
uX
kvk = t |αi |2 . (12.7)
i=1

Definition 12.2.10 (Orthogonality). The vectors v, w ∈ Cn are orthogonal (notation: x ⊥ y) if


v · w = 0.
Definition 12.2.11 (Orthogonal system). An orthogonal system in Cn is a list of (pairwise) orthogonal
nonzero vectors in Cn .

Exercise 12.2.12. Let S ⊆ Ck be an orthogonal system in Cn . Prove that S is linearly independent.

Definition 12.2.13 (Orthonormal system). An orthonormal system in Cn is a list of (pairwise)


orthogonal vectors in V , all of which have unit norm. So (v1 , v2 , . . . ) is an orthonormal system if
vi · vj = δij for all i, j.
Definition 12.2.14 (Orthonormal basis). An orthonormal basis of Cn is an orthonormal system that
is a basis of Cn .
110 CHAPTER 12. COMPLEX MATRICES

12.3 Hermitian and unitary matrices


Definition 12.3.1 (Self-adjoint matrix). The matrix A ∈ Mn (C) is said to be self-adjoint or Hermi-
tian if A∗ = A.

Exercise 12.3.2. Which diagonal matrices are Hermitian?

Theorem 12.3.3. Let A be a Hermitian matrix. Then all eigenvalues of A are real. ♦

Spectral Theorem ( R Theorem 19.4.4) given in Section 19.4 is the following lemma ( R
Exercise 12.3.4 (Alternative proof of the real Spectral Theorem). The key part of the proof of the
Lemma
19.4.7): Let A be a symmetric real matrix. Then A has an eigenvector. Derive this lemma from

R
Theorem 12.3.3.

Recall that a square matrix A is orthogonal ( Def. 9.1.1) if AT A = I. Unitary matrices are
the complex generalization of orthogonal matrices.
Definition 12.3.5 (Unitary matrix). The matrix A ∈ Mn (C) is unitary if A∗ A = I. The set of
unitary n × n matrices is denoted by U (n).

Fact 12.3.6. A ∈ Mn (C) is unitary if and only if its columns form an orthonormal basis of Cn .

Proposition 12.3.7. U (n) is a group under matrix multiplication (it is called the unitary group).

Exercise 12.3.8. Which diagonal matrices are unitary?

Theorem 12.3.9 (Third Miracle of Linear Algebra). Let A ∈ Mn (C). Then the columns of A are
orthonormal if and only if the rows of A are orthonormal. ♦

Proposition 12.3.10. Let A ∈ U (n). Then all eigenvalues of A have absolute value 1.

Exercise 12.3.11. The matrix A ∈ Mn (C) is unitary if and only if A preserves the Hermitian dot
product, i. e., for all v, w ∈ Cn , we have (Av)∗ (Aw) = v∗ w.

Exercise∗ 12.3.12. The matrix A ∈ Mn (C) is unitary if and only if A preserves the norm, i. e., for

R
all v ∈ Cn , we have kAvk = kvk.

R
Warning. The proof of this is trickier than in the real case ( Ex. 9.1.8).

Exercise 12.3.13. Find an n × n unitary circulant matrix ( Def. 2.5.12) with no zero entries.
12.4. NORMAL MATRICES AND UNITARY SIMILARITY 111

trix is the n × n Vandermonde matrix (


is a primitive n-th root of unity.
R
Definition 12.3.14 (Discrete Fourier transform matrix). The discrete Fourier transform (DFT) ma-
Def. 2.5.9) F generated by 1, ω, ω 2 , . . . , ω n−1 , where ω

Exercise 12.3.15. Let F be the n × n DFT matrix. Prove that √1 F is unitary.


n

Exercise 12.3.16. Which circulant matrices are unitary?

12.4 Normal matrices and unitary similarity


Definition 12.4.1 (Normal matrix). A matrix A ∈ Mn (C) is normal if it commutes with its
conjugate-transpose, i. e., AA∗ = A∗ A.

Exercise 12.4.2. Which diagonal matrices are normal?

Definition 12.4.3 (Unitary similarity). Let A, B ∈ Mn (C). We say that A is unitarily similar to B,
denoted A ∼u B, if there exists a unitary matrix U such that A = U −1 BU .
Note that U −1 BU = U ∗ BU because U is unitary.

Proposition 12.4.4. Let A ∼u B.

(a) If A is Hermitian then so is B.

(b) If A is unitary then so is B.

(c) If A is normal then so is B.

Proposition 12.4.5. Let A ∼u diag(λ1 , . . . , λn ). Then

(a) A is normal;

(b) if all eigenvalues of A are real then A is Hermitian;

(c) if all eigenvalues of A have unit absolute value then A is unitary.

R
(
We note that all of these implications are actually “if and only if,” as we shall demonstrate
Theorem 12.4.14).
112 CHAPTER 12. COMPLEX MATRICES

Proposition 12.4.6. A ∈ Mn (C) has an orthonormal eigenbasis if and only if A is unitarily similar
to a diagonal matrix.
We now show (Cor. 12.4.8) that this condition is equivalent to A being normal.
Lemma 12.4.7. Let A, B ∈ Mn (C) with A ∼u B. Then A is normal if and only if B is normal. ♦
Corollary 12.4.8. If A is unitarily similar to a diagonal matrix, then A is normal.
Unitary similarity is a powerful tool to study matrices, owing largely to the following theorem.
Theorem 12.4.9 (Schur). Every matrix A ∈ Mn (C) is unitarily similar to a triangular matrix.

Definition 12.4.10 (Dense subset). A subset S ⊆ Fk×n is dense in Fk×n if for every A ∈ S and every
ε > 0, there exists B ∈ S such that every entry of the matrix A − B has absolute value less than ε.
Proposition 12.4.11. Diagonalizable matrices are dense in Mn (C).
Exercise 12.4.12. Complete the proof of the Cayley-Hamilton Theorem over C (
8.5.3) using Prop. 12.4.11.
R Theorem

We now generalize the Spectral Theorem to normal matrices.


Theorem 12.4.13 (Complex Spectral Theorem). Let A ∈ Mn (C). Then A has an orthonormal
eigenbasis if and only if A is normal. ♦
Before giving the proof, let us restate the theorem.
Theorem 12.4.14 (Complex Spectral Theorem, restated). Let A ∈ Mn (C). Then A is unitarily
similar to a diagonal matrix if and only if A is normal. ♦
Exercise 12.4.15. Prove that Theorems 12.4.13 and 12.4.14 are equivalent.
Exercise 12.4.16. Infer Theorem 12.4.14 from Schur’s Theorem (Theorem 12.4.9) via the following
lemma.
Lemma 12.4.17. If a triangular matrix is normal, then it is diagonal. ♦

Theorem 12.4.18 (Real version of Schur’s Theorem). If A ∈ Mn (R) and all eigenvalues of A are
real, then A is orthogonally similar to an upper triangular matrix. ♦

real symmetric matrices ( R R


Exercise 12.4.19 (Third proof of the real Spectral Theorem). Infer the Spectral Theorem for
Theorem 10.1.1) from the complex Spectral Theorem and the fact,
separately proved, that all eigenvalues of a symmetric matrix are real ( Theorem 12.3.3).
Chapter 13

(C, R) Matrix Norms

13.1 (R) Operator norm


Definition 13.1.1 (Operator norm). The operator norm of a matrix A ∈ Rk×n is defined to be

kAxk
kAk = maxn (13.1)
x∈R kxk
x6=0

where the k · k notation on the right-hand side represents the Euclidean norm.

Proposition 13.1.2. Let A ∈ Rk×n . Then kAk exists.

Proposition 13.1.3. Let A ∈ Rk×n and λ ∈ R. Then kλAk = |λ|kAk.

Proposition 13.1.4 (Triangle inequality). Let A, B ∈ Rk×n . Then kA + Bk ≤ kAk + kBk.

Proposition 13.1.5 (Submultiplicativity). Let A ∈ Rk×n and B ∈ Rn×` . Then kABk ≤ kAk · kBk.

Exercise 13.1.6. Let A = [a1 | · · · | an ] ∈ Rk×n . (The ai are the columns of A.) Show kAk ≥ kai k
for every i.

Exercise 13.1.7. Let A = (αij ) ∈ Rk×n . Show that kAk ≥ |αij | for every i and j.

Proposition 13.1.8. If A is a orthogonal matrix then kAk = 1.

Babai: Discover Linear Algebra. 113 This chapter last updated August 29, 2016
c 2016 László Babai.
114 CHAPTER 13. MATRIX NORMS

Proposition 13.1.9. Let A ∈ Rk×n . Let S ∈ O(k) and T ∈ O(n) be orthogonal matrices. Then

kSAk = kAk = kAT k . (13.2)

Proposition 13.1.10. Let A be a symmetric real matrix with eigenvalues λ1 ≥ · · · ≥ λn . Then

R
kAk = max |λi |.
Exercise 13.1.11. Let A ∈ Rk×n . Show that AT A is positive semidefinite ( Def. 11.3.6).
k×n
√ 13.1.12. Let A ∈ R
Exercise and let AT A have eigenvalues λ1 ≥ · · · ≥ λn . Show that
kAk = λ1 .
Proposition 13.1.13. For all A ∈ Rk×n , we have kAT k = kAk.

R
Exercise 13.1.14.
(a) Find a stochastic matrix ( Def. 22.1.2) of norm greater than 1.

(b) Find an n × n stochastic matrix of norm n.

(c) Show that an n × n stochastic matrix cannot have norm greater than n.
Numerical exercise 13.1.15.
 
1 1
(a) Let A = . Calculate kAk.
0 1
 
0 1
(b) Let B = . Calculate kBk.
1 1

13.2 (R) Frobenius norm


Definition 13.2.1 (Frobenius norm). Let A = (αij ) ∈ Rk×n be a matrix. The Frobenius norm of A,
denoted kAkF , is defined as sX
kAkF = |αij |2 . (13.3)
i,j

p
Proposition 13.2.2. Let A ∈ Rk×n . Then kAkF = Tr(AT A).
13.3. COMPLEX MATRICES 115

Proposition 13.2.3. Let A ∈ Rk×n and λ ∈ R. Then kλAkF = |λ|kAkF .

Proposition 13.2.4 (Triangle inequality). Let A, B ∈ Rk×n . Then kA + BkF ≤ kAkF = kBkF .

Proposition 13.2.5 (Submultiplicativity). Let A ∈ Rk×n and B ∈ Rn×` . Then kABkF ≤ kAkF ·
kBkF .

Proposition 13.2.6. If A ∈ O(k) then kAkF = k.

Proposition 13.2.7. Let A ∈ Rk×n . Let S ∈ O(k) and T ∈ O(n) be orthogonal matrices. Then

kSAkF = kAkF = kAT kF . (13.4)

Proposition 13.2.8. For all A ∈ Rk×n , we have kAT kF = kAkF .

Proposition 13.2.9. Let A ∈ Rk×n . Then



kAk ≤ kAkF ≤ nkAk . (13.5)

R
Exercise 13.2.10. Prove kAk = kAkF if and only if rk A = 1. Use the Singular Value Decompo-
sition ( Theorem 21.1.2).

R

Exercise 13.2.11. Let A be a symmetric real matrix. Show that kAkF = nkAk if and only if
A = λR for some reflection matrix ( ??) R.

13.3 (C) Complex Matrices


Exercise 13.3.1. Generalize the definition of the operator norm and statements 13.1.2-13.1.12 to
C.

Exercise 13.3.2. Generalize the definition of the Frobenius norm and statements 13.2.2-13.2.10 to
C.

Exercise 13.3.3. Let A be a normal matrix with eigenvalues λ1 , . . . , λn . Show that kAkF = nkAk
if and only if |λ1 | = · · · = |λn |.

Question 13.3.4. Is normality necessary?


Part II

Linear Algebra of Vector Spaces

116
Introduction to Part II

TO BE WRITTEN.

Babai: Discover Linear Algebra. 117 This chapter last updated August 21, 2016
c 2016 László Babai.
Chapter 14

Algebra

14.1 Basic concepts of arithmetic


14.1.1 Arithmetic of sets of integers
Notation 14.1.1. The set of integers, {. . . , −2, −1, 0, 1, 2, . . . }, is denoted by Z.
Definition 14.1.2 (Shift). Let a ∈ Z and B ⊆ Z. Then we write
a + B = {a + b | b ∈ B} . (14.1)
We say that a + B is the set B shifted by a.
Definition 14.1.3 (Dilation). Let a ∈ Z and B ⊆ Z. Then we write
aB = {ab | b ∈ B} . (14.2)
We say that aB is the set B dilated by a factor of a.
Definition 14.1.4 (Sumsets). Let A, B ⊆ Z. Then A + B is the set
A + B = {a + b | a ∈ A, b ∈ B} . (14.3)
S
Observe that A + B = a∈A (a + B).
Definition 14.1.5 (Cartesian product). Let A and B be sets. Then the Cartesian product of A and
B is the set A × B defined by
A × B = {(a, b) | a ∈ A, b ∈ B} . (14.4)

Babai: Discover Linear Algebra. 118 This chapter last updated March 29, 2024
c 2016 László Babai.
14.1. BASIC CONCEPTS OF ARITHMETIC 119

Notice that (a, b) in the above definition is an ordered pair. In particular, if a1 , a2 ∈ A and
a1 6= a2 then (a1 , a2 ) and (a2 , a1 ) are distinct elements of A × A.
Notation 14.1.6 (Cardinality). For a set A we denote the cardinality of A (the number of elements
of A) by |A|.
For instance, |{4, 5, 4, 6, 6, 4}| = 3.
This following fact is not a theorem, but rather a definition, namely, the definition of multipli-
cation of non-negative integers.

Fact 14.1.7. |A × B| = |A| · |B| .

Exercise 14.1.8. Let A and B be finite sets of integers of respective sizes |A| = n and |B| = m.
Show that

(a) |A + B| ≤ mn

(b) if mn 6= 0 then |A + B| ≥ m + n − 1

(c) |A + A| ≤ n+1

2

(d) |A + A + A| ≤ n+2

3

(e) Generalize the last inequality to k terms.

Prove that each of these inequalities is tight for all values of n, m, and k, i. e., for all m, n, k there
exist sets A, B for which equality holds.

The following result plays a crucial role in arithmetic.

Theorem 14.1.9 (Division Theorem). Let a, b ∈ Z, b 6= 0. Then there exist q, r ∈ Z such that

a = qb + r (14.5)

and 0 ≤ r < |b|.


Moreover, the integers q and r are uniquely determined by a and b.

Exercise 14.1.10. Fix the value b. Prove the Division Theorem for every a ≥ 0 by induction on
a. Then reduce the case of negative a to the case of positive a.
120 CHAPTER 14. ALGEBRA

Definition 14.1.11. Let M ⊆ Z. We say that M is a subgroup of Z if M = 6 ∅ and M is closed


under subtraction, i. e., (∀a, b ∈ M )(a − b ∈ M ). We use the notation M ≤ Z to indicate that M
is a subgroup of Z.
A more precise term would be “subgroup of the additive group of Z” (see Sec. 14.3).

Exercise 14.1.12. Verify: for every k ∈ Z, the set kZ is a subgroup of Z.

Definition 14.1.13. Subgroups of the form kZ are called cyclic subgroups.


The converse of Ex. 14.1.12 is a important result which we shall later use to establish the
existence of gcd’s and lcm’s.

Theorem 14.1.14 (subgroups are cyclic). Every subgroup of Z is cyclic.

The proof will follow from the following sequence of observations.

Exercise 14.1.15. Let M ≤ Z.


(a) 0 ∈ M .
(b) If a ∈ M then −a ∈ M . In particular, if M 6= {0} then M contains a positive integer.
(c) M is closed under addition, i. e., (∀a, b ∈ M )(a + b ∈ M ).
(d) If k ∈ M then kZ ⊆ M .
(e) If a, b ∈ M and a = qb + r for some q, r ∈ Z then r ∈ M .
(f) Assume M 6= {0} and let k be the smallest positive integer in M . (See item (b).) Then M = kZ.

14.1.2 Divisibility
Divisibility is the central concept of arithmetic.
Definition 14.1.16 (Divisibility). Let a, b ∈ Z. We say that a divides b (notation: a | b) if there
exists x ∈ Z such that b = ax. The same circumstance is also expressed by the phrases “a is a
divisor of b” and “b is a multiple of a.”
In the following sequence of exercises, we are building on basic identities of arithmetic (commu-
tativity, associativity, distributivity).

Exercise 14.1.17. Show that divisibility is


14.1. BASIC CONCEPTS OF ARITHMETIC 121

(a) reflexive, i. e., for all a ∈ Z we have a | a

(b) transitive, i. e., for all a, b, c ∈ Z, if a | b and b | c then a | c .

For item (b), point out, what basic arithmetic identity you are using.

Remark 14.1.18. Note that in particular, 0 | 0. Why does this not violate the rock-hard prohibition
against division by zero?

Exercise 14.1.19. Show that for all a ∈ Z,

(a) a | −a

(b) a | 0

(c) 1 | a and −1 | a

Exercise 14.1.20. Show that

(a) if a | b and b | a then a = ±b

(b) if for a fixed value a, the relation x | a holds for all x ∈ Z then a = 0

(c) if for a fixed value a, the relation a | y holds for all y ∈ Z then a = ±1

Proposition 14.1.21. Let a, b, c ∈ Z. If c | a and c | b then c | a + b and c | a − b.


State, what arithmetic idenity you are using in the proof.

14.1.3 Greatest common divisor


This central concept is often misrepresented in textbooks.
Definition 14.1.22. Let S ⊆ Z be a set of integers and d ∈ S. We say that d is a common divisor
of S if (∀a ∈ S)(d | a).
We write Div(S) to denote the set of common divisors of S. If S is explicitly listed as S =
{a1 , a2 , . . . } then we write Div(a1 , a2 , . . . ) for Div({a1 , a2 , . . . }) (we omit the braces).

Exercise 14.1.23. Verify the following examples.

• Div(−6, 15, 27) = {±1, ±3},


122 CHAPTER 14. ALGEBRA

• Div(−4) = {±1, ±2, ±4},

• Div(7Z) = {±1, ±7}.

Exercise 14.1.24. (a) Show that Div(S) is never empty.

(b) Div(Z) = {−1, 1}

(c) Determine Div(∅).

(d) Find all sets S ⊆ Z such that Div(S) = Z.


(Hint: there are two such sets.)

(e) If S ⊆ Z then Div(−S) = Div(S).

(f) Prove: if A, B ⊆ Z then Div(A + B) ⊇ Div(A) ∩ Div(B).

Definition 14.1.25. Let S ⊆ Z be a set of integers and e ∈ S. We say that e is a common multiple
of S if (∀a ∈ S)(a | e).
We write Mult(S) to denote the set of common multiples of S. If S is explicitly listed as S =
{a1 , a2 , . . . } then we write Mult(a1 , a2 , . . . ) for Mult({a1 , a2 , . . . }) (we omit the braces).

Exercise 14.1.26. Verify the following examples.

• Mult(−6, 15, 27) = 270Z,

• Mult(−4) = 4Z,

• Mult(7Z) = 7Z.

Exercise 14.1.27. In these exercises, S ⊂ Z.

(a) Show that Mult(S) is never empty.

(b) Determine Mult(∅).

(c) If Mult(S) is a finite set then Mult(S) = {0}.

(d) Characterize those subsets S ⊆ Z for which Mult(S) = {0}.


14.1. BASIC CONCEPTS OF ARITHMETIC 123

(e) Find all sets S ⊆ Z such that Mult(S) = Z.


(Hint. There are 4 such sets.)
Definition 14.1.28. Let S ⊆ Z. We say that d ∈ Z is a greatest common divisor of the S if
(a) d is a common divisor of S, and

(b) d is a common multiple of all common divisors of S.


We shall abbreviate the term “greatest common divisor” as “gr.c.div.”
Exercise 14.1.29. Let S ⊆ Z and d ∈ Z. The integer d is a gr.c.div. of S if and only if
Div(S) = Div(d).
Note that we used the indefinite article “a” before “gr.c.div” in Def. 14.1.54. The following
exercise explains, why.
Exercise 14.1.30. Let S ⊆ Z. If d ∈ Z is a gr.c.div. of S then the greatest common divisors of
S are d and −d. In particular, while usually there is no unique gr.c.div., the absolute value of the
greatest common divisors is unique.
This motivates the gcd notation.
Notation 14.1.31. If d is a gr.c.div. of the set S ⊆ Z then we write gcd(S) = |d|.
Exercise 14.1.32. For all a ∈ Z we have gcd(a, a) = gcd(a, 0) = |a| and gcd(a, 1) = 1.
Given that we defined the gr.c.divisors by a wish list, it is not evident that a gr.c.div. always
exists. We shall prove this fundamental fact further down. For now, in order to establish certain
properties of the gcd, we need to assume the existence of a gr.c.div.
Exercise 14.1.33. Assume the set S ⊆ Z has a gr.c.div.
(a) gcd(S) ≥ 0.

(b) gcd(S) = gcd(−S).


, . . . , an ∈ Z. An integer linear combination of the ai is an integer
Definition 14.1.34. Let a1P
that can be expressed as ki=1 xi ai for some xi ∈ Z.
Exercise 14.1.35. The set of integer linear combinations of a1 , . . . , an ∈ Z is the set a1 Z+· · ·+an Z.
124 CHAPTER 14. ALGEBRA

Exercise 14.1.36 (elimination of repetitions). Let a1 , . . . , an ∈ Z. Some items on this list may be
repeated; let S be a maximal repetition-free sublist (i. e., every element of the list appears exactly
once in S). The set of integer linear combinations of the ai is the same as the set of integer linear
combinations of S.
Since the order of the ai does not matter, we can view S as a set, rather than an (ordered) list of
elements.
Definition 14.1.37. If S is an infinite subset of Z then by the integer linear combinations of S we
mean the integer linear combinations of the finite subsets of S.
Exercise 14.1.38. 0 is an integer linear combination of any set of integers (even of the empty set,
because an empty sum of zero by definition).
Theorem 14.1.39 (Existence of gcd and Bezout’s Lemma). Let S ⊆ Z. Then a gr.c.div. of S
exists and can be written as an integer linear combination of S.
The second part of this statement (about integer linear combinations) is usually called Bezout’s
Lemma.
Examples: −6 is a gr.c.div. of 12 and 90. It can be written as −6 = (−8) · 12 + 1 · 90. The gcd
of 21, 30, and 35 is 1. It can be written as 1 = 21 − 3 · 30 + 2 · 35.
The proof is immediate from Theorem 14.1.14 and the following two observations.
Exercise 14.1.40. Let S ⊆ Z. Then the set of integer linear combinations of S is a subgroup of Z.
Exercise 14.1.41. Let S ⊆ Z. Assume the set of integer linear combinations of S is the cyclic
subgroup dZ. Then d is a gr.c.div. of S.
Note that this completes the proof of Theorem 14.1.39.
The following lemma, a critical step towards a proof of the Fundamental Theorem of Arithmetic
(uniqueness of prime factorization) is an immediate consequence of Bezout’s Lemma.
Exercise 14.1.42. Let S ⊆ Z and k ∈ Z. Then gcd(kS) = |k| gcd(S).
Definition 14.1.43. We say that the integers a, b are relatively prime if gcd(a, b) = 1.
Exercise 14.1.44. Let a, b, c ∈ Z. If c | ab and b and c are relatively prime then c | a.
The trick is, we need to prove this without using the Fundamental Theorem of Arithmetic: this
exercise will be key to proving the FTA.
Hint. Multiply the equation gcd(b, c) = 1 by a and use Ex. 14.1.42.
14.1. BASIC CONCEPTS OF ARITHMETIC 125

14.1.4 Fundamental Theorem of Arithmetic


Definition 14.1.45. An integer p is a prime number if p ≥ 2 and Div(p) = {±1, ±p}.

Exercise 14.1.46. Show that p ∈ Z is a prime number if and only if p ≥ 0 and | Div(p)| = 4.

Definition 14.1.47. An integer n is a composite number if n ≥ 2 and n is not a prime number.


So the prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, . . . . There are 25 prime number p ≤ 100.

Exercise 14.1.48. Every positive integer n can be written as a product of prime numbers.

Hint. Induction on n. The base case: n = 1 which is the product of the empty list of primes.
(The product of an empty list of numbers is 1 by definition.)
The FTA states the uniqueness of this factorization.

Theorem 14.1.49 (Fundamental Theorem of Arithmetic). Prime factorization isQunique upQto the
order of the factors. More precisely, let n be a positive integer and assume n = ki=1 pi = `j=1 qj
where the pi and the qj are prime numbers. Then k = ` and there is a permutation σ : [k] → [k]
such that (∀i ∈ [k])(qi = pσ(i) ).

This result appears in Euclid’s Elements (cca. 300 BCE). The proof we give here is essentially
Euclid’s, in modern language. We have already made the bulk of preparations for the proof.
Definition 14.1.50. Let n ∈ Z. We say that n has the prime property if n ∈
/ {±1} and

(∀a, b ∈ Z)(n | ab ⇒ [n | a or n | b]) . (14.6)

Exercise 14.1.51. (a) Composite numbers and their negatives do not have the prime property.

(b) Does 0 have the prime property?

Exercise 14.1.52 (Euclid’s Lemma). All prime numbers have the prime property.

This result is the key lemma toward proving the FTA. It follows immediately from Ex. 14.1.44.

Exercise 14.1.53. Prove the FTA by induction on n using Euclid’s Lemma.


126 CHAPTER 14. ALGEBRA

14.1.5 Least common multiple


We define least common multiple by switching the sides of the divisibility relations involved in the
definition of greatest common divisors.
Definition 14.1.54. Let S ⊆ Z. We say that e ∈ Z is a least common multiple of the set S if
(a) e is a common multiple of S, and
(b) d is a common divisor of all common multiples of S.
We shall abbreviate the term “least common multiple” as “l.c.mult.”
Exercise 14.1.55. Let S ⊆ Z and e ∈ Z. The integer e is a l.c.mult. of S if and only if
Mult(S) = Mult(d). (Note: Mult(d) = dZ.)
Exercise 14.1.56. Let S ⊆ Z. If e ∈ Z is a gr.c.div. of S then the least common multiples of S
are e and −e. In particular, while usually there is no unique l.c.mult., the absolute value of the
least common multiples is unique.
This motivates the lcm notation.
Notation 14.1.57. If e is a l.c.mult. of the set S ⊆ Z then we write gcd(S) = |e|.
Exercise 14.1.58. For all a ∈ Z we have lcm(a, a) = lcm(a, 1) = |a| and lcm(a, 0) = 0.
Having defined the leat common multiples by a wish list, we need to show they exist. This will
be immediate from the following observation.
Exercise 14.1.59. The intersection of subgroups is a subgroup.
This is true regardless of how many subgroups we intersect. This is true even if that number is
zero; the intersection of the empty set of subgroups is Z.
An immediate corollary of this exercise and Theorem 14.1.14 is the following.
Exercise 14.1.60. Let S ⊆ Z. The set of common multiples of S is a cyclic subgroup of Z, say
eZ. Then e is a l.c.mult. of S.
This establishes the existence of lcm’s.
The following observation yields an alternative proof.
Exercise 14.1.61. Let S ⊆ Z and let M denote the set of common multiples of S. Then e = lcm(S)
if and only if e = gcd(M ).
* * *
14.2. MODULAR ARITHMETIC 127

Miscellaneous exercises

Exercise 14.1.62. Let a, b ∈ Z. Then |ab| = gcd(a, b) · lcm(a, b).


Use the FTA.

Exercise 14.1.63. Let a1 , . . . , an ∈ Z.

(a) Show that a1 Z + · · · + an Z = kZ for some k ∈ Z. Determine k.

(b) Show that a1 Z ∩ · · · ∩ an Z = `Z for some ` ∈ Z. Determine `.

Exercise 14.1.64. Let S = {p2 − 1 | p ≥ 100, p is a prime number}. Determine Div(S).

14.2 Modular arithmetic


14.2.1 Congruences
Definition 14.2.1 (Congruence modulo m). Let a, b, m ∈ Z. We say that a is congruent to b modulo
m (written a ≡ b (mod m)) if m | a − b.

Exercise 14.2.2. Let a, b ∈ Z. When is it the case that

(a) a ≡ b (mod 1)

(b) a ≡ b (mod 2)

(c) a ≡ b (mod 0)

Exercise 14.2.3 (Additivity and multiplicativity of congruences). Let a, b, c, d, m ∈ Z. Show that


if a ≡ c (mod m) and b ≡ d (mod m), then

(a) a + b ≡ c + d (mod m)

(b) a − b ≡ c − d (mod m)

(c) ax ≡ cx (mod m) for every x ∈ Z

(d) ab ≡ cd (mod m)
128 CHAPTER 14. ALGEBRA

For each item, find an elegant proof based on what we have already learned about divisibility and
congruence; point out, what you are using.

For item (c), use the transitivity of divisibility.


For an elegant proof of item (d), use item (c) and the transitivity of congruence.

Exercise 14.2.4. Let k ≥ 0. Show that if a ≡ b (mod m), then ak ≡ bk (mod m).

Proceed by induction on k, using the multiplication rule for conguences (Ex. 14.2.3, item (d)).
Definition 14.2.5. Let a, x, m ∈ Z. We say that x is a multiplicative inverse of a modulo m if ax ≡ 1
(mod m).
For example, 13 is a multiplicative inverse of −8 modulo 21 because (−8) · 13 = −104 ≡ 1
(mod 21).

Exercise 14.2.6. Let a, m ∈ Z. Then a has a multiplicative inverse modulo m if and only if a and
m are relatively prime, i. e., gcd(a, m) = 1. In particular, if p is a prime then every integer that is
not divisible by p has a multiplicative inverse modulo p.

Exercise 14.2.7 (Cancellation law). Let a, b, c, m ∈ Z. If ab ≡ ac (mod m) and gcd(a, m) = 1


then c ≡ d (mod m).

Exercise 14.2.8 (Multiplicative inverse unique modulo m). Let x be a multiplicative inverse of
a modulo m. Then an integer y is a multiplicative inverse of a modulo m if and only if x ≡ y
(mod m).

Exercise 14.2.9. Prove: if a ≡ b (mod m) then gcd(a, m) = gcd(b, m).

Remark 14.2.10. Note that day x and day y of the month fall on the same day of the week exactly
if x ≡ y (mod 7). For instance, If August 3 is a Wednesday, then August 24 is also a Wednesday,
because 3 ≡ 24 (mod 7). For this reason, when modular arithmetic is taught to kids, it is sometimes
referred to as “calendar arithmetic.”

14.2.2 Equivalence relations, residue classes


Definition 14.2.11 (Binary relation). A binary relation on the set A is a subset R ⊆ A × A. The
relation R holds for the elements a, b ∈ A if (a, b) ∈ R. In this case, we write aRb or R(a, b).
14.2. MODULAR ARITHMETIC 129

Definition 14.2.12 (Equivalence relation). Let ∼ be a binary relation on a set A. The relation ∼ is
said to be an equivalence relation if the following conditions hold for all a, b, c ∈ A.
(a) a ∼ a (reflexivity)
(b) If a ∼ b then b ∼ a (symmetry)
(c) If a ∼ b and b ∼ c, then a ∼ c (transitivity)
Proposition 14.2.13. For any fixed m ∈ Z, “congruence modulo m” is an equivalence relation.
Definition 14.2.14 (Equivalence classes). Let A be a set with an equivalence relation ∼, and let
a ∈ A. The equivalence class of a with respect to ∼, denoted [a], is the set

[a] = {b ∈ A | a ∼ b} (14.7)

Note that the equivalence classes are not empty since a ∈ [a].
Theorem 14.2.15 (Fundamental Theorem of Equivalence Relations). Let ∼ be an equivalence
relation on the set A. The equivalence classes partition A, i. e.,
(i) if [a] 6= [b] then [a] ∩ [b] = ∅
S
(ii) a∈A [a] = A.

Exercise 14.2.16. Let ∼ be an equivalence relation on A. Then for all a, b ∈ A, the following are
equivalent:
(a) a ∼ b
(b) a ∈ [b]
(c) [a] = [b]
(d) [a] ∩ [b] 6= ∅
Definition 14.2.17. Let ∼ be an equivalence relation on the set A. Any element of an equivalence
class R is a representative of R. A set T ⊆ A is a complete set of representatives of the equivalence
classes if T includes exactly one element from each equivalence class. In particular, |T | is the number
of equivalence classes.
130 CHAPTER 14. ALGEBRA

Definition 14.2.18. The equivalence classes of the equivalence relation “congruence modulo m” in
Z are called residue classes modulo m or modulo m residue classes.
Exercise 14.2.19. Show that the residue classes modulo m are precisely the sets a + mZ (a ∈ Z)
(see Definitions 14.1.2 and 14.1.3).
Exercise 14.2.20. Given m ∈ Z, how many modulo m residue classes are there? Do not forget
the case m = 0.
Next we define addition and multiplication of residue classes. We use the notation [a]m := a+mZ.

Definition 14.2.21 (Sum of residue classes). Let [a]m and [b]m be residue classes modulo m. We
define their sum as
[a]m + [b]m = [a + b]m (14.8)
Definition 14.2.22 (Product of residue classes). Let [a]m and [b]m be residue classes modulo m. We
define their product as
[a]m · [b]m = [a · b]m (14.9)
Exercise 14.2.23. Show that the sum and the product of residue classes are well defined, that is,
that they do not depend on our choice of representative for each residue class. In other words, if
[a]m = [a0 ]m and [b]m = [b0 ]m then [a + b]m = [a0 + b0 ]m and [ab]m = [a0 b0 ]m .
These equations are immediate consequences of certain facts we have learned about congruences.
Point out, which ones.
Definition 14.2.24. A complete set of representatives of the residue classes modulo m is called a
complete set of residues modulo m.
Exercise 14.2.25. Let m ≥ 1. Then the set {0, 1, . . . , m−1} is a complete set of residues modulo m.
It is called the set of least non-negative residues modulo m.
Exercise 14.2.26. Let T be a complete set of residues modulo m and let c ∈ Z be relatively prime
to m. Then the dilation cT is again a complete set of residues modulo m.
Next we observe that Ex. 14.2.9 allows us to speak of the gcd of m and a residue class modulo m.

Definition 14.2.27. We say that gcd([a]m , m) = d if d = gcd(a, m). In particular, we say that [a]m
and m are relatively prime if gcd(a, m) = 1.
14.2. MODULAR ARITHMETIC 131

Remark 14.2.28. This definition is sound, i. e., the gcd we defined does not depend on the specific
choice of the representative of the residue class [a]m . This is the content of Ex. 14.2.9.
Definition 14.2.29. A reduced set of residues modulo m is a set of representatives of those residue
classes that are relatively prime to m.
Definition 14.2.30 (Euler’s ϕ function). For a positive integer m let ϕ(m) denote the number of
x ∈ [m] that are relatively prime to m.

Exercise 14.2.31. (a) ϕ(1) = 1

(b) If p is a prime number then ϕ(p) = p − 1.

(c) If p is a prime number and k ≥ 1 then ϕ(pk ) = pk − pk−1 .

(d) If gcd(a, b) = 1 then ϕ(ab) = ϕ(a) · ϕ(b).

(e) For all positive integers m


Y 1

ϕ(m) = m · 1− , (14.10)
p
p|m

where the product extends over all prime divisors of m.

Exercise 14.2.32. If R is a reduced set of residues modulo m then |R| = ϕ(m).

Exercise 14.2.33. If R is a reduced set of residues modulo m and gcd(c, m) = 1 then cR is also a
reduced set of residues modulo m.

Exercise
Q 14.2.34.
Q If R and S are reduced sets of residues modulo m then
x∈R x ≡ y∈S y (mod m).

Theorem 14.2.35 (Euler–Fermat congruence). Let a, m ∈ Z and m ≥ 1. If a and m are relatively


prime then aϕ(m) ≡ 1 (mod m).
In particular, if p is a prime number and p - a then ap−1 ≡ 1 (mod p).

Proof. Let R be a reduced set of residues modulo m. Then aR is also a reduced set of residues
modulo m. Let P denote the product of the elements of R and Q the product of the elements of
S. Then Q = aϕ(m) P . On the other hand, by Ex. 14.2.34, P ≡ Q (mod m), i. e., P ≡ aϕ(m) · P
(mod m). Now an application of the Cancellation law (Ex. 14.2.7) yields the desired conclusion.
132 CHAPTER 14. ALGEBRA

14.3 Groups
Definition 14.3.1. A binary operation on the set A is a function A × A → A. If the operation is
denoted by “◦” then we have a map (a, b) 7→ a ◦ b (a, b ∈ A).
Examples of binary operations include addition and multiplication in Z, Q, R, C, Zm .
Definition 14.3.2 (Group). A group is a set G along with a binary operation ◦ that satisfies the
following axioms.

(a) (binary operation) For all a, b ∈ G, there exists a unique element a ◦ b ∈ G

(b) (associativity) For all a, b, c ∈ G, (a ◦ b) ◦ c = a ◦ (b ◦ c)

(c) (neutral element) There exists a neutral element e ∈ G such that for all a ∈ G, e◦a = a◦e = a

(d) (inverses) For each a ∈ G, there exists b ∈ G such that a ◦ b = b ◦ a = e. The element b is
called the inverse of a

The first of these axioms is redundant; it just declares that ◦ is a binary operation.
Remark 14.3.3. Axiom (c) ensures that if (G, ◦) is a group then G cannot be the empty set.

Exercise 14.3.4. Prove that the neutral element is the unique element of the group that satisfies
the equation x ◦ x = x.

Exercise 14.3.5. We say that c ∈ G is a left inverse of a ∈ G if ca = e. Right inverses are defined
analogously. A two-sided inverse of a is an element that is both a left inverse and a right inverse.
Prove that every element of G has a unique left inverse, a unique right inverse, and these are equal;
in particular, every element has a unique two-sided inverse, to which we refer as the inverse.

Strictly speaking, G denotes a set, and this set, along with the binary operation ◦, constitutes
the group (G, ◦). However, we often omit ◦ and refer to G as the group when the binary operation
is clear from context.
Groups satisfying the additional axiom

(e) a ◦ b = b ◦ a for all a, b ∈ G (commutativity)

are called commutative groups or abelian groups.


14.3. GROUPS 133

Convention 14.3.6. There are two common notational conventions for groups: additive and mul-
tiplicative. The additive notation is only used for abelian groups.
In additive notation, the operation is written as (a, b) 7→ a + b and the neutral element is called
zero, denoted 0G , or simply 0 if the group is clear from the context. The additive inverse of a ∈ G,
called the negative of a, is denoted −a, and the element a + (−b) is denoted a − b. We call an
abelian group an additive group if we use the additive notation.
In multiplicative notation, the operation is written as (a, b) 7→ a · b or a × b or simply ab.
The neutral element is called the identity, denoted 1G , or simply 1 if the group is clear from the
context. The multiplicative inverse of a ∈ G is denoted a−1 . If the group is abelian, we also call the
multiplicative inverse the reciprocal and sometimes denote it 1/a, and we also denote the element
a · b−1 as the quotient a/b.
Definition 14.3.7 (Order of a group). Let G be a group. The order of G is its cardinality, |G|.
Exercise 14.3.8. The following are examples of abelian groups: (Z, +), (Q, +), (R, +), (C, +).
Exercise 14.3.9. (Zm , +) is an abelian group of order m.
The additive groups listed in the preceding two exercises do not form groups with respect to
multiplication because 0 has no multiplicative inverse.
Definition 14.3.10 (Semigroup, monoid). A semigroup (S, ◦) is a set with an associative binary
operation (axioms (a) and (b) in Def. 14.3.2). A monoid is a semigroup with a neutral element
(axiom (c)). An element a ∈ S of the monoid S is invertible if it satisfies axiom (d).
Exercise 14.3.11. The following sets are examples of (commutative) monoids with respect to
multiplication: Q, R, C, Zm (m ≥ 1).
Notation 14.3.12. Let (S, ×) be a monoid, where the operation is written as multiplication. We
write S × to denote the set of invertible elements of S.
Exercise 14.3.13. If (S, ×) is a monoid then (S × , ×) is a group.
Exercise 14.3.14. Z× = {1, −1}. This set is a multiplicative abelian group of order 2.
Exercise 14.3.15. Q× := Q \ {0}, R× := R \ {0}, C× := C \ {0}. These sets are multiplicative
abelian groups.

m. (See R
Exercise 14.3.16. Z×

multiplication.
m is the set of those residue classes modulo m that are relatively prime to
Def. 14.2.27 and the subsequent remark.) Z× m is an abelian group with respect to
134 CHAPTER 14. ALGEBRA

Examples 14.3.17. Show that the following are abelian groups.


(a) The space RR of real functions f : R → R with respect to addition, (RR , +), where addition
is defined pointwise, that is, (f + g)(ζ) = f (ζ) + g(ζ) for all ζ ∈ R.
(b) The complex n-th roots of unity, that is, the set
{ζ ∈ C | ζ n = 1}
with multiplication. This is a finite abelian group of order n.
k
X 1
X
Definition 14.3.18. We define the sum ai by induction. For the base case of k = 1, ai = ai ,
i=1 i=1
and for k > 1,
k
X k−1
X
ai = ak + ai . (14.11)
i=1 i=1

Convention 14.3.19 (Empty sum). Let (G, +) be an abelian group. The empty sum in (G, +) is
defined to be equal to 0, that is,
X X0
a= ai = 0 . (14.12)
a∈∅ i=1

Exercise 14.3.20. Let (G, +) be an abelian group. Let I be a finite index set and ai ∈ G for i ∈ I.
˙ where the dot indicates that A and B are disjoint: A ∩ B = ∅. Then
Let I = A∪B
X X X
a= a+ a. (14.13)
a∈I a∈A a∈B

Note that the validity of this identity depends on the Convention about empty sums, and our
convention is the only possible way to maintain this identity.
Convention 14.3.21 (Empty product). Let (S, ×) be a commutative monoid. The empty product
in (S, ×) is defined to be equal to 1, that is,
Y 0
Y
a= ai = 1 . (14.14)
a∈∅ i=1

In particular, in Z we have 00 = 1 and 0! = 1.


14.3. GROUPS 135

Exercise 14.3.22. State and prove the multiplicative analogue of Ex. 14.3.20. Note that it is
sufficient to talk about monoids, the structure does not need to be a group.
Definition 14.3.23 (Sumset). We define shift and sumsets in an abelian group exactly as we did in
Z (Definitions 14.1.2 and 14.1.4).
Exercise 14.3.24. (i) Prove that the inequalities listed in Ex. 14.1.8 hold in any additive abelian
group.

(ii) Find an infinite abelian group in which inequality (c) in Ex. 14.1.8 is not tight.
Definition 14.3.25 (Subgroup). Let G be a group. If H ⊆ G is a group under the same operation
as G, we say that H is a subgroup of G, denoted H ≤ G.
Proposition 14.3.26. The relation ≤ is transitive, that is, if K ≤ H and H ≤ G, then K ≤ G.
Proposition 14.3.27. The intersection of any collection of subgroups of a group G is itself a
subgroup of G.
Proposition 14.3.28. Let G be a group and let H, K ∈ G. Then H ∪ K ≤ G if and only if H ⊆ K
or K ⊆ H.
Proposition 14.3.29. Let G be a group and let H ≤ G. Then
(a) The identity of H is the same as the identity of G.
(b) Let a ∈ H. The inverse of a in H is the same as the inverse of a in G.
Notation 14.3.30. Let G be an additive abelian group, and let H, K ⊆ G. We write −H for the set
−H = {−h | h ∈ H}, and H − K for the set
H − K = H + (−K) = {h − k | h ∈ H, k ∈ K} (14.15)
Proposition 14.3.31. Let G be a group and H ⊆ G. Then H ≤ G if and only if
(a) 0 ∈ H
(b) −H ⊆ H (closed under inverses)
(c) H + H ⊆ H (closed under addition)
Proposition 14.3.32. Let G be a group and let H ⊆ G. Then H ≤ G if and only if H 6= ∅ and
H − H ⊆ H (that is, H is closed under subtraction).
136 CHAPTER 14. ALGEBRA

14.4 Fields
Definition 14.4.1. A field is a set F with two binary operations, addition and multiplication, such
that
(a) (F, +) is an abelian group with zero element 0F (also simply denoted as 0 if the field F is clear
from the context);

(b) (F× , ×) is an abelian group, where F× = F \ {0};

(c) For all a, b, c ∈ F we have a(b + c) = ab + ac (distributivity)


Examples of fields include Q, R, C, Fp := Zp (the modulo p residue classes, where p is a prime
number), the function field F(t) whenever F is a field.
We denote the multiplicative identity of the field F by 1F or simply by 1 if the field in question
is clear from the context.
The order of a field is its cardinality (number of elements).
Exercise 14.4.2. In a field, 0 6= 1. In particular, every field has order ≥ 2.
Exercise 14.4.3. In a field, ab = 0 if and only if a = 0 or b = 0. In particular, if Zm is a field then
m is a prime number.
The field Fp has prime order. These are not the only finite fields.
Fields, and abstract algebra, were invented by Évariste Galois (1811-1832).
Theorem 14.4.4 (Galois). A finite field of order m exists if and only if m is a prime power.
Moreover, if q is a prime power then there is exactly one field of order q, up to isomorphism. We
denote this field by Fq . Another common notation for this field is GF(q), for “Galois field of order
q.” ♦
Exercise 14.4.5. If q is a proper prime power (not a prime number) then Zq is not isomorphic to
Fq .
Definition 14.4.6. Let (G, +) be an additive abelian group. Let k ∈ Z and a ∈ G. For k ≥ 0 we
write k · a to denote the sum a + a + · · · + a where the sum has k terms. (This includes 0 · a = 0G .)
For k < 0 we write k · a = (−k) · (−a).
Exercise 14.4.7. Let (G, +) be an additive abelian group. Let k, ` ∈ Z and a, b ∈ G. Then
14.4. FIELDS 137

(a) (k + `) · a = k · a + ` · a

(b) k · (a + b) = k · a + k · b

Definition 14.4.8. The characteristic of a field F is the smallest positive integer k such that k·1F = 0F .
If no such k exists, we say that F has characteristic 0. We denote the characteristic of F by char(F).
We say that F has finite characteristic if char(F) 6= 0.
Remark 14.4.9. Note that “infinite” and “zero” are treated as synonyms in this context, and in
many other contexts in number theory, for a good reason. Indeed, 0 is on the top of the divisibility
hierarchy of integers (0 is divisible by all integers). This fact explains the term “finite characteristic,”
see Ex. 14.4.12.

Exercise 14.4.10. If the characteristic of F is k then k · a = 0 for every a ∈ F.

Exercise 14.4.11. If the characteristic of a field is not zero then it is a prime number.

Exercise 14.4.12. Let F be a field. Let KF = {k ∈ Z | k · 1F = 0F }. Then char(F) = gcd(KF ).

The fields Q, R, C have characteristic zero.

Theorem 14.4.13 (Galois). All finite fields have finite characteristic. If the characteristic of a
finite field F is the prime number p then the order of F is a power of p. ♦

This result has a simple linear algebra reason, as we shall see later. *** This will prove the easy
part of Galois’s Theorem 14.4.4: all finite fields have prime power order. It will not prove either
the existence or the uniqueness of those fields.
Definition 14.4.14. Consider the formal quotients of formal univariate polynomials (polynomials in
one variable) over the field F, where the denominator is not the zero polynomial. (Note that even
over finite fields, there are infinitely many formal polynomials, see Sec. 14.5.) Let us say that two
such quotients, f1 /g1 and f2 /g2 are equivalent if f1 g2 = f2 g1 . The equivalence classes under the
natural operations form a field called the function field over F and is denoted by F(t), where t is
the name of the variable.
This construction is analogous to the way we build Q from Z.
Definition 14.4.15 (Subfield). Let F, G be fields. We say that F is a subfield of G if F ⊆ G and F
the operations in F are the same as those in G, restricted to F. In this case we also say that G is
an extension of F.
138 CHAPTER 14. ALGEBRA

Exercise 14.4.16. Let G be a field and F a subset of G. Then F is a subfield if and only if 1G ∈ F
and F is closed under subtraction and division by nonzero elements, i. e., F − F ⊆ F and for a ∈ F
and b ∈ F× we have a/b ∈ F.
Exercise 14.4.17. If F is a subfield of G then they have the same characteristic.
Note in particular that Fp is not a subfield of R.
Exercise 14.4.18. The field F is a subfield of the function field F(t). In particular, they have the
same characteristic.
This shows that function fields are examples of infinite fields of every characteristic.
Exercise 14.4.19. If F has characteristic p then F has a subfield isomorphic to Fp . If F has
characteristic zero then F has a subfield isomorphic to Q.
For this reason, the fields Fp and Q are called prime fields.

14.5 Polynomials
As mentioned in Section 8.3, “polynomials” in this book refer to univariate polynomials (polyno-
mials in one variable), unless we expressly refer to multivariate polynomials.
In Section 8.3, we developed a basic theory of polynomials. In that section, however, we viewed
polynomials as functions. We now develop a more formal theory of polynomials, viewing a poly-
nomial f as a formal expression whose coefficients are taken from a field F of scalars, rather than
as a function f : F → F. This makes no real difference if F is infinite but the difference is signifi-
cant when the field is finite. For starters, there are only finitely many functions Fq → Fq , namely,
there are q q of them, but there are infinitely many formal polynomials over Fq ; for instance, the
polynomials xn (n ∈ N) are all formally different.
Definition 14.5.1 (Polynomial). A polynomial over the field F is an expression1 of the form
f = α0 + α1 t + α2 t2 + · · · + αn tn (14.16)
where the coefficients αi are scalars (elements of F), and t is a symbol. The set of all polynomials
over F is denoted F[t]. Two expressions, (14.16) and
g = β0 + β1 t + β2 t2 + · · · + βm tm (14.17)
1
Strictly speaking, a polynomial is an equivalence class of such expressions, as explained in the next sentence.
14.5. POLYNOMIALS 139

define the same polynomial if they only differ in leading zero coefficients, i. e., there is some k for
which α0 = β0 , . . . , αk = βk , and all coefficients αj , βj are zero for j > k. We may omit any terms
with zero coefficient, e. g.,
3 + 0t + 2t2 + 0t3 = 3 + 2t2 (14.18)
Definition 14.5.2 (Zero polynomial). The polynomial in which all coefficients are zero is called the
zero polynomial and is denoted by 0.
Definition 14.5.3 (Leading term). The leading term of a polynomial f = α0 + α1 t + · · · + αn tn is
the term corresponding to the highest power of t with a nonzero coefficient, that is, the term αk tk
where αk 6= 0 and αj = 0 for all j > k. The zero polynomial does not have a leading term.
Definition 14.5.4 (Leading coefficient). The leading coefficient of a polynomial f is the coefficient
of the leading term of f .
For example, the leading term of the polynomial 3 + 2t2 + 5t7 is 5t7 and the leading coefficient
is 5.
Definition 14.5.5 (Monic polynomial). A polynomial is monic if its leading coefficient is 1.
Definition 14.5.6 (Degree of a polynomial). The degree of a polynomial f = α0 + α1 t + · · · + αn tn ,
denoted deg f , is the exponent of its leading term.
For example, deg (3 + 2t2 + 5t7 ) = 7.
Convention 14.5.7. The zero polynomial has degree −∞.

Exercise 14.5.8. Which polynomials have degree 0?

Notation 14.5.9. We denote the set of polynomials of degree at most n over F by Pn [F].
Definition 14.5.10 (Sum and difference of polynomials). Let f = α0 + α1 t + · · · + αn tn and g =
β0 + β1 t + · · · + βn tn be polynomials. Then the sum of f and g is defined as

f + g = (α0 + β0 ) + (α1 + β1 )t + · · · + (αn + βn )tn (14.19)

and the difference f − g is defined as

f − g = (α0 − β0 ) + (α1 − β1 )t + · · · + (αn − βn )tn (14.20)

Note that f and g need not be of the same degree; we can add on leading zeros if necessary.
140 CHAPTER 14. ALGEBRA

Numerical exercise 14.5.11. Let

f = 2t + t2
g = 3 + t + 2t2 + 3t3
h = 5 + t3 + t4

Compute the polynomials

(a) e1 = f − g

(b) e2 = g − h

(c) e3 = h − f

Self-check : verify that e1 + e2 + e3 = 0.

Proposition 14.5.12. Addition of polynomials is (a) commutative and (b) associative, that is, if
f, g, h ∈ F[t] then

(a) f + g = g + f

(b) f + (g + h) = (f + g) + h

Definition 14.5.13 (Multiplication of polynomials). Let f = α0 + α1 t + · · · + αn tn and g = β0 +


β1 t + · · · + βm tm be polynomials. Then the product of f and g is defined as
n+m
!
X X
f ·g = α j βk t i . (14.21)
i=0 j+k=i

Numerical exercise 14.5.14. Let f , g, and h be as in Example 14.5.11. Compute

(a) e4 = f · g

(b) e5 = f · h

(c) e6 = f · (g + h)

Self-check : verify that e4 + e5 = e6 .


14.5. POLYNOMIALS 141

Proposition 14.5.15. Let f, g ∈ F[t]. Then


(a) deg(f + g) ≤ max{deg f, deg g} ,

(b) deg(f g) = deg f + deg g .


Note that both of these statements hold even if one of the polynomials is the zero polynomial.
Notation 14.5.16 (Set of functions). Let A and B be sets. The set of functions f : B → A is denoted
by AB . Here B is the domain of the functions f in question and A is the target set (into which f
maps B).
The following exercise explains the reason for this notation.
Exercise 14.5.17. Let A and B be finite sets. Show that AB = |A||B| .
Definition 14.5.18 (Substitution). For ζ ∈ F and f = α0 + α1 t + · · · + αn tn ∈ F[t], we set

f (ζ) = α0 + α1 ζ + · · · + αn ζ n ∈ F (14.22)

The substitution t 7→ ζ defines a mapping F[t] → F which assigns the value f (ζ) to f . We denote
the F → F function ζ 7→ f (ζ) by f and call f a polynomial function. So f is a function while f is
a formal expression. If A is an n × n matrix, then we define

f (A) = α0 I + α1 A + · · · + αn An ∈ Mn (F) . (14.23)

Definition 14.5.19 (Divisibility of polynomials). Let f, g ∈ F[t]. We say that g divides f , or f is


divisible by g, written g | f , if there exists a polynomial h ∈ F[t] such that f = gh. In this case we
say that g is a divisor of f and f is a multiple of g.
Notation 14.5.20 (Divisors of a polynomial). Let f ∈ F[t]. We denote by Div(f ) the set of all
divisors of f . We denote by Div(f, g) the set of polynomials which divide both f and g, that is,
Div(f, g) = Div(f ) ∩ Div(g).
Theorem 14.5.21 (Division Theorem). Let f, g ∈ F[t] where g is not be the zero polynomial. Then
there exist polynomials q and r such that

f = qg + r (14.24)

and deg r < deg g.


Moreover, the polynomials q and r are uniquely determined by f and g. ♦
142 CHAPTER 14. ALGEBRA

Exercise 14.5.22. Let f ∈ F[t] and ζ ∈ F.

(a) Show that there exist q ∈ F[t] and ξ ∈ F such that

f = (t − ζ)q + ξ

(b) Prove that ξ = f (ζ)

Exercise 14.5.23. Let f ∈ F[t].

(a) Show that if F is an infinite field, then f = 0 if and only f = 0.

(b) Let F be a finite field of order q.

(b1) Prove that there exists f 6= 0 such that f = 0.


(b2) Show that if deg f < q, then f = 0 if and only if f = 0. Do not use (b3) to solve this;
your solution should be just one line, based on a very simple formula defining f .
(b3) (∗) (Fermat’s Little Theorem, generalized) Show that if |F| = q and f = tq − t, then
f = 0.

Definition 14.5.24 (Ideal). Let I ⊆ F[t]. Then I is an ideal if the following three conditions hold.

(a) 0 ∈ I

(b) I is closed under addition

(c) If f ∈ I and g ∈ F[t] then f g ∈ I

Notation 14.5.25. Let f ∈ F[t]. We denote by (f ) the set of all multiples of f , i. e.,

(f ) = {f g | g ∈ F[t]} .

Proposition 14.5.26. Let f, g ∈ F[t]. Then (f ) ⊆ (g) if and only if g | f .

Definition 14.5.27 (Principal ideal). Let f ∈ F[t]. The set (f ) is called the principal ideal generated
by f , and f is said to be a generator of this ideal.

Exercise 14.5.28. For f ∈ F[t], verify that (f ) is indeed an ideal of F[t].


14.5. POLYNOMIALS 143

Theorem 14.5.29 (Every ideal is principal). Every ideal of F[t] is principal. ♦


Proposition 14.5.30. Let f, g ∈ F[t]. Then (f ) = (g) if and only if there exists ζ 6= 0 ∈ F such
that g = αf .
Theorem 14.5.29 will be our tool to prove the existence of greatest common divisors of polyno-
mials.
Definition 14.5.31 (Greatest common divisior). Let f1 , . . . , fk , g ∈ F[t]. We say that g is a greatest
common divisor (gcd) of f1 , . . . , fk if
(a) g is a common divisor of the fi , i. e., g | fi for all i,

(b) g is a common multiple of all common divisors of the fi , i. e., for all e ∈ F[t], if e | fi for all i,
then e | g.
Proposition 14.5.32. Let f1 , . . . , fk , d, d1 , d2 ∈ F[t].
(a) Let ζ be a nonzero scalar. If d is a gcd of f1 , . . . , fk , then ζd is also a gcd of f1 , . . . , fk .

(b) If d1 and d2 are both gcds of f1 , . . . , fk , then there exists ζ ∈ F× = F \ {0} such that d2 = ζd1 .
Exercise 14.5.33. Let f1 , . . . , fk ∈ F[t]. Show that gcd(f1 , . . . , fk ) = 0 if and only if f1 = · · · =
fk = 0.
Proposition 14.5.34. Let f1 , . . . , fk ∈ F[t] and suppose not all of the fi are 0. Then among all of
the greatest common divisors of f1 , . . . , fk , there is a unique monic polynomial.
For the sake of uniqueness of the gcd notation, we write d = gcd(f1 , . . . , fk ) if, in addition to
(a) and (b),
(c) d is monic or d = 0.
Theorem 14.5.35 (Existence of gcd). Let f1 , . . . , fk ∈ F[t]. Then gcd(f1 , . . . , fk ) exists and,
moreover, there exist polynomials g1 , . . . , gk such that
k
X
gcd(f1 , . . . , fk ) = fi gi (14.25)
i=1


144 CHAPTER 14. ALGEBRA

The second statement is called “Bézout’s Lemma.”

Lemma 14.5.36 (Euclid’s Lemma). Let f, g, h ∈ F[t]. Then Div(f, g) = Div(f − gh, g). ♦

Theorem 14.5.37. Let f1 , . . . , fk , g ∈ F[t]. Then g is a gcd of f1 , . . . , fk if and only if Div(f1 , . . . , fk ) =


Div(g). ♦

Exercise 14.5.38. Let f ∈ F[t]. Show that Div(f, 0) = Div(f ).

By applying the above theorem and Euclid’s Lemma, we arrive at Euclid’s Algorithm for deter-
mining the gcd of polynomials.
Euclid’s algorithm
Input: polynomials f0 , g0
Output: gcd(f0 , g0 )

f ← f0
g ← g0
while g 6= 0
Find q and r such that
f = gq + r and deg(r) < deg(g)
(Division Theorem)
f ←g
g←r
end(while)
return f

Theorem 14.5.39. Euclid’s Algorithm returns a greatest common divisor of the two input polyno-
mials in at most deg(g0 ) rounds of the while loop. ♦

The following example provides a demonstration of Euclid’s algorithm.


14.5. POLYNOMIALS 145

Example 14.5.40. Let f = t5 + 2t4 − 3t3 + t2 − 5t + 4 and let g = t2 − 1. Then

Div(f, g) = Div f − t3 + 2t2 − 2t + 3 g, g


 

= Div t2 − 1, −7t − 7

 
2 t
= Div t − 1 + (−7t + 7), −7t + 7
7
= Div (t − 1, −7t + 7)
 
1
= Div t − 1 + (−7t + 7), −7t + 7
7
= Div(0, −7t + 7)
= Div(−7t + 7)

Thus −7t + 7 is a gcd of f and g, and we can multiply by − 71 to get a monic polynomial. In
particular, t − 1 is the gcd of f and g, and we may write gcd(f, g) = t − 1.
Exercise 14.5.41. Let

f1 = t2 + t − 2 (14.26)
f2 = t2 + 3t + 2 (14.27)
f3 = t3 − 1 (14.28)
f4 = t4 − t2 − 2t − 1 (14.29)

Determine the following greatest common divisors.


(a) gcd(f1 , f2 )

(b) gcd(f1 , f3 )

(c) gcd(f1 , f3 , f4 )

(d) gcd(f1 , f2 , f3 , f4 )
Proposition 14.5.42. Let f, g, h ∈ F[t]. Then gcd(f g, f h) = f d, where d = gcd(g, h).
Proposition 14.5.43. If f | gh and gcd(f, g) = 1, then f | h.
Exercise 14.5.44. Determine gcd(f, f 0 ) where f = tn + t + 1 (over R).
146 CHAPTER 14. ALGEBRA

Exercise 14.5.45. Determine gcd(tn − 1, t2 + t + 1).


Let f ∈ F[t] and let F be a subfield of the field G. Then we can also view f as a polynomial
over G. However, this changes the notion of linear combinations and therefore in principle it could
affect the gcd of two polynomials. We shall see that this is not the case, but in order to be able
to reason about this question, we temporarily use the notation gcdF (f ) and gcdG (f ) to denote the
gcd of f with respect to the corresponding fields.
Exercise 14.5.46 (Insensitivity of gcd to field extensions). Let F be a subfield of G, and let
f, g ∈ F[t]. Then
gcd(f, g) = gcd(f, g) .
F G

Definition 14.5.47 (Irreducible polynomial). A polynomial f ∈ F[t] is irreducible over F if deg f ≥ 1


and for all g, h ∈ F[t], if f = gh then either deg g = 0 or deg h = 0.
We shall give examples of irreducible polynomials over various fields in Examples ??
Proposition 14.5.48. If f ∈ F[t] is irreducible and ζ is a nonzero scalar, then ζf is irreducible.
Proposition 14.5.49. Let f ∈ F[t] with deg f = 1. Then f is irreducible.
Proposition 14.5.50. Let f be an irreducible polynomial, and let f | gh. Then either f | g or
f | h.
Proposition 14.5.51. Every nonzero polynomial is a product of irreducible polynomials.
Theorem 14.5.52 (Unique Factorization). Every polynomial in F[t] can be uniquely written as the
product of irreducible polynomials over F. ♦
Uniqueness holds up to the order of the factors and scalar multiples, i. e., if f1 · · · fk = g1 · · · g`
where the fi and gj are irreducible, then k = ` and there exists a permutation (bijection) σ :
{1, . . . , k} → {1, . . . , k} and nonzero scalars αi ∈ F such that fi = αi gσ(i) .
Definition 14.5.53 (Root of a polynomial). Let f ∈ F[t]. We say that ζ ∈ F is a root of f if f (ζ) = 0.
Proposition 14.5.54. Let ζ ∈ F and let f ∈ F[t]. Then

t − ζ | f − f (ζ) .

Corollary 14.5.55. Let ζ ∈ F and f ∈ F[t]. Then ζ is a root of f if and only if (t − ζ) | f .


14.5. POLYNOMIALS 147

Definition 14.5.56 (Multiplicity). The multiplicity of a root ζ of a polynomial f ∈ F[t] is the largest
k for which (t − ζ)k | f .
√ 
Exercise 14.5.57. Let f ∈ R[t]. Show that f −1 = 0 if and only if t2 + 1 | f .

Proposition 14.5.58. Let f be a polynomial of degree n. Then f has at most n roots (counting
multiplicity).

Theorem 14.5.59 (Fundamental Theorem of Algebra). Let f ∈ C[t]. If deg f ≥ 1, then f has a
complex root, i. e., there exists ζ ∈ C such that f (ζ) = 0. ♦

Proposition 14.5.60. If f ∈ C[t], then f is irreducible if and only if deg f = 1.

Proposition 14.5.61. If f ∈ C[t] and deg f = k ≥ 1, then f can be written as


k
Y
f = αk (t − ζi ) (14.30)
i=1

where αk is the leading coefficient of f and the ζi are complex numbers.

Proposition 14.5.62. Let f ∈ P2 [R] be given by f = at2 + bt + c with a 6= 0. Then f is irreducible


over R if and only if b2 − 4ac < 0.

Exercise 14.5.63. Let f ∈ F[t].

(a) If f has a root in F and deg f ≥ 2, then f is reducible.

(b) Find a reducible polynomial over R that has no real root.

Proposition 14.5.64. Let f ∈ R[t] be of odd degree. Then f has a real root.

Proposition 14.5.65. Let f ∈ R[t] and ζ ∈ C. Then f (ζ) = f (ζ). Conclude that if ζ is a complex
root of f , then so is ζ.

Exercise 14.5.66. Let f ∈ R[t]. Show that if f is irreducible over R, then deg f ≤ 2.
Q
Exercise 14.5.67. Let f ∈ R[t] and f 6= 0. Show that f can be written as f = gi where each gi
has degree 1 or 2.
148 CHAPTER 14. ALGEBRA

n
X
Definition 14.5.68 (Formal derivative of a polynomial). Let f = αi ti . Then the formal derivative
i=0
of f is defined to be
n
X
0
f = kαk tk−1 (14.31)
k=1
That is,
f 0 = α1 + 2α2 t + · · · + nαn tn−1 . (14.32)
Note that this definition works even over finite fields. We write f (k) to mean the k-th derivative of
0
f , defined inductively as f (0) = f and f (k+1) = f (k) .
Proposition 14.5.69 (Linearity of differentiation). Let f and g be a polynomials and let ζ be a
scalar. Then
(a) (f + g)0 = f 0 + g 0 ,
(b) (ζf )0 = ζf 0 .
Proposition 14.5.70 (Product rule). Let f and g be a polynomials and let ζ be a scalar. Then
(f g)0 = f 0 g + f g 0 (14.33)
Definition 14.5.71 (Composition of polynomials). Let f and g be polynomials. Then the composition
of f with g, denoted f ◦ g is the polynomial obtained by replacing all occurrences of the symbol t
in the expression for f with g, i. e., f ◦ g = f (g) (we “substitute g” into f ).
Proposition 14.5.72. Let f and g be polynomials and let ζ be a scalar. Then
(f ◦ g)(ζ) = f (g(ζ)) . (14.34)
Proposition 14.5.73 (Chain Rule). Let f, g ∈ F[t] and let h = f ◦ g. Then
h0 = (f 0 ◦ g) · g 0 . (14.35)
Proposition 14.5.74.
(a) Let2 F be Q, R, or C. Let f ∈ F[t] and let ζ ∈ F. Then (t − ζ)k | f if and only if
f (ζ) = f 0 (ζ) = · · · = f (k−1) (ζ) = 0
2
This exercise holds for all subfields of C and more generally for all fields of characteristic 0.
14.5. POLYNOMIALS 149

(b) This is false if F = Fp .

Proposition 14.5.75. Let f ∈ C[t]. Then f has no multiple roots if and only if gcd(f, f 0 ) = 1.

Exercise 14.5.76. Let n ≥ 1. Prove that the polynomial f = tn + t + 1 has no multiple roots in C.
Chapter 15

Vector Spaces: Basic Concepts

15.1 Vector spaces


Many sets with which we are familiar have the desirable property that they are closed under “linear
combinations” of their elements. Consider, for example, the space RR of functions f : R → R. If
f, g ∈ RR and α ∈ R, then f + αg ∈ RR . The same is true for the space C[0, 1] of continuous
functions f : [0, 1] → R as well as many other sets, like the space RN of sequences of real numbers
and the 2- and 3-dimensional geometric spaces G2 and G3 . We formalize this with the notion of a
vector space.
Let F be a (finite or infinite) field. The reader not familiar with fields may think of F as being
R or C.
Definition 15.1.1 (Vector space). A vector space V over a field F of scalars is:

(a) An abelian group (V, +)

(b) Endowed with a scaling function V × F → V such that

(b1) For all α ∈ F and v ∈ V , there exists a unique vector αv ∈ V


(b2) For all α, β ∈ F and v ∈ V , (αβ)v = α(βv)
(b3) For all α, β ∈ F and v ∈ V , (α + β)v = αv + βv
(b4) For every α ∈ F and u, v ∈ V , α(u + v) = αu + αv
(b5) For every v ∈ V , 1 · v = v (normalization)

Babai: Discover Linear Algebra. 150 This chapter last updated August 16, 2016
c 2016 László Babai.
15.1. VECTOR SPACES 151

The zero vector in a vector space V is written as 0V , but we often just write 0 when the context is
clear.
Property (b2) is referred to as “pseudo-associativity,” because it is a form of associativity in
which we are dealing with different operations (multiplication in F and scaling of vectors). Prop-
erties (b3) and (b4) are two types of distributivity: scaling of a vector distributes both over the
addition of scalars and the addition of vectors.

Proposition 15.1.2. Let V be a vector space over the field F. For all v ∈ V , α ∈ F:

(a) 0 · v = 0.

(b) α0 = 0.

(c) αv = 0 if and only if either α = 0 or v = 0.

Exercise 15.1.3. Let V be a vector space and let x ∈ V . Show that x + x = x if and only if x = 0.

Example 15.1.4 (Euclidean geometry). The most natural examples of vector spaces are the geo-
metric spaces G2 and G3 . We write G2 for the plane and G3 for the “space” familiar from Euclidean
geometry. We think of G2 and G3 has having a special point called the origin. We view the points
of G2 and G3 as “vectors” (line segments from the origin to the point). Addition is defined by the
parallelogram rule and scalar multiplication (over R) by scaling. Observe that G2 and G3 are vector
spaces over R. These classical geometries form the foundation of our intuition about vector spaces.

Note that G2 is not the same as R2 . Vectors in G2 are directed segments (geometric objects),
while the vectors in R2 are pairs of numbers. The connection between these two is one of the great
discoveries of the mathematics of the modern era (Descartes).

Examples 15.1.5. Show that the following are vector spaces over R.

(a) Rn

(b) Mn (R)

(c) Rk×n

(d) C[0, 1], the space of continuous real-valued functions f : [0, 1] → R

(e) The space RN of infinite sequences of real numbers


152 CHAPTER 15. VECTOR SPACES: BASIC CONCEPTS

(f) The space RR of real functions f : R → R


(g) The space R[t] of polynomials in one variable with real coefficients.
(h) For all k ≥ 0, the space Pk [R] of polynomials of degree at most k with coefficients in R

R
(i) The space RΩ of functions f : Ω → R where Ω is an arbitrary set.
In Section 11.1, we defined the notion of a linear form over Fn ( Def. 11.1.1). This generalizes
immediately to vector spaces over F.
Definition 15.1.6 (Linear form). Let V be a vector space over F. A linear form is a function
f : V → F with the following properties.
(a) f (x + y) = f (x) + f (y) for all x, y ∈ Fn ;
(b) f (λx) = λf (x) for all x ∈ Fn and λ ∈ F.
Definition 15.1.7 (Dual space). Let V be a vector space over F. The set of linear forms f : V → F
is called the dual space of V and is denoted V ∗ .

R
Exercise 15.1.8. Let V be a vector space over F. Show that V ∗ is also a vector space over F.
In Section 1.1, we defined linear combinations of column vectors ( Def. 1.1.13). This is
easily generalized to linear combinations of vectors in any vector space.
Definition 15.1.9 (Linear combination). Let V be a vector space over F, and let v1 , . . . , vk ∈ V ,
k
X
α1 , . . . , αk ∈ F. Then αi vi is called a linear combination of the vectors v1 , . . . , vk . The linear
i=1
combination for which all coefficients are zero is the trivial linear combination.

R
Exercise 15.1.10 (Empty linear combination). What is the linear combination of the empty set?
Convention 14.3.21 explains our convention for the empty sum.
Exercise 15.1.11.
(a) Express the polynomial t − 1 as a linear combination of the polynomials t2 − 1, (t − 1)2 ,
t2 − 3t + 2.
(b) Give an elegant proof that the polynomial t2 + 1 cannot be expressed as a linear combination
of the polynomials t2 − 1, (t − 1)2 , t2 − 3t + 2.
Exercise 15.1.12. For α ∈ R, express cos(t + α) as a linear combination of cos t and sin t.
15.2. SUBSPACES AND SPAN 153

15.2 Subspaces and span


In Section 1.2, we studied subspaces and span in the context of Fk . We now generalize this to
arbitrary vector spaces.
For this section, we will take V to be a vector space over a field F and we will let W, S ⊆ V .
Definition 15.2.1 (Subspace). W ⊆ V is a subspace (notation: W ≤ V ) if W is a vector space under
the same operations: if a, b ∈ W then a + b has the same meaning in W as in V , and if α ∈ F and
a ∈ W then αv has the same meaning in W as in V . Notein particular that W and V operate with
the same field of scalars.
Exercise 15.2.2. If W ≤ V then 0V ∈ W and 0W = 0W .
Exercise 15.2.3. A subset W ⊆ V is a subspace if and only if
(a) W 6= ∅

(b) W is closed under addition: if u, v ∈ W then u + v ∈ W

(c) W is closed under scaling: if u ∈ W and α ∈ F then αu ∈ W .


Exercise 15.2.4. A subset W ⊆ V is a subspace if and only if W is closed under linear combinations
in V : if w ∈ V is a linear combination of vectors in W then w ∈ W .
Note that this includes the condition that 0 ∈ W (the empty linear combination).
Definition 15.2.5 (Disjoint subspaces). Two subspaces U1 , U2 ≤ V are disjoint if U1 ∩ U2 = {0}.
Exercise 15.2.6. The intersection of any (finite or infinite) set of subspaces is a subspace.
Exercise 15.2.7. Let W1 , W2 ≤ V . Then W1 ∪ W2 ≤ V if and only if W1 ⊆ W2 or W2 ⊆ W1 .
Exercise 15.2.8. Describe all subspaces of the geometric spaces G2 and G3 .
Exercise 15.2.9. Determine which of the following are subspaces of R[t], the space of polynomials
in one variable over R.
(a) {f ∈ R[t] | deg(f ) = 5}

(b) {f ∈ R[t] | deg(f ) ≤ 5}

(c) {f ∈ R[t] | f (1) = 1}


154 CHAPTER 15. VECTOR SPACES: BASIC CONCEPTS

(d) {f ∈ R[t] | f (1) = 0}


√ 
(e) {f ∈ R[t] | f 2 = 0}
√ 
(f) {f ∈ R[t] | f −1 = 0}

(g) {f ∈ R[t] | f (1) = f (2)}

(h) {f ∈ R[t] | f (1) = (f (2))2 }

(i) {f ∈ R[t] | f (1)f (2) = 0}

(j) {f ∈ R[t] | f (1) = 3f (2) + 4f (3)}

(k) {f ∈ R[t] | f (1) = 3f (2) + 4f (3) + 1}

(l) {f ∈ R[t] | f (1) ≤ f (2)}


Definition 15.2.10 (Span). Let V be a vector space and let S ⊆ V . Then the span of S, denoted
span(S), is the smallest subspace of V containing S, i. e.,
(a) span(S) ⊇ S;

(b) span(S) ≤ V ;

(c) for every subspace W ≤ V , if S ⊆ W then span(S) ≤ W .


We repeat some of the exercises from Section 1.2 in our more general context.
Exercise 15.2.11. ***************

15.3 Linear independence and bases

R
( Def. 1.3.5). We now generalize this to linear independence of a list (
in a general vector space.
R
Let V be a vector space. In Section 1.3, we defined the notion of linear independence of matrices
Def. 1.3.1) of vectors

Definition 15.3.1 (Linear independence). The list (v1 , . . . , vk ) of vectors in V is said to be linearly
independent over F if the only linear combination equal to 0 is the trivial linear combination. The
list (v1 , . . . , vk ) is linearly dependent if it is not linearly independent.
15.3. LINEAR INDEPENDENCE AND BASES 155

Definition 15.3.2. If a list (v1 , . . . , vk ) of vectors is linearly independent (dependent), we say that
the vectors v1 , . . . , vk are linearly independent (dependent).
Definition 15.3.3. We say that a set of vectors is linearly independent if a list formed by its elements
(in any order and without repetitions) is linearly independent.

Exercise 15.3.4. Show that the following sets are linearly independent over Q.
 √ √
(a) 1, 2, 3

(b) { x | x is square-free} (an integer n is square-free if there is no perfect square k 6= 1 such
that k | n).

Exercise 15.3.5. Let U1 , U2 ≤ V with U1 ∩ U2 = {0}. Let v1 , . . . , vk ∈ U1 and w1 , . . . , w` ∈ U2 . If


the lists (v1 , . . . , vk ) and (w1 , . . . , w` ) are linearly independent, then so is the list (v1 , . . . , vk , w1 , . . . , w` ).

Definition 15.3.6 (Rank and dimension). The rank of a set of vectors is the maximum number of
linearly independent vectors among them. For a vector space V , the dimension of V is its rank,
that is, dim V = rk V .

Exercise 15.3.7.

(a) When are two vectors in G2 linearly independent?

(b) When are two vectors in G3 linearly independent?

(c) When are three vectors in G3 linearly independent?

You should phrase your answers in geometric terms.

Definition 15.3.8. We say that the vector w depends on the list (v1 , . . . , vk ) of vectors if w ∈
span(v1 , . . . , vk ), i. e., if w can be expressed as a linear combination of the vi .
Definition 15.3.9 (Linear independence of an infinite list). We say that an infinite list (vi | i ∈ I)
(where I is an index set) is linearly independent if every finite sublist (vi | i ∈ J) (where J ⊆ I and
|J| < ∞) is linearly independent.

Exercise 15.3.10. Verify that Exercises 1.3.11-1.3.26 hold in general vector spaces (replace Fn by
V where necessary).
156 CHAPTER 15. VECTOR SPACES: BASIC CONCEPTS

Example 15.3.11. For k = 0, 1, 2, . . . , let fk be a polynomial of degree k. Show that the infinite
list (f0 , f1 , f2 , . . . ) is linearly independent.
Exercise 15.3.12. Find three nonzero vectors in G3 that are linearly dependent but no two are
parallel.
Exercise 15.3.13. Prove that for all α, β ∈ R, the functions sin(t), sin(t + α), sin(t + β) are linearly
dependent (as members of the function space RR ).
Exercise 15.3.14. Let α1 < α2 < · · · < αn ∈ R. Consider the vectors
 
α1i
 i
 α2 
vi = 
 .. 
 (15.1)
 .
αni

for i ≥ 0 (recall the convention that α0 = 1 even if α = 0). Show that (v0 , . . . , vn−1 ) is linearly
independent.
Exercise 15.3.15 (Moment curve). Find a continuous curve in Rn , i. e., a continuous injective
function f : R → Rn , such that every set of n points on the curve are linearly independent. The
simplest example is called the “moment curve,” and we bet you will find it.
Exercise 15.3.16. Let α1 < α2 < · · · < αn ∈ R, and define the degree-n polynomial
n
Y
f= (t − αj )
j=1

For each 1 ≤ i ≤ n, define the polynomial gi of degree n − 1 by


n
f Y
gi = = (t − αj ) .
t − αi j=1
j6=i

Prove that the polynomials g1 , . . . , gn are linearly independent.


Definition 15.3.17. We say that a set S ⊆ V generates V if span(S) = V . If S generates V , then S
is said to be a set of generators of V .
15.3. LINEAR INDEPENDENCE AND BASES 157

Definition 15.3.18 (Finite-dimensional vector space). We say that the vector space V is finite di-
mensional if V has a finite set of generators. A vector space which is not finite dimensional is
infinite dimensional.

Exercise 15.3.19. Show that F[t] is infinite dimensional.

Definition 15.3.20. [Basis] A list e = (e1 , . . . , ek ) of vectors is a basis of V if e is linearly independent


and generates V .
In Section 1.3, we defined the standard basis of the space Fn (
general vector spaces do not have the notion of a standard basis.
R
Def. 1.3.34). However,

Note that a list of vectors is not the same as a set of vectors, but a list of vectors which is
linearly independent necessarily has no repeated elements. Note further that lists carry with them
an inherent ordering; that is, bases are ordered.

Examples 15.3.21.

(a) Show that the polynomials 1, t, t2 , . . . , tk form a basis for Pk [F].

(b) Show that the polynomials t2 + t + 1, t2 − 2t + 2, t2 − t − 1 form a basis of P2 [F].

(c) Express the polynomial f = 1 as a linear combination of these basis vectors.

Examples 15.3.22. For each of the following sets S, describe the vectors in span(S) and give a
basis for span(S).

(a) S = {t, t2 } ⊆ F[t]

(b) S = {sin(t), cos(t), cos(2t), eit } ⊆ CR


     
 1 0 4 
(c) S =  1 , 1 , −7 ⊆ F3
   
1 0 4
 
   
1 0 3 0
(d) S = , ,
  0 1 0 −5

0 2 2 3
, ⊆ M2 (F)
0 0 0 −7
158 CHAPTER 15. VECTOR SPACES: BASIC CONCEPTS
   
 1 0 0 0 0 0
(e) S = 0 −3 0 , 0 6 0 ,
0 0 0 00 2


1 0 0 2 0 0 
0 0 0 , 0 3 0 ⊆ M3 (F)
0 0 1 0 0 3

Example 15.3.23. Show that the polynomials t2 , (t + 1)2 , and (t + 2)2 form a basis for P2 [F].
Express the polynomial t in terms of this basis, and write its coordinate vector.
Exercise 15.3.24. For α ∈ R, write the coordinate vector of cos(t + α) in the basis (cos t, sin t).

inR
Exercise 15.3.25. Find a basis of the 0-weight subspace of Fk (the 0-weight subspace is defined
Ex. 1.2.7).
Exercise 15.3.26.
(a) Find a basis of Mn (F).

(b) Find a basis of Mn (F) consisting of nonsingular matrices.


Warning. Part (b) is easier for fields of characteristic 0 than for fields of finite characteristic.
Proposition 15.3.27. Let b = (b1 , . . . , bn ) be a list of vectors in V . Then b is a basis of V if
and only if every vector can be uniquely expressed as a linear combination of the bi , i. e., for every
n
X
v ∈ V , there exists a unique list of coefficients (α1 , . . . , αn ) such that v = αi bi .
i=1

Definition 15.3.28 (Maximal linearly independent set). A linearly independent set S ⊆ V is maximal
if, for all v ∈ V \ S, S ∪ {v} is not linearly independent.
Proposition 15.3.29. Let e be a list of vectors in a vector space V . Then e is a basis of V if and
only if it is a maximal linearly independent set.
Proposition 15.3.30. Let V be a vector space. Then V has a basis (Zorn’s lemma is needed for
the infinite-dimensional case).
Proposition 15.3.31. Let e be a list of vectors in V . Then it is possible to extend e to a basis of
V , that is, there exists a basis of V which has e as a sublist.
15.4. THE FIRST MIRACLE OF LINEAR ALGEBRA 159

Proposition 15.3.32. Let V be a vector space and let S ⊆ V be a set of generators of V . Then
there exists a list e of vectors in S such that e is a basis of V .
Definition 15.3.33 (Coordinates). The coefficients α1 , . . . , αn of Ex. 15.3.27 are called the coordi-
nates of v with respect to the basis b.
Definition 15.3.34 (Coordinate vector). Let b = (b1 , . . . , bk ) be a basis of the vector space V ,
and let v ∈ V . Then the column vector representation of v with respect to the basis b, or the
coordinatization of v with respect to b, denoted by [v]b , is obtained by arranging the coordinates of
v with respect to b in a column, i. e.,  
α1
α2 
[v]b =  ..  (15.2)
 
.
αk
k
X
where v = αi bi .
i=1

R
15.4 The First Miracle of Linear Algebra
In Section 1.3, we proved the First Miracle of Linear Algebra for Fn ( Theorem 1.3.40). This
generalizes immediately to abstract vector spaces.
Theorem 15.4.1 (First Miracle of Linear Algebra). Let v1 , . . . , vk be linearly independent with
vi ∈ span(w1 , . . . , wm ) for all i. Then k ≤ m.
The proof of this theorem requires the following lemma.
Lemma 15.4.2 (Steinitz exchange lemma). Let (v1 , . . . , vk ) be a linearly independent list such that
vi ∈ span(w1 , . . . , wm ) for all i. Then there exists j (1 ≤ j ≤ m) such that the list (wj , v2 , . . . , vk )
is linearly independent. ♦
Corollary 15.4.3. Let V be a vector space. All bases of V have the same cardinality.
This is an immediate corollary to the First Miracle.
The following theorem is essentially a restatement of the First Miracle of Linear Algebra.
Theorem 15.4.4.
160 CHAPTER 15. VECTOR SPACES: BASIC CONCEPTS

(a) Use the First Miracle to derive the fact that rk(v1 , . . . , vk ) = dim (span(v1 , . . . , vk )).

(b) Derive the First Miracle from the statement that rk(v1 , . . . , vk ) = dim (span(v1 , . . . , vk )).

Exercise 15.4.5. Let V be a vector space of dimension n, and let v1 , . . . , vn ∈ V . The following
are equivalent:

(a) (v1 , . . . , vn ) is a basis of V

(b) v1 , . . . , vn are linearly independent

(c) V = span(v1 , . . . , vn )

Exercise 15.4.6. Show that dim (Fn ) = n.



Exercise 15.4.7. Show that dim Fk×n = kn.

Exercise 15.4.8. Show that dim(Pk ) = k + 1, where Pk is the space of polynomials of degree at
most k.

Exercise 15.4.9. What is the√dimension


 of the subspace of R[t] consisting of polynomials f of
degree at most n such that f −1 = 0?

Exercise 15.4.10. Show that, if f is a polynomial of degree n, then (f (t), f (t + 1), . . . , f (t + n − 1))
is a basis of Pn [F].

Exercise 15.4.11. Show that any list of polynomials, one of each degree 0, . . . , n, forms a basis of
Pn [F].

Proposition 15.4.12. Let V be an n-dimensional vector space with subspaces U1 , U2 such that
U1 ∩ U2 = {0}. Then
dim U1 + dim U2 ≤ n . (15.3)

Proposition 15.4.13 (Modular equation). Let V be a vector space, and let U1 , U2 ≤ V . Then

dim(U1 + U2 ) + dim(U1 ∩ U2 ) = dim U1 + dim U2 . (15.4)


15.5. DIRECT SUMS 161

Exercise 15.4.14. Let A = (αij ) ∈ Rn×n , and assume the columns of A are linearly independent.
Prove that it is always possible to change the value of an entry in the first row so that the columns
of A become linearly dependent.
Exercise 15.4.15. Call a sequence (a0 , a1 , a2 , . . . ) “Fibonacci-like” if for all n, an+2 = an+1 + an .
(a) Prove that Fibonacci-like sequences form a 2-dimensional vector space.
(b) Find a basis for the space of Fibonacci-like sequences.
(c) Express the Fibonacci sequence (0, 1, 1, 2, 5, 8, . . . ) as a linear combination of these basis vec-
tors.
♥ Exercise 15.4.16. Let f be a polynomial. Prove that f has a multiple g = f · h 6= 0 in which
every exponent is prime, i. e., X
g= α p xp (15.5)
p prime

for some coefficients αp .


Definition 15.4.17 (Elementary operation). For a list of vectors (v1 , . . . , vk ), the elementary opera-
tion denoted by (i, j, λ) is the operation which replaces vi by vi − λvj (i 6= j).
Proposition 15.4.18. Performing elementary operations does not change the rank of a set of
vectors.

15.5 Direct sums


Definition 15.5.1 (Direct sum). Let V be a vector space and let U1 , U2 ≤ V . Define W = U1 + U2 .
Then W is the direct sum of U1 and U2 (W = U1 ⊕ U2 ) if U1 ∩ U2 = {0}. More generally, if
W = U1 + · · · + Uk , then W is the direct sum of U1 , . . . , Uk (W = U1 ⊕ · · · ⊕ Uk ) if for all i,
!
X
Ui ∩ Uj = {0} . (15.6)
j6=i

Proposition 15.5.2. Let V be a vector space and let U1 , . . . , Uk ≤ V . Then


k
X
dim(U1 ⊕ · · · ⊕ Uk ) = dim Ui . (15.7)
i=1
162 CHAPTER 15. VECTOR SPACES: BASIC CONCEPTS

The following exercise shows that this could actually be used as the definition of the direct sum
in finite-dimensional spaces, but that statement in fact holds in infinite-dimensional spaces as well.

Proposition 15.5.3. Let V be a finite-dimensional vector space, and let U1 , . . . , Uk ≤ V , with


k
! k
X X
dim Ui = dim Ui .
i=1 i=1

Then
k
X k
M
Ui = Ui .
i=1 i=1

k
X
Proposition 15.5.4. Let V be a vector space and let Ui , . . . Uk ≤ V . Then W = Ui is a direct
i=1
sum if and only if for every choice of k vectors ui (i = 1, . . . , k) where ui ∈ Ui \ {0}, the vectors
u1 , . . . , uk are linearly independent.

We note that the notion of direct sum extends the notion of linear independence to subspaces.

Proposition 15.5.5. The vectors v1 , . . . , vk are linearly independent if and only if


k
X k
M
span(vi ) = span(vi ) . (15.8)
i=1 i=1
Chapter 16

Linear Maps

16.1 Linear map basics


Definition 16.1.1 (Linear map). Let V and W be vector spaces over the same field F. A function
ϕ : V → W is called a linear map or homomorphism if for all v, w ∈ V and α ∈ F:
(a) ϕ(αv) = αϕ(v)
(b) ϕ(v + w) = ϕ(v) + ϕ(w)
Exercise 16.1.2. Show that if ϕ : V → W is a linear map, then ϕ(0V ) = 0W
Proposition 16.1.3. Linear maps preserve linear combinations, i. e.,
k
! k
X X
ϕ αi v i = αi ϕ(vi ) (16.1)
i=1 i=1

Exercise 16.1.4. True or false?


(a) If the vectors v1 , . . . , vk are linearly independent, then ϕ(v1 ), . . . , ϕ(vk ) are linearly indepen-
dent.
(b) If the vectors v1 , . . . , vk are linearly dependent, then ϕ(v1 ), . . . , ϕ(vk ) are linearly dependent.
Example 16.1.5. Our prime examples of linear maps are those defined by matrix multiplication.
Every matrix A ∈ Fk×n defines a linear map ϕA : Fn → Fk by x 7→ Ax.

Babai: Discover Linear Algebra. 163 This chapter last updated August 14, 2016
c 2016 László Babai.
164 CHAPTER 16. LINEAR MAPS

Example 16.1.6. The projection of R3 onto R2 defined by


 
α1  
α2  7→ α1 (16.2)
α2
α3

is a linear map (verify!).


R1
Example 16.1.7. The map f 7→ 0
f (t)dt is a linear map from C[0, 1] → R (verify!).
d d
Example 16.1.8. Differentiation dt
is a linear map dt
: Pn (F) → Pn−1 (F) (verify!).

Example 16.1.9. The map ϕ : Pn (R) → Rk defined by


 
f (α1 )
f (α2 )
f 7→  ..  (16.3)
 
 . 
f (αk )

where α1 < α2 < · · · < αn is a linear map (verify!).

Example 16.1.10. Interpolation is the map f 7→ L(f ) ∈ Pn (R) where L(f ) = p is the unique
polynomial with the property that p(αi ) = f (αi ) (i = 0, . . . , n) for some fixed α0 , . . . , αn . This is a
linear map from C[0, 1] → Pn (R) (verify!).

Example 16.1.11. The map ϕ : Pn (R) → Tn = span{1, cos t, . . . , cos nt} defined by

ϕ (α0 + α1 t + · · · + αn tn )
= α0 + α1 cos t + · · · + αn cosn t (16.4)

is a linear map (verify!).

Notation 16.1.12. The set of linear maps ϕ : V → W is denoted by Hom(V, W ).

Fact 16.1.13. Hom(V, W ) is a subspace of the function space W V .


16.2. ISOMORPHISMS 165

Definition 16.1.14 (Composition of linear maps). Let U , V , and W be vector spaces, and let ϕ :
U → V and ψ : V → W be linear maps. Then the composition of ψ with ϕ, denoted by ψ ◦ ϕ or
ψϕ, is the map η : U → W defined by

η(v) := ψ(ϕ(v)) (16.5)


ϕ ψ
FIGURE: U −
→V −
→W

Proposition 16.1.15. Let U , V , and W be vector spaces and let ϕ : U → V and ψ : V → W be


linear maps. Then ϕ ◦ ψ is a linear map.

Linear maps are uniquely determined by their action on a basis, and we are free to choose this
action arbitrarily. This is more formally expressed in the following theorem.

Theorem 16.1.16 (Degree of freedom of linear maps). Let V and W be vector spaces with e =
(e1 , . . . , ek ) a basis of V , and w1 , . . . , wk arbitrary vectors in W . Then there exists a unique linear
map ϕ : V → W such that ϕ(vi ) = wi for 1 ≤ i ≤ k. ♦

Exercise 16.1.17. Show that dim(Hom(V, W )) = dim(V ) · dim(W ).

In Section 15.3 we represented vectors by the column vectors of their coordinates with respect
to a given basis. As the next step in translating geometric objects to tables of numbers, we assign
matrices to linear maps relative to given bases in the domain and the target space. Our key tool
for this endeavor is Theorem 16.1.16.

16.2 Isomorphisms
Let V and W be vector spaces over the same field.
Definition 16.2.1 (Isomorphism). A linear map ϕ ∈ Hom(V, W ) is said to be an isomorphism if it
is a bijection. If there exists an isomorphism between V and W , then V and W are said to be
isomorphic. “V is isomorphic to W ” is denoted V ∼= W.

Fact 16.2.2. The inverse of an isomorphism is an isomorphism.

Fact 16.2.3. Isomorphisms preserve linear independence and, moreover, map bases to bases.
166 CHAPTER 16. LINEAR MAPS

Fact 16.2.4. Let V and W be vector spaces, let ϕ : V → W be an isomorphism, and let v1 , . . . , vk ∈
V . Then
rk(v1 , . . . , vk ) = rk(ϕ(v1 ), . . . , ϕ(vk ))

Exercise 16.2.5. Show that ∼


= is an equivalence relation, that is, for vector spaces U , V , and W :

(a) V ∼
= V (reflexive)

(b) If V ∼
= W then W ∼
= V (symmetric)

(c) If U ∼
= V and V ∼
= W then U ∼
= W (transitive)

Exercise 16.2.6. Let V be an n-dimensional vector space over F. Show that V ∼


= Fn .

Proposition 16.2.7. Two vector spaces over the same field are isomorphic if and only if they have
the same dimension.

16.3 The Rank-Nullity Theorem


Definition 16.3.1 (Image). The image of a linear map ϕ, denoted im(ϕ), is

im(ϕ) := {ϕ(v) | v ∈ V } (16.6)

Definition 16.3.2 (Kernel). The kernel of a linear map ϕ, denoted ker(ϕ), is

ker(ϕ) := {v ∈ V | ϕ(v) = 0W } = ϕ−1 (0W ) (16.7)

Proposition 16.3.3. Let ϕ : V → W be a linear map. Then im(ϕ) ≤ W and ker(ϕ) ≤ V .

Definition 16.3.4 (Rank of a linear map). The rank of a linear map ϕ is defined as

rk(ϕ) := dim(im(ϕ)) .

Definition 16.3.5 (Nullity). The nullity of a linear map ϕ is defined as

nullity(ϕ) := dim(ker(ϕ)) .

Exercise 16.3.6. Let ϕ : V → W be a linear map.


16.4. LINEAR TRANSFORMATIONS 167

(a) Show that dim(im ϕ) ≤ dim V .

(b) Use this to reprove rk(AB) ≤ min{rk(A), rk(B)}.


Theorem 16.3.7 (Rank-Nullity Theorem). Let ϕ : V → W be a linear map. Then

rk(ϕ) + nullity(ϕ) = dim(V ) . (16.8)


Exercise 16.3.8. Let n = k + `. Find a linear map ϕ : Rn → Rn which has rank k, and therefore
has nullity `.
Examples 16.3.9. Find the rank and nullity of each of the linear maps in Examples 16.1.6-16.1.11.

16.4 Linear transformations


Let V be a vector space over the field F.
In Section 16.1, we defined the notion of a linear map. We now restrict ourselves to those maps
which map a vector space into itself.
Definition 16.4.1 (Linear transformation). If ϕ : V → V is a linear map, then ϕ is said to be a
linear transformation or a linear operator.
Definition 16.4.2 (Identity transformation). For any vector space V , identity transformation is the
transformation id : V → V defined by id(v) = v for all v ∈ V .
Definition 16.4.3 (Scalar transformation). For any vector space V over F, the scalar transformations
of V are the transformations of the form α id for α ∈ F.
Examples 16.4.4. The following are linear transformations of the 2-dimensional geometric space
G2 (verify!).
(a) Rotation about the origin by an angle θ

(b) Reflection about any line passing through the origin

(c) Shearing parallel to a line (tilting the deck)

(d) Scaling by α in one direction


168 CHAPTER 16. LINEAR MAPS

FIGURES

Examples 16.4.5. The following are linear transformations of the 3-dimensional geometric space
G3 (verify!).

(a) Rotation by θ about any line

(b) Projection into any plane

(c) Reflection through any plane passing through the origin

(d) Central reflection

Example 16.4.6. Let Sα : RR be the left shift by α operator, defined by Sα (f ) = g where

g(t) := f (t + α) . (16.9)

Sα is a linear transformation (verify!).

Examples 16.4.7. The following are linear transformations of Pn [F] (verify!).


d
(a) Differentiation dt

(b) Left shift by α, f 7→ Sα (f )

(c) Multiplication by t, followed by differentiation, i. e., the map f 7→ (tf )0

Examples 16.4.8. The following are linear transformations of RN (verify!).

(a) Left shift,


(α0 , α1 , α2 , . . . ) 7→ (α1 , α2 , α3 , . . . )

(b) Difference, (α0 , α1 , α2 , . . . ) 7→


(α1 − α0 , α2 − α1 , α3 − α2 , . . . )

Example 16.4.9. Let V = span(sin t, cos t). Differentiation is a linear transformation of V (verify!).
16.4. LINEAR TRANSFORMATIONS 169

Exercise 16.4.10. The difference operator ∆ : Pk [F] → Pk [F], defined by

∆f := Sα (f ) − f

is linear, and deg(∆f ) = deg(f ) − 1 if deg(f ) ≥ 2.

In Chapter 8, we discussed the notion of eigenvectors and eigenvalues of square matrices. This
is easily generalized to eigenvectors and eigenvalues of linear transformations.
Definition 16.4.11 (Eigenvector). Let ϕ : V → V be a linear transformation. Then v ∈ V is an
eigenvector of ϕ if v 6= 0 and there exists λ ∈ F such that ϕ(v) = λv.
Definition 16.4.12 (Eigenvalue). Let ϕ : V → V be a linear transformation. Then λ ∈ F is an
eigenvalue of ϕ if there exists a nonzero vector v ∈ V such that ϕ(v) = λv.

R
It is easy to see that eigenvectors and eigenvalues as we defined them for square matrices are
just a special case of these definitions; in particular, if A is a square matrix, then its eigenvectors
and eigenvalues are the same as those of ϕA ( Example 16.1.5), the map which takes the column
vector x to Ax.

Proposition 16.4.13. Let ϕ : V → V be a linear transformation and let v1 , . . . , vk be eigenvectors


to distinct eigenvalues. Then the vi are linearly independent.

Proposition 16.4.14. Let ϕ : V → V be a linear transformation and let v1 and v2 be eigenvectors


to distinct eigenvalues. Then v1 + v2 is not an eigenvector.

Proposition 16.4.15. Let ϕ : V → V be a linear transformation to which every nonzero vector is


an eigenvector. Then ϕ is a scalar transformation.

Definition 16.4.16 (Eigenspace). Let V be a vector space and let ϕ : V → V be a linear transfor-
mation. We denote by Uλ the set

Uλ := {v ∈ V | ϕ(v) = λv} . (16.10)

This set is called the eigenspace corresponding to the eigenvalue λ.

Exercise 16.4.17. Let V be a vector space over F and let ϕ : V → V be a linear transformation.
Show that, for all λ ∈ F, Uλ is a subspace of V .
170 CHAPTER 16. LINEAR MAPS

Proposition 16.4.18. Let ϕ : V → V be a linear transformation. Then


X M
Uλ = Uλ . (16.11)

R
λ λ

where ⊕ represents the direct sum ( Def. 15.5.1).

Examples 16.4.19. Determine the rank, nullity, eigenvalues (and their geometric multiplicities),
and eigenvectors of each of the transformations in Examples 16.4.4-16.4.10.

Definition 16.4.20 (Eigenbasis). Let ϕ : V → V be a linear transformation. An eigenbasis of ϕ is a


basis of V consisting of eigenvectors of ϕ.

16.4.1 Invariant subspaces


Definition 16.4.21 (Invariant subspace). Let ϕ : V → V and let W ≤ V . Then W is a ϕ-invariant
subspace of V if, for all w ∈ W , we have ϕ(w) ∈ W . For any linear transformation ϕ, the trivial
invariant subspace is the subspace {0}.

Exercise 16.4.22. Let ϕ : G3 → G3 be a rotation about the vertical axis through the origin. What
are the ϕ-invariant subspaces?

Exercise 16.4.23. Let π : G3 → G3 be the projection onto the horizontal plane. What are the
π-invariant subspaces?

Exercise 16.4.24.

(a) What are the invariant subspaces of id?

(b) What are the invariant subspaces of the 0 transformation?

Exercise 16.4.25. Let ϕ : V → V be a linear transformation and let λ be an eigenvalue of ϕ.


What are the invariant subspaces of ϕ + λ id ?

Exercise 16.4.26. Over every field F, find an infinite-dimensional vector space V and linear trans-
formation ϕ : V → V that has no finite-dimensional invariant subspaces other than {0}.

Proposition 16.4.27. Let ϕ : V → V be a linear transformation. Then ker ϕ and im ϕ are


invariant subspaces.
16.4. LINEAR TRANSFORMATIONS 171

Proposition 16.4.28. Let ϕ : V → V be a linear transformations and let W1 , W2 ≤ V be ϕ-


invariant subspaces. Then
(a) W1 + W2 is a ϕ-invariant subspace;

(b) W1 ∩ W2 is a ϕ-invariant subspace.


Definition 16.4.29 (Restriction to a subspace). Let ϕ : V → V be a linear map and let W ≤ V be a
ϕ-invariant subspace of V . The restriction of ϕ to W , denoted ϕW , is the linear map ϕW : W → W
defined by ϕW (w) = ϕ(w) for all w ∈ W .
Proposition 16.4.30. Let ϕ : V → V be a linear transformation and let W ≤ V be a ϕ-invariant
subspace. Let f ∈ F[t]. Then
f (ϕW ) = f (ϕ)W . (16.12)
Proposition 16.4.31. Let ϕ : V → V be a linear transformation. The following are equivalent.
(a) ϕ is a scalar transformation;

(b) all subspaces of V are ϕ-invariant;

R
(c) all 1-dimensional subspaces of V are ϕ-invariant;

R
(d) all hyperplanes ( Def. 5.2.1) are ϕ-invariant.
Exercise 16.4.32. Let S be the shift operator ( Example 16.4.8 (a)) on the space RN of
sequences of real numbers, defined by

S(α0 , α1 , α2 , . . . ) := (α1 , α2 , α3 , . . . ) . (16.13)

(a) In Ex. 15.4.15, we defined the space of Fibonacci-like sequences. Show that this is an S-
invariant subspace of RN .

(b) Find an eigenbasis of S in this subspace.

(c) Use the result of part (b) to find an explicit formula for the n-th Fibonacci number.
Definition 16.4.33 (Minimal invariant subspace). Let ϕ : V → V be a linear transformation. Then
U ≤ V is a minimal invariant subspace of ϕ if the only invariant subspace properly contained in U
is {0}.
172 CHAPTER 16. LINEAR MAPS
 
0 0 ··· 0 1
1
 0 · · · 0 0 
Exercise 16.4.34. Let ρ = 0
 1 · · · 0 0  be an n × n matrix.
 .. .. . . .. .. 
. . . . .
0 0 ··· 1 0
(a) Count the invariant subspaces of ρ over C

(b) Find a 2-dimensional minimal invariant subspace of ρ over R

(c) Count the invariant subspaces of ρ over Q if n is prime


Definition 16.4.35. Let ϕ : V → V be a linear transformation. Then the n-th power of ϕ is defined
as
ϕn := ϕ ◦ · · · ◦ ϕ . (16.14)
| {z }
n times

Definition 16.4.36. Let ϕ : V → V be a linear transformation and let f ∈ F[t], say

f = α0 + α1 t + · · · + αn tn .

Then we define f (ϕ) by


f (ϕ) := α0 id +α1 ϕ + · · · + αn ϕn . (16.15)
In particular,
f (ϕ)(v) = α0 v + α1 ϕ(vv) + · · · + αn ϕn (v) . (16.16)
Exercise 16.4.37. Let ϕ : V → V be a linear transformation and let f ∈ F[t]. Verify that f (ϕ) is
a linear transformation of V .
Definition 16.4.38 (Chain of invariant subspaces). Let ϕ : V → V be a linear transformation, and
let U1 , . . . , Uk ≤ V be ϕ-invariant subspaces. We say that the Ui form a chain if whenever i 6= j,
either Ui ≤ Uj or Uj ≤ Ui .
Definition 16.4.39 (Maximal chain). Let ϕ : V → V be a linear transformation and let U1 , . . . , Uk ≤
V be a chain of ϕ-invariant subspaces. We say that this chain is maximal if, for all W ≤ V (W 6= Ui

R
for all i), the Ui together with W do not form a chain.
Exercise 16.4.40. Let dtd : Pn (R) → Pn (R) be the derivative linear transformation ( Def.
14.5.68) of the space of real polynomials of degree at most n.
16.4. LINEAR TRANSFORMATIONS 173

d
(a) Prove that the number of dt
-invariant subspaces is n + 2;

(b) give a very simple description of each;

(c) prove that they form a maximal chain.

♥ Exercise 16.4.41. Let V = Fp Fp , and let ϕ be the shift-by-1 operator, i. e., ϕ(f )(t) = f (t + 1).
What are the invariant subspaces of ϕ? Prove that they form a maximal chain.

Proposition 16.4.42. Let ϕ : V → V be a linear transformation, and let f ∈ F[t]. Then ker f (ϕ)
and im f (ϕ) are invariant subspaces.

The next exercise shows that invariant subspaces generalize the notion of eigenvectors.

Exercise 16.4.43. Let v ∈ V be a nonzero vector and let ϕ : V → V be a linear transformation.


Then v is an eigenvector of ϕ if and only if span(v) is ϕ-invariant.

Proposition 16.4.44. Let V be a vector space over R and let ϕ : V → V be a linear transformation.
Then ϕ has an invariant subspace of dimension at most 2.

Proposition 16.4.45. Let V be an n-dimensional vector space and let ϕ : V → V be a linear


transformation. The following are equivalent.

(a) There exists a maximal chain of subspaces, all of which are invariant.

(b) There is a basis b of V such that [ϕ]b is triangular.

Proposition 16.4.46. Let V be a finite-dimensional vector space with basis b and let ϕ : V → V
be a linear transformation.

(a) Let b a basis of V . Then [ϕ]b is triangular if and only if every initial segment of b spans a
ϕ-invariant subspace of V .

(b) If such a basis exists, then such an orthonormal basis exists.

Proposition 16.4.47. Let V be a finite-dimensional vector space with basis b and let ϕ : V → V

R
be a linear transformation. Then [ϕ]b is diagonal if and only if b is an eigenbasis of ϕ.

Theorem ( R
Exercise 16.4.48. Infer Schur’s Theorem ( Theorem 12.4.9) and the real version of Schur’s
Theorem 12.4.18) from the preceding exercise.
174 CHAPTER 16. LINEAR MAPS

R
16.5 Coordinatization
In Section 15.3, we defined coordinatization of a vector with respect to some basis ( Def.
15.3.34). We now extend this to the notion of coordinatization of linear maps.
Definition 16.5.1 (Coordinatization). Let V be an n-dimensional vector space with basis e =
(e1 , . . . , en ), and let W be an m-dimensional vector space with basis f = (f 1 , . . . , f m ). Let αij
m
X
(1 ≤ i ≤ m, 1 ≤ j ≤ n) be coefficients such that ϕ(ej ) = αij f i . Then the matrix representation
i=1
or coordinatization of ϕ with respect to the bases e and f is the m × n matrix
 
α11 · · · α1n
[ϕ]e,f :=  ... .. ..  (16.17)

. . 
αm1 · · · αmn

If ϕ : V → V is a linear transformation, then we write [ϕ]e instead of [ϕ]e,e .


So the j-th column of [ϕ]e,f is [ϕ(ej )]f , the coordinate vector of the image of the j-th basis
vector of V in the basis f of W .
Example 16.5.2. Write the matrix representation of each of the linear transformations in Ex.
16.4.4 in a basis consisting of perpendicular unit vectors.
Example 16.5.3.
(a) Compute the matrix of the “rotation by θ” transformation with respect to the basis of two

R
unit vectors at an angle θ.

(b) Compare the trace and determinant ( Def. 6.3.1) of the matrix corresponding to the same
linear transformation in the basis of the preceding exercise (two perpendicular unit vectors).
Example 16.5.4. Write the matrix representation of each of the linear transformations in Ex.
16.4.5 in the basis consisting of three mutually perpendicular unit vectors.
Example 16.5.5. Write the matrix representation of each of the linear transformations in Ex.
16.4.7 in the basis (1, t, t2 , . . . , tn ) of the polynomial space Pn (F).
The next exercise demonstrates that under our rules of coordinatization, the action of a linear
map corresponds to multiplying a column vector by a matrix.
16.5. COORDINATIZATION 175

Proposition 16.5.6. For any v ∈ V , if ϕ : V → W is a linear map, then

[ϕ(v)]f = [ϕ]e,f [v]e

Moreover, coordinatization treats eigenvectors the way we would expect.


Proposition 16.5.7. Let V be a vector space with basis b and let ϕ : V → V be a linear transfor-
mation. Write A = [ϕ]b . Then
(a) A and ϕ have the same eigenvalues

(b) v ∈ V is an eigenvector of ϕ with eigenvalue λ if and only if [v]b is an eigenvector of A with

R
eigenvalue λ
The next exercise shows that under our coordinatization, composition of linear maps ( Def.
16.1.14) corresponds to matrix multiplication. This gives a natural explanation of why we multiply
matrices the way we do, and why this operation is associative.
Proposition 16.5.8. Let U , V , and W be vector spaces with bases e, f , and g, respectively, and
let ϕ : U → V and ψ : V → W be linear maps. Then

[ψϕ]e,g = [ψ]f ,g [ϕ]e,f (16.18)

Exercise 16.5.9. Explain the comment before the preceding exercise.


Exercise 16.5.10. Let V and W be vector spaces, with dim V = n and dim W = k. Infer from
Prop. 16.5.8 that coordinatization is an isomorphism between Hom(V, W ) and Fk×n .
Exercise 16.5.11. Let ρθ be the “rotation by θ” linear map in G2 .
(a) Show that, for α, β ∈ R, ρα+β = ρα ◦ ρβ .

(b) Use matrix multiplication to derive the addition formulas for sin and cos.
Example 16.5.12. Let V be an n-dimensional vector space with basis b, and let A ∈ Fk×n . Define
ϕ : V → Fk by x 7→ A[x]b .
(a) Show that ϕ is a linear map.

(b) Show that rk ϕA = rk A


176 CHAPTER 16. LINEAR MAPS

Definition 16.5.13. Let V be a finite-dimensional vector space over R and let b be a basis of V . Let
ϕ : V → V be a nonsingular linear transformation. We say that ϕ is sense-preserving if det[ϕ]b > 0
and ϕ is sense-reversing if det[ϕ]b < 0.

Proposition 16.5.14.

(a) The sense-preserving linear transformations of the plane are rotations.

(b) The sense-reversing transformations of the plane are reflections about a line through the origin.

Exercise 16.5.15. What linear transformations of G2 fix the origin?

Definition 16.5.16 (Rotational reflection). A rotational reflection of G3 is a rotation followed by a


central reflection.

Proposition 16.5.17.

(a) The sense-preserving linear transformations of 3-dimensional space are rotations about an
axis.

(b) The sense-reversing linear transformations of 3-dimensional space are rotational reflections.

16.6 Change of basis


We now have the equipment necessary to discuss change of basis transformations and the result of
change of basis on the matrix representation of linear maps.
Definition 16.6.1 (Change of basis transformation). Let V be a vector space with bases e =
(e1 , . . . , en ) and e0 = (e01 , . . . , e0n ). Then the change of basis transformation (from e to e0 ) is
the linear transformation σ : V → V given by σ(ei ) = e0i , for 1 ≤ i ≤ n.

Proposition 16.6.2. Let V be a vector space with bases e and e0 , and let σ : V → V be the change
of basis transformation from e to e0 . Then [σ]e = [σ]e0 .

For this reason, we often denote the matrix representation of the change of basis transformation
σ by [σ] rather than by, e. g., [σ]e .

Fact 16.6.3. Let σ : V → V be a change of basis transformation. Then [σ] is invertible.


16.6. CHANGE OF BASIS 177

Notation 16.6.4. Let V be a vector space with bases e and e0 . When changing basis from e to e0 , we
sometimes refer to e as the “old” basis and e0 as the “new” basis. So if v ∈ V is a vector, we often
write [v]old in place of ve and [v]new in place of [v]e0 . Likewise, if W is a vector space with bases f
and f 0 and we change bases from f to f 0 , we consider f the “old” basis and f 0 the “new” basis. So
if ϕ : V → W is a linear map, we write [ϕ]old in place of [ϕ]e,f and [ϕ]new in place of [ϕ]e0 ,f 0 .
Proposition 16.6.5. Let v ∈ V and let e and e0 be bases of V . Let σ be the change of basis
transformation from e to e0 . Then
[v]new = [σ]−1 [v]old . (16.19)
Numerical exercise 16.6.6. For each of the following vector spaces V , compute the change of
basis matrix from e to e0 . Self-check : pick some v ∈ V , determine [v]e and [v]e0 , and verify that
Equation (16.19) holds.
(a) V = G2 , e = (e1 , e2 ) is two perpendicular unit vectors, and e0 = (e1 , e02 ), where e02 is e1 rotated
by θ

(b) V = P2 [F], e = (1, t, t2 ), e0 = (t2 , (t + 1)2 , (t + 2)2 )


           
1 2 0 1 −1 0
(c) V = F3 , e = 1 ,  0  , 1, e0 = 0 ,  1  , 1
1 −1 2 1 1 1
Just as coordinates change with respect to different bases, so do the matrix representations of
linear maps.
Proposition 16.6.7. Let V and W be finite dimensional vector spaces, let e and e0 be bases of V ,
let f and f 0 be bases of W , and let ϕ : V → W be a linear map. Then

[ϕ]new = T −1 [ϕ]old S (16.20)

where S is the change of basis matrix from e to e0 and T is the change of basis matrix from f to
f 0.
Proposition 16.6.8. Let N be a nilpotent matrix. Then the linear transformation defined by
x 7→ N x has a chain of invariant subspaces.
Proposition 16.6.9. Every matrix A ∈ Mn (C) is similar to a triangular matrix.
Chapter 17

(F) Block Matrices (optional)

17.1 Block matrix basics


Definition 17.1.1 (Block matrix). We sometimes write matrices as block matrices, where each entry
actually represents a matrix.

 
1 2 −3 4
Example 17.1.2. The matrix A =  2 1 6 −1  may be symbolically written as the
  0 −3 −1 2  
A11 A12 1 2 −3 4
block matrix A = where A11 = , A12 = , A21 = (0, −3), and
A21 A22 2 1 6 −1
A22 = (−1, 2).

Definition 17.1.3 (Block-diagonal matrix).


 A square matrix
 A is said to be a block-diagonal matrix

A1
A2
0 
if it can be written in the form A =   where the diagonal blocks Ai are square
 
...

0 Ak

matrices. In this case, we say that A is the diagonal sum of the matrices A1 , . . . , Ak .

Babai: Discover Linear Algebra. 178 This chapter last updated August 23, 2016
c 2016 László Babai.
17.1. BLOCK MATRIX BASICS 179

Example 17.1.4. The matrix

 
1 2 0 0 0 0

 −3 7 0 0 0 0 

 0 0 6 0 0 0 
A= 

 0 0 0 4 6 0 

 0 0 0 2 −1 3 
0 0 0 3 2 5

 
A1 0 0  
1 2 
is a block-diagonal matrix. We may write A =  0 A2 0  where A1 = , A2 = 6 ,
−3 7
  0 0 A3
4 6 0
and A3 = 2 −1
 3 .
3 2 5

Note that the blocks need not be of the same size.

Definition 17.1.5 (Block-triangular matrix). A 


square matrix A is said
 to be a block-triangular
A11 A12 · · · A1k
 A22 · · · A2k 
matrix if it can be written in the form A =  ..  for blocks Aij , where the
 
..
.

0 . 
Akk
diagonal blocks Aii are square matrices.

Exercise 17.1.6. Show that block matrices multiply in the same way as matrices. More precisely,
suppose that A = (Aij ) and B = (Bjk ) are block matrices where Aij ∈ Fri ×sj and Bjk ∈ Fsj ×tk . Let
C = AB (why is this product defined?). Show that C = (Cik ) where Cik ∈ Fri ×tk and

X
Cjk = Aij Bjk . (17.1)
j
180 CHAPTER 17. BLOCK MATRICES (OPTIONAL)

17.2 Arithmetic of block-diagonal and block-triangular ma-


trices
Proposition 17.2.1. Let A = diag(A1 , . . . , An ) and B = diag(B1 , . . . , Bn ) be block-diagonal ma-
trices with blocks of the same size and let λ ∈ F. Then

A + B = diag(A1 + B1 , . . . , An + Bn ) (17.2)
λA = diag(λA1 , . . . , λAn ) (17.3)
AB = diag(A1 B1 , . . . , An Bn ) . (17.4)

Proposition 17.2.2. Let A = diag(A1 , . . . , An ) be a block-diagonal matrix. Then Ak = diag Ak1 , . . . , Akn
for all k.

Proposition 17.2.3. Let f ∈ F[t] be a polynomial and let A = diag(A1 , . . . , An ) be a block-diagonal


matrix. Then
f (A) = diag(f (A1 ), . . . , f (An )) . (17.5)

In our discussion of the arithmetic block-triangular matrices, we are interested only in the block-
diagonal entries.

Proposition 17.2.4. Let


∗ 
 
A1
 A2
A=

.. 
.

0 An

and
∗ 
 
B1
 B2
B=

... 

0 Bn

17.2. ARITHMETIC OF BLOCK-DIAGONAL AND BLOCK-TRIANGULAR MATRICES 181

be block-upper triangular matrices with blocks of the same size and let λ ∈ F. Then


 
A1 + B1
 A2 + B2 
A+B = (17.6)
 
. .. 

0 An + Bn


 
λA1
 λA2 
λA =  (17.7)
 
.. 
.

0 λAn


 
A1 B1
 A2 B2 
AB =   . (17.8)
 
...

0 An Bn

Proposition 17.2.5. Let A be as in Prop. 17.2.4. Then

∗ 
 
Ak1
k
 Ak2
A = (17.9)

.. 
.

0 Akn

for all k.

Proposition 17.2.6. Let f ∈ F[t] be a polynomial and let A be as in Prop. 17.2.4. Then


 
f (A1 )
 f (A2 ) 
f (A) =   . (17.10)
 
...

0 f (An )

Chapter 18

(F) Minimal Polynomials of Matrices and


Linear Transformations (optional)

18.1 The minimal polynomial


All matrices in this section are square.
In Section 8.5, we defined what it means to plug a matrix into a polynomial ( R Def. 2.3.3).
Definition 18.1.1 (Annihilation). Let f ∈ F[t]. We say that f annihilates the matrix A ∈ Mn (F) if
f (A) = 0.
Exercise 18.1.2. Let A be a matrix. Prove, without using the Cayley-Hamilton Theorem, that
for every matrix there is a nonzero polynomial that annihilates it, i. e., for every matrix A there is
a nonzero polynomial f such that f (A) = 0. Show such a polynomial of degree at most n2 exists.
Definition 18.1.3 (Minimal polynomial). Let A ∈ Mn (F). The polynomial f ∈ F[t] is a minimal
polynomial of A if f is a nonzero polynomial of lowest degree that annihilates A.
Example 18.1.4. The polynomial t − 1 is a minimal polynomial of the n × n identity matrix.
Exercise 18.1.5. Let A be the diagonal matrix diag(3, 3, 7, 7, 7), i. e.,
 


3
 3 0


A=  7  .


0 7 
7

Babai: Discover Linear Algebra. 182 This chapter last updated August 16, 2016
c 2016 László Babai.
18.1. THE MINIMAL POLYNOMIAL 183

R
Find a minimal polynomial for A. Compare your answer to the characteristic polynomial fA . Recall
( Prop. 2.3.4) that for all f ∈ F[t], we have
f (diag(λ1 , . . . , λn )) = diag(f (λ1 ), . . . , f (λn )) .
Proposition 18.1.6. Let A ∈ Mn (F) and let m be a minimal polynomial of A. Then for all
g ∈ F[t], we have g(A) = 0 if and only if m | g.
Corollary 18.1.7. Let A ∈ Mn (F). Then the minimal polynomial of A is unique up to nonzero
scalar factors.
Convention 18.1.8. When discussing “the” minimal polynomial of a matrix, we refer to the unique
monic minimal polynomial, denoted mA .
Corollary 18.1.9 (Cayley-Hamilton restated). The minimal polynomial of a matrix divides its
characteristic polynomial.
Corollary 18.1.10. Let A ∈ Mn (F). Then deg mA ≤ n.
Example 18.1.11. mI = t − 1 .
 
1 1
Exercise 18.1.12. Let A = . Prove mA = (t − 1)2 .
0 1
Exercise 18.1.13. Find two 2 × 2 matrices with the same characteristic polynomial but different
minimal polynomials.
Exercise 18.1.14. Let
A = diag(λ1 , . . . , λ1 , λ2 , . . . , λ2 , . . . , λk , . . . , λk )
where the λi are distinct. Prove that
k
Y
mA = (t − λi ) . (18.1)
i=1
 

A1
A2
0 
Exercise 18.1.15. Let A be a block-diagonal matrix, say A =  . Give a simple

...

0 Ak

expression of mA in terms of the mAi . mA = lcm mAi .


i
184 CHAPTER 18. MINIMAL POLYNOMIALS

Proposition 18.1.16. Let A ∈ Mn (F). Then λ is an eigenvalue of A if and only if mA (λ) = 0.

Exercise 18.1.17. Prove: similar matrices have the same minimal polynomial, i. e., if A ∼ B then
mA = mB .

R
Proposition 18.1.18. Let A ∈ Mn (C). If mA does not have multiple roots then A is diagonalizable.

In fact, this condition is necessary and sufficient ( Theorem 18.2.20).

18.2 Minimal polynomials of linear transformations

R
In this section, V is an n-dimensional vector space over F.
In Section 16.4.1 we defined what it means to plug a linear transformation into a polynomial
( Def. 16.4.36).

Exercise 18.2.1. Define what it means for the polynomial f ∈ F[t] to annihilate the linear trans-
formation ϕ : V → V .

Exercise 18.2.2. Let ϕ : V → V be a linear transformation. Prove that there is a nonzero


polynomial that annihilates ϕ.

Exercise 18.2.3. Define a minimal polynomial of a linear transformation ϕ : V → V .

Example 18.2.4. The polynomial t − 1 is a minimal polynomial of the identity transformation.

Proposition 18.2.5. Let ϕ : V → V be a linear transformation and let m be a minimal polynomial


of ϕ. Then for all g ∈ F[t], we have g(ϕ) = 0 if and only if m | g.

Corollary 18.2.6. Let ϕ : V → V be a linear transformation. Then the minimal polynomial of ϕ


is unique up to nonzero scalar factors.

Convention 18.2.7. When discussing “the” minimal polynomial of a linear transformation, we shall
refer to the unique monic minimal polynomial, denoted mϕ .

Proposition 18.2.8. Let ϕ : V → V be a linear transformation. Then deg mϕ ≤ n.

Example 18.2.9. mid = t − 1.


18.2. MINIMAL POLYNOMIALS OF LINEAR TRANSFORMATIONS 185

Exercise 18.2.10. Let ϕ : V → V be a linear transformation with an eigenbasis. Prove: all


roots of mϕ belong to F and mϕ has no multiple roots. Let b1 , . . . , bn be an eigenbasis, and let
ϕ(bi ) = λi bi for each i. Find mϕ in terms of the λi .

Proposition 18.2.11. Let ϕ : V → V be a linear transformation. Then λ is an eigenvalue of ϕ if


and only if mϕ (λ) = 0.

Exercise 18.2.12 (Consistency of translation).

(a) Let b be a basis of V and let ϕ : V → V be a linear transformation. Let A = [ϕ]b . Show that
mϕ = mA .

(b) Use this to give a second proof of Ex. 18.1.17 (similar matrices have the same minimal

R
polynomial).

Recall the definition of an invariant subspace of the linear transformation ϕ ( Def. 16.4.21).
Definition 18.2.13 (Minimal invariant subspace). Let ϕ : V → V and let W ≤ V . We say that W is
a minimal ϕ-invariant subspace if W is a ϕ-invariant subspace, W 6= {0}, and the only ϕ-invariant
subspaces of W are W and {0}.

Lemma 18.2.14. Let ϕ : V → V be a linear transformation and let W ≤ V be a ϕ-invariant


subspace . Let ϕW denote the restriction of ϕ to W . Then mϕW | mϕ . ♦

denotes their direct sum (R


Proposition 18.2.15. Let V = W1 ⊕ · · · ⊕ Wk where the Wi are ϕ-invariant subspaces and ⊕
Def. 15.5.1). If mi denotes the minimal polynomial of ϕWi then

mϕ = lcm mi . (18.2)
i

(a) Prove this without using Ex. 18.1.15, i. e., without translating the problem to matrices.

(b) Prove this using Ex. 18.1.15.

Theorem 18.2.16. Let V be a finite-dimensional vector space and let ϕ : V → V be a linear


transformation. If mϕ has an irreducible factor f of degree d, then there is a d-dimensional minimal
invariant subspace W such that
mϕW = fϕW = f . (18.3)

186 CHAPTER 18. MINIMAL POLYNOMIALS

Corollary 18.2.17. Let V be a finite-dimensional vector space over R and let ϕ : V → V be a


linear transformation. Then V has a ϕ-invariant subspace of dimension 1 or 2.

Proposition 18.2.18. Let ϕ : V → V be a linear transformation. Let f | mϕ and let W = ker f (ϕ).
Then

(a) mϕW | f ;
 
(b) if gcd f, mfϕ = 1, then mϕW = f .

(a) f (ϕW ) = f (ϕ)W = 0.

(b) mϕ = lcm(mϕW , mϕW 0 ) where W 0 = ker mfϕ (ϕ), i. e., V = W ⊕ W 0 .

Proposition 18.2.19. Let ϕ : V → V be a linear transformation and let f, g ∈ F[t] with mϕ = f · g


and gcd(f, g) = 1. Then
Fn = (ker f (ϕ)) ⊕ (ker g(ϕ)) (18.4)

Theorem 18.2.20. Let A ∈ Mn (C). Then A is diagonalizable if and only if mA does not have
multiple roots over C. ♦

Theorem 18.2.21. Let F = C and let ϕ : V → V be a linear transformation. Then ϕ has an


eigenbasis if and only if mϕ does not have multiple roots. ♦

The next theorem represents a step toward canonical forms of matrices.

Theorem 18.2.22. Let A ∈ Mn (F). Then there is a matrix B ∈ Mn (F) such that A ∼ B and B is
the diagonal sum of matrices whose minimal polynomials are powers of irreducible polynomials. ♦
Chapter 19

Euclidean Spaces

19.1 Inner products


Let V be a vector space over R.
In Section 1.4, we introduced the standard dot product of Rn ( R
generalize this to the notion of an inner product over a real vector space V .
Def. 1.4.1). We now

Definition 19.1.1 (Euclidean space). A Euclidean space V is a vector space over R endowed with
an inner product h·, ·i : V × V → R which is positive definite, symmetric, and bilinear. That is, for
all u, v, w ∈ V and α ∈ R, we have

(a) hv, vi ≥ 0, with equality holding if and only if v = 0 (positive definite)

(b) hv, wi = hw, vi (symmetric)

(c) hv, w + αui = hv, wi + αhv, ui and hv + αu, wi = hv, wi + αhu, wi (bilinear)

Observe that the standard dot product of Rn that was introduced in Section 1.4 has all of these
properties and, in particular, the vector space Rn endowed with this inner product is a Euclidean
space.

Exercise 19.1.2. Let V be a Euclidean space with inner product h·, ·i. Show that for all v ∈ V ,
we have
hv, 0i = h0, vi = 0 . (19.1)

Babai: Discover Linear Algebra. 187 This chapter last updated August 21, 2016
c 2016 László Babai.
188 CHAPTER 19. EUCLIDEAN SPACES

Examples 19.1.3. The following vector spaces together with the specified inner products are
Euclidean spaces (verify this).
R∞
(a) V = R[t], hf, gi = −∞ f (t)g(t)dt where ρ(t) is a nonnegative continuous function which is
R∞
not identically 0 and has the property that −∞ ρ(t)t2n dt < ∞ for all nonnegative integers n
(such a function ρ is called a weight function)
R1
(b) V = C[0, 1] (the space of continuous functions f : [0, 1] → R) with hf, gi = 0 f (t)g(t)dt

R

(c) V = Rk×n , hA, Bi = Tr AB T
(d) V = Rn , and hx, yi = xT Ay where A ∈ Mn (R) is a symmetric positive definite ( Def.
10.2.3) n × n real matrix
Notice that the same vector space can be endowed with different inner products (for example,
different weight functions for the inner product on R[t]), so that there are many Euclidean spaces
with the same underlying vector space.
Because they have inner products, Euclidean spaces carry with them the notion of distance
(“norm”) and the notion of two vectors being perpendicular (“orthogonality”). Just as inner prod-
ucts generalize the standard dot product in Rn , these concepts generalize the definitions of norm
and orthogonality presented for Rn (with respect to the standard dot product) in Section 1.4.
Definition 19.1.4 (Norm). Let V be a Euclidean space, and let v ∈ V . Then the norm of v, denoted
kvk, is p
kvk := hv, vi . (19.2)
The notion of a norm allows us to easily define the distance between two vectors.
Definition 19.1.5. Let V be a Euclidean space, and let v, w ∈ V . Then the distance between the
vectors v and w, denoted d(v, w), is
d(v, w) := kv − wk . (19.3)
The following two theorems show that distance in Euclidean spaces behaves the way we are used
to it behaving in Rn .
Theorem 19.1.6 (Cauchy-Schwarz inequality). Let V be a Euclidean space, and let v, w ∈ V .
Then
|hv, wi| ≤ kvk · kwk . (19.4)

19.1. INNER PRODUCTS 189

Theorem 19.1.7 (Triangle inequality). Let V be a Euclidean space, and let v, w ∈ V . Then

kv + wk ≤ kvk + kwk . (19.5)

Exercise 19.1.8. Show that the triangle inequality is equivalent to the Cauchy-Schwarz inequality.

Definition 19.1.9 (Angle between vectors). Let V be a Euclidean space, and let v, w ∈ V . The
angle θ between v and w is defined by

hv, wi
θ := arccos . (19.6)
kvkkwk

Observe that from the Cauchy-Schwarz inequality, we have (for v, w 6= 0) that

hv, wi
−1≤ ≤1 (19.7)
kvkkwk

so this definition is valid, as this term is always in the domain of arccos.


Definition 19.1.10 (Orthogonality). Let V be a Euclidean space, and let v, w ∈ V . Then v and
w are orthogonal (denoted v ⊥ w) if hv, wi = 0. A set of vectors S = {v1 , . . . , vn } is said to be
orthogonal if vi ⊥ vj whenever i 6= j.
Observe that two vectors are orthogonal when the angle between them is π2 . This agrees with
the notion of orthogonality being a generalization of perpendicularity.

Exercise 19.1.11. Let V be a Euclidean space. What vectors are orthogonal to every vector?

Exercise 19.1.12. Let V = C[0, 2π] be the space of continuous functions f : [0, 2π] → R, endowed
with the inner product Z 2π
hf, gi = f (t)g(t)dt . (19.8)
0

Show that the set {1, cos t, sin t, cos(2t), sin(2t), cos(3t), . . . } is an orthogonal set in this Euclidean
space.

Definition 19.1.13 (Orthogonal system). An orthogonal system in a Euclidean space V is a list of


(pairwise) orthogonal nonzero vectors in V .
190 CHAPTER 19. EUCLIDEAN SPACES

Proposition 19.1.14. Every orthogonal system in a Euclidean space is linearly independent.

Definition 19.1.15 (Gram matrix). Let V be a Euclidean space, and let v1 , . . . , vk ∈ V . The Gram
matrix of v1 , . . . , vk is the k × k matrix whose (i, j) entry is hvi , vj i, that is,

G = G(v1 , . . . , vk ) := (hvi , vj i)ki,j=1 . (19.9)

Exercise 19.1.16. Let V be a Euclidean space. Show that the vectors v1 , . . . , vk ∈ V are linearly
independent if and only if det G(v1 , . . . , vk ) 6= 0.

Exercise 19.1.17. Let V be a Euclidean space and let v1 , . . . , vk ∈ V . Show

rk(v1 , . . . , vk ) = rk(G(v1 , . . . , vk )) . (19.10)

Definition 19.1.18 (Orthonormal system). An orthonormal system in a Euclidean space V is a list


of (pairwise) orthogonal vectors in V , all of which have unit norm. So (v1 , v2 , . . . ) is an orthonormal
system if hvi , vj i = δij for all i, j.
In the case of finite-dimensional Euclidean spaces, we are particularly interested in orthonormal
bases.
Definition 19.1.19 (Orthonormal basis). Let V be a Euclidean space. A list (v1 , . . . , vn ) is an
orthonormal basis (ONB) of V if it is a basis and {v1 , . . . , vn } is an orthonormal set.
In the next section, not only will we show that every finite-dimensional Euclidean space has an
orthonormal basis, but we will, moreover, demonstrate an algorithm for transforming any basis into
an orthonormal basis.

Proposition 19.1.20. Let V be a Euclidean space with orthonormal basis b. Then for all v, w ∈ V ,

R
hv, wi = [v]Tb [w]b . (19.11)

Proposition 19.1.21. Let V be a Euclidean space. Every linear form f : V → R ( Def. 15.1.6)
can be written as
f (x) = ha, xi (19.12)
for a unique a ∈ V .
19.2. GRAM-SCHMIDT ORTHOGONALIZATION 191

19.2 Gram-Schmidt orthogonalization


The main result of this section is the following theorem.

Theorem 19.2.1. Every finite-dimensional Euclidean space has an orthonormal basis. In fact,
every orthonormal system extends to an orthonormal basis. ♦

Before we prove this theorem, we will first develop Gram-Schmidt orthogonalization, an online
procedure that takes a list of vectors as input and produces a list of orthogonal vectors satisfying
certain conditions. We formalize this below.

Theorem 19.2.2 (Gram-Schmidt orthogonalization). Let V be a Euclidean space and let v1 , . . . , vk ∈


V . For i = 1, . . . , k, let Ui = span(v1 , . . . , vi ). Then there exist vectors e1 , . . . , ek such that

(a) For all j ≥ 1, we have vj − ej ∈ Uj−1

(b) The ej are pairwise orthogonal

Moreover, the ej are uniquely determined. ♦

Note that U0 = span(∅) = {0}.


Let GS(k) denote the statement of Theorem 19.2.2 for a particular value of k. We prove the
Theorem 19.2.2 by induction on k. The inductive step is based on the following lemma.

Lemma 19.2.3. Assume GS(k) holds. Then, for all j ≤ k, we have span(e1 , . . . , ej ) = Uj . ♦

Exercise 19.2.4. Prove GS(1).

Exercise 19.2.5. Let k ≥ 2 and assume GS(k − 1) is true. Look for ek in the form
k−1
X
ek = v k − αi e i (19.13)
i=1

Prove that the only possible vector ek satisfying the conditions of GS(k) is the ek for which

hvi , ei i
αi = (19.14)
kei k2

except in the case where ei = 0 (in that case, αi can be chosen arbitrarily).
192 CHAPTER 19. EUCLIDEAN SPACES

Exercise 19.2.6. Prove that ek as constructed in the previous exercise satisfies GS(k).
This completes the proof of Theorem 19.2.2.
Proposition 19.2.7. ei = 0 if and only if vi ∈ span(v1 , . . . , vi−1 ).
Proposition 19.2.8. The Gram-Schmidt procedure preserves linear independence.
Proposition 19.2.9. If (v1 , . . . , vk ) is a basis of V , then so is (e1 , . . . , ek ).
Exercise 19.2.10. Let e = (e1 , . . . , en ) be an orthogonal basis of V . From e, construct an or-
thonormal basis e0 = (e01 , . . . , e0n ) of V .
Exercise 19.2.11. Conclude that every finite-dimensional Euclidean space has an orthonormal
basis.
Numerical exercise 19.2.12. Apply the Gram-Schmidt procedure to find an orthonormal basis
for each of the following Euclidean spaces V from the basis b. Self-check : once you have applied
Gram-Schmidt, verify that you have obtained an orthonormal set of vectors.
     
1 1 −1
3
(a) V = R with the standard dot product, b =   0 , 1 , 1 
   
1 0 2
R∞ 2
(b) V = P2 [R], with hf, gi = −∞ e−t /2 f (t)g(t)dt, and b = (1, t, t2 ) (Hermite polynomials)
R∞ 1 2
(c) V = P2 [R], with hf, gi = −∞ √1−t 2 f (t)g(t)dt, and b = (1, t, t ) (Chebyshev polynomials of

the first kind)


R∞ √
(d) V = P2 [R], with hf, gi = −∞ 1 − t2 f (t)g(t)dt, and b = (1, t, t2 ) (Chebyshev polynomials of
the second kind)

19.3 Isometries and orthogonality


Definition 19.3.1 (Isometry). Let V and W be Euclidean spaces. Then an isometry f : V → W is
a bijection that preserves inner product, i. e., for all v1 , v2 ∈ V , we have
hv1 , v2 iV = hf (v1 ), f (v2 )iW (19.15)
The Euclidean spaces V and W are isometric if there is an isometry between them.
19.3. ISOMETRIES AND ORTHOGONALITY 193

Theorem 19.3.2. If V and W are finite dimensional Euclidean spaces, then they are isometric if
and only if dim V = dim W . ♦
Proposition 19.3.3. Let V and W be Euclidean spaces. Then ϕ : V → W is an isometry if and
only if it maps an orthonormal basis of V to an orthonormal basis of W .
Proposition 19.3.4. Let ϕ : V → W be an isomorphism that preserves orthogonality (so v ⊥ w
if and only if ϕ(v) ⊥ ϕ(w)). Show that there is an isometry ψ and a nonzero scalar λ such that
ϕ = λψ.
The geometric notion of congruence is captured by the concept of orthogonal transformations.
Definition 19.3.5 (Orthogonal transformation). Let V be a Euclidean space. A linear transformation
ϕ : V → V is called an orthogonal transformation if it is an isometry. The set of orthogonal

R
transformations of V is denoted by O(V ), and is called the orthogonal group of V .
Proposition 19.3.6. The set O(V ) is a group ( Def. 14.3.2) under composition.
Exercise 19.3.7. The linear transformation ϕ : V → V is orthogonal if and only if ϕ preserves the
norm, i. e., for all v ∈ Cn , we have kϕvk = kvk.
Theorem 19.3.8. Let ϕ ∈ O(V ). Then all eigenvalues of ϕ are ±1. ♦
Proposition 19.3.9. Let V be a Euclidean space and let e = (e1 , . . . , en ) be an orthonormal basis
of V . Then ϕ : V → V is an orthogonal transformation if and only if (ϕ(e1 ), . . . , ϕ(en )) is an
orthonormal basis.

R
Proposition 19.3.10 (Consistency of translation). Let V be a Euclidean space with orthonormal
basis b, and let ϕ : V → V be a linear transformation. Then ϕ is orthogonal if and only if [ϕ]b is
an orthogonal matrix ( Def. 9.1.1).
Definition 19.3.11. Let V be a Euclidean space and let S, T ⊆ V . For v ∈ V , we say that v is
orthogonal to S (notation: v ⊥ S) if for all s ∈ S, we have v ⊥ s. Moreover, we say that S is
orthogonal to T (notation: S ⊥ T ) if s ⊥ t for all s ∈ S and t ∈ T .
Definition 19.3.12. Let V be a Euclidean space and let S ⊆ V . Then S ⊥ (“S perp”) is the set of
vectors orthogonal to S, i. e.,
S ⊥ := {v ∈ V | v ⊥ S}. (19.16)
Proposition 19.3.13. For all subsets S ⊆ V , we have S ⊥ ≤ V .
194 CHAPTER 19. EUCLIDEAN SPACES
⊥
Proposition 19.3.14. Let S ⊆ V . Then S ⊆ S ⊥ .
Exercise 19.3.15. Verify
(a) {0}⊥ = V
(b) ∅⊥ = V

R
(c) V ⊥ = {0}
The next theorem says that the direct sum ( Def. 15.5.1) of a subspace and its perp is the
entire space.
Theorem 19.3.16. If dim V < ∞ and W ≤ V , then V = W ⊕ W ⊥ . ♦
The proof of this theorem requires the following lemma.
Lemma 19.3.17. Let V be a vector space with dim V = n, and let W ≤ V . Then

dim W ⊥ = n − dim W . (19.17)


Proposition 19.3.18. Let V be a finite-dimensional Euclidean space and let S ⊆ V . Then
⊥
S ⊥ = span(S) . (19.18)

Proposition 19.3.19. Let U1 , . . . , Uk be pairwise orthogonal subspaces (i. e., Ui ⊥ Uj whenever


i 6= j). Then
Xk Mk
Ui = Ui .
i=1 i=1

We now study the linear map analogue of the transpose of a matrix, known as the adjoint of
the linear map.
Theorem 19.3.20. Let V and W be Euclidean spaces, and let ϕ : V → W be a linear map. Then
there exists a unique linear map ψ : W → V such that for all v ∈ V and w ∈ W , we have

hϕv, wi = hv, ψwi . (19.19)


19.4. FIRST PROOF OF THE SPECTRAL THEOREM 195

Note that the inner product above refers to inner products in two different spaces. To be more
specific, we should have written
hϕv, wiW = hv, ψwiV . (19.20)
Definition 19.3.21. The linear map ψ whose existence is guaranteed by Theorem 19.3.20 is called
the adjoint of ϕ and is denoted ϕ∗ . So for all v ∈ V and w ∈ W , we have

hϕv, wi = hv, ϕ∗ wi . (19.21)

The next exercise shows the relationship between the coordinatization of ϕ and of ϕ∗ . The
reason we denote the adjoint of the linear map ϕ by ϕ∗ rather than by ϕT will become clear in
Section 20.4.
Proposition 19.3.22. Let V , W , and ϕ be as in the statement of Theorem 19.3.20. Let b1 be an
orthonormal basis of V and let b2 be an orthonormal basis of W . Then

[ϕ∗ ]b2 ,b1 = [ϕ]Tb1 ,b2 . (19.22)

19.4 First proof of the Spectral Theorem


We first stated the Spectral Theorem for real symmetric matrices in Chapter 10. We now restate
the theorem in the context of Euclidean spaces and break its proof into a series of exercises.
Let V be a Euclidean space.
Definition 19.4.1 (Symmetric linear transformation). Let ϕ : V → V be a linear transformation.
Then ϕ is symmetric if, for all v, w ∈ V , we have

hv, ϕ(w)i = hϕ(v), wi . (19.23)

Proposition 19.4.2. Let ϕ : V → V be a symmetric transformation, and let v1 , . . . , vk be eigen-


vectors of ϕ with distinct eigenvalues. Then vi ⊥ vj whenever i 6= j.
Proposition 19.4.3. Let b be an orthonormal basis of the finite-dimensional Euclidean space V ,
and let ϕ : V → V be a linear transformation. Then ϕ is symmetric if and only if the matrix [ϕ]b
is symmetric.
In particular, this proposition establishes the equivalence of Theorem 10.1.1 and Theorem 19.4.4.
The main theorem of this section is the Spectral Theorem.
196 CHAPTER 19. EUCLIDEAN SPACES

Theorem 19.4.4 (Spectral Theorem). Let V be a finite-dimensional Euclidean space and let ϕ :
V → V be a symmetric linear transformation. Then ϕ has an orthonormal eigenbasis. ♦

let W ≤ V . If W is ϕ-invariant ( R
Proposition 19.4.5. Let ϕ be a symmetric linear transformation of the Euclidean space V , and
Def. 16.4.21) then W ⊥ is ϕ-invariant.

Lemma 19.4.6. Let ϕ : V → V be a symmetric linear transformation and let W be a ϕ-invariant


subspace of V . Then the restriction of ϕ to W is also symmetric. ♦

The heart of the proof of the Spectral Theorem is the following lemma.

Main Lemma 19.4.7. Let ϕ be a symmetric linear transformation of a Euclidean space of degree
≥ 1. Then ϕ has an eigenvector.

Exercise 19.4.8. Assuming Lemma 19.4.7, prove the Spectral Theorem by induction on dim V .

Before proving the Main Lemma, we digress briefly into analysis.


Definition 19.4.9 (Convergence). Let V be a Euclidean space and let v = (v1 , v2 , v3 , . . . ) be a
sequence of vectors in V . We say that the sequence v converges to the vector v if lim kvi − vk = 0,
i→∞
that is, if for all ε > 0 there exists N ∈ Z such that kvi − vk < ε for all i > N .
Definition 19.4.10 (Open set). Let S ⊆ Rn . We say that S is open if, for every v ∈ S, there exists
ε > 0 such that the set {w ∈ Rn | kv − wk < ε} is a subset of S.
Definition 19.4.11 (Closed set). A set S ⊆ V is closed if, whenever there is a sequence of vectors
v1 , v2 , v3 , · · · ∈ S that converges to a vector v ∈ V , we have v ∈ S.

that for all w ∈ V , we have d(v, w) < r ( R


Definition 19.4.12 (Bounded set). A set S ⊆ V is bounded if there is some v ∈ V and r ∈ R such
Def. 19.1.5: d(v, w) = kv − wk).
Definition 19.4.13 (Compactness). Let V be a finite-dimensional Euclidean space, and let S ⊆ V .
Then S is compact if it is closed and bounded.1
Definition 19.4.14. Let S be a set and let f : S → R be a function. We say that f attains its
maximum if there is some s0 ∈ S such that for all s ∈ S, we have f (s0 ) ≥ f (s). The notion of a
function attaining its minimum is defined analogously.
1
The reader familiar with topology may recognize this not as the standard definition of compactness in a general
metric space, but rather as an equivalent definition for finite-dimensional Euclidean spaces that arises as a consequence
of the Heine–Borel Theorem.
19.4. FIRST PROOF OF THE SPECTRAL THEOREM 197

Theorem 19.4.15. Let V be a Euclidean space and let S ⊆ V be a compact and nonempty subset.
If f : S → R is continuous, then f attains its maximum and its minimum. ♦

Definition 19.4.16 (Rayleigh quotient for symmetric linear transformations). Let ϕ : V → V be a


symmetric linear transformation. Then the Rayleigh quotient of ϕ is the function Rϕ : V \ {0} → R
defined by
hv, ϕ(v)i
Rϕ (v) := . (19.24)
kvk2
Exercise 19.4.17. Let ϕ : V → V be a symmetric linear transformation, and let v ∈ V and
λ ∈ R× . Verify that Rϕ (λv) = Rϕ (v).

Proposition 19.4.18. For all linear transformations ϕ : V → V , the Rayleigh quotient Rϕ attains
its maximum and its minimum.

Exercise 19.4.19. Consider the real function


α + βt + γt2
f (t) =
δ + εt2
where α, β, γ, δ, ε ∈ R and δ 2 + ε2 6= 0. Suppose that for all t ∈ R, we have f (0) ≥ f (t). Show that
β = 0.

Definition 19.4.20 (arg max). Let S be a set and let f : S → R be a function which attains its
maximum. Then arg max f := {s0 ∈ S | f (s0 ) ≥ f (s) for all s ∈ S}.
Convention 19.4.21. Let S be a set and let f : S → R be a function which attains its maximum.
We often write arg max f to refer to any (arbitrarily chosen) element of the set arg max f , rather
than the set itself.

Proposition 19.4.22. Let ϕ be a symmetric linear transformation of the Euclidean space V , and
let v0 = arg max Rϕ (v). Then v0 is an eigenvector of ϕ.

Prop. 19.4.22 completes the proof of the Main Lemma and thereby the proof of the Spectral
Theorem.

Proposition 19.4.23. If two symmetric matrices are similar then they are orthogonally similar.
Chapter 20

Hermitian Spaces

In Chapter 19, we discussed Euclidean spaces, whose underlying vector spaces were real. We now
generalize this to the notion of Hermitian spaces, whose underlying vector spaces are complex.

20.1 Hermitian spaces


Definition 20.1.1 (Sesquilinear forms). Let V be a vector space over C. The function f : V ×V → C
is a sesquilinear form if the following four conditions are met.
(a) f (v, w1 + w2 ) = f (v, w1 ) + f (v, w2 ) for all v, w1 , w2 ∈ V ,
(b) f (v1 + v2 , w) = f (v1 , w) + f (v2 , w) for all v1 , v2 , w ∈ V ,
(c) f (v, λw) = λf (v, w) for all v, w ∈ V and λ ∈ C,
(d) f (λv, w) = λf (v, w) for all v, w ∈ V and λ ∈ C, where λ is the complex conjugate of λ.
Examples 20.1.2. The function f (v, w) = v∗ w is a sesquilinear form over Cn . More generally,
for any A ∈ Mn (C),
f (v, w) = v∗ Aw (20.1)
is sesquilinear.
Exercise 20.1.3. Let V be a vector space over C and let f : V × V → C be a sesquilinear form.
Show that for all v ∈ V , we have
f (v, 0) = f (0, v) = 0 . (20.2)

Babai: Discover Linear Algebra. 198 This chapter last updated August 9, 2016
c 2016 László Babai.
20.1. HERMITIAN SPACES 199

Definition 20.1.4 (Hermitian form). Let V be a complex vector space. The function f : V × V → C
is Hermitian if for all v, w ∈ V ,
f (v, w) = f (w, v) . (20.3)
A sesquilinear form that is Hermitian is called a Hermitian form.

Exercise 20.1.5. For what matrices A ∈ Mn (C) is the sesquilinear form f (v, w) = v∗ Aw Hermi-
tian?

Exercise 20.1.6. Show that for Hermitian forms, (b) follows from (a) and (d) follows from (c) in
Def. 20.1.1.

Fact 20.1.7. Let V be a vector space over C and let f : V × V → C be a Hermitian form. Then
f (v, v) ∈ R for all v ∈ V .

Definition 20.1.8. Let V be a vector space over C, and let f : V × V → C be a Hermitian form.

(a) If f (v, v) > 0 for all v 6= 0, then f is positive definite.

(b) If f (v, v) ≥ 0 for all v ∈ V , then f is positive semidefinite.

(c) If f (v, v) < 0 for all v 6= 0, then f is negative definite.

(d) If f (v, v) ≤ 0 for all v ∈ V , then f is negative semidefinite.

(e) If there exists v, w ∈ V such that f (v, v) > 0 and f (w, w) < 0, then f is indefinite.

Exercise 20.1.9. For what A ∈ Mn (C) is v∗ Aw positive definite, positive semidefinite, etc.?

Exercise 20.1.10. Let V be the space of continuous functions f : [0, 1] → C, and let ρ : [0, 1] → C
be a continuous “weight function.” Define
Z 1
F (f, g) := f (t)g(t)ρ(t)dt . (20.4)
0

(a) Show that F is a sesquilinear form.

(b) Under what conditions on ρ is F Hermitian?

(c) Under what conditions on ρ is F Hermitian and positive definite?


200 CHAPTER 20. HERMITIAN SPACES

Exercise 20.1.11. Let r = (ρ0 , ρ1 , . . . ) be an infinite sequence of complex numbers. Consider the
space V of infinite sequences (α0 , α1 , . . . ) of complex numbers such that

X
|αi |2 |ρi | < ∞. (20.5)
i=0

For a = (α0 , α1 , . . . ) and b = (β0 , β1 , . . . ), define



X
F (a, b) := αi βi ρi . (20.6)
i=0

(a) Show that F is a sesquilinear form.


(b) Under what conditions on r is F Hermitian?
(c) Under what conditions on r is F Hermitian and positive definite?
Definition 20.1.12 (Hermitian space). A Hermitian space is a vector space V over C endowed with
a positive definite, Hermitian, sesquilinear inner product h·, ·i : V × V → C.

standard Hermitian dot product ( R


Example 20.1.13. The standard example of a (complex) Hermitian space is Cn endowed with the
Def. 12.2.5), that is, hv, wi := v∗ w.
Note that the standard Hermitian dot product is sesquilinear and positive definite.
Example 20.1.14. Let V be the space of continuous functions f : [0, 1] → C, and define the inner
product Z 1
hf, gi = f (t)g(t)dt
0
Then V endowed with this inner product is a Hermitian space.
Example 20.1.15. The space `2 (C) of sequences (α0 , α1 , . . . ) of complex numbers such that

X
|αi |2 < ∞ (20.7)
i=0

endowed with the inner product



X
ha, bi = αi βi (20.8)
i=0
20.1. HERMITIAN SPACES 201

where a = (α0 , α1 , . . . ) and b = (β0 , β1 , . . . ) is a Hermitian space. This is one of the standard
representations of the complex “separable Hilbert space.”
Euclidean spaces generalize the geometric concepts of distance and perpendicularity via the
notions of norm and orthogonality, respectively; these are easily extended to complex Hermitian
spaces.
Definition 20.1.16 (Norm). Let V be a Hermitian space, and let v ∈ V . Then the norm of v,
denoted kvk, is defined to be p
kvk := hv, vi . (20.9)
Just as in Euclidean spaces, the notion of a norm allows us to define the distance between two
vectors in a Hermitian space.
Definition 20.1.17. Let V be a Hermitian space, and let v, w ∈ V . Then the distance between the
vectors v and w, denoted d(v, w), is

d(v, w) := kv − wk . (20.10)

Distance in Hermitian spaces obeys the same properties that we are used to in Euclidean spaces.
Theorem 20.1.18 (Cauchy-Schwarz inequality). Let V be a Hermitian space, and let v, w ∈ V .
Then
|hv, wi| ≤ kvk · kwk . (20.11)

Theorem 20.1.19 (Triangle inequality). Let V be a Hermitian space, and let v, w ∈ V . Then

kv + wk ≤ kvk + kwk . (20.12)


Again, like in Euclidean spaces, norms carry with them the notion of angle; however, because
hv, wi is not necessarily real, our definition of angle is not identical to the definition of angle
presented in Section 19.1.
Definition 20.1.20 (Orthogonality). Let V be a Hermitian space. Then we say that v, w ∈ V are
orthogonal (notation: v ⊥ w) if hv, wi = 0.
Exercise 20.1.21. Let V be a Hermitian space. What vectors are orthogonal to every vector?
202 CHAPTER 20. HERMITIAN SPACES

Definition 20.1.22 (Orthogonal system). An orthogonal system in a Hermitian space V is a list of


(pairwise) orthogonal nonzero vectors in V .
Proposition 20.1.23. Every orthogonal system in a Hermitian space is linearly independent.
Definition 20.1.24 (Gram matrix). Let V be a Hermitian space, and let v1 , . . . , vk ∈ V . The Gram
matrix of v1 , . . . , vk is the k × k matrix whose (i, j) entry is hvi , vj i, that is,
G = G(v1 , . . . , vk ) := (hvi , vj i)ki,j=1 . (20.13)
Exercise 20.1.25. Let V be a Hermitian space. Show that the vectors v1 , . . . , vk ∈ V are linearly
independent if and only if det G(v1 , . . . , vk ) 6= 0.
Exercise 20.1.26. Let V be a Hermitian space and let v1 , . . . , vk ∈ V . Show
rk(v1 , . . . , vk ) = rk(G(v1 , . . . , vk )) . (20.14)
Definition 20.1.27 (Orthonormal system). An orthonormal system in a Hermitian space V is a list
of (pairwise) orthogonal vectors in V , all of which have unit norm. So (v1 , v2 , . . . ) is an orthonormal
system if hvi , vj i = δij for all i, j.
In the case of finite-dimensional Hermitian spaces, we are particularly interested in orthonormal
bases, just as we were interested in orthonormal bases for finite-dimensional Euclidean spaces.
Definition 20.1.28 (Orthonormal basis). Let V be a Hermitian space. An orthonormal basis of V

R
is an orthonormal system that is a basis of V .
Exercise 20.1.29. Generalize the Gram-Schmidt orthogonalization procedure ( Section 19.2)
to complex Hermitian spaces. Theorem 19.2.2 will hold verbatim, replacing the word “Euclidean”
by “Hermitian.”
Proposition 20.1.30. Every finite-dimensional Hermitian space has an orthonormal basis. In fact,
every orthonormal list of vectors can be extended to an orthonormal basis.
Proposition 20.1.31. Let V be a Hermitian space with orthonormal basis b. Then for all v, w ∈ V ,

R
hv, wi = [v]∗b [w]b . (20.15)
Proposition 20.1.32. Let V be a Hermitian space. Every linear form f : V → C ( Def.
15.1.6) can be written as
f (x) = ha, xi (20.16)
for a unique a ∈ V .
20.2. HERMITIAN TRANSFORMATIONS 203

20.2 Hermitian transformations


Definition 20.2.1 (Hermitian linear transformation). Let V be a Hermitian space. Then the linear
transformation ϕ : V → V is Hermitian if for all v, w ∈ V , we have
hϕv, wi = hv, ϕwi . (20.17)
Theorem 20.2.2. All eigenvalues of a Hermitian linear transformation are real. ♦
Proposition 20.2.3. Let ϕ : V → V be a Hermitian linear transformation, and let v1 , . . . , vk be
eigenvectors of ϕ with distinct eigenvalues. Then vi ⊥ vj whenever i 6= j.
Proposition 20.2.4. Let b be an orthonormal basis of the Hermitian space V , and let ϕ : V → V
be a linear transformation. Then the transformation ϕ is Hermitian if and only if the matrix [ϕ]b
is Hermitian.

metric linear transformation of a Euclidean space ( R


Observe that the definition of a Hermitian linear transformation is analogous to that of a sym-
Def. 19.4.1), but they are not fully anal-

R
ogous. In particular, while a linear transformation of a Euclidean space that has an orthonormal
eigenbasis is necessarily symmetric, the analogous statement in complex spaces involves “normal
transformations” ( Def. 20.5.1), as opposed to Hermitian transformations.
Exercise 20.2.5. Find a linear transformation of Cn that has an orthonormal eigenbasis but is not
Hermitian.
However, the Spectral Theorem does extend to Hermitian linear transformations.
Theorem 20.2.6 (Spectral Theorem for Hermitian transformations). Let V be a Hermitian space
and let ϕ : V → V be a Hermitian linear transformation. Then
(a) ϕ has an orthonormal eigenbasis;
(b) all eigenvalues of ϕ are real.

Exercise 20.2.7. Prove the converse of the Spectral Theorem for Hermitian transformations: If
ϕ : V → V satisfies (a) and (b), then ϕ is Hermitian.

to normal transformations ( R
In Section 20.6, we shall see a more general form of the Spectral Theorem which extends part (a)
Theorem 20.6.1).
204 CHAPTER 20. HERMITIAN SPACES

R
20.3 Unitary transformations
In Section 19.3, we introduced orthogonal transformations ( Def. 19.3.5), which captured the
geometric notion of congruence in Euclidean spaces. The complex analogues of real orthogonal
transformations are called unitary transformations.
Definition 20.3.1 (Unitary transformation). Let V be a Hermitian space. Then the transformation
ϕ : V → V is unitary if it preserves the inner product, i. e.,

hϕv, ϕwi = hv, wi (20.18)

R
for all v, w ∈ V . The set of unitary transformations ϕ : V → V is denoted by U (V ).

Proposition 20.3.2. The set U (V ) is a group ( Def. 14.3.2) under composition.

Exercise 20.3.3. The linear transformation ϕ : V → V is unitary if and only if ϕ preserves the

R
norm, i. e., for all v ∈ Cn , we have kϕvk = kvk.

Warning. The proof of this is trickier than in the real case ( Ex. 19.3.7).

Theorem 20.3.4. Let ϕ ∈ U (V ). Then all eigenvalues of ϕ have absolute value 1. ♦

Proposition 20.3.5 (Consistency of translation). Let b be an orthonormal basis of the Hermitian


space V , and let ϕ : V → V be a linear transformation. Then the transformation ϕ is unitary if
and only if the matrix [ϕ]b is unitary.

Theorem 20.3.6 (Spectral Theorem for unitary transformations). Let V be a Hermitian space and
let ϕ : V → V be a unitary transformation. Then

(a) ϕ has an orthonormal eigenbasis;

(b) all eigenvalues of ϕ have unit absolute value.

Exercise 20.3.7. Prove the converse of the Spectral Theorem for unitary transformations: If
ϕ : V → V satisfies (a) and (b), then ϕ is unitary.
20.4. ADJOINT TRANSFORMATIONS IN HERMITIAN SPACES 205

20.4 Adjoint transformations in Hermitian spaces


We now study the linear map analogue of the conjugate-transpose of a matrix, known as the
Hermitian adjoint.

Theorem 20.4.1. Let V and W be Hermitian spaces, and let ϕ : V → W be a linear map. Then
there exists a unique linear map ψ : W → V such that for all v ∈ V and w ∈ W , we have

hϕv, wi = hv, ψwi . (20.19)

Note that the inner product above refers to inner products in two different spaces. To be more
specific, we should have written
hϕv, wiW = hv, ψwiV . (20.20)

Definition 20.4.2. The linear map ψ whose existence is guaranteed by Theorem 20.4.1 is called the
adjoint of ϕ and is denoted ϕ∗ . So for all v ∈ V and w ∈ W , we have

hϕv, wi = hv, ϕ∗ wi . (20.21)

Proposition 20.4.3 (Consistency of translation). Let V , W , and ϕ be as in the statement of


Theorem 20.4.1. Let b1 be an orthonormal basis of V and let b2 be an orthonormal basis of W .
Then
[ϕ∗ ]b2 ,b1 = [ϕ]∗b1 ,b2 . (20.22)

TO BE WRITTEN.

20.5 Normal transformations


Definition 20.5.1 (Normal transformation). Let V be a Hermitian space. The transformation ϕ :
V → V is normal if it commutes with its adjoint, i. e., ϕ∗ ϕ = ϕϕ∗ .
TO BE WRITTEN.
206 CHAPTER 20. HERMITIAN SPACES

20.6 The Complex Spectral Theorem for normal transfor-


mations
The main result of this section is the generalization of the Spectral Theorem to normal transforma-
tions.

Theorem 20.6.1 (Complex Spectral Theorem). Let ϕ : V → V be a linear transformation. Then


ϕ has an orthonormal eigenbasis if and only if ϕ is normal. ♦
Chapter 21

(R, C) The Singular Value Decomposition

In this chapter, we discuss matrices over C, but every statement of this chapter holds over R as
well, if we replace matrix adjoints by transposes and the word “unitary” by “orthogonal.”

21.1 The Singular Value Decomposition


In this chapter we study the “Singular Value Decomposition,” an important tool in many areas of
math and computer science.
Notation 21.1.1. For r ≤ min{k, n}, the matrix diagk×n (σ1 , . . . , σr ) is the k × n matrix with σi
(1 ≤ i ≤ r) in the (i, i) entry and 0 everywhere else, i. e., the matrix

0
 
σ1
 σ2 
..
 

 . 

σr
 
 
0
 
 
 ... 

0 0

Note that such a matrix is a “diagonal” matrix which is not necessarily square.

Babai: Discover Linear Algebra. 207 This chapter last updated August 29, 2016
c 2016 László Babai.
208 CHAPTER 21. THE SINGULAR VALUE DECOMPOSITION

Theorem 21.1.2 (Singular Value Decomposition). Let A ∈ Ck×n . Then there exist unitary matrices
S ∈ U (k) and T ∈ U (n) such that

S ∗ AT = diagk×n (σ1 , . . . , σr ) (21.1)

where each “singular value” σi is real, σ1 ≥ σ2 ≥ · · · ≥ σr > 0, and r = rk A. If A ∈ Rk×n , then we


can let S and T be real, i. e., S ∈ O(k) and T ∈ O(n).
Theorem 21.1.3. The singular values of A are the square roots of the nonzero eigenvalues of
A∗ A. ♦
The remainder of this section is dedicated to proving the following restatement of Theorem
21.1.2.
Theorem 21.1.4 (Singular Value Decomposition, restated). Let V and W be Hermitian spaces
with dim V = n and dim W = k. Let ϕ : V → W be a linear map of rank r. Then there exist
orthonormal bases e and f of V and W , respectively, and real numbers σ1 ≥ · · · ≥ σr > 0 such that
ϕei = σi f i and ϕ∗ f i = σi ei for i = 1, . . . , r, and ϕej = 0 = ϕ∗ f j for j > r.
Exercise 21.1.5. Show that Theorem 21.1.4 is equivalent to Theorem 21.1.2.
Exercise 21.1.6. Let V and W be Hermitian spaces and let ϕ : V → W be a linear map of rank r.
Let e = (e1 , . . . , er , . . . , en ) be an orthonormal eigenbasis of ϕ∗ ϕ (why does√
such a basis exist?) with
corresponding eigenvalues λ1 ≥ · · · ≥ λr > 0 = λr+1 = · · · = λn . Let σi = λi for u = 1, . . . , r. Let
f i = σ1i ei for i = 1, . . . , r. Then for 1 ≤ i ≤ r,
(a) ϕei = σi f i ;
(b) ϕ∗ f i = σi ei .
Exercise 21.1.7. The f i of the preceding exercise are orthonormal.
Exercise 21.1.8. Complete the proof of Theorem 21.1.4, and hence of Theorem 21.1.2.

21.2 Low-rank approximation

R
In this section, we use the Singular Value Decomposition to find low-rank approximations to matri-
ces, that is, the matrix of a given rank which is “closest” to a specified matrix under the operator
norm ( Def. 13.1.1).
21.2. LOW-RANK APPROXIMATION 209

Definition 21.2.1 (Truncated matrix). Let D = diagk×n (σ1 , . . . , σr ) be a rank-r matrix (so σ1 , . . . , σr 6=
0). The rank-` truncation (` ≤ r) of D, denoted D` , is the k × n matrix D` = diagk×n =
diagk×n (σ1 , . . . , σ` ).
The next theorem explains how the Singular Value Decomposition helps us find low-rank ap-
proximations to matrices.

Theorem 21.2.2 (Nearest low-rank matrix). Let A ∈ Ck×n be a matrix of rank r with singular
values σ1 ≥ · · · ≥ σr > 0. Define S, T , and D as guaranteed by the Singular Value Decomposition
Theorem so that
S ∗ AT = diagk×n (σ1 , . . . , σr ) = D . (21.2)
Given ` ≤ r, let
D` = diagk×n (σ1 , . . . , σ` ) (21.3)
and define B` = SD` T ∗ . Then B` is the matrix of rank at most ` which is nearest to A under the
operator norm, i. e., rk B` = ` and for all B ∈ Ck×n , if rk B ≤ `, then kA − B` k ≤ kA − Bk.

Exercise 21.2.3. Let A, B` , D, and D` be as in the statement of Theorem 21.2.2. Show

(a) kA − B` k = kD − D` k ;

(b) kD − D` k = σ`+1 .

As with the proof of the Singular Value Decomposition Theorem, Theorem 21.2.2 is easier to
prove in terms of linear maps. We restate the theorem as follows.

Theorem 21.2.4. Let V and W be Hermitian spaces with orthonormal bases e and f , respectively,
and let ϕ : V → W be a linear map such that ϕei = σi f i for i = 1, . . . , r. Define the truncated map
ϕ` : V → V by 
σi f i 1 ≤ i ≤ `
ϕ ` ei = (21.4)
0 otherwise
Then whenever ψ : V → W is a linear map of rank ≤ `, we have kϕ − ϕ` k ≤ kϕ − ψk.

Exercise 21.2.5. Show that Theorem 21.2.4 is equivalent to Theorem 21.2.2.

Exercise 21.2.6. Let ϕ and ϕ` be as in the statement of the preceding theorem. Show kϕ − ϕ` k =
σ`+1 .
210 CHAPTER 21. THE SINGULAR VALUE DECOMPOSITION

It follows that in order to prove Theorem 21.2.4, it suffices to show that for all linear maps
ψ : V → W of rank ≤ `, we have kϕ − ψk ≥ σ`+1 .

Exercise 21.2.7. Let ψ : V → W be a linear map of rank ≤ `. Show that there exists v ∈ ker ψ
such that
k(ϕ − ψ)vk
≥ σ`+1 . (21.5)
kvk
Exercise 21.2.8. Complete the proof of Theorem 21.2.4, hence of Theorem 21.2.2.

Exercise 21.2.9. Show that the matrix B` whose existence is guaranteed by Theorem 21.2.2 is
unique, i. e., if there is a matrix B ∈ Ck×` of rank ` such that, kA − B 0 k ≤ kA − Bk for all rank-`
matrices B 0 ∈ Ck×` , then B = B` .

norm is also the rank-` matrix nearest to A under the Frobenius norm ( R
In fact, the rank-` matrix guaranteed by Theorem 21.2.2 to be nearest to A under the operator
Def. 13.2.1).

Theorem 21.2.10. The statement of Theorem 21.2.2 holds for the same matrix B` when the
operator norm is replaced by the Frobenius norm. That is, we also have kA − B` kF ≤ kA − BkF for
all rank-` matrices B ∈ Ck×n . ♦
Chapter 22

(R) Finite Markov Chains

22.1 Stochastic matrices


Definition 22.1.1 (Nonnegative matrix). We say that A is a nonnegative matrix if all entries of A
are nonnegative.
Definition 22.1.2 (Stochastic matrix). A square matrix A = (αij ) ∈ Mn (R) is stochastic if it is
n
X
nonnegative and the rows of A sum to 1, i. e., αij = 1 for all i.
j=1

Examples 22.1.3. The following matrices are stochastic.


1
(a) J
n n

(b) Permutation matrices


 
0.8 0.2
(c)
0.3 0.7
 
1
1
(d)  ..

.
0 
1

Babai: Discover Linear Algebra. 211 This chapter last updated August 24, 2016
c 2016 László Babai.
212 CHAPTER 22. FINITE MARKOV CHAINS

Exercise 22.1.4. Let A ∈ Rk×n and B ∈ Rn×m . Prove that if A and B are stochastic matrices
then AB is a stochastic matrix.
Exercise 22.1.5.
(a) Let A be a stochastic matrix. Prove that A1 = 1.
(b) Show that the converse is false.
(c) Show that A is stochastic if and only if A is nonnegative and A1 = 1.
Definition 22.1.6 (Probability distribution). A probability distribution is a list of nonnegative num-
bers which add to 1.
Fact 22.1.7. Every row of a stochastic matrix is a probability distribution.

22.2 Finite Markov Chains


A finite Markov Chain is a stochastic process defined by a finite number of states and constant
transition probabilities. We consider the trajectory of a particle that can move between states at
discrete time steps with the given transition probabilities. In this section, we denote by Ω = [n] the
finite set of states.

FIGURE HERE

This is described more formally below.


Definition 22.2.1 (Finite Markov Chain). Let Ω = [n] be a set of states. We denote by Xt ∈ Ω the
position of the particle at time t. The transition probability pij is the probability that the particle
moves to state j at time t + 1, given that it is in state i at time t, i. e.,

pij = P (Xt+1 = j | Xt = i) . (22.1)


n
X
In particular, each pij ≥ 0 and pij = 1 for every i. The infinite sequence (X0 , X1 , X2 , . . . ) is a
j=1
stochastic process.
Definition 22.2.2 (Transition matrix). The transition matrix corresponding to our finite Markov
Chain is the matrix T = (pij ).
22.2. FINITE MARKOV CHAINS 213

Fact 22.2.3. The transition matrix T of a finite Markov Chain is a stochastic matrix, and every
stochastic matrix is the transition matrix of a finite Markov Chain.
Notation 22.2.4 (r-step transition probability). The r-step transition probability from state i to
(r)
state j, denoted pij is defined by
(r)
pij = P (Xt+r = j | Xt = i) . (22.2)
Exercise 22.2.5 (Evolution of Markov Chains, I). Let
 T = (pij ) be the transition matrix corre-
(r)
sponding to a finite Markov Chain. Show that T r = pij where
Definition 22.2.6. Let qt,i be the probability that the particle is in state i at time t. We define
qt = (qt,1 , . . . , qt,n ) to be the distribution of the particle at time t.
X n
Fact 22.2.7. For every t ≥ 0, qt,i = 1.
i=1

Exercise 22.2.8 (Evolution of Markov Chains, II). Let T be the transition matrix of a finite
Markov Chain. Show that qt+1 = qt T and conclude that qt = q0 T t .

R
Definition 22.2.9 (Stationary distribution). The probability distribution q is a stationary distribu-
tion if q = qT , i. e., if q is a left eigenvector ( Def. 8.1.17) with eigenvalue 1.
Proposition 22.2.10. Let A ∈ Mn (R) be a stochastic matrix. Show that A has a left eigenvector
with eigenvalue 1.

R
The preceding proposition in conjunction with the next theorem shows that every stochastic
matrix has a stationary distribution.
Recall ( Def. 22.1.1) that a nonnegative matrix is a matrix whose entries are all nonnegative;
a nonnegative vector is defined similarly.
Do not prove the following theorem.
Theorem 22.2.11 (Perron-Frobenius). Every nonnegative square matrix has a nonnegative eigen-
vector.
Corollary 22.2.12. Every finite Markov Chain has a stationary distribution.
Proposition 22.2.13. If T is the transition matrix of a finite Markov Chain and lim T r = L
r→∞
exists, then every row of L is a stationary distribution.
In order to determine which Markov Chains have transition matrices that converge, we study
the directed graphs associated with finite Markov Chains.
214 CHAPTER 22. FINITE MARKOV CHAINS

22.3 Digraphs
TO BE WRITTEN.

22.4 Digraphs and Markov Chains


Every finite Markov Chain has an associated digraph, where i → j if and only if pij 6= 0.

FIGURE HERE

Definition 22.4.1 (Irreducible Markov Chain). A finite Markov Chain is irreducible if its associated
digraph is strongly connected.

Proposition 22.4.2. If T is the transition matrix of an irreducible finite Markov Chain and
lim T r = L exists, then all rows of L are the same, so rk L = 1.
r→∞

Definition 22.4.3 (Ergodic Markov Chain). A finite Markov Chain is ergodic if its associated digraph
is strongly connected and aperiodic.
The following theorem establishes a sufficient condition under which the probability distribution
of a finite Markov Chain converges to the stationary distribution. For irreducible Markov Chains,
this is necessary and sufficient.

Theorem 22.4.4. If T is the transition matrix of an ergodic finite Markov Chain, then lim T r
r→∞
exists.

Proposition 22.4.5. For irreducible Markov Chains, the stationary distribution is unique.

22.5 Finite Markov Chains and undirected graphs


TO BE WRITTEN.
22.6. ADDITIONAL EXERCISES 215

22.6 Additional exercises


Definition 22.6.1 (Doubly stochastic matrix). A matrix A = (αij ) is doubly stochastic if both A and
AT are stochastic, i. e., 0 ≤ αij ≤ 1 for all i, j and every row and column sum is equal to 1.

Examples 22.6.2. The following matrices are doubly stochastic.


1
(a) J
n n

(b) Permutation matrices


 
0.1 0.3 0.6
(c) 0.2 0.5 0.3
0.7 0.2 0.1

R Def. 5.3.6) of the set of permutation matrices ( R R


Theorem∗ 22.6.3 (Birkhoff’s Theorem). The set of doubly stochastic matrices is the convex hull
( Def. 2.4.3). In other words, every
doubly stochastic matrix can be expressed as a convex combination ( Def. 5.3.1) of permutation
matrices. ♦
Chapter 23

More Chapters

TO BE WRITTEN.

Babai: Discover Linear Algebra. This chapter last updated July 7, 2016
c 2016 László Babai.
Chapter 24

Hints

24.1 Column Vectors

1.1.17 R Exercise R Solution

R R
(b) Consider the sums of the entries in each column vector.

1.2.7 Exercise Solution


Show that the sum of two vectors of zero weight has zero weight and that scaling a zero-weight

R R
vector produces a zero-weight vector.

1.2.8 Exercise Solution

R R
What are the subspaces of Rn which are spanned by one vector?

1.2.9 Exercise Solution


(c) The “if” direction is trivial. For the “only if” direction, begin with vectors w1 ∈ W1 \ W2 and

R R
w2 ∈ W2 \ W1 . Where is w1 + w2 ?

1.2.12 Exercise Solution

R R
Prove (a) and (b) together. Prove that the subspace defined by (b) satisfies the definition.

1.2.15 Exercise Solution

Babai: Discover Linear Algebra. 217 This chapter last updated August 18, 2016
c 2016 László Babai.
218 CHAPTER 24. HINTS

R
R R
Preceding exercise.

1.2.16 Exercise Solution

R R
Show that span(T ) ≤ span(S).

1.2.18 Exercise Solution


It is clear that U1 + U2 ⊆ span(U1 ∪ U2 ). For the reverse inclusion you need to show that a linear

R R
combination from U1 ∪ U2 can always be written as a sum u1 + u2 for some u1 ∈ U1 and u2 ∈ U2 .

1.3.9 Exercise Solution


You need to find α1 , α2 , α3 such that
       
1 4 −2 0
α1 −3 + α2 2 + α3 −8 = 0 .
      
2 1 3 0

R R
This gives you a system of linear equations. Find a nonzero solution.

1.3.11 Exercise Solution

R R
P
Assume αi vi = 0. What condition on αj allows you to express vj in terms of the other vectors?

1.3.13 Exercise Solution

R R
What does the empty sum evaluate to?

1.3.15 Exercise Solution

R R
Find a nontrivial linear combination of v and v that evaluates to 0.

1.3.16 Exercise Solution

R R
The only linear combinations of the list (v) are of the form αv for α ∈ F.

R
1.3.17 Exercise Solution

R R
Ex. 1.3.13.

1.3.21 Exercise Solution

R R
P
If αi vi = 0 and not all the αi are 0, then it must be the case that αk+1 6= 0 (why?).

1.3.22 Exercise Solution


24.1. COLUMN VECTORS 219

R
R R
Prop. 1.3.11

1.3.23 Exercise Solution

R R
What is the simplest nontrivial linear combination of such a list which evaluates to zero?

1.3.24 Exercise Solution

R R
Combine Prop. 1.3.15 and Fact 1.3.20.

1.3.27 Exercise Solution

R R
Begin with a basis. How can you add one more vector that will satisfy the condition?

1.3.41 Exercise Solution


Suppose not. Use Lemma 1.3.21 to show that span(w1 , . . . , wm ) ≤ span(v2 , . . . , vk ) and arrive at

R R
a contradiction.

1.3.42 Exercise Solution


Use the Steinitz exchange lemma to replace v1 by some wi1 , then v2 by some wi2 , etc. So in the

R R
end, wi1 , . . . , wik are linearly independent.

1.3.45 Exercise Solution

R R
k
Use the standard basis of F and the preceding exercise.

1.3.46 Exercise Solution


It is clear that rk(v1 , . . . , vk ) ≤ dim(span(v1 , . . . , vk ) (why?). For the other direction, first assume
that the vi are linearly independent, and show that span(v1 , . . . , vk ) cannot contain a list of more

R R
than k linearly independent vectors.

1.3.51 Exercise Solution

R R
Show that if this list were not linearly independent, then U1 ∩ U2 would contain a nonzero vector.

R
1.3.52 Exercise Solution

R R
Ex. 1.3.51.

1.3.53 Exercise Solution


220 CHAPTER 24. HINTS

R R
Start with a basis of U1 ∩ U2 .

1.4.9 Exercise Solution

R R
Only the zero vector. Why?

1.4.10 Exercise Solution

R R
Express 1 · x in terms of the entries of x.

1.5.4 Exercise Solution


Begin with a linear combination W of the vectors in S which evaluates to zero. Conclude that all

R R
coefficients must be 0 by taking the dot product of W with each member of S.

1.6.4 Exercise Solution


Incidence vectors.

24.2 Matrices

2.2.3 R Exercise R Solution


R
R R
Prove only one of these, then infer the other using the transpose ( Ex. 2.2.2).

2.2.17 Exercise Solution

R R
(d3) The size is the sum of the squares of the entries. This is the Frobenius norm of the matrix.

2.2.19 Exercise Solution

R R
Show, by induction on k, that the (i, j) entry of Ak is zero whenever j ≤ i + k − 1.

2.2.27 Exercise Solution

R R
Write B as a sum of matrices of the form [0 | · · · | 0 | bi | 0 | · · · | 0] and then use distributivity.

2.2.29 Exercise Solution

R R
Apply the preceding exercise to B = I.

2.2.30 Exercise Solution


24.2. MATRICES 221

R R
Consider (A − B)x.

R
2.2.31 Exercise Solution

R R
Prop. 2.2.28.

R
2.2.32 Exercise Solution

R R
Prop. 2.2.29

R
2.2.33 Exercise Solution

R R
Prop. 2.2.29

2.2.38 Exercise Solution

R R
What is the (i, i) entry of AB?

R
2.2.39 Exercise Solution

R R
Preceding exercise.

2.2.41 Exercise Solution

R R
If Tr(AB) = 0 for all A, then B = 0.

2.3.2 Exercise Solution

R R
Induction on k. Use the preceding exercise.

2.3.4 Exercise Solution

R R
Induction on the degree of f .

2.3.6 Exercise Solution

R R
Observe that the off-diagonal entries do not affect the diagonal entries.

2.3.7 Exercise Solution

R R
Induction on k. Use the preceding exercise.

R
2.5.1 Exercise Solution

R R
Ex. 2.2.2

2.5.5 Exercise Solution


222 CHAPTER 24. HINTS

R
R R
Ex. 2.2.14.

2.5.6 Exercise Solution

R R
Multiply by the matrix with a 1 in the (i, j) position and 0 everywhere else.

2.5.7 Exercise Solution

(a) Trace.

2.5.8 R R RExercise Solution

R R
Incidence vectors ( Def. 1.6.2).

2.5.13 Exercise Solution


(b) That particular circulant matrix is the one corresponding to the cyclic permutation of all
elements, i. e., the matrix C = C(0, 1, 0, 0, . . . , 0).

24.3 Matrix Rank

3.1.5 R Exercise R Solution


For the upper bound on rkcol (A + B), consider 1 × 2 block matrix [A | B] (the k × 2n matrix
obtained by concatenating the columns of B onto A). Show that rkcol (A + B) ≤ rkcol [A | B] ≤

R R
rkcol (A) + rkcol (B). Use the upper bound to derive the lower bound.

3.3.12 Exercise Solution

R R
Consider the k × r submatrix formed by taking r linearly independent columns.

3.5.9 Exercise Solution

R R
Extend a basis of U to a basis of Fn , and consider the action of the matrix A on this basis.

R
3.6.7 Exercise
Ex. 2.2.19.
Solution
24.4. QUALITATIVE THEORY OF SYSTEMS OF LINEAR EQUATIONS 223

24.4 Qualitative Theory of Systems of Linear Equations

4.1.4 R Exercise
Immediate from Ex. 2.2.26.
R Solution

24.5 Affine and Convex Combinations

5.1.14 R Exercise R
Solution
(b) Let v0 ∈ S and let U = {s − v0 | s ∈ S}. Show that

(i) U ≤ Fn ;

(ii) S is a translate of U .

For uniqueness, you need to show that if U + u = V + v = S, then U = V .

24.6 The Determinant

R
6.4.12 Exercise R Solution
The columns of A are linearly dependent ( R Def. 3.3.10) Use this fact to find elementary column

R R
operations that create an all-zero column.

6.4.15

R
Exercise Solution
Necessity was established in the preceding exercise ( R Cor. 6.4.14). Sufficiency follows from

R R
Prop. 3.2.7.

6.7.7 Exercise Solution


Let fen denote the number of fixed-point-free even permutations of the set {1, . . . , n}, and let fon
denote the number of fixed-point-free odd permutations of the set [n]. Write fon − fen as the

R R
determinant of a familiar matrix.

6.8 Exercise Solution


224 CHAPTER 24. HINTS

R
Sufficiency follows from the determinantal expression of A−1 . Necessity follows from the multiplica-

R R
tivity of the determinant ( Prop. 6.4.13).

R
6.8.5 Exercise Solution

R R
Ex. 6.10.5

6.10.4 Exercise Solution

R R
Identity matrix.

6.10.5 Exercise Solution


Consider the n! term expansion of the determinant (Eq. (6.20)). What happens with the contribu-
tions of σ and σ −1 ? What if σ = σ −1 ? What if σ has a fixed point?

24.7 Eigenvectors and Eigenvalues

8.4.9 R Exercise
Induction on n.
R Solution

24.8 Orthogonal Matrices


24.9 The Spectral Theorem
24.10 Bilinear and Quadratic Forms

11.1.5 R Exercise R Solution


R
Let e1 , . . . , en be the standard basis of Fn ( Def. 1.3.34). Let αi = f (ei ). Show that a =

R R
(α1 , . . . , αn )T works.

11.1.10 Exercise Solution

R R
Generalize the hint to Theorem 11.1.5.

11.1.13 Exercise Solution


24.11. COMPLEX MATRICES 225

R
R R
(c) Skew symmetric matrices. Ex. 6.10.3
11.3.3 Exercise Solution

R R
Give a very simple expression for B in terms of A.
11.3.9 Exercise Solution

R R
Use the fact that λ is real.
11.3.15 Exercise Solution

R R
Interlacing.
11.3.23 Exercise Solution
(b) Triangular matrices.

24.11 Complex Matrices

12.2.7 R Exercise R Solution

(a)

R R
(b) Prove that your example must have rank 1.

12.3.16 Exercise Solution


Pick λ0 , . . . , λn−1 to be complex numbers with unit absolute value. Let w = (λ0 , . . . , λn−1 )T . Prove
that the entries of √1n F w generate a unitary circulant matrix, and that all unitary circulant matrices

R R
are of this form. The λi are the eigenvalues of the resulting circulant matrix.
12.4.9 Exercise Solution

R R
Induction on n.

R
12.4.15 Exercise Solution

R R
Prop. 12.4.6.

R
12.4.19 Exercise
Theorem 12.4.18
Solution
226 CHAPTER 24. HINTS

24.12 Matrix Norms

13.1.2 R Exercise
kAxk
RSolution

Show that maxn exists. Why does this suffice?


x∈R kxk

R RR
kxk=1

13.1.13 Exercise Solution


Use the fact that fAB = fBA · tn−k ( Prop. 8.6.1).

24.13 Basic Algebra

24.14 Vector Spaces

R
15.3.14 Exercise R Solution

R R
Use the fact that a polynomial of degree n has at most n roots.

R
15.4.9 Exercise Solution

R R
Ex. 14.5.57.

R
15.4.10 Exercise Solution

R R
Ex. 16.4.10.

15.4.13 Exercise Solution

R R
Start with a basis of U1 ∩ U2 .

15.4.14 Exercise Solution

R R
It is an immediate result of Cor. 15.4.3 that every list of k + 1 vectors in Rk is linearly dependent.

15.4.18 Exercise Solution


Their span is not changed.
24.15. LINEAR MAPS 227

24.15 Linear Maps

16.2.6 R Exercise R Solution

R R
Coordinate vector.

16.3.7 Exercise Solution


Let e1 , . . . , ek be a basis of ker(ϕ). Extend this to a basis e1 , . . . , en of V , and show that ϕ(ek+1 ), . . . , ϕ(en )

R R
is a basis for im(ϕ).

16.4.31 Exercise Solution

(a)

R
(b)

(c) Prop. 16.4.15.

(d)

16.6.7 R Exercise R
Solution
Write A = [ϕ]old and A0 = [ϕ]new . Show that for all x ∈ Fn (n = dim V ), we have A0 x = T −1 ASx.

24.16 Minimal Polynomials of Matrices and Linear Trans-


formations
24.17 Euclidean Spaces

19.1.7 R Exercise R
Solution

R R
Derive this from the Cauchy-Schwarz inequality.

19.1.21 Exercise Solution


228 CHAPTER 24. HINTS

R Theorem 11.1.5.

R R
19.3.20 Exercise
Prop. 19.1.21.
R Solution

R
19.4.19
f 0 (0) = 0.
Exercise R Solution

R
19.4.22 Exercise

R Solution
Define U = {v0 } , and for each u ∈ U \ {0}, consider the function fu : R → R defined by
fu (t) = Rϕ (v0 + tu). Apply the preceding exercise to this function.

24.18 Hermitian Spaces

R
20.1.19 Exercise R Solution
Derive this from the Cauchy-Schwarz inequality.

R R
20.1.32 Exercise
Theorem 11.1.5.
R Solution

R
20.2.2 Exercise R Solution
For an eigenvalue λ, you need to show that λ = λ. The proof is just one line. Consider the
expression x∗ Ax.

R
20.2.5 Exercise R Solution
Which diagonal matrices are Hermitian?

R R
20.4.1 Exercise
Prop. 20.1.32.
R Solution
24.19. THE SINGULAR VALUE DECOMPOSITION 229

24.19 The Singular Value Decomposition

21.2.7 R Exercise R Solution


It suffices to show that there exists v ∈ ker ψ such that kϕvk ≥ σ`+1 kvk (why?). Pick v ∈
ker ψ ∩ span(e1 , . . . , e`+1 ) and show that this works.

24.20 Finite Markov Chains

R
22.1.5 Exercise R Solution

R R
For a general matrix B ∈ Mn (R), what is the i-th entry of B1?

R
22.2.10 Exercise Solution

R R
Prop. ??.

22.6.3 Exercise Solution


Marriage Theorem.
Chapter 25

Solutions

25.1 Column Vectors

1.1.16 R Exercise R Hint

     
−5 −2 2
−1 = 2  1  − 1 6
2
11 7 6

Babai: Discover Linear Algebra. 230 This chapter last updated August 9, 2016
c 2016 László Babai.
25.2. MATRICES 231

25.2 Matrices

25.3 Matrix Rank

25.4 Theory of Systems of Linear Equations I: Qualitative


Theory

25.5 Affine and Convex Combinations

25.6 The Determinant

25.7 Theory of Systems of Linear Equations II: Cramer’s


Rule

25.8 Eigenvectors and Eigenvalues

8.1.3 R Exercise
n
R Hint

R R
Every vector in F is an eigenvector of In with eigenvalue 1.

8.4.2 Exercise Hint


2
fA (t) = t − (α + δ)t + (alphaδ − βγ) .

25.9 Orthogonal Matrices

25.10 The Spectral Theorem

10.2.2 R Exercise R Hint


232 CHAPTER 25. SOLUTIONS

n
! n
!
X X
T
v Av = αi bTi A αi bi
i=1 i=1
n
! n
!
X X
= αi bTi αi Abi
i=1 i=1
n
! n
!
X X
= αi bTi αi λi bi
i=1 i=1
n X
X n
= λi αi αj bTj bi
i=1 j=1
n X
X n
= λi αi αj (bj · bi )
i=1 j=1
Xn X n
= λi αi αj δij
i=1 j=1
Xn
= λi αi2
i=1

10.2.4 R Exercise R Hint


First suppose A is positive definite. Let v be an eigenvector of A. Then, because v 6= 0, we have
0 < vT Av = vT λv = λvT v = λkvk2 .
Hence λ > 0.
Now suppose A is symmetric and all of its eigenvalues are positive. By the Spectral Theorem,
k
P(b1 , . . . , bn ), with Abi = λi bi for all i. Let v ∈ F be a
A has an orthonormal eigenbasis, say b =
nonzero vector. Then we can write v = αi bi for some scalars αi . Then
n
X
T
v Av = λi αi2 > 0
i=1

so A is positive definite.
25.11. BILINEAR AND QUADRATIC FORMS 233

25.11 Bilinear and Quadratic Forms

11.3.3
B= A+AT
R Exercise R Hint
. Verify that this and only this matrix works.
2

25.12 Complex Matrices


25.13 Matrix Norms
25.14 Basic Algebra
25.15 Vector Spaces
25.16 Linear Maps
25.17 Minimal Polynomials of Matrices and Linear Trans-
formations

18.2.10
Qn 0
R Exercise
mA = i=1 (t − λi ) where
Q0
R Hint
means that we take each factor only once (so eigenvalues λi with
multiplicity greater than 1 only contribute one factor of t − λi ).

25.18 Euclidean Spaces


25.19 Hermitian Spaces
25.20 Finite Markov Chains
Index

0, 4, 13, 17, 18 Cauchy-Binet formula, 76


1, 5, 17, 28 Cayley-Hamilton Theorem, 87, 87–88, 112
for diagonal matrices, 87
Affine basis, 56, 56
for diagonalizable matrices, 88
Affine combination, 53, 53, 58
Characteristic polynomial, 84, 84–89
Affine hull, 54, 54–55, 58
of nilpotent matrices, 87
Affine independence, 56, 56
Affine subspace, 51 of similar matrices, 88
of Fk , 54, 54–57 Codimension, 45
codimension of, 57 of an affine subspace, 57
dimension of, 55, 55–56 Cofactor, 71
intersection of, 54, 55 Cofactor expansion, 71, 71–73
All-ones matrix, see J matrix skew, 73
All-ones vector, see 1 Column space, 39, 41
Column vecotr
Basis orthogonality of, 109
of a subspace of Fk , 14, 14–15 Column vector, 3, 3–19
of a subspace of F k, 15 addition of, 5, 5
of Fk , 15, 52
dot product of, 16, 16–18, 28, 91, 98, 99,
Bilinear form
105
nonsingular, 105
isotropic, 105, 105–106
over Fn , 98, 98–99
linear combination of, 6, 6–8, 16, 29
nonsingular, 98, 99
linear dependence of, 11, 11–69
representation theorem for, 98
Linear independence of, 15
Cn linear independence of, 11, 11–13, 18, 37,
orthonormal basis of, 109, 110 42, 46, 51, 79

235
236 INDEX

multiplication by a scalar, 6, 6 Diagonalizable matrix, 81, 81


norm of, 109 Dimension, 13, 15
orthogonality, 17, 17–18 of a convex set, 59
orthogonality of, 28 of an affine subspace, 55, 55–56
with respect to a bilinear form, 104 Discrete Fourier transform, 111, 111
parallel, 13, 13 Dot product, 16, 16–18, 28, 91, 98, 99, 105
Column-rank, 39, 39–41 Dual space
full, 39, 43 of Fn , 97
Column-space, 39
Companion matrix, 89, 89 Eigenbasis
Complex conjugate, 107 of Fn , 79, 79
Complex number, 107, 107–108 orthonormal, 92, 94, 95, 112
absolute value of, 107 Eigenspace, 80
imaginary part of, 107 Eigenvalue
magnitude of, 107 interlacing of, 96
norm of, 107 of a matrix, 78, 78–80, 85, 86, 102, 114,
real part of, 107 115
unit norm, 107, 107–108 algebraic multiplicity of, 85, 85–86
Convex combination, 57–58 geometric multiplicity of, 80, 80, 85, 86,
Convex hull, 58, 58–59 89
Convex set, 58, 58–59 left, 79, 79–80
dimension of, 59 of a real symmetric matrix, 94
intersection of, 58 Eigenvector
Coordinates, 95 of a matrix
Corank left, 80
of a matrix, 46, 46 of a matrix, 78, 78–80
of a set, 45 left, 79
Courant-Fischer Theorem, 95 Elementary column operation, 40, 40–41, 46
Cramer’s Rule, 74, 74 and the determinant, 68
Elementary matrix, 40
Determinant, 52, 62, 65, 65–76, 86, 103 Elementary row operation, 40, 40–42, 46
corner, 103, 103–104 Empty list, 7
multilinearity of, 67 Empty sum, 4
of a matrix product, 68 Euclidean norm, 18, 75, 91, 110
of inverse matrix, 68 complex, 109
INDEX 237

Fk , 4, 21 Hyperplane, 57, 57
basis of, 15, 52 intersection of, 57
dimension of, 15 linear, see Linear hyperplane
standard basis of, 14, 26, 28
subspace of, 8, 8–10, 57 I, see Identity matrix
codimension of, 45, 45 Identity matrix, 26, 26, 43, 78
dimension of, 13, 13–15 Incidence vector, 18, 18–19
disjoint, 15
J matrix, 21, 28
intersection of, 9
Jordan block, 37
totally isotropic, 105, 105–106
trivial, 9 k-cycle, 64, 65
union of, 9 Kronecker delta, 26
Fk×n , 21
F[t], 82 `2 norm, 109
Field, 4 `2 norm, 18
of characteristic 2, 75 Linear combination
field affine, see Affine combination
algebraically closed, 84 of column vectors, 6, 6–8, 10, 16, 29
First Miracle of Linear Algebra, 15, 15 trivial, 11
Frobenius norm, 114, 114–115 Linear dependence
submultiplicativity of, 115 of column vectors, 11, 11–69
Fundamental Theorem of Algebra, 84 Linear form
over Fn , 97, 97, 101
Gaussian elimination, 40 Linear hyperplane, 57
Generalized Fisher Inequality, 19 intersection of, 57
Group Linear independence
orthogonal, see Orthogonal group maximal, 14, 14
symmetric, see Symmetric group of column vectors, 11, 11–13, 15, 18, 37,
unitary, see Unitary group 42, 46, 51, 79
List, 11
Hadamard matrix, 92, 92 concatenation of, 11
Hadamard’s Inequality, 75, 76 empty, 12
Half-space, 59, 59 List of generators, 14, 14
Helly’s Theorem, 59, 59
Hermitian dot product, 108, 108–110 Mn (F), 21
238 INDEX

Matrix, 20, 20–37 multiplication by a scalar, 24, 24


addition of, 23, 23 negative definite, 102
adjugate of, 73 negative semidefinite, 102
augmented, 51 nilpotent, 28, 28, 79–81
block, 40, 41, 178, 178 characteristic polynomial of, 87
multiplication of, 179 nonnegative, 211
block-diagonal, 178, 179–180 nonsingular, 42, 42–43, 51, 68, 70, 73, 74,
block-triangular, 179, 180, 181 79, 98
characteristic polynomial of, 84, 84–89 normal, 111, 111–112, 115
circulant, 37, 37, 110, 111 null space of, 45, 49, 49
commutator of, 36, 36 nullity of, 49
conjugate-transpose of, 108, 108 orthogonal, 91, 91–92, 113–115
corank of, 46, 46 orthogonally similar, 92, 92–95
corner, 103, 103 permutation, see Permutation matrix
definiteness of, 102, 102–103 positive definite, 95, 95, 102, 114
diagonal, 21, 21, 27, 31, 36, 78, 102, 110 positive semidefinite, 102
determinant of, 67 rank, 42, 42–47, 71, 109
diagonalizable, 81, 81, 112 full, 42
doubly stochastic, 215, 215 right inverse of, 43, 43–44, 52
exponentiation, 27, 27, 31, 33, 180, 181 rotation, see Rotation matrix
Hermitian, 110, 110–111 scalar, 26, 36, 79
Hermitian adjoint of, 108, 108 self-adjoint, 110, 110–111
identity, see Identity matrix similarity of, 81, 81, 87
indefinite, 102 skew-symmetric, 75
integral, 21, 36 square, 21, 23, 42
inverse, 43, 43–44, 73 stochastic, 114, 211, 211–212
inverse of, 52 strictly lower triangular, 22, 22
invertibility of, 43, 43–44, 52, 68 strictly upper triangular, 22, 22, 28
kernel of, 45, 49 symmetric, 23, 23, 36, 92, 94, 95, 114, 115
left inverse of, 43, 43–44, 52 trace of, 29, 29–30
lower triangular, 22, 22 transpose, 22, 23, 24, 36
multiplication, 24, 24–30 determinant of, 66
and the determinant, 68 unitary, 110, 110–111
associativity of, 24 unitary similarity of, 111, 111–112
distributivity of, 24 upper triangular, 22, 22, 32–33
INDEX 239

determinant of, 67 Orthonormal system, 18, 91, 109, 110


Vandermonde, 111
zero, see Zero matrix Pn (F), 82
Modular equation, 15 Parallelepiped, 76, 76
for codimension, 45 Parallelogram, 76, 76
Monomial Permutation, 34, 34–35, 62, 62–65
multivariate, 99, 99 composition of, 34, 34, 63, 64
degree of, 99 cycle, 64, 65
monic, 100 decomposition, 65
monic part of, 100 disjoint, 65, 65
Multilinearity, 67 even, 63, 63
of the determinant, 67 fixed point of, 72, 73
identity, 35
Norm inverse of, 35, 35
of column vectors, 18, 75, 91, 109, 110 inversion of, 62, 62–65
Null space odd, 63
of a matrix, 45, 49, 49 parity, 63, 63–65
Nullity product of, 63, 64
of a matrix, 49 sign, 63, 64
Permutation matrix, 33
Operator norm Polynomial, 31, 82, 82–84, 86
of a matrix, 113, 113–115 degree of, 82, 82
submultiplicativity of, 113 divisibility of, 82
Orthogonal group, 91 leading coefficient of, 82, 82
Orthogonal matrix leading term of, 82, 82
eigenvalues of, 91 monic, 82
Orthogonal system, 18, 109 multivariate, 99, 99–101
Orthogonality degree of, 100
of column vectors, 17, 17–18, 28, 80, 109 homogeneous, 100, 100–101
with respect to a bilinear form, 104 standard form, 100
Orthogonally similar matrices, 92, 92–95, 104 of a matrix, 37
Orthonormal basis of matrices, 31, 31, 33, 86, 86–88, 180,
of a subspace of Rn , 18 181
of Cn , 109, 110 root of, 83, 83–84
of Rn , 91, 92 multiplicity, 83
240 INDEX

zero, 82, 82 affine closed, 56


affine-closed, 53, 53
Quadratic form, 101, 101–104 intersection of, 54
definiteness of, 102, 102–104 closed, 59
indefinite, 102 convex, 58, 58–59
negative definite, 102 dimension of, 59
negative semidefinite, 102 intersection of, 58
positive definite, 102 corank of, 45
positive semidefinite, 102 open, 59, 196
Rn sum of, 10, 10, 15
orthonormal basis of, 91, 92 translate of, 51, 51, 54
subspace of, 9 Similar matrices, 81, 81, 87, 103, 104
Radon’s Lemma, 59 Skew cofactor expansion, 73
Rank Span
of a matrix, 42, 42–47, 71, 109 of column vectors, 9, 9–10, 13, 52
full, 42 transitivity of, 10
of column vectors, 13, 13–15 Spectral Theorem
Rank-Nullity Theorem for normal matrices, 112
for matrices, 50 for real symmetric matrices, 94, 94–96,
Rayleigh quotient, 95, 101 110, 112
Rayleigh’s Principle, 95 applications of, 95–96
Reflection matrix, 115 Standard basis of Fk , 14, 26, 28
Rook arrangement, 33, 33 Standard unit vector, 14, 26
Rotation matrix, 26, 26 Steinitz exchange lemma, 15
Row space, 39 Straight-line segment, 58, 58
Row vector, 21, 21 Sublist, 12, 12
Row-rank, 39 Submatrix, 36, 42, 42, 69
full, 39, 43 Subset
dense, 112, 112
Sn , see Symmetric group Subspace
Scalar, 4 affine, see Affine subspace
Schur’s Theorem, 112 of Fk , 8, 8–10, 57
real version, 112 basis of, 14, 14–15
Second Miracle of Linear Algebra, 41, 41 codimension of, 45, 45
Set dimension of, 13, 13–15
INDEX 241

disjoint, 15 Transposition, 63, 63–64


intersection of, 9 neighbor, 64, 64
totally isotropic, 105, 105–106 Triangle inequality, 113, 115
union of, 9
zero-weight, 9 Unitarily similar matrices, 111, 111–112
Symmetric group, 62, 63, 64 Unitary group, 110
System of linear equations, 7, 48–52, 61 Unitary matrix
augmented, 51 eigenvalues of, 110
homogeneous, 49, 55, 69
solution space of, 49, 49–51 Vandermonde determinant, 72
trivial solution to, 49 Vandermonde matrix, 36, 37, 72
set of solutions to, 51, 55 Vector
solvable, 49, 51–52, 74 isotropic, 17, 17

Third Miracle of Linear Algebra, 91, 110 Zero matrix, 21, 29, 42
Trace Zero polynomial, 82, 82
of a matrix, 29, 29–30, 86 Zero vector, see 0

You might also like