0% found this document useful (0 votes)
113 views

Lecture Notes On Linear Algebra: Christian S Amann

This document provides lecture notes on the topic of linear algebra. It begins with an introduction to Euclidean space and matrices. It then covers systems of linear equations, vector spaces, inner product spaces, linear transformations, and eigenvalues and eigenvectors. The notes are intended to teach second year university students the concepts and proofs of linear algebra. It emphasizes learning to understand theorems and proofs through constructing examples and counterexamples. Assessment is based on exams, online tests, and working tutorial problems.

Uploaded by

Tina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

Lecture Notes On Linear Algebra: Christian S Amann

This document provides lecture notes on the topic of linear algebra. It begins with an introduction to Euclidean space and matrices. It then covers systems of linear equations, vector spaces, inner product spaces, linear transformations, and eigenvalues and eigenvectors. The notes are intended to teach second year university students the concepts and proofs of linear algebra. It emphasizes learning to understand theorems and proofs through constructing examples and counterexamples. Assessment is based on exams, online tests, and working tutorial problems.

Uploaded by

Tina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Lecture Notes on Linear Algebra

Christian Sämann

Version September 15, 2014


Contents
Preface 4
Course summary 5
I Euclidean space 7
I.1 The vector space R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
I.2 The vector spaces R3 and Rn . . . . . . . . . . . . . . . . . . . . . . . . 10
I.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
II Systems of linear equations 19
II.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
II.2 Basic notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
II.3 Gaußian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
II.4 Homogeneous and inhomogeneous SLEs . . . . . . . . . . . . . . . . . . 25
II.5 Inverting matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
III Vector spaces 27
III.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
III.2 Vector subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
III.3 Linear combination and span . . . . . . . . . . . . . . . . . . . . . . . . 30
III.4 Linear dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
III.5 Basis and dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
IV Inner product spaces 37
IV.1 Inner products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
IV.2 Orthogonality of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
IV.3 Gram-Schmidt algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 41
V Linear transformations 43
V.1 Examples and general properties . . . . . . . . . . . . . . . . . . . . . . 44
V.2 Row and column spaces of matrices . . . . . . . . . . . . . . . . . . . . 45
V.3 Range and kernel, rank and nullity . . . . . . . . . . . . . . . . . . . . . 48
V.4 Orthogonal linear transformations . . . . . . . . . . . . . . . . . . . . . 51
VI Eigenvalues and eigenvectors 52
VI.1 Characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . 52
VI.2 Diagonalising matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
VI.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
VII Advanced topics 65
VII.1 Complex linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
VII.2 A little group theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
VII.3 Special relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Appendix 81
A Linear algebra with SAGE . . . . . . . . . . . . . . . . . . . . . . . . . 81
Index 84

3
Preface
These are the lecture notes of the course Linear Algebra F18CF taught at Heriot-Watt
University to second year students. While the material covered in this course is itself very
important for later courses, this course is mainly the first one to teach you how to prove
theorems. Linear Algebra is in fact the ideal course to learn this, as the proofs are rather
short, simple, and less technical than those in Analysis. Traditionally, switching from
algorithmic work to problem solving and proving theorems is very difficult for students,
and this course tries to ease the transition.
To understand theorems and proofs it is necessary to try to construct examples and/or
counterexamples to the statements. Playing and experimenting with definitions and the-
orems is one of the key activities for understanding mathematics. I therefore included an
appendix giving an introduction to doing Linear Algebra with the computer algebra pro-
gramme SAGE. SAGE can be a valuable help in playing and experimenting, as it does most
of the tedious calculations for you.
Please note that these notes may still contain typos and other mistakes. If you should
find something that requires corrections (or if you have a good suggestion for improving
these notes), please send an email to [email protected]. The lecture notes were created
relying on material from many different sources and are certainly not meant to be original.
Finally , I’d like to thank all students who spotted typos and took the time to let me
know, in particular Simone Rea.

Christian Sämann

Remarks on Nomenclature and Notation


You should know what a Definition is. A Theorem is a mathematical (hopefully) pro-
found and proved statement, and this is what we are most interested in. A Proposition
is a less profound statement, also coming with a proof. A Lemma is a proved auxiliary
statement1 , usually used in the larger proof of a theorem. A Corollary is an interesting
consequence that follows almost immediately from a theorem.
A F marks additional reading material or questions to think about further. The end of
a proof is marked by . We use the convention that a := b and b =: a mean a is defined
as b. (In other sources, this is often written as a ≡ b.) Vectors are almost always labelled
by underlined characters x, v, u, ... While our vectors in Rn are always column vectors, we
also use row notation followed by a T for transpose to simplify typesetting. For example,
x = (1, 2, 3)T ∈ R3 . N∗ , R∗ etc. denote the sets N\{0}, R\{0} etc. We label the set of
functions f : R → R that are smooth, i.e. that can be differentiated infinitely many times,
by C ∞ (R).

1
Although there are many very deep statements that are usually called Lemma, see Wikipedia’s list of
Lemmata. Some people claim that a good lemma is worth a thousand theorems.

4
Course summary
Outline

 Euclidean Space. Vector space Rn , Matrices, Basic matrix operations, Determinants


 Systems of Linear Equations. Gaußian elimination, Results on Homogeneous and
Inhomogeneous Linear Systems, Matrix Inversion
 Vector Spaces. Definition and examples of Vector Spaces, Subspaces, Span, Linear
Independence, Bases and Dimension
 Inner Product Spaces. Inner Products, Cauchy-Schwartz Inequality, Orthogonality,
Orthogonal Projection, Orthonormal Bases, Gram-Schmidt Process, Vector Products
 Linear transformations. Row and Column Rank of a Matrix, Applications to Sys-
tems of Linear Equations, Range, Kernel, Rank and Nullity, Invertibility of Linear
Transformations, Linear Transformations and Matrices
 Eigenvalues and Eigenvectors. Calculation of Eigenvalues and Eigenvectors, Sym-
metric Matrices, Diagonalisation of a Matrix, Cayley-Hamilton Theorem

Assessment and feedback


The final mark will be composed of the following: 70% mark in the final exam, 10% mark
in the midterm, 2% marks in each of the 9 online multiple-choice tests. You will get the
remaining 2%, if you show me that you worked on the tutorial problems at home. You can
use electronic calculators in exams, however, the usual restrictions apply. The multiple-
choice test will give you a good feeling of how well you are progressing.

How to get the most out of this course


It is expected that you participate in both lectures and tutorials. For each hour spent in a
lecture, you should spend roughly another hour at home to go through the material once
more. Do not expect to understand all the material immediately when it is presented in
class. Mathematics is difficult and requires considerable effort to learn. For this, it is also
necessary to try to solve the exercises on the tutorial sheets before coming to the tutorials.
The most effective and fun way to do this is to form small groups to work on the tutorials.
Make sure that you are able to explain the solutions to each other. Try to work continuously
throughout the term, this will make the preparation for the exam much easier for you. It
is very important that you keep a tidy set of lecture notes for yourself. Understanding the
material in class without lecture notes will be very difficult. As a backup, the lecture notes
required for the current tutorial will be made available on VISION when the tutorial is
handed out in class.

Recommended textbooks
This course is fairly self-contained and the material covered in the published lecture notes
is certainly sufficient to pass the exam. To deepen your knowledge in Linear Algebra, you
could use one of the following textbooks:

5
 Linear Algebra - Concepts and Methods by Martin Anthony and Michele Harvey.
 Linear Algebra Done Right by Sheldon Axler, Springer.
 Introduction to Linear Algebra by Gilbert Strang, Cambridge University Press.

Note that there is a wealth of lecture notes and other material freely available on the
internet. The book Linear Algebra in Schaum’s outline series is available on VISION.

By the end of the course, you should be able to...

B Use the Gaußian elimination procedure to determine whether a given system of simul-
taneous linear equations is consistent and if so, to find the general solution. Invert a
matrix by the Gaußian elimination method.
B Understand the concepts of vector space and subspace, and apply the subspace test to
determine whether a given subset of a vector space is a subspace.
B Understand the concepts of linear combination of vectors, linear (in)dependence, span-
ning set, and basis. Determine if a set of vectors is linearly independent and spans a
given vector space.
B Find a basis for a subspace, defined either as the span of a given set of vectors, or as
the solution space of a system of homogeneous equations.
B Understand the concept of inner product in general, and calculate the inner products
of two given vectors.
B Understand the concept of orthogonal projection and how to explicitly calculate the
projection of one vector onto another one. Use the Gram-Schmidt method to convert a
given basis for a vector space to an orthonormal basis. Understand (geometrically) the
concept of the vector product and calculate the vector product of two given vectors.
B Find the coordinates of a given vector in terms of a given basis - especially in the case
of an orthogonal or orthonormal basis.
B Calculate the rank of a given matrix and, from that, the dimension of the solution space
of the corresponding system of homogeneous linear equations. Calculate the determinant
of 2 × 2 and 3 × 3 matrices.
B Understand the concepts of linear transformation, range and kernel (nullspace). Un-
derstand the concept of invertibility of a linear transformation including injectivity and
surjectivity.
B Know and apply the Rank-Nullity Theorem.
B Compute the characteristic polynomial of a square matrix and (in simple cases) factorise
to find the eigenvalues.
B Determine whether a given square matrix is diagonalisable, and if so find a diagonalising
matrix.
B Apply the Cayley-Hamilton Theorem to compute powers of a given square matrix.

6
I Euclidean space
We start by recalling the definition of the euclidean spaces R2 and R3 , together with some
related elementary facts. These spaces serve as intuitive pictures for many of the definitions
introduced in this course. We then review briefly matrices and their action on vectors of
Rn .

I.1 The vector space R2


§1 The real line. Consider the real line R. Each number r corresponds to a point on the
line, which we can equally well regard as an arrow starting at 0 and ending at the number:

0 r

Adding two numbers r1 and r2 corresponds to taking one of the corresponding arrows and
aligning its tail on the tip of the other arrow. The former arrow’s tip is then the tip of the
arrow corresponding to r1 + r2 .

0 r1 r2 r1 + r2

We can also stretch an arrow by multiplying the corresponding number by another number:

−1.5r 0 r 2r

§2 Euclidean plane. On the real line, a point and the corresponding arrow encodes merely
a length. On the plane R2 , an arrow corresponding to a point encodes both a length and a
direction. We call such an arrow a vector. Again, we can add and stretch vectors, as before:

v1
v1 + v2

−2v 2

0
v2

Note that we denoted the origin of the Euclidean plane by 0. Each point and thus each
vector v ∈ R2 can be denoted by a pair of numbers v = (v1 , v2 )T (thus the notation R2 ),

7
I Euclidean space

giving the horizontal and vertical distance of the tip of the vector from the origin:

v2
v

0 v1

It is a very useful convention to write vectors as columns. For typesetting reasons, we


wrote above v = (v1 , v2 )T . The T , short for transpose, indicates that we actually mean the
corresponding column vector:
!
T v1
v = (v1 , v2 ) = . (I.1)
v2

We have the following rules for adding two vectors v = (v1 , v2 )T and w = (w1 , w2 )T and
stretching by λ ∈ R:
! ! ! !
v1 w1 v 1 + w1 λv1
v+w = + = and λv = . (I.2)
v2 w2 v 2 + w2 λv2

The origin or null vector 0 has coordinates 0 = (0, 0)T . Note that if we stretch a vector by
the factor 0, we obtain the null vector: 0v = 0 for all v ∈ R2 .
§3 Linear combination. We can combine both vector addition and stretching into ex-
pression like this:
u = λv + κw , (I.3)

where u, v, w ∈ R2 and λ, κ ∈ R. More generally, we can have expressions like

u = λ1 v 1 + λ2 v 2 + λ3 v 3 + · · · + λn v n , (I.4)

where u, v i ∈ R2 , λi ∈ R and n ∈ N. Such expressions are called linear combinations.


Denote by e1 and e2 the horizontal and vertical vectors of length one. The coordinates
v1 , v2 of a vector v then give the coefficients of a linear combination of e1 and e2 , which
equals v:
v = v1 e1 + v2 e2 . (I.5)

For example, (5, 3)T ∈ R2 can be written as a linear combination of the vectors (1, 0)T and
(0, 1)T : (5, 3)T = 5(1, 0)T + 3(0, 1)T . It cannot be written as a linear combination of (2, 1)T
and (4, 2)T .
§4 Parallel vectors and linear dependence. Two vectors v, w ∈ R2 are called parallel,
if one is obtained by stretching of the other. That is, there is a λ ∈ R such that

v = λw or w = λv . (I.6)

8
I.1 The vector space R2

Note that if w = 0 and v 6= 0, there is no λ with v = λw, however, with λ = 0, we have


w = λv. A better definition for parallel vectors is therefore the following: Two vectors v, w
are called parallel, if there is a non-trivial linear combination of the null-vector. That is,
there are constants λ, κ not both equal 0 such that

λv + κw = 0 . (I.7)

It should be clear that the above two definitions of parallel vectors are equivalent. One
also says that the vectors v and w are linearly dependent if they are parallel and linearly
independent otherwise.
F Usually, if v 1 is parallel to v 2 , and v 2 is parallel to v 3 , then v 1 is parallel to v 3 . When is
this not true?
§5 Lemma. Every vector u ∈ R2 can be obtained from a linear combination of two vectors
v, w ∈ R2 that are not parallel. (For example, the vector u = (u1 , u2 )T can be written as a
linear combination of the two vectors v = (1, 0)T and w = (1, 1)T : u = (u1 − u2 )v + u2 w.)
Proof: Consider the linear combination
! ! ! !
u1 λv1 κw1 λv1 + κw1
u= = λv + κw = + = . (I.8)
u2 λv2 κw2 λv2 + κw2

We calculate the solution to this:


u1 = λv1 + κw1 u1 w2 = λv1 w2 + κw1 w2 u1 v2 = λv1 v2 + κv2 w1
(I.9)
u2 = λv2 + κw2 u2 w1 = λv2 w1 + κw1 w2 u2 v1 = λv1 v2 + κv1 w2 .
Subtracting the second and third pair of equations yields
u1 w2 − u2 w1 u2 v1 − u1 v2
λ= and κ = , (I.10)
v 1 w2 − v 2 w1 v2 w1 − v1 w2
which are well-defined, if v1 w2 − v2 w1 6= 0. The latter is equivalent to v not being parallel
to w.
§6 Span and basis. Every vector in R2 can be obtained as a linear combination of the
vectors e1 and e2 . That is, these two vectors span R2 . In general, the span of a set of vetors
{v 1 , . . . , v n } is the set of their linear combinations.
Consider now two arbitrary vectors v, w. If v = w = 0, the span of v and w is just the
{0}. If they are parallel and v or w are non-vanishing, then the span of v and w is a straight
line in R2 through the origin 0. Otherwise, by lemma §5, any vector can be obtained as a
linear combination of v and w and therefore their span is R2 .
A basis of R2 is a minimal set of vectors that spans R2 . The vectors e1 and e2 introduced
in §3 form a basis of R2 : their span is R2 and we need both vectors to span R2 . On the
other hand, (e1 , e2 , v) is not a basis of R2 . We could write any vector u as the linear
combination
u = u1 e1 + u2 e2 + 0v (I.11)
for any vector v ∈ R2 and therefore the third vector is superfluous.

9
I Euclidean space

§7 Scalar product and norm. We can introduce the map h·, ·i : R2 × R2 → R defined
as
hu, vi := u1 v1 + u2 v2 . (I.12)
This map is called the vector product, the scalar product or, most appropriately, the inner
product. If we take the inner product of a vector v with itself, hv, vi, we obtain the square
of the length2 of this vector. We call the length of a vector its norm and define
p
||v|| := hv, vi . (I.13)

§8 Lemma. Two vectors are perpendicular, if their inner product vanishes.


Proof: If two vectors v, w ∈ R2 are perpendicular, then because of Pythagoras, we have
||v||2 + ||w||2 = ||v + w||2 or

v12 + v22 + w12 + w22 = v12 + v22 + 2v1 w1 + 2v2 w2 + w12 + w22 = v12 + v22 + w12 + w22 + 2hv, wi . (I.14)

It follows that hv, wi = 0.


§9 Angles. For parallel vectors v, w ∈ R2 , we have the formula hv, wi = ||v|| · ||w|| and the
above lemma states that for perpendicular vectors v, w, it is hv, wi = 0. These are extreme
cases of the general formula

hv, wi = ||v|| · ||w|| cos(v, w) , (I.15)

where cos(v, w) denotes the cosine of the angle3 between v and w. The above relation
follows rather directly from the law of cosines and equation (I.14): Given a triangle with
sides a, b, c and angle γ opposing side c, we have

c2 = a2 + b2 − 2ab cos(γ) . (I.16)

1.1 F Complete the proof.


1.2
I.2 The vector spaces R3 and Rn
Most of the notions introduced on R2 straightforwardly extend to R3 . The big difference
is that the meaning of linear dependence becomes slightly more involved.
§1 Linear combinations in R3 . A point p in three dimensions encodes a vector v ∈ R3 ,
i.e. an arrow from the origin 0 to the point p. We describe the vector v by a triple of
numbers v = (v1 , v2 , v3 )T . We can add and stretch vectors as before:
       
v1 w1 v 1 + w1 λv1
v + w =  v2  +  w2  =  v2 + w2  and λv =  λv2  . (I.17)
       
v3 w3 v 3 + w3 λv3
2
To see this, draw a picture, insert the coordinates of the vector v and use Pythagoras’ theorem.
3
The cosine ignores the orientation of the angle.

10
I.2 The vector spaces R3 and Rn

Linear combinations of vectors in R3 are again expressions of the form

λu + κv + µw or λ1 u1 + λ2 u2 + λ3 u3 + · · · + λn un , (I.18)

where λ, κ, µ, λi ∈ R, v, w, ui ∈ R3 and n ∈ N. For example, (5, 3, 2)T ∈ R3 is not a linear


combination of (1, 0, 0)T and (0, 1, 0)T . It is, however, a linear combination of (1, 0, 0)T ,
(0, 1, 0)T and (0, 0, 1)T :
       
5 1 0 0
 3  = 5 0  + 3 1  + 2 0  . (I.19)
       
2 0 0 1

§2 Linear dependence. If two vectors u, v ∈ R3 are parallel to each other, i.e. one is a
multiple of the other, the set of their linear combinations or their span is just 0 if u = v = 0,
and a line through the origin otherwise. If they are not parallel, their linear combinations
form a plane containing the null-vector 0. A third vector w is an element of this plane, if
it can be written as a linear combination of the other two:

w = λu + κv . (I.20)

Alternatively, we can extend our notion of linear dependence: We call three vectors u, v, w
linearly dependent, if there is a non-trivial linear combination (i.e. at least one of the
constants λ, κ, µ is not zero) such that

λu + κv + µw = 0 . (I.21)

Otherwise, we call them linearly independent.


§3 Examples. a ) The vectors v 1 = (1, 0, 0)T , v 2 = (0, 1, 0)T and v 3 = (1, 1, 0)T are linearly
dependent, since 1 · v 1 + 1 · v 2 − 1 · v 3 = 0 (and also 2 · v 1 + 2 · v 2 − 2 · v 3 = 0 etc.).
b ) Similarly, the vectors v 1 = (3, 2, 1)T and v 2 = (0, 0, 0)T are linearly dependent, because
0 · v 1 + 1 · v 2 = 0 (and also 0 · v 1 + 2 · v 2 = 0).
c ) On the other hand, the vectors e1 = (1, 0, 0)T , e2 = (0, 1, 0)T and e3 = (0, 0, 1)T are
linearly independent, because a1 · e1 + a2 · e2 + a3 · e3 = 0 implies a1 = a2 = a3 = 0.
§4 Subspaces of R3 . Consider a set of three vectors u, v, w. Then we can have the
following three cases:

i) {u, v, w} are linearly independent: Their span is R3 .


ii) {u, v, w} are not linearly independent, but two of them are: Their span is a plane
containing 0 in R3 .
iii) All vectors in {u, v, w} are parallel, but not all of them are null-vectors: Their span
is a line containing 0 ∈ R3 .
iv) u = v = w = 0: Their span is the set {0}.

11
I Euclidean space

§5 Examples. a ) The vectors (1, 2, 3)T , (1, 1, 1)T , (2, 3, 4)T span a plane in R3 , as the
vectors are not linearly independent: (1, 2, 3)T + (1, 1, 1)T − (2, 3, 4)T = 0.
b ) The vectors (1, 1, 1)T , (0, 0, 0)T span a line in R3 .
§6 Basis. Any vector in R3 can be written as a linear combination of three linearly inde-
pendent vectors in R3 . We therefore call such a set a basis of R3 .
§7 Inner product and angles. Inner products and angles are defined in the obvious way.
That is, the inner product between two vectors v, w ∈ R3 is given by

hv, wi := v1 w1 + v2 w2 + v3 w3 . (I.22)

The inner product of a vector with itself is again the square of its length due to Pythagoras,
and the norm of a vector v ∈ R3 is therefore defined as ||v|| := hv, vi.
p

Note that two vectors in R3 are either parallel or define a plane. Within this plane,
we can still define an angle between the vectors. (If the vectors are parallel, we define the
angle between the vectors to be 0.) We can compute this angle via the formula

hv, wi = ||v|| · ||w|| cos(v, w) , (I.23)

which follows again from the law of cosines, however in a slightly more involved fashion than
in R2 . Two vectors are perpendicular iff (i.e. if and only if) their scalar product vanishes.
§8 Cross product. In R3 (and only in R3 ) there is a further product mapping two vectors
into another vector. Given two vectors v, w ∈ R3 , the product vanishes if the vectors are
parallel. Otherwise, it equals the unique vector that is perpendicular to the plane spanned
by v and w (the orientation is here important) and whose norm equals the area of the
parallelogram with sides v and w. Explicitly, one has the following formula:
     
v1 w1 v2 w3 − v3 w2
v × w =  v2  ×  w2  :=  v3 w1 − v1 w3  . (I.24)
     
v3 w3 v1 w2 − v2 w1

F Derive this formula from the description.


This formula is very useful to analyse vectors in R3 . For example, to verify if three vec-
tors u, v, w are linearly independent, one computes h(u × v), wi. The vectors are linearly
dependent iff this expression vanishes. F Why is this the case?
§9 Example. Consider the vectors (1, 1, 1)T , (1, 2, 3)T and (1, −2, 0)T . The first two vec-
tors are linearly independent. Their cross product is (1, 1, 1)T × (1, 2, 3)T = (1, −2, 1)T . We
have    
* 1 1 +
 −2  ,  −2  = 5 6= 0 , (I.25)
   
1 0
and the vectors are linearly independent.

12
I.3 Matrices

§10 Rn . The generalisation to Rn should now be obvious. We write:


  
 x1 
..  x ∈ R ,
 
Rn = 

.  i i = 1, . . . , n .

 
xn

Note that we can add two elements together and multiply them by a real constant:
         
x1 y1 x 1 + y1 x1 λx1
 ..   ..   ..  .   . 
 . + . =  and λ  ..  =  .. 

.
xn yn x n + yn xn λxn

for λ ∈ R and x, y ∈ V . All the other definitions like linear combination, span, basis etc.
generalise in the obvious way, and we will come back to the details in section III.
§11 Remark. F Although it is impossible to imagine a four- or higher-dimensional space,
some intuition can be obtained from reading Edwin A. Abott’s novel “Flatland: A Romance
of Many Dimensions” from 1884. The author describes life in a two-dimensional world. As
a two-dimensional creature, he also visits a one-dimensional world and encounters a three-
dimensional sphere. What would a four-dimensional sphere passing through our three-
dimensional world look like?

I.3 Matrices
§1 Definition. A map f : Rm → Rn is called linear, if

(i) f (x1 + x2 ) = f (x1 ) + f (x2 ) for all x1 , x2 ∈ Rm ,


(ii) f (λx) = λf (x) for all λ ∈ R and x ∈ Rm .

§2 Remark. The two conditions in the definition of a linear map can be combined into
the condition f (λx1 + x2 ) = λf (x1 ) + f (x2 ) for all λ ∈ R and x1 , x2 ∈ Rm .
§3 Examples. a ) A linear map f : R → R is of the form f (x) = ax, a ∈ R. This follows
from f (x) = f (1x) = xf (1) =: xa.
b ) A linear map f : R2 → R is of the form f [(x1 , x2 )T ] = f [(x1 , 0)T ] + f [(0, x2 )T ] =
f [(1x1 , 0)T ] + f [(0, 1x2 )T ] = f [(1, 0)T ]x1 + f [(0, 1)T ]x2 =: a1 x1 + a2 x2 , a1 , a2 ∈ R.
More generally, we have the following:
§4 Matrices. A linear map f = A : Rm → Rn is of the form:
    
a11 x1 + a12 x2 + . . . + a1m xm a11 a12 . . . a1m x1
 a21 x1 + a22 x2 + . . . + a2m xm   a21 a22 . . . a2m   x2 
    
..  =  .. .. ..   ..   .
    
.   . . .  . 


an1 x1 + an2 x2 + . . . + anm xm an1 an2 . . . anm xm
| {z } | {z }
A x

The array of numbers A is called a matrix, more specifically, an n × m-matrix4 . The 1.2
1.3
13
I Euclidean space

product between the matrix A and the vector x is given as above: One takes the vector and
writes its components as rows next to the components of the matrix and sums the rows.
We will call this product (and its later generalisations) a matrix product. Matrices with m
columns can only multiply vectors in Rm .
Note that there is a matrix A with all components 0 such that Av = 0 for any v ∈ Rm .
We denote the set of all square matrices with n rows and columns by Matn . We call a
matrix in Matn of the form
 
a11 0 0
A =  0 ... 0  . (I.26)
 

0 0 ann

a diagonal matrix. The set of diagonal matrices in Matn is denoted by Diagn . The unit
matrix 1n ∈ Matn acts on any vector v ∈ Rn according to 1n v = v. It is a diagonal matrix
with entries a11 = a22 = . . . = ann = 1.

§5 Basic operations. One easily checks that the matrix product satisfies Ax1 + Ax2 =
A(x1 + x2 ) for an n × m-matrix A and vectors x1 , x2 ∈ Rm . To have Ax + Bx = (A + B)x
for n × m-matrices A, B and x ∈ Rm , we define the following sum of matrices:
   
a11 a12 . . . a1m b11 b12 . . . b1m
a21 a22 . . . a2m b21 b22 . . . b2m
   
   
 .. .. .. + .. .. .. 
. . . . . .
   
   
an1 an2 . . . anm bn1 bn2 . . . bnm
 
a11 + b11 a12 + b12 . . . a1m + b1m
a21 + b21 a22 + b22 . . . a1m + b1m
 
 
= .. ..  .
. .
 
 
an1 + bn1 an2 + bn2 . . . anm + bnm

To have the relation A(λx) = (λA)x, where λ ∈ R, we define


   
a11 a12 . . . a1m λa11 λa12 . . . λa1m
a21 a22 . . . a2m λa21 λa22 . . . λa2m
   
   
λ .. .. .. = .. .. ..  .
. . . . . .
   
   
an1 an2 . . . anm λan1 λan2 . . . λanm

We also want to have associativity of the matrix product: A(Bx) = (AB)x. This relation
holds, if we define the matrix product AB as the matrix product of A with each column

4
Note that matrices are always labelled row-column.

14
I.3 Matrices

vector of B:
  
a11 a12 . . . a1m b11 b12 ... b1p
 a21 a22 . . . a2m b21 b22 ... b2p
  
 
 . .. ..  .. .. .. 
 .
 . . . . . .
 
 
an1 an2 . . . anm bm1 bm2 . . . bmp
 
a11 b11 + . . . + a1m bm1 . . . a11 b1p + . . . + a1m bmp
= .. ..
 .
 
. .
an1 b11 + . . . + anm bm1 . . . an1 b1p + . . . + anm bmp

An important consequence of this definition is that the matrix product is not commutative.
That is, in general AB 6= BA. F Find examples for matrices A, B such that AB = BA
and such that AB 6= BA.
Finally, the transpose of a matrix is the matrix with rows and columns interchanged (in
the case of square matrices, the entries are mirrored at the diagonal):
 T  
a11 . . . a1m a11 . . . an1
 .. ..  =  .. ..  .
 . .   . . 
an1 . . . anm a1m . . . anm

This is the operation that we use to map a row vector (a 1 × n matrix) into a column vector
(a n × 1 matrix). The trace of a square matrix is the sum of its diagonal entries. That is,
 
a11 . . . a1n
tr  ... ..  := a + a + . . . + a . (I.27)

.  11 22 nn
an1 . . . ann

§6 Properties of the matrix operations. Let us sum up useful properties of the matrix
product. In the following A, B, C are matrices of a size compatible with the given products
and λ, κ are real constants. We have:

A+B =B+A , A + (B + C) = (A + B) + C , 1A = A , A1 = A ,
A(BC) = (AB)C , A(B + C) = AB + AC , (B + C)A = BA + CA ,
λ(A + B) = λA + λB , (λ + κ)A = λA + κA , (I.28)
(λκ)A = λ(κA) , λ(AB) = (λA)B = A(λB) ,
(AB) = B T AT ,
T
hx, yi = xT y .

F Prove these results. Note that if for two matrices A, B, Ax = Bx for any vector x, then
A = B.
§7 Inverse matrices. Given a matrix A ∈ Matn , another matrix B is called the inverse
of A, if AB = 1n . We then write A−1 for B. In the tutorials, we prove that if an inverse
exists it is unique and we also have A−1 A = 1n .

15
I Euclidean space

§8 Types of square matrices. On the vector space Rn , an n × n matrix can have two
kinds of actions, that can be combined arbitrarily

(1) Dilations: v ∈ V 7→ λv, λ ∈ R. More generally:


     
v1 λ1 v1 λ1 0 0
 ...  →  ...  , A= 0 ..
. 0  .
     
vn λn vn 0 0 λn

(2) Rotations: Example: V = R2 . Given a vector v = (v1 , v2 ), we want to rotate this


vector by an angle θ.
x2

v
θ
α x1

Assume that the angle between v and e1 = (1, 0) is α. We have:

v1 = R cos α ṽ1 = R cos(α + θ) = R(cos α cos θ − sin α sin θ) = cos θv1 − sin θv2
v2 = R sin α ṽ2 = R sin(α + θ) = R(cos α sin θ + sin α cos θ) = sin θv1 + cosθv2

The corresponding linear map A is given by the matrix


!
cos θ − sin θ
A= .
sin θ cos θ

§9 Example. Consider the following consecutive operations: a rotation by 45◦ = π4 ,


stretching of the second component of the vector to 0 and an inverse rotation by 45◦ = π4 .
Written in inverse order, we have:
! ! ! !
√1 √1 1 0 √1 − √12 1
− 12
2 2 2 = 2 =: P . (I.29)
− √12 √12 0 0 √1
2
√1
2
− 1
2
1
2

This matrix satisfies P 2 = P and therefore P (12 −P ) = 0. It is an example of an orthogonal


projection. In general, given a vector x its projection Pv x onto a vector v is the component
of x parallel to v: v
Pv x

16
I.3 Matrices

The above example projects onto the vector (1, −1)T : Writing the vector x as a sum
! !
1 1
x = x1 + x2 , (I.30)
−1 1

we have ! ! !
1 1 1
P x = x1 P + x2 P = x1 . (I.31)
−1 1 −1

§10 Remark. Consider two vectors x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 . Together, they


span a parallelogram with corners 0, x, y and x + y. The area, i.e. the two-dimensional
volume of this parallelogram is
q
vol = ||x|| · ||y|| · | sin(x, y)| = ||x|| · ||y|| · 1 − cos2 (x, y) . (I.32)

We can express this in terms of components of x and y and find


s
||x||2 · ||y||2 − hx, yi2 q 2
vol =||x|| · ||y|| = (x1 + x22 )(y12 + y22 ) − (x1 y1 + x2 y2 )2
||x||2 · ||y||2
! (I.33)
x 1 y1
= |x1 y2 − x2 y1 | =: | det(A)| with A := .
x 2 y2

The function det is called the determinant. We can generalise the notion of a determinant
to arbitrary n vectors in Rn (and correspondingly, to n × n matrices A), as seen in the
definition below.
§11 ε-symbol. We define5

ε123...n = +1 , εi1 ...jj...in = 0 , εi1 ...jk...in = −εi1 ...kj...in .

Examples are: ε132 = −ε123 = −1, ε112 = 0 and ε1432 = −ε1423 = ε1243 = −ε1234 = −1.
§12 Definition. The determinant of a matrix A is a function det : Matn 7→ R defined as
 
a11 a12 . . . a1n
n
 a21 a22 . . . a2n
 
 X
det(A) = det 
 .. .. .. = εi1 ...in a1i1 . . . anin .
 . . .

 i1 ,...,in =1
an1 an2 . . . ann

Here, the symbol ni1 ,...,in =1 means a summation over all indices from 1 to n: ni1 ,...,in =1 =
P P
Pn Pn Pn
i1 =1 i2 =1 · · · in =1 . 1.3
5
The ε-symbol is related to permutations, i.e. the various possible orderings, of sets. The ε-symbol is the 2.1
sign of the permutation given by the order of the indices.

17
I Euclidean space

§13 Remark. The notion of a determinant is not very intuitive, and it takes some expe-
rience and practice to get a handle on it. It is essentially a number assigned to a matrix
that contains useful information about the matrix. Consider for example the determinant
for Mat2 introduced in §10. If the vectors x and y are parallel, the enclosed volume and
thus the determinant is zero, which also implies that the corresponding matrix A = (x y)
is not invertible. This generalises to Matn . Furthermore, the determinant of a rotation is
1 and that of a dilation is the product of all the scaling factors. Projections P 6= 1 have
determinant 0 (and clearly correspond to non-invertible matrices).
§14 Special cases. a) Mat2 (“cross rule”: multiply diagonally, subtract):
2
!
a11 a12 X
det = εi1 i2 a1i1 a2i2 = ε11 a11 a21 + ε12 a11 a22 + ε21 a12 a21 + ε22 a12 a22
a21 a22 i ,i =1 1 2

= a11 a22 − a12 a21 .


b) Mat3 (“rule of Sarrus”: multiply diagonally, add and subtract):
 
a11 a12 a13
det  a21 a22 a23  =
 
a31 a32 a33
a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a11 a23 a32 − a13 a22 a31 − a12 a21 a33 .
c) Matrix in upper triangular form:
 
a11 x x x
 0 a22 x x
 

det 
 .. .. ..  = a11 a22 . . . ann . (I.34)
 . . .


0 ... 0 ann
§15 Rules. We have the following rules for computing with determinants:
1
det(AB) = det(A) det(B) , det(A−1 ) = , det(AT ) = det(A) . (I.35)
det(A)
We prove the first rule only for upper triangular matrices, i.e. matrices A and B of the form
appearing in (I.34).6 Note that the diagonal entries in the matrix AB are the products
a11 b11 , . . . , ann bnn . With formula (I.34), we can compute
det(A) det(B) = a11 a22 · · · ann b11 b22 . . . bnn = a11 b11 · · · ann bnn = det(AB) . (I.36)
The second rule follows from the first. The third can be obtained by calculating:
n n
X 1 X
det(A) = εi1 ...in a1i1 . . . anin = εi1 ...in εj1 ...jn aj1 i1 . . . ajn in
n!
i1 ,...,in =1 i1 ,...,in =1
n (I.37)
X
T
= εj1 ...jn aj1 1 . . . ajn n = det(A ) .
i1 ,...,in =1
6
A complete proof uses the fact that certain operations that can be used to transform the matrices to
upper triangular ones leave the determinant invariant. We will encounter these operations later.

18
Note that if det(A) = 0, then det(A−1 ) is not defined. As we will see later, this is due to
A−1 not being defined in this case.
F A very nice formula relating the trace, i.e. the sum of the diagonal elements, of a
matrix to its determinant is the following: log det A = tr log A. Here, the logarithm of a
matrix is defined via its Taylor series: log A = (A − 1) − 21 (A − 1)2 + 13 (A − 1)3 + · · · .
Verify this formula for 2× 2 matrices. Develop sketches of proofs of the rules (I.35) using
this formula.

II Systems of linear equations


II.1 Motivation
Below are two examples to motivate the study of systems of linear equations. Understanding
the physics/chemistry behind them is not important.
§1 Example. Consider three masses ma , mb and mc with mc = 2kg. We put them on
a beam balance and find equilibrium positions when the distances from the middle of the
beam are da = −5, db = −3 and dc = 5 or da = 2, db = −2 and dc = 1.

a b c b c a

Physics (i.e. the lever rule) tells us that this means that

5ma + 3mb = 5 · 2kg


2mb = 2ma + 1 · 2kg .

We would like to deduce the values of ma and mb from these two equations.
§2 Example. (Making TNT) We are interested in the following chemical reaction of
toluene and nitric acid to TNT and water, where the variables x, y, z and w denote the
numbers of the various molecules:

x C7 H8 + y HNO3 → z C7 H5 O6 N3 + w H2 O .

Considering the numbers of the various atoms in this reaction, we obtain the equations

7x = 7z
8x + 1y = 5z + 2w
1y = 3z
3y = 6z + 1w .

We would like to deduce the right ratios of toluene and nitric acid we need in order to have
none of these left over after the reaction took place.

19
II Systems of linear equations

§3 Remark. We see that systems of equations as above appear in many different problems.
Other subjects which make heavy use of such equations and linear algebra in general are
for example Special Relativity and Quantum Mechanics.

II.2 Basic notions

§1 Definition. A linear equation in n unknowns x1 , . . . , xn is an equation which can be


expressed in the form
a1 x1 + a2 x2 + . . . + an xn = b

with a1 , . . . , an , b ∈ R.

§2 Remarks. In principle, we could also consider complex constants. In this lecture,


however, we restrict ourselves to the case of real linear equations. Examples:

5x1 + 3x2 = 10 is a linear equation ,


+ x21=1 x22 is not a linear equation ,

sin(x ) + cos( x) = x5
2
is certainly not a linear equation .

§3 Systems of linear equations. (SLEs) Consider the system of m linear equations in


n unknowns x1 , . . . , xn :

a11 x1 + a12 x2 + . . . + a1n xn = b1


a21 x1 + a22 x2 + . . . + a2n xn = b2
.. ..
. .
am1 x1 + am2 x2 + . . . + amn xn = bm
    
a11 a12 ... a1n x1 b1
a21 a22 ... a2n x2 b2
    
    
 .. .. ..  .. = ..  .
. . . . .
    
    
am1 am2 . . . amn xn bm

We see that systems of linear equations can be written in matrix form Ax = b. Example:
! ! !
5x1 + 3x2 = 10 5 3 x1 10
= .
−2x1 + 2x2 = 2 −2 2 x2 2

§4 Definition. The SLE Ax = b is said to be

(i) consistent, if it has one (i.e. at least one) solution and


(ii) inconsistent, if it has no solution.

20
II.3 Gaußian elimination

§5 Examples.
)
x1 + x2 = 2
one solution: x1 = x2 = 1 → consistent
x1 − x2 = 0
(Solve second equation: x1 = x2 , first then implies x1 = x2 = 1.)

x1 + x2 = 2  
x1 − x2 = 0 no solution. → inconsistent

3x1 + 2x2 = 4 

(First two eqns. imply x1 = x2 = 1, which does not solve the third eqn.)
)
x1 + x2 = 2
infinitely many solutions: x2 = 2 − x1 → consistent
2x1 + 2x2 = 4
(Second equation is multiple of first → same solution.)
)
x1 + x2 = 2
no solution. → inconsistent
3x1 + 3x2 = 7
(Left- and right-hand sides of second equation are different multiples of the
first equation.)
§6 Geometric interpretation. Consider an SLE consisting of two equations in two un-
knowns. Each equation determines a line in R2 . Points (x1 , x2 ) on this line correspond to
solutions of this equation. Intersections of two lines correspond to common solutions of the
corresponding equations. The plots for the first, the third and the last examples of §4 are
given below in figure 1.

3 x2 3 x2
3 x2
2 2 2
1 1 1
x1 x1 x1
-1 1 2 3 -1 1 2 3 -1 1 2 3
-1 -1 -1

Figure 1: The three possible situations for a system of two linear equations in two unknowns:
one solution, no solution and infinitely many solutions. In the last case, the two lines are
on top of each other.

F What about higher m, n, e.g. three equations in three unknowns?

II.3 Gaußian elimination


§1 Definition. Two SLEs in n variables are called equivalent, if they have the same set of
solutions.

21
II Systems of linear equations

§2 Elementary row operations. Consider the SLE


 
a11 x1 + a12 x2 + . . . + a1n xn = b1 a11 a12 . . . a1n b1
a21 x1 + a22 x2 + . . . + a2n xn = b2  a21 a22 . . . a2n b2
 

.. ..  .. ..  . (II.1)
. . . .
 
 
am1 x1 + am2 x2 + . . . + amn xn = bm am1 am2 . . . amn bm

The right form of the SLE is called an augmented matrix. We do not change the solutions
of (II.1), if we perform one of the following operations:

(a) interchange two equations/rows


(b) multiply an equation/row by a non-zero number
(c) add an equation/row to another equation/row.

That is, these operations lead to an SLE, which is equivalent to the SLE (II.1).
F Verify this statement.
F What is the geometric interpretation of these operations?

§3 Example. Find the solutions of


 
x1 + 2x2 + x3 = 1 1 2 1 1
x1 + 3x2 + 4x3 = 3 (∗) . Rewrite as  1 3 4 3  .
 
2x1 + 5x2 + 6x3 = 5 2 5 6 5

Now, apply elementary row operations:


   
R2 →R2 −R1 1 2 1 1 1 2 1 1
R3 →R3 −2R1  R3 →R3 −R2 
 0 1 3 2   0 1 3 2  (II.2)
 
0 1 4 3 0 0 1 1

This corresponds to the SLE

x1 + 2x2 + x3 = 1 x1 = 1 − 2x2 − x3 = 2
0
x2 + 3x3 = 2 (∗ ) with solutions x2 = 2 − 3x3 = −1 .
x3 = 1 x3 = 1

The SLEs (∗) and (∗0 ) have the same solutions and are thus equivalent.

§4 Definition. The second matrix in (II.2) is said to be of row echelon form:

(i) In each row, the first non-vanishing entry is a 1.


(ii) The first non-vanishing entry in a row has to appear to the right of leading non-
vanishing entries of previous rows.

22
II.3 Gaußian elimination

§5 Examples and counterexamples.


     
1 5 3 −1 1 8 4 2 1 0 3
Examples:  0 0 1 3  ,  0 0 0 0  ,  0 1 0  .
     
0 0 0 1 0 0 0 0 0 0 1
     
1 5 3 −1 1 8 4 2 3 0 3
Counterexamples:  1 0 1 3  ,  0 0 0 1  ,  0 4 0  .
     
0 0 0 1 0 0 1 0 0 0 8
2.1
§6 Gaußian elimination. GE is the following algorithm to solve an SLE by bringing it
2.2
first to row echelon form:

(1) Go to the first column i, which is not entirely zero.


(2) Choose a row which has a non-vanishing entry in this column i and move it to the
top. If possible, simplify the row by multiplying by a constant.
(3) Add multiples of this row to the other rows, such that their entry in the column i
vanishes.
(4) Forget the first row. If the matrix is not yet in row echelon form, start over at (1)
(5) Divide each row with a non-zero entry by a constant such that the leading entry is 1.

Be lazy! Always reorder rows to simplify the calculations. Avoid fractions in actual com-
putations. Exercise!
§7 Example. Perform GE on the augmented matrix corresponding to the SLE for the
example in II.1, §1. We have
!
5x1 + 3x2 = 10 5 3 10
.
−2x1 + 2x2 = 2 −2 2 2

Using elementary row operations, we bring this augmented matrix into row echelon form:

! R2 ↔R1 ! !
5 3 10 R1 →− 21 R1 1 −1 −1 R2 →R2 −5R1 1 −1 −1
−2 2 2 5 3 10 0 8 15
!
R2 → 18 R2 7
1 −1 −1 x1 − x2 = −1 → x1 = 8
15 .
0 1 8 x2 = 15
8 → x2 = 15
8

§8 Remark. It seems a little overkill to apply a complicated algorithm to the above


system of equations, that can also be solved by simple techniques learnt in school as e.g.
substitution. SLEs in real live, however, often consist of hundreds of equations of hundreds
of variables, which can only be handled reasonably with GE. Most importantly, however,
we are more interested in the concept of GE than in the actual solutions it generates.

23
II Systems of linear equations

§9 Consistency. From the row echelon form, it is easy to see7 whether an SLE is consis-
tent. Examples:
unique solution: infinitely many solutions: no solution:
     
1 × × × 1 × × × 1 × × ×
 0 1 × ×   0 1 × ×   0 1 × × 
     
0 0 1 × 0 0 0 0 0 0 0 1
(Substitution yields the (x3 can be chosen (Last row: 0x1 + 0x2 + 0x3 = 1
unique solution.) arbitrarily.) yields a contradiction.)
§10 General solution. In an augmented matrix in row echelon form, the variables cor-
responding to a leading 1 in a row are called pivot variables. The values of these variables
are determined by substitution. The other, undetermined variables can be put to arbitrary
constants α, β, γ, . . .. The result is called the general solution. Example (we start at the
bottom):  
1 5 3 2 3 → x1 = 3 − 5β − 3(8 − 4α) − 2α
 0 0 1 4 8  → x = 8 − 4α
  3
 0 0 0 0 0  → x2 , x4 undetermined, so put
 

0 0 0 0 0 x2 = β , x4 = α
In this SLE, x1 and x3 are pivot variables. The general solution is given by the set {(−21 +
10α − 5β, β, 8 − 4α, α)T |α, β ∈ R}.
§11 Example. Recall the example in II.1, §2 (TNT). Gaußian elimination works here as
follows:  
7x2 − 7x4 = 0 0 7 0 −7 0
−2x1 + 8x2 + x3 − 5x4 = 0  −2 8 1 −5 0 
 
x3 − 3x4 = 0  0 0 1 −3 0 
 

−x1 + 3x3 − 6x4 = 0 −1 0 3 −6 0


Using elementary row operations, we bring this augmented matrix into row echelon form:
   
1 0 −3 6 0 R3 →R3 +2R1 1 0 −3 6 0
 0 7 0 1
−R4 →R1  −7 0   R2 → 7 R2  0 1 0 −1 0 
 
 −2 8 1 −5 0   0 8 −5 7 0 
   

0 0 1 −3 0 0 0 1 −3 0
   
1 0 −3 6 0 R3 →− 15 R3 1 0 −3 6 0
 0 1 0 1
R3 →R3 −8R2  −1 0   R4 →R4 − 5 R3  0 1 0 −1 0 
 
 0 0 −5 15 0   0 0 1 −3 0 
   

0 0 1 −3 0 0 0 0 0 0
We read off: x4 = α, x3 = 3α, x2 = α and x1 = 3α. As expected, we have the freedom
to determine what amount of chemicals we want to obtain in the reaction: The choice α
corresponds to the amount of water molecules which should come out of the reaction.
7
If you don’t see this, rewrite the augmented matrices as SLEs.

24
II.4 Homogeneous and inhomogeneous SLEs

II.4 Homogeneous and inhomogeneous SLEs


§1 Definition. An SLE Ax = b is said to be homogeneous iff (i.e. if and only if) b = 0.
Otherwise, it is inhomogeneous. The above example in II.3, §7 is inhomogeneous, while the
example in II.3, §11 is homogeneous.
§2 Remark. A homogeneous SLE Ax = 0 is always consistent because it has at least the
solution x = 0.
§3 Theorem. A homogeneous SLE with more unknowns than equations has infinitely
many solutions.
Proof: Suppose a homogeneous SLE with m equations in n unknowns, m < n. Us-
ing Gaußian elimination on the corresponding augmented matrix with m rows and n + 1
columns, we obtain a matrix in row echelon form with k ≤ m < n rows with non-zero
entries. Hence there are k pivot variables (corresponding to rows with leading 1s) and n − k
free variables. We are thus left with n − k > 0 free variables leading to infinitely many
solutions.
§4 Remark. The solutions to an inhomogeneous SLE Ax = b are given by a particular
solution xp plus all solutions {xh } to the homogeneous SLE Ax = 0: Such a sum is indeed
a solution, as A(xp + xh ) = Axp + Axh = Axp + 0 = b. Moreover, the difference between
two solutions x1 and x2 of Ax = b is a solution to the corresponding homogeneous system:
A(x1 − x2 ) = b − b = 0.
§5 Example. We consider the inhomogeneous SLE corresponding to the augmented matrix
 
1 1 0 2
 0 1 1 2  .
 
0 0 0 0

We immediately read off the special solution x1 = x2 = x3 = 1. The general solution is


found by adding the solutions to the homogeneous SLE
 
1 1 0 0
 0 1 1 0  ,
 
0 0 0 0

which are given by x3 = α, x2 = −x3 = −α, x1 = −x2 = α. The general solution of the
inhomogeneous SLE is thus x1 = 1 + α, x2 = 1 − α and x3 = 1 + α.

II.5 Inverting matrices


§1 Remark. It is easy to solve the SLE Ax = b, where A is a square matrix, if we can
construct the inverse matrix to A, A−1 , with A−1 A = 1 (where 1 denotes the identity
matrix). Then Ax = b ⇔ A−1 (Ax) = A−1 b ⇔ x = A−1 b.

25
II Systems of linear equations

§2 Algorithm. The following algorithm is based on Gaußian elimination. Given an n × n


matrix A, we put
 
x11 x12 . . . x1n
 x21 x22 . . . x2n 
 
−1
A =  .. ..  = (x1 , x2 , . . . , xn ) .
.. 
 . . . 
xn1 xn2 . . . xnn

It follows that A(x1 , x2 , . . . , xn ) = 1 = (e1 , e2 , . . . , en ), and we have to solve the SLEs


Ax1 = e1 , Ax2 = e2 , . . ., Axn = en . Each SLE can be solved separately by Gaußian
elimination. However, we can speed things up by solving all the SLEs at once: Consider
the augmented matrix
 
1 0 0
.
 
A e1 · · · en = 0 .. 0  .
 
A
0 0 1

Use now elementary row operations to transfer this system into the form
 
1 0 0
.
 
e1 · · · en B =  0 .. 0  , B = (b1 , . . . , bn ) .
 
B
0 0 1

It follows that x1 = b1 , . . . , xn = bn and thus A−1 = B.


§3 Example. Invert the following matrix:
   
1 2 4 1 2 4 1 0 0
 −1 −1 5   −1 −1 5 0 1 0 
   
2 1 −3 2 1 −3 0 0 1
   
R2 →R2 +R1 1 2 4 1 0 0 R1 →R1 −2R2 1 0 −14 −1 −2 0
R3 →R3 −2R1   R3 →R3 +3R2 
 0 1 9 1 1 0   0 1 9 1 1 0 

0 −3 −11 −2 0 1 0 0 16 1 3 1

14
R1 →R1 + 16 R3
   
2 10 14
9 1 0 0 − 16 16 16 1 1 0 0 − 18 5
8
7
8
R2 →R2 − 16 R3  R3 → 16 R3
7 11 9  7
 0 1 0 − −  0 1 0 16 − 11 9 
− 16

16 16 16  16 
1 3 1
0 0 16 1 3 1 0 0 1 16 16 16
 
1 5 7

 78 8 8
⇒ A−1 =  16 − 11
16 − 9 
16 
1 3 1
16 16 16
2.2
3.1
26
§4 Remark. In some cases, the left side of the augmented matrix cannot be turned into
the unit matrix 1 using elementary row operations as one is left with rows of zeros, e.g.
 
1 0 0 × × ×
 0 0 1 × × ×  (II.3)
 
0 0 0 × × ×
In these cases, the inverse of A does not exist, and the SLE Ax = b has either no or infinitely
many solutions.
§5 Lemma. Elementary row operations affect the determinant of a matrix as follows:
i) Exchanging two rows in a matrix changes the sign of the determinant.
ii) Multiplying a row of a matrix by a number λ 6= 0 changes the determinant by the
factor λ.
iii) Adding a row of a matrix to another does not change the determinant.
Proof: i) Assume that we interchange rows j and k in the matrix A and obtain a new
matrix B. According to our definition of the determinant (I.3, §12), we have
n
X
det(A) = εi1 ...ij ...ik ...in a1i1 · · · ajij · · · akik · · · anin
i1 ,...,in =1
X n
= εi1 ...ik ...ij ...in a1i1 · · · akij · · · ajik · · · anin (II.4)
i1 ,...,in =1
X n
=− εi1 ...ij ...ik ...in a1i1 · · · akij · · · ajik · · · anin = − det(B) ,
i1 ,...,in =1

where we first interchanged the variables ij ↔ ik as well as the order of akij and ajik . This
does not change the equation. We then used εi1 ...ij ...ik ...in = −εi1 ...ik ...ij ...in . ii) and iii)
follow analogously from the definition of the determinant. For iii), one needs that, because
of the antisymmetry of the ε-symbol, nij ,ik =1 εi1 ...in akij akik = 0.
P

§6 Theorem. The determinant of a square matrix vanishes iff it does not have an inverse.
Proof: If the matrix A does not have an inverse, we can use elementary row operations to
turn it into a diagonal matrix D with at least one zero entry along the diagonal. Because
of lemma §5, we have λ det(A) = det(D) for some λ 6= 0. Because det(D) = 0, it follows
that det(A) = 0. Inversely, if det(A) 6= 0, then we can turn it into a diagonal matrix D
with det(D) 6= 0, and therefore into the unit matrix. The matrix A is therefore invertible.

III Vector spaces


III.1 Definition and examples
§1 Remark. We have already introduced the Euclidean vector space Rn . Recall that
the key operations for vectors in Rn are adding two vectors and stretching a vector by

27
III Vector spaces

multiplying it by a real number. We now make the concept of a vector space more abstract,
thereby generalising it.
§2 Definition. A vector space over R is a set V endowed with two operations + : V × V →
V and · : R × V → V satisfying the following vector space axioms:

(1) There is an element 0 = 0V ∈ V , the zero or null vector, such that v + 0 = v for all
v ∈V.
(2) For all v ∈ V , there is an element −v ∈ V such that v + (−v) = 0.
(3) The operation + is commutative: v + w = w + v for all v, w ∈ V ,
(4) and associative: (v + w) + u = v + (w + u) for all v, w, u ∈ V .
(5) The scalar multiplication is distributive: a(v + w) = av + aw and (a + b)v = av + bv,
a, b ∈ R, v, w ∈ V ,
(6) and associative: a(bv) = (ab)v, a, b ∈ R, v ∈ V ,
(7) and compatible with 1: 1v = v.

§3 Remark. To check whether a set V is a vector space, you should first test if the
operations + and · are well defined, that is they do not take you out of the set. You can
then check the other axioms.
§4 Examples. a ) V = (Rn , +, ·) clearly satisfies the axioms and forms a vector space.
b ) Matrices form a vector space: +: add matrices, ·: multiply all entries by the number
c ) If a0 , a1 , . . . , an ∈ R, an 6= 0, then p(y) = a0 + a1 y + . . . + an y n is called a polynomial of
degree n. Polynomials can be added and multiplied by a real number λ ∈ R:

(a0 + a1 y + . . . + an y n ) + (b0 + b1 y + . . . + bn y n ) =
(a0 + b0 ) + (a1 + b1 )y + . . . + (an + bn )y n ,
λ(a0 + a1 y + . . . + an y n ) = (λa0 ) + (λa1 )y + . . . + (λan )y n .

It is easy to check that these operations satisfy the vector space axioms. We conclude that
polynomials of maximal degree n form a vector space, which we denote by Pn . Note that
the null vector 0 is identified with the constant polynomial p(y) = 0.
d ) Analogously, real valued smooth functions8 f ∈ C ∞ (R), f : R → R form a vector space
V with the following rules:

(f + g)(x) := f (x) + g(x) and (λf )(x) := λf (x) ,

for all f, g ∈ V and λ ∈ R. Here, the null vector 0 is the function f (x) = 0.
§5 Theorem. Let V be a vector space over R. Then
(i) for all u, v, w ∈ V , u + v = u + w implies v = w and
(ii) for all α ∈ R, we have α.0 = 0.
(iii) 0.v = 0 for all v ∈ V .
8
Most of our discussion can be extended to functions that are sectionwise continuous.

28
III.2 Vector subspaces

Proof: Because of vector space axiom (2), there is a vector −u ∈ V such that u+(−u) = 0.
With (3), we have (−u)+u = 0. From u+v = u+w, we have (−u)+(u+v) = (−u)+(u+w).
Associativity of +, i.e. axiom (4), leads to ((−u) + u) + v = ((−u) + u) + w and therefore
to 0 + v = 0 + w and with (1) finally to v = w.
Consider α.0 + α.0 = α.(0 + 0) according to axiom (5). Because of (1), we have α.0 + α.0 =
α.0 = α.0 + 0. Call α.0 = u and together with (i) of this theorem, we have α.0 = 0. The
proof of (iii) is done in Tutorial 3.5.
F Construct an invertible linear map m that maps every polynomial in P2 to a vector
in R3 . Extend this map to one between Pn and Rn+1 . The existence of this map shows
that it does not really matter, if we work with Pn or Rn+1 .

III.2 Vector subspaces


§1 Definition. If (V, +, ·) is a vector space and W ⊆ V such that (W, +, ·) is a vector
space, then W is a (vector) subspace.
§2 Examples. a ) Let (V, +, ·) be a vector space. In particular, V itself and {0} are
subspaces.
b ) Let V = R2 , W = {(x, y)T : x = 3y} is a vector subspace:
! ! ! ! !
3y1 3y2 3(y1 + y2 ) 3y 3λy
+ = and λ = , λ∈R.
y1 y2 y1 + y2 y λy

The axioms are clearly satisfied. Note that W 0 = {(x, y)T : x + y = 1} is not a vector
subspace, as 0 = (0, 0)T is not an element of W 0 . Another reason is that the sum of two
elements s1 , s2 ∈ W 0 is not in W 0 . These observations lead us to the vector subspace test:
§3 Theorem. (Vector subspace test) Let V be a vector space and let W be a non-empty
subset of V . Then W is a vector subspace of V iff (if and only if)

(i) for all w1 , w2 ∈ W , we have w1 + w2 ∈ W and


(ii) for all λ ∈ R and w ∈ W , we have λw ∈ W .

Proof: First, note that due to (i) and (ii), the restrictions of the operations + and · from
V to W are indeed well defined on W and do not take us out of this set. It remains to check
the vector space axioms. In the tutorials, we will show that (−1)v = −v for any v ∈ V . We
have already shown in III.1, §5 that 0w = 0. Take an arbitrary vector v ∈ W . Because of
(ii), both 0w = 0 and (−1)v = −v are also in W . Axiom (1) and (2) are therefore satisfied:
There is a null vector 0 in W , and every vector v has an inverse −v in W . The validity of
the remaining axioms (3)-(7) is then inherited from V .
§4 Examples. a ) W = {(x, y)T ∈ R2 : x + y = 0} is a subspace of R2 . Vector subspace
test:
! ! ! ! !
x1 x2 x1 + x2 x1 λx1
+ = and λ = .
−x1 −x2 −(x1 + x2 ) −x1 −λx1

29
III Vector spaces

In general, lines through the origin of R2 form vector subspaces of R2 .


b ) Lines in R2 that do not pass through the origin do not contain 0 and thus are no vector
subspaces of R2 , cf. second example in §2.
c ) Diagonal matrices form a subspace of the vector space of square matrices:
( ! ) ( ! )
a b a 0
Mat2 := : a, b, c, d ∈ R , Diag2 := : a, d ∈ R ,
c d 0 d

One easily performs the vector subspace test.


d ) Let V be the vector space of smooth functions C ∞ (R) and W = {f ∈ V : f (1) = 0}.
Test:

f1 , f2 ∈ W , (f1 + f2 )(1) = f1 (1) + f2 (1) = 0 + 0 = 0 ⇒ f1 + f2 ∈ W ,


λ ∈ R, f ∈ W , (λf )(1) = λf (1) = λ0 = 0 ⇒ λf ∈ W .

Thus, W is a vector subspace of V .


§5 Corollary. The set of solutions to a homogeneous SLE forms a vector subspace: Let
A be an m × n matrix and W all solutions to Ax = 0: W = {x ∈ Rn : Ax = 0}. Then W
is a vector subspace of Rn .
Proof: Let x1 , x2 ∈ W and λ ∈ R. The vector subspace test shows:

A(x1 + x2 ) = Ax1 + Ax2 = 0 + 0 = 0 ⇒ x1 + x2 ∈ W


A(λx1 ) = λAx1 = λ0 = 0 ⇒ λx1 ∈ W .

We conclude that W is a vector subspace of Rn .


§6 Remark. Note that {x ∈ Rn : Ax = b} with b 6= 0 is not a vector subspace. This is
analogous to lines in R2 not running through the origin.

III.3 Linear combination and span


§1 Definition. A vector v ∈ V is called a linear combination of the vectors u1 , . . . , uk ∈ V ,
if it can be written as

v = a1 u1 + a2 u2 + . . . + ak uk , a1 , . . . , ak ∈ R .

This is an extension of our definition of linear combination of vectors in Rn .


§2 Example. Consider the vector space of polynomials of maximal degree 2, P2 . Linear
combinations of the two vectors p(x) = 1 and q(x) = x read as a1 p(x) + a2 q(x) = a1 + a2 x.
§3 Definition. The span of a set of vectors {u1 , . . . , uk }, usually denoted by span{u1 ,
. . . , uk }, is the set of all possible linear combinations of u1 , . . . , uk :

span{u1 , . . . , uk } = {a1 u1 + . . . + ak uk |a1 , . . . , ak ∈ R} = W .

3.2 We also say that W is spanned by u1 , . . . , uk or that these vectors span W .


3.3
30
III.4 Linear dependence

§4 Remarks. a ) Given a vector space V and vectors u1 , . . . , uk ∈ V , it is straightforward


to show that span{u1 , . . . , uk } is a vector subspace.
b ) Note that a vector subspace can be spanned by many different sets of vectors.
§5 Examples. a ) Both the pairs of vectors (1, 0)T , (0, 1)T and (1, 0)T , (1, 1)T span R2 :
 

Given a vector (x, y)T ∈ R2 , we have


! ! ! ! ! !
x 1 0 x 1 1
=x +y and = (x − y) +y .
y 0 1 y 0 1

b ) The vectors (1, 0, 0)T , (0, 1, 0)T , (0, 0, 1)T span R3 and one needs all three of them. Two
of them span only a subspace.
c ) The vector space of polynomials up to degree n, Pn , is spanned by the polynomials
1, x, x2 , . . . , xn .
d ) The vector space of 2 × 2-dimensional matrices Mat2 is spanned by
! ! ! !
1 0 0 1 0 0 0 0
, , , .
0 0 0 0 1 0 0 1

§6 Theorem. Suppose v 1 , . . . v k span V and v 1 is a linear combination of v 2 , . . . , v k . Then


v 2 , . . . , v k span V .
Proof: Let v ∈ V . Since span{v 1 , . . . , v k } = V , there are constants c1 , . . . , ck ∈ R such
that v = c1 v 1 + c2 v 2 + . . . + ck v k . Since v 1 is a linear combination of v 2 , . . . , v k , there are
constants d2 , . . . , dk ∈ R such that v 1 = d2 v 2 + . . . + dk v k . Hence:

v = c1 (d2 v 2 + . . . + dk v k ) + c2 v 2 + . . . + ck v k = (c1 d2 + c2 )v 2 + . . . + (c1 dk + ck )v k .

Thus, every vector v ∈ V can be expressed as a linear combination of v 2 , . . . , v k and so


v 2 , . . . , v k span V .

III.4 Linear dependence


§1 Definition. We say that the vectors v 1 , . . . , v k are linearly dependent if there exist
c1 , . . . , ck ∈ R not all equal to zero such that c1 v 1 +. . .+ck v k = 0. If the only solution to the
system of linear equations c1 v 1 +. . .+ck v k = 0 is the trivial solution c1 = . . . = ck = 0, then
the vectors are linearly independent. We say that a set of vectors is linearly independent,
if the vectors in the set are linearly independent. This is an extension of our definition of
linear dependence for Rn .
§2 Examples. a ) Recall that {(1, 0)T , (0, 1)T } is linearly independent in R2 , as
! ! ! !
1 0 c1 0
c1 + c2 =0 ⇒ =
0 1 c2 0

implies c1 = c2 = 0. Analogously, {(1, 0, 0)T , (0, 1, 0)T , (0, 0, 1)T } is linearly independent in
R3 and the generalisation to Rn is clear.

31
III Vector spaces

b ) {(1, 2)T , (2, 4)T } is linearly dependent, as 2(1, 2)T − (2, 4)T = 0.
c ) Let u1 = (1, −1, 2)T , u2 = (3, 2, 1)T , u3 = (4, 1, 3)T . Then {u1 , u2 , u3 } is linearly
dependent as u1 + u2 − u3 = 0.
§3 Algorithm. To determine if a set {u1 , . . . , uk } is linearly independent or not, find (most
likely using Gaußian elimination) all solutions to

c1 u1 + c2 u2 + . . . + ck uk = 0 .

If all variables c1 , . . . , ck are pivot variables, this homogeneous SLE has only the trivial
solution c1 = c2 = . . . = ck = 0. In this case, the set {u1 , . . . uk } is linearly independent.
Otherwise, nontrivial solutions exist and the vectors are linearly dependent.
§4 Exercise. Determine whether (1, 2, 3, −1)T , (2, 1, 3, 1)T and (4, 5, 9, −1)T are linearly
dependent.
Solution: We must find all solutions (c1 , c2 , c3 ) of the homogeneous SLE
       
1 2 4 0
 2   1   5   0 
c1   + c2   + c3  =  .
       
 3   3   9   0 
−1 1 −1 0

That is, we have to analyse


 
c1 + 2c2 + 4c3 = 0 1 2 4 0
2c1 + c2 + 5c3 = 0 
 2 1 5 0 

3c1 + 3c2 + 9c3 = 0 3 3 9 0
 
 
−c1 + c2 − c3 = 0 −1 1 −1 0
R3 →R3 −R2
   
R2 →R2 −2R1 1 2 4 0 R4 →R4 +R3 1 2 4 0
R3 →R3 −3R1 
R4 →R4 +R1  0 −3 −3 0 R2 →− 13 R2
  0 1 1 0 
 .
  
 0 −3 −3 0  0 0 0 0
  
 
0 3 3 0 0 0 0 0
Not all the variables are pivot variables and therefore we expect the vectors to be linearly
dependent. Rewritten as an SLE, we have

c1 + 2c2 + 4c3 = 0 → c1 = −2c2 − 4c3 = −2α


.
c2 + c3 = 0 → c3 = α ⇒ c2 = −α

Putting α = 1, for example, we obtain


       
1 2 4 0
 2   1   5   0 
−2  − + =  ,
       
 3   3   9   0 
−1 1 −1 0

and we conclude that the vectors are indeed linearly dependent.

32
III.5 Basis and dimensions

§5 Remarks. a ) If S = {u} and u 6= 0, then S is linearly independent.


b ) If S = {u1 , u2 } then S is linearly dependent iff u2 = 0 or u1 is a multiple of u2 . The first
case is clear, as c1 u1 + c2 0 = 0 has nontrivial solutions c1 = 0, c2 ∈ R. In the second case,
assume that u1 = λu2 , then c1 u1 + c2 u2 = 0 has the nontrivial solutions c1 = α, c2 = −λα,
α ∈ R.
c ) Let {v 1 , . . . , v k } be a set of vectors in Rn with k > n. Then c1 v 1 + . . . + ck v k = 0 gives
rise to a homogeneous SLE with n equations in k unknowns. As k > n, this SLE must have
a non-zero solution (Theorem II.4, §3) and therefore {v 1 , . . . , v k } is linearly dependent.
§6 Lemma. A set of vectors {u1 , . . . , uk } is linearly dependent, iff one (i.e. at least one)
of the ui s can be expressed as a linear combination of the others.
Proof: If the set is linearly dependent, then there are constants c1 , . . . , ck not all vanishing
such that c1 u1 + . . . + ck uk = 0. Assume that ci 6= 0. We can then solve the equation for
ui :
1
ui = (c1 u1 + . . . + ci−1 ui−1 + ci+1 ui+1 + . . . + ck uk ) . (III.1)
ci
Inversely, if ui can be expressed as a linear combination of the others, we can transform
this equation into that demonstrating linear dependence.
§7 Theorem. Suppose {v 1 , . . . , v k } is linearly independent and v 0 ∈ / span{v 1 , . . . , v k }.
Then {v 0 , v 1 , . . . , v k } is linearly independent.
Proof: Suppose c0 v 0 + c1 v 1 + . . . + ck v k = 0. If c0 6= 0, v 0 = − c10 (c1 v 1 + . . . + ck v k ) and
so v0 ∈ span{v 1 , . . . , v k }, which we know is not the case. Thus, c0 = 0. But we know that
{v 1 , . . . , v k } are linearly independent, and so the only solution to the SLE c0 v 0 + c1 v 1 +
. . . + ck v k = c1 v 1 + . . . + ck v k = 0 is c0 = c1 = . . . = ck = 0. Therefore, {v 0 , v 1 , . . . , v k } is
linearly independent.

III.5 Basis and dimensions


§1 Definition. An ordered set of vectors9 u1 , . . . , un is called a basis for V , if {u1 , . . . ,


un } is a linearly independent set of vectors which span V . 3.3


§2 Examples. a ) (1, 0)T , (0, 1)T is a basis for R2

4.1
b ) (1, 0)T , (1, 1)T is a basis for R2 : The vectors are not multiples of each other and so


they are linearly independent, cf. III.4, remark §5, b).


c ) (1, 0)T , (0, 1)T , (1, 1)T is not a basis for R2 , as the vectors are linearly dependent.


d ) (1, 0, 0)T , (0, 1, 0)T is not a basis for R3 as these vectors do not span this space.


e ) (1, 0, 0)T , (0, 1, 0)T , (0, 0, 1)T is a basis for R3 .




f ) Mat2 has basis


! ! ! ! !
1 0 0 1 0 0 0 0
, , , .
0 0 0 0 1 0 0 1
9
The round brackets (. . .) denote a tuple which is an ordered list of elements. A set, denoted by curly
brackets {. . .}, is a list of elements without any order. In particular {a, b} = {b, a}, while (a, b) 6= (b, a) if
b 6= a.

33
III Vector spaces

g ) The vector space of polynomials of degree up to n, Pn , has basis (1, x, x2 , . . . , xn ).


§3 Remark. Just as a non-trivial vector space can be spanned by infinitely different sets
of vectors, it has infinitely different bases.
F Find infinitely many bases for R2 .
§4 Lemma. If S = (v 1 , v 2 , . . . , v n ) is a basis of a vector space V , then every subset of V
containing more than n vectors is linearly dependent.
Proof: Let S 0 = {w1 , w2 , . . . , wm }, where m > n. We shall show that the vectors in S 0 are
linearly dependent, i.e. we show that there exist c1 , c2 , . . . , cm not all zero such that

c1 w1 + c2 w2 + . . . + cm wm = 0 . (III.2)

Since S spans V , each wi can be expressed as a linear combination of the vs:

w1 = a11 v 1 + a12 v 2 + . . . + a1n v n ,


w2 = a21 v 1 + a22 v 2 + . . . + a2n v n ,
.. ..
. .
wm = am1 v 1 + am2 v 2 + . . . + amn v n .

Plugging this into (III.2), we have

c1 (a11 v 1 + . . . + a1n v n ) + . . . + cm (am1 v 1 + . . . + amn v n ) = 0 .

To have all the coefficients of the v 1 , . . . , v n vanish, note that it is sufficient to find c1 , . . . , cm
such that
a11 c1 + a21 c2 + . . . + am1 cm = 0 ,
a12 c1 + a22 c2 + . . . + am2 cm = 0 ,
.. ..
. .
a1n c1 + a2n c2 + . . . + amn cm = 0 .
This is a homogeneous SLE with more unknowns (m) than equations (n) and thus there is
a solution with not all of the ci being zero (Theorem II.4, §3). It follows that (III.2) has a
solution besides the trivial solution and so S 0 is linearly dependent.
§5 Theorem. Any two bases for a vector space V contain the same number of vectors.
Proof: Let B = (v 1 , v 2 , . . . , v n ) and B 0 = (v 01 , v 02 , . . . , v 0m ) be bases for V . From the above
lemma, we conclude that since B is a basis and B 0 is linearly independent, m ≤ n. Equally,
since B 0 is a basis and B is linearly independent, n ≤ m. Altogether, we have m = n.
§6 Remark. From lemma §4 and theorem §5, it follows that a basis for V contains a
minimal number of vectors that span V , but a maximal number of vectors that are still
linearly independent.
§7 Definition. Let V be a vector space. We define the dimension of V to be the number
of vectors in any basis of V .

34
III.5 Basis and dimensions

§8 Examples. a ) Recall the examples of a basis in §2. We thus have


Space R2 R3 Rn Pn Mat2
Dimension 2 3 n n+1 4
b ) Consider the straight line L = {(x, y, z)T : x2 = y3 = z5 }. If (x, y, z)T ∈ L, then x2 = y3 =
5 = α for some α ∈ R and therefore (x, y, z) = α(2, 3, 5) and L = {α(2, 3, 4) : α ∈
z T T T

R} = span{(2, 3, 4)T }. Since a single vector10 is always linearly independent, (2, 3, 5)T is
a basis for L and we have dim L = 1.
c ) Consider the plane P = {(x, y, z)T ∈ R3 : x + y − z = 0}. Then P is a subset of R3 .
Consider the tuple B = (1, 0, 1)T , (0, 1, 1)T of vectors in P . Since (1, 0, 1)T and (0, 1, 1)T


are not multiples of each other, B is linearly independent. Furthermore, they span P :
(x, y, z)T ∈ P ⇒ z = x + y, (x, y, x + y)T = x(1, 0, 1)T + y(0, 1, 1)T . We conclude that B is
a basis for P and P has dimension 2.
d ) Consider the SLE Ax = 0 given by
x1 − x2 + 2x3 + x4 = 0 ,
2x1 + x2 − x3 + x4 = 0 ,
(III.3)
4x1 − x2 + 3x3 + 3x4 = 0 ,
x1 + 2x2 − 3x3 = 0 .
Let W = {x ∈ R4 : Ax = 0}. Then W is a subspace of R4 according to corollary III.2,
§5, and we would like to determine a basis. The SLE (III.3) has the same solutions as the
SLEs corresponding to the following augmented matrices:
   
1 −1 2 1 0 R2 →R2 −2R1 1 −1 2 1 0
 2 1 −1 1 0  R3 →R3 −4R1  0 3 −5 −1 0 
  R4 →R4 −R1  
 4 −1 3 3 0   0 3 −5 −1 0 
   

1 2 −3 0 0 0 3 −5 −1 0
 
1 −1 2 1 0
R3 →R3 −2R2  0 3 −5 −1 0 
R4 →R4 −R2   x1 − x2 + 2x3 + x4 = 0 ,
 0 0 0 0 0  3x2 − 5x3 − x4 = 0 .
 

0 0 0 0 0
The general solution is thus
1 1 2
x3 = α , x4 = β ,
x2 = (5α + β) , x1 = x2 − 2x3 − x4 = − α − β .
3 3 3
Any solution can be written as
     T  
x1 −α − 2β −1 −2
 x  1  5α + β  1 1
 5   1 
 
 2 
x= = = α + β  ,
 
 x3  3  3α 3 3  3 0 
     

x4 3β 0 3
10
which is not the null vector 0

35
III Vector spaces

and we find B = (−1, 5, 3, 0)T , (−2, 1, 0, 3)T spans the solution space to Ax = 0. As the


two vectors in B are not multiples of each other, they are linearly independent (cf. III.4,
remark §5, b)) and B forms a basis of the solution space.
§9 Lemma. We have

(i) span{v 1 , . . . , v j , . . . , v k , . . . , v m } = span{v 1 , . . . , v k , . . . , v j , . . . , v m } ,


(ii) span{v 1 , . . . , v j , . . . , v m } = span{v 1 , . . . , λv j , . . . , v m } , λ ∈ R∗
(iii) span{v 1 , . . . , v j , . . . , v k , . . . , v m } = span{v 1 , . . . , v j + v k , . . . , v k , . . . , v m } .

Proof: In each case, we need to show that any vector of the set on the left-hand side is also
contained in the set on the right-hand side and vice versa. (i) is trivial. (ii): v = c1 v 1 +. . .+
c
cj v j + . . . + cm v m = d1 v 1 + . . . + dj λv j + . . . + dm v m , where dj = λj and di = ci else. (iii):
v = c1 v 1 +. . .+cj v j +. . .+ck v k +. . .+cm v m = d1 v 1 +. . .+dj (v j +v k )+. . .+dk v k +. . .+dm v m ,
where dk = ck − cj and di = ci else.
§10 Finding a basis. The above lemma §9 implies the following: We can find a basis for
span{v 1 , . . . , v k } as follows:

(1.) Write down a matrix whose ith row11 is v i .


(2.) Perform elementary row operations to bring the matrix into row echelon form. (The
above lemma shows that elementary row operations do not change the span of the
rows).
(3.) The rows in the row echelon form with at least one non-zero entry correspond to a
basis for span{v 1 , . . . , v k }.

§11 Exercise. Find a basis for the set S = span{(2, −1, 2, 1)T , (1, 2, −1, 3)T , (4, 3, 0, 7)T ,
(0, 0, 1, 2)T }.
Solution: S equals the span of the rows of each of the following matrices:

1 2 −1 3 1 2 −1 3 1 2 −1 3
     
R2 →R2 −2R1 R3 →R3 −R2
 2 −1 2 1  R3 →R3 −4R1  0 −5 4 −5  R3 ↔R4  0 −5 4 −5 
      .
 4 3 0 7   0 −5 4 −5   0 0 1 2 
0 0 1 2 0 0 1 2 0 0 0 0

Thus, S = span{(1, 2, −1, 3)T , (0, −5, 4, −5)T , (0, 0, 1, 2)T }. Because of the row echelon
form, it is easy to see that these vectors are linearly independent and therefore they form
a basis for S.
§12 Remark. The original vectors in the above example are linearly dependent. It follows
from the matrix after the first set of elementary row operations that

R2 − 2R1 = R3 − 4R1 or (2, −1, 2, 1)T − 2(1, 2, −1, 3)T = (4, 3, 0, 7)T − 4(1, 2, −1, 3)T .
4.1
11
4.2 It is a very common source of mistakes to confuse rows and columns here. In algorithms like this one,
always make sure you know how to arrange vectors!

36
§13 Theorem. Let V be an n-dimensional vector space and let {v 1 , . . . , v k } be a linearly
independent set of vectors which do not span V . Then {v 1 , . . . , v k } can be extended to a
basis of V .
Proof: (by construction) Since span{v 1 , . . . , v k } = 6 V , there is a v k+1 ∈ V such that v k+1 ∈ /
span{v 1 , . . . , v k }. By theorem III.4, §7, the set {v 1 , . . . , v k , v k+1 } is linearly independent.
If span{v 1 , . . . , v k , v k+1 } = V , then we found a basis. Otherwise, we repeat this procedure.
In each step, the dimension of the span of the vectors increases by one, and after n − k
steps, we arrive at the desired basis.
§14 Theorem. Suppose S = {v 1 , . . . , v k } spans a (finite-dimensional) vector space V .
Then there exists a subset of S which is a basis for V .
Proof: (by construction) If S is linearly independent, then S is a basis. Otherwise, one
of the v i can be written as a linear combination of the others as follows from lemma III.4,
§6. Therefore S\{v i } still spans V by theorem III.3, §6. We continue with this reduced set
from the top as often as possible. The minimal set, which still spans V is the basis of V .

§15 Corollary. Let V be a vector space of dimension n. Let S = {v 1 , . . . , v n }, v i ∈ V .


(i) If S is linearly independent, then S is a basis for V .
(ii) If S spans V , then S is a basis for V .
Proof: (i) (by contradiction) Assume that S is not a basis for V . Then it can be extended
to a basis by theorem §13 with more than n elements. This contradicts the assumption
that dim V = n.
(ii) Theorem §14 tells us that a subset of S spans V . Because dim V = n, this subset has
to have n elements. So the subset of S is all of S.
§16 Remark. Note that we worked above with finite dimensional vector spaces.
F For the complications in the case of infinite dimensional vector spaces and some strange
features appearing in set theory, have a look at the Wikipedia entry for Zorn’s Lemma.

IV Inner product spaces


IV.1 Inner products
§1 Definition. An inner product or scalar product on a vector space V over R is a sym-
metric positive definite bilinear map. That is, an inner product associates to each pair
of vectors u, v ∈ V a real number hu, vi such that the following rules are satisfied for all
vectors u, v, w ∈ V :
(i) hu, vi = hv, ui (symmetry),
(ii) hv, vi ≥ 0 and hv, vi = 0 if and only if v = 0 (positive definiteness),
(iii) hu + v, wi = hu, wi + hv, wi (linearity 1),
(iv) hku, vi = khu, vi (linearity 2).
A vector space endowed with an inner product is called an inner product space.

37
IV Inner product spaces

§2 Examples. a ) R2 with the Euclidean scalar product


! !
x1 y1
hx, yi = x1 y1 + x2 y2 where x= , y= .
x2 y2

b) Rn with the Euclidean inner product


   
x1 y1
hx, yi = x1 y1 + x2 y2 + . . . + xn yn where x =  ...  , y =  ...  .
   

xn yn

Note that hx, yi = x.y = xT y, where T denotes the transpose.


c ) Consider the vector space Pn of polynomials of maximal degree n. For two polynomials
p(y) = p0 + p1 y + · · · + pn y n and q(y) = q0 + q1 y + · · · + qn y n , an inner product is given by
hp, qi = p0 q0 + p1 q1 + · · · + pn qn .
d ) Let V be the set of smooth functions defined on the interval [0, 1], V = C ∞ ([0, 1]). We
can endow V with the following inner product:
Z 1
hf, gi = f (x)g(x)dx .
0

§3 Remark. Note that on a vector space, one can define many different inner products.
For example, besides the usual Euclidean inner product on R2 , we could use any expression
hx, yi = αx1 y1 + βx2 y2 for positive numbers α, β ∈ R.
F Try to formulate conditions on a matrix A to define an inner product via hx, yi :=
T
x Ay? We will return to this question later.
§4 Lemma. We have h0, ui = 0 for all u ∈ V .
Proof: It is 0 = 0hu, ui = h0u, ui = h0, ui.
§5 Definition. The norm of a vector (or its length or magnitude) is given by ||u|| =
p
hu, ui.
§6 Examples. a ) The Euclidean norm of x = (x1 , . . . , xn )T ∈ Rn is ||x|| = x21 + . . . + x2n .
p

b ) Recall that if θ denotes the angle between x and y, then hx, yi = ||x|| ||y|| cos θ.
c ) In the case of the vector space V of continuous functions defined on the interval [0, 1],
R 1/2
1
4.2 we have ||f || = 0 f (x)2 dx for all f ∈ V .
5.1 §7 Theorem. (Cauchy-Schwarz inequality) Let V be an inner product space and let u, v ∈
V . Then |hu, vi| ≤ ||u|| ||v||.
Proof: The proof is trivial if v = 0 and thus ||v|| = 0 by §4. Let now v 6= 0. We have for
any number δ ∈ R:

0 ≤ ||u − δv||2 = hu − δv, u − δvi = hu, ui − δhu, vi − δhv, ui + δ 2 hv, vi .

38
IV.2 Orthogonality of vectors

We choose now δ = hu, vihv, vi−1 = hu, vi ||v||−2 , which is indeed well-defined, as v 6= 0.
We continue:
0 ≤ hu, ui − |hu, vi|2 hv, vi−1
⇔ |hu, vi|2 ≤ ||u||2 ||v||2
⇔ |hu, vi| ≤ ||u|| ||v|| .

§8 Example. Using the Euclidean inner product on Rn , we obtain


q q
|hu, vi| = |u1 v1 + . . . + un vn | ≤ u21 + . . . + u2n v12 + . . . + vn2 .
§9 Theorem. (Triangle inequality) Let V be an inner product space and let u, v ∈ V .
Then ||u + v|| ≤ ||u|| + ||v||.
u

u+v v

Proof: We have
||u + v||2 = hu + v, u + vi = hu, ui + 2hu, vi + hv, vi ≤ ||u||2 + 2|hu, vi| + ||v||2
§7
≤ ||u||2 + 2||u|| ||v|| + ||v||2 = (||u|| + ||v||)2 .
Thus, ||u + v|| ≤ ||u|| + ||v||.

IV.2 Orthogonality of vectors


§1 Definition. Let V be an inner product space and let u, v ∈ V .
(i) The vectors u, v are said to be orthogonal, if hu, vi = 0.
(ii) If the vector u is orthogonal to each vector x in a set S ⊂ V , then we say that u is
orthogonal to S.
§2 Examples. a ) Consider V = Rn with Euclidean inner product. Two vectors u, v ∈ Rn
span a plane. Denote the angle between u and v by θ. Then hu, vi = ||u|| ||v|| cos θ. Thus,
hu, vi = 0 ⇔ cos θ = 0 ⇔ θ = ± π2 . This implies that u and v are perpendicular.
b ) In R4 , u = (1, 2, 2, −1)T and v = (2, −1, 1, 2)T are orthogonal with respect to the Eu-
clidean inner product.
c ) Consider the vector space V of smooth functions on [0, 1], C ∞ ([0, 1]) with inner prod-
uct hf, gi = 0 f (x)g(x)dx for f, g ∈ V . Then sin(k1 πx) and sin(k2 πx), k1 , k2 ∈ Z are
R1

orthogonal if k1 6= k2 : Z 1
sin(k1 πx) sin(k2 πx)dx = . . . = 0 .
0

39
IV Inner product spaces

§3 Theorem. (Pythagoras) If u and v are orthogonal vectors, then ||u + v||2 = ||u||2 +
||v||2 .
Proof: We have

||u + v||2 = hu + v, u + vi = hu, ui + 2hu, vi + hv, vi = ||u||2 + ||v||2 .

§4 Definitions. A set of vectors is called an orthogonal set, if all pairs of distinct vectors
in the set are orthogonal. An orthogonal set in which each vector has norm 1 is called
orthonormal.
§5 Examples. In R3 with Euclidean inner product, {(1, −2, 1)T , (1, 1, 1)T , (1, 0, −1)T } is
an orthogonal set and {(1, 0, 0)T , (0, 1, 0)T , (0, 0, 1)T } is an orthonormal set.
§6 Theorem. An orthogonal set that does not contain 0 is linearly independent.
Proof: Let the orthogonal set be given by S = {u1 , . . . , un } with huk , u` i = 0 for k 6= `.
Consider the equation c1 u1 + c2 u2 + . . . + cn un = 0. We need to show that c1 = c2 = . . . =
cn = 0 is the only solution. Taking the inner product of both sides of this SLE with an
arbitrary vector uk ∈ spanS, we find

hc1 u1 + c2 u2 + . . . + cn un , uk i = h0, uk i
c1 hu1 , uk i + c2 hu2 , uk i + . . . + ck huk , uk i + . . . + cn hun , uk i = 0
ck huk , uk i = 0 ⇒ ck = 0 .

Since k was arbitrary, we conclude that c1 , c2 , . . . cn = 0.


§7 Remark. In the vector space of continuous functions on [0, 1], S = {sin πx, sin 2πx,
. . . , sin nπx} is an orthogonal set not containing the null vector f (x) = 0 and therefore
linearly independent. The dimension of span S is therefore n. Since n can be arbitrarily
large, the vector space of continuous functions is infinite dimensional.
§8 Orthogonal bases. If (u1 , . . . , un ) is a basis for Rn and we want to write a vector
v ∈ Rn as a linear combination v = c1 u1 + . . . + cn un of the basis vectors, we have to solve a
system of n equations in n unknowns in general. If (u1 , . . . , un ) is an orthogonal basis, i.e.
an orthogonal family of vectors which span Rn , then this problem becomes much simpler:
Consider

v = c1 u1 + . . . + cn un ⇒ hv, u1 i = hc1 u1 + . . . + cn un , u1 i
= c1 hu1 , u1 i + c2 hu2 , u1 i + . . . + cn hun , u1 i
= c1 hu1 , u1 i ,
hv, u1 i hv, ui i
⇒ c1 = , and generally: ci =
hu1 , u1 i hui , ui i

where we used that hui , uj i = 0 for j 6= i.


If (u1 , . . . , un ) is even an orthonormal basis, i.e. hui , ui i = 1 for all i = 1, . . . , n, then the
formula for the coefficients simplifies further: ci = hv, ui i.

40
IV.3 Gram-Schmidt algorithm

§9 Example. Consider the orthogonal basis (1, −1, 1)T , (1, 0, −1)T , (1, 2, 1)T of R3 . To


write the vector (3, 4, 5)T as a linear combination of the basis vectors, we compute
       
* 3 1 + * 3 1 +
 4  ,  −1   4 , 0 
       
5 1 4 5 −1
c1 =     = , c2 =     = −1 ,
* 1 1 + 3 * 1 1 +
 −1  ,  −1   0 , 0 
       
1 1 −1 −1
   
* 3 1 +
 4 , 2 
           
5 1 3 1 1 1
8 4 8 
c3 =     = ⇒  4  =  −1  −  0  +  2  .
    
* 1 1 + 3 3 3
5 1 −1 1
 2 , 2 
   
1 1

IV.3 Gram-Schmidt algorithm


§1 Remark. Above, we saw that orthogonal bases are very helpful. In the following, we
develop an algorithm for constructing such bases. 5.1
§2 Orthogonal projection. The component of a vector u along a vector w is the orthog- 5.2
hu,wi
onal projection of u onto w. Explicitly, it is given by12 Pw u := hw,wi w, cf. §8, IV.2.
w
Pw u

If we subtract the orthogonal projection of u on w from u, then the result is perpendicular


to w:  
hu, wi hu, wi
hw, u − Pw ui = w, u − w = hw, ui − hw, wi = 0 .
hw, wi hw, wi
This is made use of in the Gram-Schmidt algorithm.
§3 Gram-Schmidt procedure. Let V be an inner product space with a basis
(u1 , . . . , un ). The Gram-Schmidt algorithm turns this basis into an orthogonal basis (w1 ,
. . . , wn ):

(1) Let w1 = u1 .
12
The notation here is not unique, sometimes Pu w is used instead of Pw u. This can be a source of much
confusion, so you should pay attention to the context.

41
IV Inner product spaces

(2) Take the next vector and make it perpendicular to all the previously obtained ones:
i−1
X
wi = ui − Pwj ui
j=1

Explicitly, we have for example


hu2 , w1 i hu3 , w1 i hu , w i
w2 = u2 − w , w3 = u3 − w − 3 2 w .
hw1 , w1 i 1 hw1 , w1 i 1 hw2 , w2 i 2
The resulting vectors are all orthogonal and still span V due to lemma III.5, §9. The set
is also linearly independent due to theorem IV.2, §6. Therefore, they form an orthogonal
basis of V .  
w w
§4 Remark. If (w1 , . . . , wn ) is an orthogonal basis, then ||w1 || , . . . , ||wn || is an orthonor-
1 n
mal basis.
§5 Exercise. Find an orthonormal basis for span{(1, 1, 1, 1)T , (0, 1, 0, 1)T , (−1, 1, 3, 2)T },
a subspace of R4 .
Solution: We start with w1 = (1, 1, 1, 1) and use the above formulæ:

h(0, 1, 0, 1)T , (1, 1, 1, 1)T i 2


w2 = (0, 1, 0, 1)T − T T
(1, 1, 1, 1)T = (0, 1, 0, 1)T − (1, 1, 1, 1)T
h(1, 1, 1, 1) , (1, 1, 1, 1) i 4
= 12 (−1, 1, −1, 1)T ,
h(−1, 1, 3, 2)T , (1, 1, 1, 1)T i
w3 = (−1, 1, 3, 2)T − (1, 1, 1, 1)T
h(1, 1, 1, 1)T , (1, 1, 1, 1)T i
(−1, 1, 3, 2)T , 21 (−1, 1, −1, 1)T



1 1 (−1, 1, −1, 1)T
(−1, 1, −1, 1) T , 1 (−1, 1, −1, 1)T 2
2 2
= . . . = 21 (−4, −1, 4, 1)T .
T , 1 (−1, 1, −1, 1)T , (−4, −1, 4, 1)T and therefore an or-

An orthogonal basis
 is (1, 1, 1, 1) 2 
thonormal basis is 21 (1, 1, 1, 1)T , 21 (−1, 1, −1, 1)T , √134 (−4, −1, 4, 1)T .
§6 Remarks. a ) In the above procedure, wi can always be replaced by λwi with λ ∈
R∗ to remove fractions. Therefore, we could take w2 = (−1, 1, −1, 1)T instead of w2 =
1 T
2 (−1, 1, −1, 1) .
b ) Normalising the vectors after each step produces an orthonormal basis and can be faster.
c ) One can use the Gram-Schmidt procedure to find an orthogonal basis containing a given
vector. This is just the vector one identifies with w1 in the first step of the Gram-Schmidt
algorithm.

The following two paragraphs contain extra material which will not be required for the
exam. Also, note that we are slightly sloppy in our treatment of the infinite-dimensional
vector space of real-valued functions on [0, 1] below13 .
13
A proper treatment would work with the vector space consisting of finite linear combinations (i.e. only

42
§7 Functions on [0, 1]. We saw in section IV.2, §2 that the functions sin(kπx) form an
orthogonal set of functions on the interval [0, 1]. Let us study this example in more detail.
R1
Recall that the scalar product on this space was hf, gi = 0 f (x)g(x)dx. We wish to turn
the set of functions S = {sin(πx), sin(2πx), . . .} into an orthonormal set. We already know
that sin(k1 πx) and sin(k2 πx) are orthogonal if k1 , k2 ∈ Z, k1 6= k2 . It remains to normalise
these functions. We evaluate:
Z 1
1 sin(2kπ)
sin(kπx) sin(kπx)dx = − , k ∈ N∗
0 2 4kπ
For k ∈ Z, we have sin(2kπ) = 0 and we conclude that hsin(kπx), sin(kπx)i = 21 . Thus,
√ √  √
2 sin(πx), 2 sin(2πx), . . . forms an orthonormal set. Similarly, 1, 2 cos(πx),
√ 
2 cos(2πx)), . . . forms an orthonormal set.
§8 Fourier transform. Recall that given an orthonormal basis (u1 , . . . , uk ) of a vector
space V , we can write any vector v ∈ V as the linear combination

v = hv, u1 iu1 + . . . + hv, uk iuk ,

where the hv, ui i are the coordinates of the vector v with respect to the given basis. (Check:
Take the scalar product of both sides with an arbitrary element ui of the basis.) It can
√ √ √ √ 
now be shown that 1, 2 sin(2πx), 2 cos(2πx), 2 sin(4πx), 2 cos(4πx), . . . forms an
orthonormal basis for continuous functions on [0, 1]. We can therefore write an arbitrary
continuous function f : [0, 1] → R, as
√ √ √ √
f (x) = b0 1 + a1 2 sin(2πx) + b1 2 cos(2πx) + a2 2 sin(4πx) + b2 2 cos(4πx) + . . .
∞ ∞
X √ X √
= b0 + ak 2 sin(2kπx) + bk 2 cos(2kπx) ,
k=1 k=1

where
√ √
b0 = hf, 1i , ak = hf, 2 sin(2kπx)i , bk = hf, 2 cos(2kπx)i , k ∈ N∗ ,
Z 1
i.e. ak = f (x) sin(2kπx)dx , etc.
0
Note that the coefficients a1 , a2 , . . . and b0 , b1 , . . . contain all information necessary to re-
construct f . The transition from f to these coefficients is called a Fourier transform. This
transform is used e.g. to solve differential equations, in cryptography, jpeg compression and
by your stereo or mp3-player to display a frequency analysis of music clips. 5.2
5.3
V Linear transformations
We already encountered linear maps between Rn and Rm in section I.3. Here, we define
linear maps between arbitrary vector spaces.
finitely many coefficients in the linear combinations are non-zero) of the infinite-dimensional basis and
showing that this vector space is dense in the space of functions on [0, 1]. Also, there is an issue with
linearly combining infinitely many vectors.

43
V Linear transformations

V.1 Examples and general properties


§1 Definition. Let U and V be two vector spaces. A linear transformation (or a linear
map) between U and V is a map T : U → V preserving vector space operations. That is:

(1) T (u1 + u2 ) = T (u1 ) + T (u2 ) for all u1 , u2 ∈ U .


(2) T (λu) = λT (u) for all λ ∈ R and u ∈ U .

§2 Examples. a ) Let U = Rn and V = Rm , then the linear maps between U and V are
m × n-dimensional matrices, cf. section I.3.
b ) Let M be an m × m matrix and N and n × n matrix. The map A 7→ M AN , where A is
an m × n matrix is a linear map.
c ) Let C ∞ (R) be the vector space of smooth functions on R. Then the transformations
D : f 7→ f 0 and I : f 7→ 0 f (s)ds are linear transformations from the vector space C ∞ (R)
Rx

to itself.
d ) Since in general sin(x1 + x2 ) 6= sin(x1 ) + sin(x2 ), the map T (x, y) := (y, sin x) is not a
linear transformation.
e ) The map T : R3 → R3 with T (x, y, z) = (x+1, y+1, z+1) is not a linear transformation.
§3 Lemma. Let T : U → V be a linear transformation. We then have:

(1) If u1 , . . . , un ∈ U , c1 , . . . , cn ∈ R, then T (c1 u1 + c2 u2 + . . . + cn un ) = c1 T (u1 ) +


c2 T (u2 ) + . . . + cn T (un ).
(2) T (0U ) = 0V .

Proof: (1) We iterate: T (c1 u1 + c2 u2 + . . . + cn un ) = T (c1 u1 ) + T (c2 u2 + . . . + cn un ) =


c1 T (u1 ) + T (c2 u2 + . . . + cn un ) etc.
(2) T (0U ) = T (0 · 0U ) = 0 · T (0U ) = 0V due to III.1, §5.
§4 Theorem. Let U be a vector space with basis (e1 , . . . , en ). A linear transformation
T : U → V is completely determined by the images of the basis vectors T (e1 ), . . . , T (en ).
Proof: Given a vector v ∈ U , we can write it as a linear combination v = c1 e1 + . . . + cn en .
The linear transformation T (v) can be evaluated as T (v) = c1 T (e1 ) + . . . + cn T (en ), and
all we need are the images T (e1 ), . . . , T (en ).
§5 Theorem. Let U, V, W be vector spaces and T1 : U → V , T2 : U → V and S :
V → W be linear transformations and λ ∈ R. Then the following maps are also linear
transformations:

(1) (T1 + T2 ), where (T1 + T2 )(u) := T1 (u) + T2 (u) for all u ∈ U .


(2) λT1 , where (λT1 )(u) := λT1 (u) for all u ∈ U .
(3) S ◦ T1 : U → W , where (S ◦ T1 )(u) := S(T1 (u)) for all u ∈ U .

Proof: (1) and (2): Exercise.


(3) Let u1 , u2 ∈ U and λ ∈ R. We have (S ◦T1 )(λu1 +u2 ) = S(T1 (λu1 +u2 )) = S(λT1 (u1 )+
T1 (u2 )) = λS(T1 (u1 )) + S(T1 (u2 )) = λ(S ◦ T1 )(u1 ) + (S ◦ T1 )(u2 ).

44
V.2 Row and column spaces of matrices

§6 Remark. If S and T are linear transformations, T ◦ S is usually written as T S and


T ◦ T as T 2 . Warning: in some books the order might be inverted, i.e. T S = S ◦ T .
§7 Example. Let T : R2 → R3 with T ((1, 1)T ) = (4, 3, 5)T and T ((0, 1)T ) = (1, 2, 0)T .
Problem: Compute T ((5, 3)T ). Note that (1, 1)T and (0, 1)T form a basis of R2 . Because
(5, 3)T = 5(1, 1)T − 2(0, 1)T , we have by §4: T ((5, 3)T ) = 5T ((1, 1)T ) − 2T ((0, 1)T ) =
(18, 11, 25)T .
§8 Matrices and linear transformations. Consider the vector space Rn with standard
basis e1 , . . . , en , i.e. the only non-vanishing entry of the vector ei is a 1 at position i. It is
easy to see that the jth column of the matrix A determines T (ej ). By theorem §4, a matrix
therefore determines a linear transformation. Inversely, given a linear transformation via
all the T (ei ) for ei the standard basis, we can read off the corresponding matrix A.
F How does one derive A from the T (ei ), if the ei are not the standard basis?

V.2 Row and column spaces of matrices


§1 Definition. Given a matrix
 
a11 a12 ... a1n
a21 a22 ... a2n
 
 
A= .. .. ..  ,
. . .
 
 
am1 am2 . . . amn

the vectors r1 = (a11 , a12 , . . . , a1n ), . . ., rm = (am1 , am2 , . . . , amn ) are called the row vectors
of A, while the vectors
   
a11 a1n
 a   a 
 21   2n 
c1 =   , . . . , cn =   ,
 ...   ... 
am1 amn

are called the column vectors of A. The subspace span{r1 , . . . , rm } ⊆ Rn is called the row
space and the subspace span{c1 , . . . , cn } ⊆ Rm is called the column space. We denote their
dimensions by rkr (A) and rkc (A) and call them the row and column rank of the matrix A.
§2 Example.
!
1 2 3 row space: span{(1, 2, 3), (4, 5, 6)} , 2 = rkr (A) ≤ 2 ,
A= T T T
4 5 6 column space: span{(1, 4) , (2, 5) , (3, 6) } , 2 = rkc (A) ≤ 3 .
! ! !
2 1 3
Note that 2 = + .
5 4 6
§3 Elementary row operations. The elementary row operations of II.3, §2 do not change
the span of the rows of a matrix (cf. III.5, §9). Therefore, they do not change the row rank
of a matrix.

45
V Linear transformations

§4 Determining row spaces. We can determine the row space by bringing the matrix
to row echelon form. Example:
   
1 2 −1 3 1 2 −1 3
 2 −1 2 1   0 −5 4 −5 
A= (cf. III.5, §11) .
   
4 3 0 7  0 0 1 2 
  
 
0 0 1 2 0 0 0 0

and we have span{(1, 2, −1, 3), (2, −1, 2, 1), (4, 3, 0, 7), (0, 0, 1, 2)} = span{(1, 2, −1, 3),
(0, −5, 4, −5), (0, 0, 1, 2), (0, 0, 0, 0)}. As the first three vectors are linearly independent,
we have rkr (A) = 3. In general, the number of non-zero rows in the row echelon form of a

matrix gives the row rank of the matrix. We can use (1, 2, −1, 3), (0, −5, 4, −5), (0, 0, 1, 2)
as a basis for the row space of A.

§5 Determining column spaces. To determine the column space of a matrix A, we


determine the row space of its transpose, AT :
     
1 2 −1 3 1 2 4 0 1 2 4 0
 2 −1 2 1  T
 2 −1 3 0   0 1 1 0 
A=  , A = A=  .
     
4 3 0 7  −1 2 0 1 0 0 0 1

    
0 0 1 2 3 1 7 2 0 0 0 0

Thus, (1, 2, 4, 0)T , (0, 1, 1, 0)T , (0, 0, 0, 1)T



is a basis for the column space of A and the
rank is rkc (A) = 3.

§6 Rank of a matrix. Above, we saw in the example that rkr (A) = rkc (A) = 3. This is
a consequence of the following general Theorem: rkc (A) = rkr (A) =: rk(A), the rank of
the matrix A.
Proof: Let (e1 , . . . , ek ) be a basis of the row space: span{r1 , . . . , rm } = span{e1 , . . . , ek }.
We then have:
   
r1 a11 e1 + . . . + a1k ek
 r2   a21 e1 + . . . + a2k ek 
   
A= . =
   .. 
 ..   .


rm am1 e1 + . . . + amk ek
 
a11 e11 + . . . + a1k ek1 a11 e12 + . . . + a1k ek2 . . . a11 e1n + . . . + a1k ekn
= .. .. ..
.
 
. . .
am1 e11 + . . . + amk ek1 am1 e12 + . . . + amk ek2 . . . am1 e1n + . . . + amk ekn

The column space is thus spanned by {(a11 , . . . , am1 )T , . . . , (a1k , . . . , amk )T }. It follows
that rkr (A) ≥ rkc (A). Interchanging rows and columns in this argument leads to rkc (A) ≥
rkr (A), and altogether, we have rkc (A) = rkr (A).

46
V.2 Row and column spaces of matrices

§7 Examples.
   
0 0 0 1 1 1
 0 0 0  has rank 0  1 −1 1  has rank 2
   
0 0 0 0 0 0
   
1 1 1 1 2 0
 2 2 2  has rank 1  0 1 0  has rank 3
   
3 3 3 0 0 1

§8 Invertibility of a matrix. Let A ∈ Matn . A can be transformed via elementary row


operations to 1, iff one can invert A. Because of §3, this is equivalent to the statement that
iff one can invert A, then the rank of A is maximal: rk(A) = n.
§9 Theorem. Let A be an m × n matrix. The general solution of the homogeneous SLE
Ax = 0 contains n − rk(A) arbitrary constants.
Proof: Recall that elementary row operations and therefore GE does not change the rank
of a matrix, cf. §3. Performing Gaußian elimination, we obtain a matrix with rk(A) non-
zero rows. Leading 1s occur in rk(A) rows, that is rk(A) of the n variables x1 , . . . , xn are
pivot variables and therefore fixed. It thus remains to specify n − rk(A) constants.
§10 Corollary. dim{x ∈ Rn : Ax = 0} = n − rk(A).
§11 Remarks. In the following, A is an m × n-matrix.
a ) rk(A) < n ⇒ Ax = 0 has infinitely many solution.
b ) rk(A) = n ⇒ Ax = 0 has only one solution x = 0.
c ) rk(A) > n is not possible, as the row space of A is a subspace of Rn .
§12 Theorem. The inhomogeneous SLE Ax = b is consistent, iff b is in the column space
of A.
Proof:
  
 a11 x1 + . . . + a1n xn 
.
 
{Ax : x ∈ R } = 
n ..  : x1 , ..., xn ∈ R
 
 
am1 x1 + . . . + amn xn
 
     
 a11 a1n 
 ..   .. 
 
= x1  .  + . . . + xn  .  : x1 , ..., xn ∈ R

 
am1 amn

   
 a11
 a1n 
 ..   .. 

= span  .  , . . . ,  . 

 
am1 amn

The last expression is the column space of A and thus consistency of the SLE Ax = b
requires that b is in the column space of A.

47
V Linear transformations

§13 Corollary. Let A be an m × n matrix. Then Ax = b is consistent for all b ∈ Rm iff


rk(A) = m.
Proof: The statement that Ax = b is consistent for all b ∈ Rm is equivalent to the fact that
the column space of A is Rm and thus that the (column) rank of A is maximal: rk(A) = m.

V.3 Range and kernel, rank and nullity


§1 Definition. Let T : U → V be a linear transformation. The kernel (or null space) of
T , denoted by N (T ) or ker(T ) is defined as N (T ) = {u ∈ U : T (u) = 0V }. The range
space of T , denoted by R(T ) is defined as the image of U under T : R(T ) = {v ∈ V : v =
T (u) for some u ∈ U } = {T (u) : u ∈ U }.

N (T ) R(T )
• T •
0U 0V

U V
Figure 2: The kernel and the range of a linear map T : U → V , depicted as sets.

§2 Examples. a ) Consider U = C ∞ (R), the vector space of smooth functions on R. Then


D : f 7→ f 0 is a linear transformation D : C ∞ (R) → C ∞ (R). We have N (D) = {x 7→ a : a ∈
R} ∼= R, the constant functions, and R(D) = C ∞ (R).F What happens to the D-operator
under Fourier transformation?
b ) Consider the linear transformation T : R2 → R3 with T (x) = (x1 , 0, 0). Then R(T ) =
{(a, 0, 0)T |a ∈ R} ∼ = R and N (T ) = {(0, a)T |a ∈ R}. Note that dim N (T ) + dim R(T ) =
dim(U ) = 2.
§3 Remark. Since T (0U ) = 0V due to V.1, §3, we have 0U ∈ N (T ) and 0V ∈ R(T ). This
is a first requirement for the following theorem to hold:
§4 Theorem. The kernel of a linear transformation T : U → V , N (T ), is a vector subspace
of U and the range R(T ) is a vector subspace of V .
Proof: We perform in both cases the vector subspace test: Let u1 , u2 ∈ N (T ). That
is, T (u1 ) = T (u2 ) = 0V . Therefore, u1 + u2 ∈ N (T ), as T (u1 + u2 ) = T (u1 ) + T (u2 ) =
0V + 0V = 0V . Also, λu1 ∈ N (T ) for any λ ∈ R, as T (λu1 ) = λT (u1 ) = λ0V = 0V . Thus,
N (T ) is a subspace of U .
Let now v 1 , v 2 ∈ R(T ). That is, there are vectors u1 , u2 ∈ U such that T (u1 ) = v 1 and
T (u2 ) = v 2 . It is T (u1 + u2 ) = T (u1 ) + T (u2 ) = v 1 + v 2 and T (αu1 ) = αT (u1 ) = αv 1 which

48
V.3 Range and kernel, rank and nullity

implies that both v 1 + v 2 and αv 1 are in R(T ). Therefore, R(T ) is a vector subspace of V .

§5 Theorem. Let A be an m×n matrix and let T : Rn → Rm be the linear transformation


T (x) := Ax. Then the kernel of T is the solution space to the homogeneous SLE Ax = 0
and the range of T is the column space of A.
Proof: The first statement is obvious. Note that the range of T is given by all linear
combinations of the column vectors of A:

R(T ) = R(A) = {T (x) : x ∈ Rn }


   
 x1 
..  : x , . . . , x ∈ R .
 
= {Ax : x ∈ Rn } = (c1 , c2 , . . . , cn ) 

 .  1 n


xn

The set of linear combinations of the column vectors of A is the column space.
§6 Example. Consider the linear transformation T given by the matrix A:
 
1 1 0 0
A =  0 1 −1 0  .
 
1 0 1 0

We immediately read off the range: span{(1, 0, 1)T , (1, 1, 0)T , (0, −1, 1)T } = span{(1, 0, 1)T ,
(1, 1, 0)T }. Also, the kernel is given by span{(0, 0, 0, 1)T , (−1, 1, 1, 0)T }. Studying several
examples, one observes that dim U = dim N (T ) + dim R(T ). We will develop this as a
theorem below.
§7 Definition. If T : U → V is a linear transformation, then the dimension of the range
of T is called the rank of T : rk(T ) := dim R(T ). The dimension of the kernel of T is called
the nullity of T .
§8 Remark. If a linear transformation T is given by an m × n matrix A as T (x) := Ax,
then the rank of T equals the dimension of the column space of A, which is the rank of the
matrix A.
§9 Exercise. Let T : R3 → R3 be given by
   
x x+y+z
T  y  =  2x − y − 2z  .
   
z 4x + y

Determine range, null-space, rank and nullity of the linear transformation T .


Solution: We have
      
x 1 1 1 x x
T y = 2 −1 −2   y  = A  y  .
      
z 4 1 0 z z

49
V Linear transformations

The row spaces of the following matrices are the same as the column space of A:
     
1 2 4 R2 →R2 −R1 1 2 4 1 2 4
R3 →R3 −R1
AT =  1 −1 1   0 −3 −3   0 1 1  .
     
1 −2 0 0 −4 −4 0 0 0

We see that (1, 2, 4)T , (0, 1, 1)T is a basis for the column space of A, which is the range of


T . The rank of T is therefore 2. The kernel of T is equal to the solutions to the homogeneous
SLEs corresponding to the following augmented matrices:
   
1 1 1 0 R2 →R2 −2R1 1 1 1 0
R3 →R3 −4R1
 2 −1 −2 0   0 −3 −4 0  .
   
4 1 0 0 0 −3 −4 0

That is, Ax = 0 has the solutions (x, y, z) with z = α, y = − 43 α, x = −y − z = 13 α. A basis


for the solution space is given by ( 13 , − 43 , 1)T . The solution space to the homogeneous


SLE corresponds to the kernel of T and so the nullity of T is 1. Note that we observed
again that for T : U → V , we have dim U = dim N (T ) + dim R(T ).

§10 Theorem. (Rank-Nullity Theorem) If T : U → V is a linear transformation, then the


dimension of U equals the rank of T plus the nullity of T : dim U = dim N (T ) + dim R(T ).
Proof: Suppose that dim U = n and dim N (T ) = k and let E1 = (e1 , . . . , ek ) be a basis for
N (T ). By theorem III.5, §13, we can extend E1 to a basis E = (e1 , . . . , ek , ek+1 , . . . , en )
for U . We now show that (T (ek+1 ), . . . , T (en )) is a basis for R(T ).
For any vector v ∈ R(T ), there is a vector u ∈ U such that T (u) = v. Using the basis E,
we write u = c1 e1 + . . . + cn en . The linear transformation T turns this into v = T (u) =
T (c1 e1 + . . . + ck ek + ck+1 ek+1 + . . . + cn en ) = c1 T (e1 ) + . . . + ck T (ek ) + ck+1 T (ek+1 ) + . . . +
cn T (en ) = ck+1 T (ek+1 ) + . . . + cn T (en ). Thus, v ∈ span{T (ek+1 ), . . . , T (en )}, and this set
equals R(T ).
This set is also linearly independent. Suppose ck+1 T (ek+1 ) + . . . + cn T (en ) = 0. Then
T (ck+1 ek+1 + . . . + cn en ) = 0 and ck+1 ek+1 + . . . + cn en is an element of N (T ). That is,
ck+1 ek+1 + . . . + cn en = d1 e1 + . . . + dk ek , which we reformulate as d1 e1 + . . . + dk ek −
ck+1 ek+1 − . . . − cn en = 0. Since (e1 , . . . , en ) are linearly independent, it follows that
ck+1 = . . . = cn = 0. The set {T (ek+1 ), . . . , T (en )} is thus independent and forms a basis
for R(T ).
Alternative Proof: Suppose now that the basis (u1 , . . . , un ) is orthonormal and that V has
an orthonormal basis (v 1 , . . . , v m ). Every linear transformation T : U → V can be written
as an m × n matrix A with entries hv i , T (uj )i at position i, j. We know that the rank
of T coincides with the rank of the matrix A. We also know that the dimension of the
solution space to Ax = 0 is n − rk(A), cf. corollary V.2, §10. It follows that dim(U ) = n =
n − rk(A) + rk(A) = dim N (T ) + dim R(T ).

50
V.4 Orthogonal linear transformations

V.4 Orthogonal linear transformations


§1 Remark. Let us now restrict ourselves to linear maps of the form T : Rn → Rn given by
an n × n matrix A: T (x) = Ax. A particularly important class of linear transformations p of
this type is formed by those, which preserve the Euclidean norm of T
pa vector,p||x|| := x .x:
p = ||T (x)|| = ||Ax|| for all x ∈ R . This equation amounts to x .x = (Ax) .(Ax) =
||x|| n T T

xT .AT Ax. We therefore define:


§2 Definition. A square matrix A is called orthogonal, if it satisfies the equation AT A =
1.
§3 Example. The rotations in R2 preserve the Euclidean norm of vectors:
! !
cos θ − sin θ T cos2 θ + sin2 θ − cos θ sin θ + cos θ sin θ
A= , A A= ,
sin θ cos θ − cos θ sin θ + cos θ sin θ cos2 θ + sin2 θ

and we have AT A = 12 .
§4 Remark. Orthogonal matrices preserve the norm of vectors. It follows that A is in-
vertible, that A has maximal rank and that AT = A−1 .
§5 Theorem. A square matrix A of size n × n is orthogonal iff its columns form an
orthonormal basis of Rn with respect to the Euclidean inner product.
Proof: Let ci be the ith column of A. We consider the entry of AT A at row i and column j.
This entry is given by the inner product of the i-th row of AT , which is the i-th column of
A, with the j-th column of A: cTi .cj = hci , cj i. The matrix AT A = 1 has 1s on the diagonal
and 0s everywhere else and thus AT A = 1. The condition that the matrix is orthogonal is
therefore equivalent to the condition that hci , ci i = 1 and hci , cj i = 0 for i 6= j.
§6 Theorem. The determinant of an orthogonal matrix is ±1.
Proof: Let A be the orthogonal matrix. Using the definition of the determinant and
formulæ in I.3, §15, we have 1 = det(1) = det(AT A) = det(AT ) det(A) = det(A)2 . It
follows that det(A) = 1 or det(A) = −1.
§7 Lemma. Orthogonal transformations leave the Euclidean inner product on Rn invari-
ant: hAx, Ayi = hx, yi for all x, y ∈ Rn . This is sometimes used as an alternative definition
of orthogonal transformations.
Proof: We have hx, yi := xT y = xT AT Ay = (Ax)T Ay = hAx, Ayi.
§8 Corollary. Orthogonal transformations map orthogonal bases to orthogonal bases.
§9 Remarks. a ) The rotation matrix above has orthogonal columns, it has determinant 1
and it leaves inner products on R2 invariant, as it leaves angles between vectors invariant.
b ) Consider the matrix
 1 1 1 
√ √ √
3 2 6
A= √1 − √12 √1  . (V.1)
 
3 6
√1 0 − √26
3

This matrix has orthonormal columns, and is therefore orthogonal. One easily verifies
AT A = 1 and det(A) = 1.

51
VI Eigenvalues and eigenvectors

c ) Because every orthogonal n × n matrix A is invertible, its kernel is trivial: N (A) = {0}
and its rank is maximal: rk(A) = n.

VI Eigenvalues and eigenvectors


VI.1 Characteristic polynomial
§1 Definition. Let A be an n × n matrix. Then λ ∈ R is an eigenvalue of A, if there is a
vector x ∈ Rn , x 6= 0, such that Ax = λx. The vector x is called an eigenvector of A for
the eigenvalue λ.
§2 Examples. a ) For A = 1, every vector x 6= 0 is an eigenvector with eigenvalue 1.
b ) The diagonal matrix
 
d1 0 . . . 0
0 d2 . . . 0
 
d1 , . . . , dn ∈ R ,
 
D= .. .. .  ,

 . . .. 

0 0 . . . dn

has eigenvalues d1 , . . . , dn with eigenvectors (c1 , 0, . . . , 0)T , . . . , (0, . . . , 0, cn )T , where


c1 , . . . , cn ∈ R.
c ) A rotation matrix A in two dimensions, A 6= ±12 , does not have any eigenvalues.
§3 Lemma. Given eigenvectors v 1 , . . . , v k of a matrix A with the same eigenvalue λ, then
all linear combinations of v 1 , . . . , v k are eigenvectors with eigenvalue λ.
Proof: We have A(c1 v 1 + . . . + ck v k ) = c1 (Av 1 ) + . . . + ck (Av k ) = c1 (λv 1 ) + . . . + ck (λv k ) =
λ(c1 v 1 + . . . + ck v k ).
§4 Remark. Pairs of eigenvalues and eigenvectors (λ, x) of a matrix A are solutions to
the equation Ax = λx or (A − λ1)x = 0. This is a homogeneous system of n equations in
n unknowns. To have a nontrivial solution, the rank of (A − λ1) must not be maximal.
Therefore, (A − λ1) is not invertible and the determinant of (A − λ1) has to vanish. This
motivates the following definition:
§5 Definition. Given an n × n-matrix A, the expression

a −λ a12 ... a1n
11
a21 a22 − λ . . . a2n

det(A − λ1) =:

.. .. ..
. . .



an1 an2 . . . ann − λ

is a polynomial p(λ) of degree n in λ, the characteristic polynomial of the matrix A. The


condition for λ being an eigenvalue, i.e. the equation p(λ) = 0, is called the characteristic
equation of A.

52
VI.1 Characteristic polynomial

§6 Exercise. Find the eigenvalues and eigenvectors of the matrix


 
3 2 4
A= 2 0 2  .
 
4 2 3

Solution: The characteristic polynomial of A reads as (rule of Sarrus)



3−λ 2 4

2 −λ 2 = −(3 − λ)2 λ + 16 + 16 + 16λ − 4(3 − λ) − 4(3 − λ) ,


4 2 3−λ

or p(λ) = −λ3 + 6λ2 + 15λ + 8. We guess λ = −1 as a zero of this polynomial and factorise
it as −(λ+1)(λ+1)(λ−8). That is, the eigenvalues of A are −1 and 8. We now wish to find
the corresponding eigenvectors v = (x, y, z)T . For the eigenvalue λ = −1, the eigenvalue
equation yields
3x + 2y + 4z = −x , 4x + 2y + 4z = 0 ,
2x + 2z = −y , or 2x + y + 2z = 0 ,
4x + 2y + 3z = −z , 4x + 2y + 4z = 0 .

An eigenvector (x, y, z)T of A with eigenvalue -1 has to satisfy 2x + y + 2z = 0. The


solutions to this equation form a two-dimensional subspace of R3 , {(− 12 (2α + β), β, α)T }
spanned, e.g. by {(−1, 0, 1)T , (− 12 , 1, 0)T }. Note that these two vectors are eigenvectors
with eigenvalue −1 as is every linear combination of them.
For the eigenvalue 8, we obtain

3x + 2y + 4z = 8x , −5x + 2y + 4z = 0 ,
2x + 2z = 8y , or 2x − 8y + 2z = 0 ,
4x + 2y + 3z = 8z , 4x + 2y − 5z = 0 .

We use Gaußian elimination:


   
1 −4 1 0 R2 →R2 +5R1 1 −4 1 0
R3 →R3 −4R1
 −5 2 4 0   0 −18 9 0  .
   
4 2 −5 0 0 18 −9 0

The solutions of this homogeneous SLE are x = (α, 12 α, α)T and an eigenvector is given by
(1, 21 , 1)T . Eigenvectors with eigenvalue 8 are therefore multiples of (1, 12 , 1)T .
§7 Remarks. a) By the fundamental theorem of algebra, the characteristic polynomial of
an n × n matrix A can be factorised into n linear factors:

p(λ) = ±(λ − λ1 )(λ − λ2 ) . . . (λ − λn ) ,

53
VI Eigenvalues and eigenvectors

where λi are (not necessarily distinct) complex numbers. Thus, A has at most n (real)
eigenvalues.
b) A matrix may have no real eigenvalues, e.g.
!
0 1 −λ 1
A= , = λ2 + 1 = (λ + i)(λ − i) = p(λ)

−1 0 −1 −λ

has a characteristic polynomial p(λ) without real roots.

§8 Theorem. Let u1 , . . . , uk be eigenvectors of a matrix A corresponding to distinct eigen-


values λ1 , . . . , λk . Then the set {u1 , . . . , uk } is linearly independent.
Proof: Assume that we have constants c1 , . . . ck such that c1 u1 +c2 u2 +. . .+ck uk = 0. Note
that (A−λi 1)ui = 0. By acting onto both sides of the above equation with (A−λ1 1) . . . (A−
λk−1 1), we obtain ck (λk − λ1 )(λk − λ2 ) . . . (λk − λk−1 )uk = 0. Since the eigenvalues are
all distinct, it follows that ck = 0. We are left with c1 u1 + c2 u2 + . . . + ck−1 uk−1 = 0. By
applying (A − λ1 1) . . . (A − λk−2 1) to this equation, we conclude that ck−1 = 0. We can
continue this to show that all the ci vanish, and thus that the set {u1 , . . . , un } is linearly
independent.
§9 Corollary. Let A be an n × n matrix with n distinct real eigenvalues λ1 , . . . , λn . Then
the corresponding eigenvectors {u1 , . . . , un } form a basis for Rn .
§10 Definition. We can reduce the characteristic polynomial to distinct eigenvalues:
p(λ) = ±(λ−λ1 )µ1 . . . (λ−λk )µk . The number µi is called algebraic multiplicity of the eigen-
value λi . Note that the eigenvectors to the eigenvalue λi span a corresponding eigenspace,
and its dimension is called the geometric multiplicity of λi .
§11 Remark. Consider an n × n matrix A. As polynomials of degree n do not factorise
into n roots over R, the sum of the algebraic multiplicities of the roots of p(A) is less or
equal n. The geometric multiplicity is maximally equal to the algebraic multiplicity14 . It
can be less, as the following example shows. Take
   
0 2 −1 0−λ 2 −1
A =  2 −1 1  , with A − λ1 =  2 −1 − λ 1  . (VI.1)
   
2 −1 3 2 −1 3−λ

Its characteristic polynomial is p(A) = det(A − λ1) = −(λ − 2)(λ − 2)(λ + 2) = −(λ −
2)2 (λ + 2). The eigenvalues u1 , u2 of A corresponding to the eigenvalues 2 and −2 are given
by
u1 = (−1, 0, 2) , u2 = (−3, 4, 2) . (VI.2)

Thus, although the algebraic multiplicity of the eigenvalue 2 is 2, its geometric multiplicity
is 1.
14
A strict proof of this statement is possible with similarity transforms introduced in the next section. F
Try to find the proof once you studied the material in the next section.

54
VI.1 Characteristic polynomial

§12 Lemma. For every root of the characteristic polynomial of a matrix A, there is at
least one eigenvector of A. That is, the geometric multiplicity is always larger or equal 1.
Proof: Consider a matrix A and a root λ of its characteristic polynomial p(A). Since
det(A − 1λ) = 0, this matrix is not invertible and has a nontrivial kernel, i.e. the kernel
contains a vector u 6= 0. This is an eigenvector u, as it solves (A − 1λ)u = 0 or Au = λu.

§13 Theorem. Real symmetric matrices have real eigenvalues.


Proof: Let A be a real symmetric matrix. If λ ∈ C is an eigenvalue of A, then so is15 λ∗ :
T
Ax = λx ⇒ Ax∗ = λ∗ x∗ . As A is symmetric, we also have (x∗ )T Ax = (x∗ )T Ax =
T T ∗ T ∗ ∗ T T ∗ ∗ T T
x A (x) = x A(x) , and thus (x ) xλ = x (x )λ . Due to a b = b a, we have
(x∗ )T xλ = (x∗ )T xλ∗ and therefore λ = λ∗ . We conclude that every eigenvalue of A is
real.
§14 Corollary. The sum of the algebraic multiplicities of the eigenvalues of a real sym-
metric n × n matrix is n.
§15 Theorem. For every real symmetric n × n matrix A, the algebraic multiplicity of an
eigenvalue equals its geometric multiplicity. That is, there is a basis of Rn consisting of
orthogonal eigenvectors of A.
Proof: Consider one eigenvalue λ of A with eigenvector x. Reduce Rn to the subspace
V := {v ∈ Rn |hv, xi = 0}, with an inner product induced from16 Rn . This subspace is an
invariant subspace of A, i.e. Av ∈ V :

0 = hv, xi = hv, Axi = v T Ax = v T AT x = (Av)T x = hAv, xi . (VI.3)

Moreover, as hAv, Axi = λhAv, xi = 0, A : Rn → Rn splits into two maps: A1 : R → R


and A2 : V ∼= Rn−1 → V . On V , A is thus given by an (n − 1) × (n − 1)-dimensional matrix
A . This matrix is moreover symmetric, as hv 1 , A0 v 2 i = hv 1 , Av 2 i = hAv 1 , v 2 i = hA0 v 1 , v 2 i,
0

and we can start this procedure from the top. After n steps, we find n eigenvectors for the
various eigenvalues of A.
§16 Example. Consider the symmetric matrix
 
1 −1 1
A =  −1 2 −1  . (VI.4)
 
1 −1 1
√ √
This matrix has eigenvalues (0, 2− 2, 2+ 2) with corresponding eigenvectors (−1, 0, 1)T ,
√ √
(1, 2, 1)T , (1, − 2, 1)T .


§17 Definition. A real symmetric matrix with exclusively positive eigenvalues is called a
positive definite matrix.
15 ∗
λ is the complex conjugate of λ, and x∗ is the vector obtained by complex conjugating the components
of x.
16
R
That is, to compute the inner product of two vectors in V , take their inner product in n .

55
VI Eigenvalues and eigenvectors

§18 Theorem. Given a positive definite n × n matrix A, the expression

hx, yi := xT Ay , x, y ∈ Rn (VI.5)

defines an inner product on Rn .


Proof: Linearity in each slot of the above expression is obvious. Symmetry follows anal-
ogously: hx, yi = xT Ay = xT AT y = (Ax)T y = y T Ax = hy, xi. As A is real symmetric,
there is a basis of eigenvectors of A, (e1 , . . . , en ) with corresponding eigenvalues λ1 , . . . , λn ,
which is orthogonal with respect to the usual Euclidean inner product. Consider a vector
v ∈ Rn . Then v = c1 e1 + . . . + cn en , and hv, vi = hc1 e1 + . . . + cn en , c1 e1 + . . . + cn en i =
(c1 e1 + . . . + cn en )T A(c1 e1 + . . . + cn en ) = (c1 e1 + . . . + cn en )T (c1 λ1 e1 + . . . + cn λn en ) =
c21 λ1 eT1 e1 + . . . + c2n λn eTn en ≥ 0. Furthermore, this expression vanishes iff c1 = . . . = cn = 0
and therefore v = 0. Thus, the expression (VI.5) is positive definite and therefore an inner
product.
§19 Corollary. Eigenvectors of real symmetric matrices with distinct eigenvalues are or-
thogonal (as opposed to merely linearly independent, cf. §8).
Proof: Consider a real symmetric matrix A together with eigenvectors x, y belonging to
eigenvalues λ1 6= λ2 . We then have

λ2 xT y = xT Ay = y T Ax = λ1 y T x = λ1 xT y . (VI.6)

It follows that (λ2 − λ1 )xT y = 0 which, together with λ1 6= λ2 , implies xT y = 0.

VI.2 Diagonalising matrices


§1 Definition. Given a basis E = (e1 , . . . , en ) of Rn , we can write any vector v ∈ Rn as
a linear combination
 
v1
v = e1 v1 + . . . + en vn = (e1 . . . en )  ...  .
 

vn

If E is an orthonormal basis, then the coefficients vi are given by the inner products
vi = hei , vi. The vector (v1 , . . . , vn )T is called the coordinate vector of the vector v with
respect to the basis E.
§2 Change of basis. Above, we saw that certain bases, e.g. orthonormal ones, are more
convenient than others. Note that the vectors in a new basis (b1 , . . . , bn ) can be written as
linear combinations of the old ones:

b1 = p11 e1 + p21 e2 + . . . + pn1 en


.. or (b1 b2 . . . bn ) = (e1 e2 . . . en )P .
.
bn = p1n e1 + p2n e2 + . . . + pnn en

56
VI.2 Diagonalising matrices

The matrix P is called the transformation matrix of the change of basis. Let us now look
at the coordinate vector (v1 , . . . , vn )T of v:
     
v1 w1 w1
 ..   ..  . 
v = (e1 . . . en )  .  = (b1 . . . bn )  .  = (e1 . . . en )P  ..  .

vn wn wn

It follows that (v1 . . . vn )T = P (w1 . . . wn )T or (w1 . . . wn )T = P −1 (v1 . . . vn )T , where


(w1 . . . wn ) is the coordinate vector of v with respect to the new basis (b1 , . . . , bn ). That
is, coordinate vectors transform with the inverse of the transformation matrix. How do
matrices encoding linear transformations change?
§3 Similarity transformations. Let (e1 , . . . , en ) be a basis for Rn , and a vector v
written in terms of this basis as a coordinate vector (v1 . . . vn )T . Consider now a lin-
ear transformation T : Rn → Rn such that the coordinate vector of T (v) is given by
(x1 . . . xn )T = A(v1 . . . vn )T . The change of basis corresponding to the transformation
matrix P is given by
        
x1 v1 v1 w1
P −1  ...  = P −1 A  ...  = P −1 AP P −1  ...  = P −1 AP  ... 
        

xn vn vn wn

We see that if A describes a linear transformation with respect to the basis (e1 , . . . , en ),
then P −1 AP describes this linear transformation with respect to the basis (b1 , . . . , bn ) =
(e1 , . . . , en )P . The map
A 7→ P −1 AP =: Ã
is called a similarity transformation. We say that A and à are similar matrices.
§4 Lemma. Similar matrices have the same determinant.
Proof: Let A and à be similar n × n matrices and P a transformation matrix on Rn . We
have det(A) = det(P −1 ÃP ) = det(P −1 ) det(Ã) det(P ) = det(Ã).
§5 Lemma. Similar matrices have the same rank, the same trace, the same eigenvalues
(but not necessarily the same eigenvectors) and the same characteristic polynomial.
Proof: Exercise.
§6 Remark. If we study a linear transformation T : Rn → Rn , a particularly useful basis
is a basis consisting of eigenvectors of the matrix A corresponding to T .
§7 Theorem. (Diagonalising matrices) Let A be an n × n matrix with n linearly inde-
pendent eigenvectors {u1 , . . . , un } corresponding to the eigenvalues λ1 , . . . , λn . Then we
have
 
λ1 0 . . . 0
 0 λ2 . . . 0 
 
−1
P AP = D , where D =  . 
..  and P = u1 . . . un ) .
.. 
 . . . . 
0 0 . . . λn

57
VI Eigenvalues and eigenvectors

We call such a matrix diagonalisable.


Proof: We compute:
AP = A(u1 . . . un ) = (Au1 . . . Aun ) = (λ1 u1 . . . λn un )
 
λ1 0 . . . 0
 0 λ2 . . . 0 
 
= (u1 . . . un ) 
 .. .. .  = PD .
 . . ..  
0 0 . . . λn

Because the columns of P are linearly independent, rk(P ) = n and therefore P is invertible.

§8 Corollary. Real symmetric matrices are diagonalisable, cf. VI.1, Theorem §15.
§9 Remark. Examples for matrices which are not diagonalisable:
! !
0 1 0 1
A= , B= .
−1 0 0 0

The matrix A has complex eigenvalues and therefore cannot be diagonalised as a real
matrix. The matrix B has eigenvalue 0 with algebraic multiplicity 2 (i.e. the characteristic
polynomial has two roots) and geometric multiplicity 1 (i.e. there is only one vector mapped
to the zero vector).
§10 Example. Consider the linear transformation T : R2 → R2 , T (x) = Ax with
!
2 −1
A= .
−1 2

We compute the eigenvalues to be λ = 3 and λ = 1 and the eigenvectors are (1, −1)T and
(1, 1)T . We gain some intuition about the properties of T , if we look at the image of all
vectors in R2 of length 1: x2

x1

Vectors with a tip on the circle are mapped to vectors with tip on the ellipse, where the
eigenvectors are the main axes. Diagonalising A means to go to a coordinate system, where
the main axes correspond to the coordinate axes.
F What is the corresponding picture in higher dimensions, e.g. for R3 ?

58
VI.3 Applications

Figure 3: The three types of conic sections we study here: circle, ellipse and hyperbola.

§11 Consistency check. Consider a matrix A and its diagonalised form D = P −1 AP .


Due to lemma §5, we have tr(A) = tr(D). Because tr(D) is the sum of the eigenvalues, we
can always check our calculations when diagonalising a matrix: the sum of the eigenvalues
has to equal tr(A). In the above example, this is true as tr(A) = 2 + 2 = λ1 + λ2 = 3 + 1.
F What is the corresponding statement for determinants?
Also, for a real symmetric matrix, the eigenvectors for different eigenvalues have to be
orthogonal with respect to the Euclidean inner product, cf. VI.1, §19 All the inner products
between eigenvectors to different eigenvalues therefore have to vanish. In the example above,
we have h(1, −1)T , (1, 1)i = 1 − 1 = 0, as expected.
Moreover, since any matrix A that is real symmetric, i.e. AT = A, can be diagonalised,
we know that its trace is the sum of its eigenvalues.

VI.3 Applications
§1 Conic Sections. Circle and ellipse are special cases of conic sections. These are curves
obtained by intersecting a double cone by a plane, see figure 3. Leaving out the parabolas
y = ax2 , we have the following equations for the various conic sections:

x2 y2
Circle: + =1,
a2 a2
x2 y2
Ellipse: + 2 =1,
a2 b
x2 y2
Hyperbola, main axis: x-axis − 2 =1,
a2 b
y2 x2
Hyperbola, main axis: y-axis − 2 =1.
a2 b

Note that we can encode a conic section in a symmetric 2 × 2 matrix A:


!T ! !T ! !
x x x a b x
A = = ax2 + 2bxy + cy 2 .
y y y b c y

59
VI Eigenvalues and eigenvectors

If we diagonalise this matrix, we can read off the type of conic.


§2 Example. Consider the conic given by the equation x2 + 4xy + y 2 = 1:
!T ! !T ! !
x x x 1 2 x
A = =1.
y y y 2 1 y

The characteristic polynomial of A is (1 − λ)2 − 4 and the eigenvalues are λ = −1 and


λ = 3. The corresponding eigenvectors are (1, −1)T and (1, 1)T . Note that the lengths in
our construction are important, we therefore choose P to be orthogonal. The transformation
matrix is thus ! !
1 1 1 −1 1 1 −1
P = √2 ⇒ P = √2 .
−1 1 1 1
The new coordinate vector is (d1 , d2 )T = P −1 (x, y)T = √1 (x
2
− y, x + y)T and we have
!T ! !T ! !
x−y x−y x−y −1 0 x−y
1= √1 P −1 AP √1 = 1
2
2 x+y 2 x+y x+y 0 3 x+y
= − 21 (x − y)2 + 32 (x + y)2 = x2 + 4xy + y 2 .
The conic is thus a hyperbola with main axis y = x.
§3 Powers of a matrix. Diagonalising a matrix helps to compute its powers. Assume
that we have diagonalised an n × n matrix A: A = P DP −1 , where D is a diagonal matrix
with entries d1 , . . . , dn . We then have

Ak = (P DP −1 )k = P DP −1 P DP −1 . . . P DP −1 = P Dk P −1 .

Here, Dk is simply the diagonal matrix with entries dk1 , . . . , dkn . Note that this also works
for rational k.
§4 Exercise. Let !
2 2
A= .
1 3
Find A20 . Find a matrix B such that B 2 = A.
Solution: Using the characteristic polynomial, we find that A has eigenvalues λ = 1 and λ =
4. The corresponding eigenvectors are (2, −1)T and (1, 1)T . We have the diagonalisation
! ! ! !
2 2 −1 2 1 1 0 1 1 −1
A= = P DP = 3 .
1 3 −1 1 0 4 1 2

We thus have
! ! !
2 1 1 0 1 −1
A20 = P D20 P −1 = 1
= ...
3 −1 1 0 420 1 2
!
1 2 + 420 −2 + 2 × 420
= 3 −1 + 420 1 + 2 × 420

60
VI.3 Applications

Note that the diagonal matrix D1 with entries 1, 2 satisfies D12 = D. Thus, B = P D1 P −1
satisfies B 2 = P D1 P −1 P D1 P −1 = P D12 P −1 = P DP −1 = A. We have
! ! ! !
2 1 1 0 1 −1 4 2
B = 13 = . . . = 31 .
−1 1 0 2 1 2 1 5

§5 Theorem. (Cayley-Hamilton Theorem) Let A be an n × n matrix with characteristic


polynomial p(λ) = cn λn + . . . + c1 λ + c0 . Then cn An + . . . + c1 A + c0 1 = 0. That is, a
matrix satisfies its own characteristic equation p(A) = 0.
Proof: The theorem is clear, if A is diagonalisable and there is a basis of Rn in terms
of eigenvectors {u1 , . . . un }: We have p(A)ui = cn An ui + . . . + c1 Aui + c0 ui = cn λni ui +
. . . + c1 λi ui + c0 ui = p(λi )ui = 0. Note that p(A) is a linear transformation Rn → Rn and
according to V.1, §4 this implies that p(A)v = 0 for all v ∈ V . It follows p(A) = 0.
The theorem holds also for non-diagonalisable matrices, but in this case, it is considerably
more difficult to prove17 .
§6 Exercise. Consider a 3 × 3 matrix A with eigenvalues λ = 1, λ = 2 and λ = 3. Use the
Cayley-Hamilton Theorem to express A4 and A−1 in terms of 1, A and A2 .
Solution. The characteristic polynomial is p(λ) = (1−λ)(2−λ)(3−λ) = −λ3 +6λ2 −11λ+6.
By the C-H Theorem, we have A3 −6A2 +11A−6 1 = 0 or A3 = 6A2 −11A+6 1, from which
we compute: A4 = 6A3 − 11A2 + 6A = 6(6A2 − 11A + 6 1) − 11A2 + 6A = 25A2 − 60A + 36 1.
(Similarly, one can reduce any power An with n > 2.) We also have A(A2 − 6A + 11 1) = 61
and therefore A−1 = 16 (A2 − 6A + 11 1).
§7 Analytic functions of matrices. Consider a function f : R → R, which is analytic
f (i) (x)
at 0. That is, we can expand f as the power series f (x) = ∞ i
P
i=0 ai x with ai = i! .
Consider now an n × n matrix A with characteristic polynomial p(λ) and real eigenvalues
λ1 , . . . , λn . We can write f (x) = Q(x)p(x) + R(x), where Q(x) is an analytic function and
R(x) is a polynomial of degree d < n. Since p(A) = 0, we have f (A) = R(A). Note that
we therefore have n equations f (λi ) = R(λi ), which, if they are all distinct18 , allow us to
determine the polynomial R(x).
§8 Example. Assume that A is a 2 × 2 matrix with characteristic polynomial p(λ) =
3 5 7
(1−λ)(2−λ) and that we need to compute sin(A). We have sin(x) = x− x3! + x5! − x7! +. . . =
Q(x)p(x) + R(x). Here, R(x) = α0 + α1 x is a polynomial of degree 1. As the eigenvalues
are λ = 1 and λ = 2, we have as additional conditions

sin(1) = α0 + α1
and therefore α1 = sin(2) − sin(1) , α0 = 2 sin(1) − sin(2) .
sin(2) = α0 + 2α1

Altogether, we computed
sin(A) = R(A) = α0 1 + α1 A .
17
One can e.g. show that there is a diagonalisable matrix “arbitrarily close” to any non-diagonalisable
matrix. Then one argues that the theorem holds by continuity of the characteristic polynomial.
18
If they are not, one can consider the derivatives of these equations with respect to the λi to get a full
set of equations.

61
VI Eigenvalues and eigenvectors

§9 Coupled harmonic oscillators. Consider two masses m1 = m2 = m suspended


between three springs with identical spring constants κ:

κ κ κ
m m

Let x1 (t) and x2 (t) describe the positions of the two masses at time t. Physics tells us, that
this system is governed by the following equations of motion:
d2 x1 (t)
mẍ1 (t) := m = −κx1 (t) + κ(x2 (t) − x1 (t)) ,
dt2
d2 x2 (t)
mẍ2 (t) := m = −κx2 (t) + κ(x1 (t) − x2 (t)) ,
dt2
or
! ! ! ! ! !
ẍ1 (t) 1 2κ −κ x1 (t) ẍ1 (t) x1 (t) 0
+ = +A = .
ẍ2 (t) m −κ 2κ x2 (t) ẍ2 (t) x2 (t) 0

We recall/look up/happen to know that the solution to ẍ(t) + ax(t) = 0 for a > 0 is
√ √
x(t) = φ cos( at) + ψ sin( at) with φ = x(0) and ψ = ẋ(0)
√ :
a
√ √
ẍ(t) = φ(−a cos( at)) + ψ(−a sin( at)) .

This leads us to trying the solution:


! ! !
x1 (t) √  φ1 √  ψ1
= cos At + sin At .
x2 (t) φ2 ψ2

We thus have to compute the matrix B = A and then sin(Bt) and cos(Bt). The charac-
teristic polynomial of A reads as
2
κ2 4κλ 3κ2


− λ − 2 = λ2 − + 2 ,
m m m m
κ
and therefore the eigenvalues are λ = m and λ = 3κ m . The eigenvectors can be found by
solving the homogeneous SLEs (A − m 1)x = 0 and (A − 3κ
κ
m 1)x = 0, but we can also guess
them easily in this case: They are (1, 1)T and (1, −1)T . We thus have
! pκ ! !!
√ √ −1 1 1 m q0 1 1 1
B = A = P DP = 3κ
1 −1 0 m
2 1 −1
r √ √ !
1 κ 1+ 3 1− 3
= √ √ .
2 m 1− 3 1+ 3

Let us now compute sin(Bt) and cos(Bt). The eigenvalues of Bt are λ1 = m t and
q

λ2 = m t and the characteristic polynomial reads as p(λ) = (λ − λ1 )(λ − λ2 ). We

62
VI.3 Applications

decompose the functions sin(x) and cos(x) into Q(x)p(x) + R(x) with R(x) = α0 + α1 x and
find r  r r ! r
κ sin sin κ 3κ sin sin 3κ
sin t = α0 + α1 t and sin t = α0 + α1 t
m m m m
or
√ pκ  q 

q 

pκ 
3 sin m t − sin mt sin mt− sin mt
α0sin = √ and α1sin = √ pκ .
3−1 ( 3 − 1) m t

For the cosine, we obtain the same, just replace sin everywhere by cos. Let us put the result
together.
The vector (φ1 , φ2 )T clearly specifies the initial position of the two oscillators. For
simplicity, let us put them to zero: (φ1 , φ2 )T = (0, 0)T . The vector (ψ1 , ψ2 )T specifies the
initial direction of the two oscillators. To see this, assume that (ψ1 , ψ2 )T = (1, 1)T . Since
(1, 1)T is an eigenvector of A, it is an eigenvector of B and of Bt and thus multiplication by
sin(Bt) just changes the prefactor. The oscillators keep moving in parallel. On the other
hand, starting from the eigenvector (ψ1 , ψ2 )T = (1, −1)T , the oscillators always move in
opposite directions. Starting from a general configuration leads to a linear combination of
these two normal modes of the coupled oscillators, which is given by
! ! !
x1 (t) φ1  ψ1
= (α0 1 + α1 Bt)
cos cos
+ α0 1 + α1 Bt
sin sin
.
x2 (t) φ2 ψ2

§10 Damped harmonic oscillator. Another, very similar example is the damped har-
monic oscillator. We will describe the position of the oscillator by the coordinate function
x(t), its mass by m, the spring constant by κ and the dampening constant by γ. The
equation of motion reads as

mẍ(t) = −κx(t) − 2mγ ẋ(t) .

By introducing a function p(t) = mẋ(t) (the momentum of the oscillator), we can rewrite
this as the following matrix equation:
! ! ! !
1
x(t) 0 m x(t) ẋ(t)
A = = .
p(t) −κ −2γ p(t) ṗ(t)

We will use the ansatz x(t) = Re(x0 exp(λ± t)) and p(t) = Re(p0 exp(λ± t)), where x0 , p0
are real, but we allow λ± ∈ C. Because ẋ(t) = λ± x(t) and ṗ = λ± p, the λ± are the two
eigenvalues of A. We obtain:
p p
−mγ + −κm + m2 γ 2 −mγ − −κm + m2 γ 2
λ+ = and λ− = .
m m
If γ is large, i.e., if the damping is strong, then m2 γ 2 > Dm, and the eigenvalues are λ±
are real and negative. The oscillator just dies off without oscillating. If the damping is

63
VI Eigenvalues and eigenvectors

smaller, then the eigenvalues are complex. Recall the formula exp(iα) = cos(α) + i sin(α).
Splitting λ± into real and imaginary parts: λ± = λR I
± + iλ± , we have

x(t) = Re x0 exp(λ± t) = Re x0 exp(λR I R I


  
± t + iλ± t) = Re x0 exp(λ± t) exp(iλ± t)
= x0 exp(λR I
± t) cos(λ± t) .

Here, x0 is the initial elongation of the oscillator, the exponential describes the dampening
and the cosine is responsible for the oscillations.

1.5 x0 1.5 x0
1.0 1.0
0.5 0.5
t t
0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0
- 0.5 - 0.5
- 1.0 - 1.0

Figure 4: Left: the case of pure damping with λ+ = −3 ∈ R. Right: The case of complex
λ+ , where we chose λR I
+ = −3 and λ+ = 20.

§11 Linear recursive sequences. Consider the linear recursive sequence an+1 = an +
an−1 . With the initial values a0 = 0 and a1 = 1, the ai are known as the Fibonacci numbers:
0, 1, 1, 2, 3, 5, 8, 13, . . . We can encode this recurrence relation into the matrix equation
! ! ! !
an+1 an 1 1 an
=A = .
an an−1 1 0 an−1

To compute an+1 from given a1 and a0 , we just have to apply An to the initial vector.
Using the diagonal form of A, we know how to do this efficiently: If A = P DP −1 , then
√ √
An = P −1 Dn P . The eigenvalues of the matrix A are λ1 = 12 (1 + 5) and λ2 = 12 (1 − 5)
√ √
with the corresponding eigenvectors x1 = ( 12 (1 + 5), 1)T and x2 = ( 21 (1 − 5), 1)T . One
can show that in terms of eigenvectors, our initial conditions read as
!
1 1 √ 1 √
= (5 + 5)x1 + (5 − 5)x2 .
1 10 10

If we apply An to our initial vector, we obtain


!
n 1 1 √ 1 √
A = (5 + 5)λn1 x1 + (5 − 5)λn2 x2 .
1 10 10

Note that −1 < λ2 = −0.618 . . . < 0, and therefore λn2 tends to zero as n goes to infinity.
Thus, for large n, the contribution of x2 becomes negligible. We then make a curious

64
observation: The ratio of an+1 /an approaches the ratio of the components of x1 , which is

the golden ratio ϕ := 12 (1+ 5) = 1.61803 . . . Recall that a line is divided into two segments
of length a and b according to the golden ratio, if a+b a a
a = b . It follows b = ϕ:

a b
a+b

§12 Population growth model. Originally, the Fibonacci sequence was used to model
the growth of a rabbit population. Let us give a similar example: Consider a population
of animals of a0 adults and j0 juveniles. From one year to the next, the adults will each
produces γ juveniles. Adults will survive from one year to the next with probability α,
while juveniles will survive into the next year and become adults with the probability β.
Altogether, we have
! ! ! !
an+1 = αan + βjn an+1 an α β an
or =A = .
jn+1 = γan jn+1 jn γ 0 jn

Assume that α = 12 , β = 10 3
. What is the minimum value of γ to have a stable population?
1
√ √ 1
√ √
The eigenvalues of A are λ1 = 20 (5 − 5 5 + 24γ) and λ2 = 20 (5 + 5 5 + 24γ). To
have a stable population, we need an eigenvector of A with eigenvalue 1. As λ2 > λ1 we
demand that λ2 = 1, and find γ = 35 . The corresponding eigenvector reads as ( 35 , 1)T . Thus,
starting e.g. from a population of 3000 adults and 5000 juveniles, this population would be
(statistically) stable for γ = 53 .

VII Advanced topics


This section contains additional material that will not be relevant for the exam. However,
you will find that the topics discussed in the first two subsections will be very helpful in
your future life as a student and practitioner of mathematics. Also, they contain many
applications of the results of the previous sections which can be useful as examples. The
subsection on special relativity is included for students interested in theoretical physics. It
can be easily understood with some background in linear algebra.

VII.1 Complex linear algebra


Complex linear algebra deals with vector spaces over the complex numbers instead of vec-
tor spaces over R. Most definitions carry over straightforwardly, and one merely has to
adjust the notion of transposition, inner product and orthogonality in most definitions and
theorems.
§1 Complex numbers. Many polynomial equations do not have solutions in the real
numbers. Consider for example x2 + 1 = 0. The solution to this equation is a number that

squares to −1. Let us introduce a fictitious such number and denote it by i = −1. We

65
VII Advanced topics

can use this number to generate a new kind of complex number, consisting of a real part
and a multiple of i, the imaginary part: z = x + iy, x, y ∈ R. We use the symbol C for
such numbers. One can now consistently define the sum and the product of two complex
numbers and the computational rules correspond to those of real numbers: Consider two
complex numbers z1 = x1 + iy1 and z2 = x2 + iy2 . We define:

z1 + z2 = (x1 + iy1 ) + (x2 + iy2 ) = x1 + x2 + i(y1 + y2 ) ,


z1 z2 = (x1 + iy1 )(x2 + iy2 ) = x1 x2 + x1 iy2 + iy1 x2 + i2 y1 y2 (VII.1)
= (x1 x2 − y1 y2 ) + i(x1 y2 + x2 y1 ) .

These operations satisfy the usual associative, commutative and distributive laws of real
numbers. Using Maclaurin series, we can also define other functions of complex numbers,
for example:
z2 z3 z4
exp(z) = 1 + z + + + + ··· . (VII.2)
2! 3! 4!
We define one special map, which is called complex conjugation and denote it by an asterisk.
This map inverts the sign of the imaginary part: the complex conjugate of z = x + iy is
z ∗ = x − iy.
We can regard complex numbers as elements of the 2-dimensional vector space R2 with
basis (1, i) and coordinates (x, y)T . F Other vector spaces which can be treated as if their
elements were numbers are R4 yielding the quaternions (one looses commutativity) and R8
yielding the octonions (one looses associativity). Have a look at their multiplication rules.
As points onp the plane R2 , we can also describe a complex number z = x+iy by a radial
coordinate r = x2 + y 2 and an angle tan(θ) = xy . We then have z = r cos θ + ir sin θ =
r(cos θ + i sin θ). Consider the following Maclaurin series:

θ2 θ4 θ6 θ8
cos θ = 1 − + − + − ··· ,
2! 4! 6! 8!
θ3 θ5 θ7 θ9
sin θ = θ − + − + − ··· , (VII.3)
3! 5! 7! 9!
θ2 θ3 θ4 θ5
eθ = 1 + θ + + + + .
2! 3! 4! 5!

From this, we can glean Euler’s formula: cos θ + i sin θ = eiθ . Pictorially, Euler’s formula
states that eiθ is the point on the unit circle at angle θ. Note that our above expression for
z reduces to z = reiθ . This is what is called the polar form of a complex number. Under
complex conjugation, the sign of the angle is inverted: z ∗ = re−iθ .
F Euler’s formula also gives rise to one of the most beautiful equations in mathematics,
in which all central symbols appear: eiπ + 1 = 0.
§2 Complex vector spaces. A complex vector space is a vector space over the complex
numbers. It satisfies the axioms of III.1, §2, where all appearances of R have to be replaced
by C. The most important example of a complex vector space is Cn = {(c1 , . . . , cn )T |ci ∈

66
VII.1 Complex linear algebra

C}. Other examples are the space of complex polynomials or smooth complex valued
functions C ∞ (I, C) on an interval I.
Similarly, one has to extend the definition of linear combinations: A (complex) linear
combination of a set of vectors {v 1 , . . . , v n } is an expression of the form c1 v 1 + · · · + cn v n
with c1 , . . . , cn ∈ C. This leads to the notion of a complex span and complex linear
(in)dependence. It is easy to verify that all our theorems concerning bases and spans in
real vector spaces also hold in the complex setting.
The only new operation is the complex conjugation of a vector. This operation can be
defined as a linear map ∗ : V → V such that the map satisfies (v ∗ )∗ = v and is antilinear,
i.e. (v + w)∗ = v ∗ + w∗ and (λv)∗ = λ∗ v ∗ . In the case of Cn , it is most convenient to define
v ∗ = (c∗1 , . . . , c∗n ) for a vector v = (c1 , . . . , cn ) ∈ Cn . Note, however, that there are other
possibilities. On C2 , e.g., one could also define
!∗ ! !
c1 0 1 c∗1
:= . (VII.4)
c2 1 0 c∗2

Let us now go through our results on real vector spaces and generalise them step by
step to the complex setting.
§3 Complex vector spaces as real vectors spaces. Consider a complex vector space V
with dimension n and basis (b1 , . . . bn ). Any vector v ∈ V is a complex linear combination
of the form
v = c1 b1 + · · · + cn bn = (d1 + ie1 )b1 + · · · + (dn + ien )bn , (VII.5)
where ci = di + iei , ci ∈ C, di , ei ∈ R. We can therefore write the vector v as a real linear
combination of the vectors (b1 , ib1 , b2 , ib2 , . . . , bn , ibn ):

v = d1 b1 + e1 (ib1 ) + · · · + dn bn + en (ibn ) . (VII.6)

The vectors (b1 , ib1 , b2 , ib2 , . . . , bn , ibn ) thus form a basis of V , when regarded as a real
vector space. Altogether, we observe that a complex vector space V of dimension n can be
regarded as a real vector space of dimension 2n.
§4 Complex inner product spaces. Inner products for real vectors were defined as
symmetric positive definite bilinear forms. Positive definiteness allowed us to introduce the
p
notion of the norm of a vector, as it guaranteed that the expression ||v|| = hv, vi is well-
defined. We therefore need to preserve the positive definiteness by all means. Unfortunately,
the standard definition of an inner product on Rn as hx, yi = xT y does not have this
property on Cn . For example, on C2 , we have
!T !
x1 x1
= x21 + x22 , (VII.7)
x2 x2

which is negative, e.g. for x1 = x2 = i. Instead, one has to complex conjugate one of the
vectors in an inner product. This leads to the notion of a positive definite Hermitian or
sesquilinear form. These are maps h·, ·i : V × V → C such that

67
VII Advanced topics

∗
(i) hu, wi = hw, ui ,
(ii) hu + v, wi = hu, wi + hv, wi,
(iii) hu, λvi = λu, vi and
(iv) hv, vi ≥ 0 and hv, vi = 0 if and only if v = 0.

It immediately follows that hu, v + wi = hv, wi + hu, wi and hλu, vi = λ∗ hu, vi.
On Cn , we thus define the analogue to the Euclidean inner product
T 
u∗1
 
w1
hu, vi := (u∗ )T v =  ...   ...  . (VII.8)
   

u∗n wn

One readily checks that this defines a positive definite hermitian form. As the operation
of simultaneous complex conjugation and transposition is so important in complex linear
algebra (it essentially fully replaces transposition), it is denoted by a ‘dagger’: u† = (u∗ )T
and called the adjoint, Hermitian conjugate or simply dagger. We thus write hu, vi := u† v.
p
We define again the norm of a vector as ||v|| := hv, vi and we call two vectors u, v
orthogonal, if hu, vi = 0. We can still apply the Gram-Schmidt procedure to find an
orthogonal basis.
§5 Complex matrices. Complex matrices are simply rectangular tables of numbers with
complex entries. Analogous to real matrices, it is trivial to check that an m × n complex
matrix is a linear map from Cn to Cm . Essentially all definitions related to real matrices
generalise to complex matrices: We define sums, products, inverse, rank, determinant, range
and null space completely analogously to the real case.
§6 Hermitian matrices. In the definition of matrix operations, only transposition gets
modified. As we saw above when discussing complex inner product spaces, one should
replace transposition by Hermitian conjugation. This leads to the following definition:
Analogously to real symmetric matrices satisfying AT = A, we define hermitian matrices
as those which satisfy A† = A.
Recall that real symmetric matrices have real eigenvalues (theorem VI.1, §13). This
theorem also applies to Hermitian matrices, as the proof remains valid once one replaces
transposition with Hermitian conjugation. (As real symmetric matrices are a special case
of Hermitian matrices, we can regard theorem VI.1, §13 as a corollary to that for Hermitian
matrices.)
§7 Unitary matrices. What about the orthogonal matrices that we introduced in the
real case? Their characteristic property was that they preserved the Euclidean norm of a
vector in Rn : hv, vi = hAv, Avi. This lead to the relation AT = A−1 . Here, we again have
to replace transposition by hermitian conjugation:

!
hv, vi = v † v = hAv, Avi = (Av)† Av = v † A† Av , (VII.9)

68
VII.2 A little group theory

and we conclude that A† = A−1 . Matrices with this property are called unitary. Just as
orthogonal matrices, unitary matrices always have orthonormal columns. An example of a
unitary matrix is
1 i
!
√ √
A= 2 2 . (VII.10)
√i √1
2 2

Just as in the case of orthogonal matrices, the determinant of a unitary matrix is ±1.
§8 Diagonalisation. The notion of diagonalisation is fully analogous to the real case.
Given an n × n-matrix which has n eigenvalues λi with corresponding eigenvectors ui that
form a basis of Cn , we have
 
λ1 0 . . . 0
0 λ2 . . . 0
 
−1
 
P AP = D , where D= .. .. .  and P = u1 . . . un ) . (VII.11)

 . . .. 

0 0 . . . λn

An important difference is that we can apply diagonalisation now also to matrices that
cannot be diagonalised using real matrices. Even when calculating exclusively with real
numbers, it still can be helpful to temporarily introduce complex matrices and then to use
tricks applicable only to diagonalised matrices. For example, compute the 500th power of
the following matrix:
!
1 2
A= . (VII.12)
−2 1

This matrix does not have any real eigenvalues and therefore cannot be diagonalised using
real matrices. Using complex matrices, however, we compute
! ! !−1
−i i 1 + 2i 0 −i i
A= (VII.13)
1 1 0 1 − 2i 1 1

and therefore
! !500 !†
500 1 −i i 1 + 2i 0 −i i
A = . (VII.14)
2 1 1 0 1 − 2i 1 1

(Here, we used the trick that the inverse of a unitary matrix, i.e. a matrix with orthonormal
columns, is its Hermitian conjugate.)

VII.2 A little group theory


Group theory provides powerful tools for studying many phenomena, in particular symme-
tries. It is very important in many areas of natural sciences as well as applied mathematics.

69
VII Advanced topics

§1 Groups. A group is a set G endowed with a map ◦ : G × G → G such that the following
axioms hold:

(i) ◦ is associative: (x ◦ y) ◦ z = x ◦ (y ◦ z) for all x, y, z ∈ G.


(ii) There is a right-neutral element e: x ◦ e = x for all x ∈ G.
(iii) There is a right-inverse element x−1 for each x ∈ G such that x ◦ x−1 = e.

If the map ◦ satisfies x ◦ y = y ◦ x for all x, y ∈ G, then the group G is called Abelian.
§2 Lemma. The above definition is as minimalist as possible. In fact, right-neutral ele-
ments are automatically also left-neutral and unique, right-inverse elements are also left-
inverse elements and unique. We can thus speak of the neutral element and the inverse
element to a group element.
Proof: The proofs of the above statements are straightforward once one has the right
starting points. Let G be a group. Let x ◦ x−1 = e, x, x−1 , e ∈ G. We first show that this
implies x−1 ◦ x = e. For this, put y = x−1 ◦ x. We compute:

(i) (iii)
y ◦ y = (x−1 ◦ x) ◦ (x−1 ◦ x) = x−1 ◦ (x ◦ x−1 ) ◦ x = x−1 ◦ x = y . (VII.15)

Because of (iii) we have a y −1 such that y ◦ y −1 = e and therefore

(i) (iii)
y = y ◦ e = y ◦ (y ◦ y −1 ) = (y ◦ y) ◦ y −1 = y ◦ y −1 = e . (VII.16)

It is now a trivial computation that x ◦ e = x implies e ◦ x = x:

x = x ◦ e = x ◦ (x−1 ◦ x) = (x ◦ x−1 ) ◦ x = e ◦ x . (VII.17)

Let us now show that the neutral elements are unique: Assume that there are two such
neutral elements e and e0 . Then

e = e ◦ e0 and e0 = e ◦ e0 ⇒ e = e0 . (VII.18)

Last, assume that both y and z are inverses of x: y ◦ x = z ◦ x = x ◦ y = x ◦ z = e. Then

z = e ◦ z = (y ◦ x) ◦ z = y ◦ (x ◦ z) = y ◦ e = y . (VII.19)

§3 Lemma. We have the implications x ◦ y = x ◦ z ⇒ y = z and y ◦ x = z ◦ x ⇒ y = z.


These follow directly by applying x−1 on both sides of the equation from the left and the
right, respectively.
§4 Subgroup. A subgroup is a subset of a group G that is simultaneously a group. A
subset S of G is a subgroup if for all x, y ∈ S, x ◦ y ∈ S and x−1 ∈ S. This statement and
its proof are analogous to the vector subspace test.

70
VII.2 A little group theory

§5 Examples. a ) The real numbers R together with addition. The neutral element is
e = 0 and the inverse to an integer x ∈ R is −x ∈ R.
b ) The integers form a subgroup of the above group.
c ) Strictly positive reals R>0 together with multiplication. The neutral element is e = 1
and the inverse to a positive real r ∈ R>0 is 1r ∈ R>0 .
d ) A vector space with vector addition. The neutral element is e = 0 and the group inverse
of a vector is the corresponding inverse vector.
e ) The set of invertible n × n matrices with matrix multiplication. The neutral element is
e = 1 and the group inverse of an invertible matrix A is its inverse A−1 .
f ) The set K2 = {0, 1} with addition defined as 0 + 0 = 0, 0 + 1 = 1 + 0 = 1 and 1 + 1 = 0.
The neutral element is 0, the inverse of 0 is 0 and the inverse of 1 is 1.
§6 Permutation group. The last of the above examples is a group with finitely many
elements. Another example of such a group is the permutation group of an ordered set
with finitely many elements S = (s1 , . . . sn ). The elements of the permutation group are
reorderings of these elements. For n = 3, we have the 3! = 6 group elements that map
S = (s1 , s2 , s3 ) to
g1 g2 g3
S −→ (s1 , s2 , s3 ) , S −→ (s1 , s3 , s2 ) , S −→ (s2 , s3 , s1 ) ,
g4 g5 g6 (VII.20)
S −→ (s2 , s1 , s3 ) , S −→ (s3 , s1 , s2 ) , S −→ (s3 , s2 , s1 ) .

The neutral element is g1 , its inverse is g1 . The inverse of g2 is g2 , the inverse of g3 is g5 ,


the inverse of g4 is g4 and the inverse of g6 is g6 .
If we write the ordered set as a vector S = (s1 , s2 , s3 )T , we can represent the group
elements by matrices with entries 0 and 1:
     
1 0 0 1 0 0 0 1 0
g1 =  0 1 0  , g2 =  0 0 1  , g3 =  0 0 1  ,
     
0 0 1 0 1 0 1 0 0
      (VII.21)
0 1 0 0 0 1 0 0 1
g4 =  1 0 0  , g5 =  1 0 0  , g6 =  0 1 0  .
     
0 0 1 0 1 0 1 0 0

Here, the group structure becomes obvious. Because groups with finitely many elements
can always be represented in terms of linear transformations, group theory is closely linked
to and uses many results of linear algebra.
Counting how often a permutation interchanges pairs of elements, we can associate a
sign σ to a permutation. For example:
12 23 12
(s3 , s2 , s1 ) ←→ (s2 , s3 , s1 ) ←→ (s2 , s1 , s3 ) ←→ (s1 , s2 , s3 ) (VII.22)

and σ(g6 ) = (−1)3 = −1. This sign is given by the ε-symbol used in the definition of the
determinant. For example σ(g6 ) = ε321 = −1, σ(g3 ) = ε231 = 1 etc.

71
VII Advanced topics

The permutation group of three elements is the symmetry group of an equilateral tri-
angle with center at the origin: We have three rotations and three rotations with reflection
at one axis.
§7 Symmetries of Crystals. A crystal is an arrangement of atoms or molecules on points
of a lattice. A lattice is a set of points that is described by all integer multiples of two and
three vectors in two and three dimensions, respectively. Let us restrict for simplicity to two
dimensions. Examples for lattices are
( ! ! )
1 0
square lattice: Λ1 = a +b a, b ∈ Z ,
0 1
( ! ! )
2 0
rectangular lattice: Λ2 = a +b a, b ∈ Z , (VII.23)
0 1
( ! ! )
1 cos π3
triangular lattice: Λ3 = a +b a, b ∈ Z .

0 sin π3
Consider now a crystal with particles sitting in regular patterns on a lattice. Its rotational
symmetries can be described by certain 2 × 2-matrices. Which rotational symmetries can a
crystal in two dimensions have? The answer gives the crystallographic restriction theorem,
which states that the only possible rotational symmetries of a two-dimensional crystal are
rotations by π, 2π π π
3 , 2 and 3 . The proof is rather easy using our knowledge of linear algebra:
Proof: Consider a 2 × 2-matrix A describing the rotational symmetries of the crystal in
the lattice basis. Its entries have to be integers, as otherwise, lattice points are mapped
to non-lattice points. It follows that the trace of A (the sum of its diagonal elements) has
to be an integer, too. If we now change the basis from the lattice basis to the standard
Euclidean basis, we have the rotation matrix B = P −1 AP , where P is the transformation
matrix of the coordinate change. The trace is invariant under coordinate changes: tr(B) =
tr(P −1 AP ) = tr(P P −1 A) = tr(A), and therefore also the trace of B has to be an integer. A
matrix B that describes a rotation in two dimensions has trace 2 cos θ, and this is an integer
only for θ ∈ {0, ± π3 , ± π2 , ± 2π
3 , π}. This restricts the symmetries to the ones we want. One
can now easily construct examples with these symmetries to demonstrate their existence.

§8 The orthogonal and unitary groups. Let us now come to groups with infinitely
many elements. We introduced groups that preserve the Euclidean norm in both real
and complex vector spaces. The resulting sets of matrices form groups which are called
orthogonal and unitary groups, respectively. On Rn and Cn , we write O(n) and U(n) for
these, respectively. Recall that the determinants of orthogonal and unitary matrices can be
±1. The matrices with determinant +1 describe rotations, while those with determinant
−1 describe rotations together with a flip of one of the coordinate axes. This flip reverses
the orientation of vectors. Consider e.g. the matrix
!
1 0
P = . (VII.24)
0 −1

72
VII.2 A little group theory

This matrix maps the basis B = (e1 , e2 ) to a basis B 0 = BP = (e1 , −e2 ). The latter basis
cannot be turned into B by rotation and therefore has a different orientation than B.
The group of orthogonal and unitary matrices that preserve orientation and therefore
have determinant +1 form the subgroups of special orthogonal and special unitary trans-
formations, SO(n) and SU(n), respectively.
§9 Generators of groups. Let us briefly look at the group SO(2) in more detail. Its
elements are the rotation matrices in R2 , i.e.
!
cos θ − sin θ
P (θ) = . (VII.25)
sin θ cos θ

Although the group SO(2) has infinitely many elements – one for each θ ∈ [0, 2π) – these
elements depend on only one parameter θ. How can we make this statement more precise?
Let us consider the derivative of P at θ = 0:
! !
d − sin θ − cos θ 0 −1
τ := P (θ) = = . (VII.26)

dθ cos θ − sin θ 1 0

θ=0 θ=0

Consider now the following expression, which we can evaluate by our methods for computing
analytic functions of matrices from VI.3, §7:

A(τ ) = exp(θτ ) . (VII.27)

The eigenvalues of τ are i and −i and the corresponding eigenvectors are (i, 1)T and (−i, 1)T .
We write
A(τ ) = R(τ ) = α0 12 + α1 τ . (VII.28)
Comparing with the eigenvalues, we have

exp(θi) = R(i) = α0 + α1 i and exp(−θi) = R(−i) = α0 − α1 i . (VII.29)

We conclude that

α0 = 12 (eiθ + e−iθ ) = cos θ and α1 = 1 iθ


2i (e − e−iθ ) = sin θ . (VII.30)

Thus,
A(τ ) = cos θ12 + sin θτ = P (θ) . (VII.31)
We say that τ is a generator for the group SO(2).
§10 The generators of SU(2). Let us now describe the generators of SU(2). A group
element has to consist of orthonormal complex vectors and has to have determinant +1.
Explicitly, we have
( ! )
α −β ∗
SU(2) = α, β ∈ C , αα∗ + ββ ∗ = 1 . (VII.32)
β α∗

73
VII Advanced topics

Note that SU(2) is parameterised by a vector (α, β)T ∈ C2 , which we can rewrite as a
vector x = (x1 , x2 , x3 , x4 )T ∈ R4 . The condition αα∗ + ββ ∗ = 1 translates to ||x|| = 1, and
this condition describes the three-sphere S 3 in R4 . We thus expect that SU(2) is described
by three parameters, just like S 3 is (for example, by three angles).
Recall the formula log det(A) = tr log(A) for real matrices. Because of problems with
complex valued logarithms, we rewrite this formula as det(A) = exp(tr(log(A))). We want
to write a matrix g in SU(2) as g = exp(τ ). Because det(g) = 1, the determinant formula
implies exp(trτ ) = 1 and thus tr(τ ) = 0. Moreover, we want that gg † = 1. To analyse
this, let us first make two observations: exp(τ )† = exp(τ † ) and exp(τ ) exp(−τ ) = 1, which
are both clear from the Maclaurin series. (F Complete the proofs of these statements.)
Altogether, τ has to be traceless and satisfy τ † = −τ . A basis for the vector space of such
matrices is given by
! ! ! !
i 0 0 −1 0 i
(τ1 , τ2 , τ3 ) := , , . (VII.33)
0 −i 1 0 i 0

To preserve the property τ † = −τ , we can form real linear combinations of τ1 , τ2 , τ3 : τ =


a1 τ1 + a2 τ2 + a3 τ3 , a1 , a2 , a3 ∈ R. If we compute the exponential A = exp(τ ) = exp(a1 τ1 +
a2 τ2 + a3 τ3 ), we obtain
! p
α −β ∗ sin a21 + a22 + a23
q
g= 2 2 2
with α = cos a1 + a2 + a3 + ia1 p 2 ,
β α∗ a1 + a22 + a23
p (VII.34)
sin a21 + a22 + a23
and β = −(a2 − ia3 ) p 2 .
a1 + a22 + a23

Note that any element of SU(2) can be written in this way. Furthermore, it is

d d d
g = τ1 , g = τ2 , g = τ3 . (VII.35)
da1 a1 =a2 =a3 =0 da2 a1 =a2 =a3 =0 da3 a1 =a2 =a3 =0

Altogether, we saw that the matrices τ1 , τ2 , τ3 are the generators of SU(2) and we write
su(2) := spanR {τ1 , τ2 , τ3 }. Any element τ ∈ su(2) yields an element g = exp(τ ) ∈ SU(2).
The matrices τi are up to factors of i the so-called Pauli matrices, that play a major role
in describing particles carrying spin, as e.g. electrons, in quantum mechanics.
An interesting relation is the following:

3
X
[τi , τj ] := τi τj − τj τi = 2 εijk τk . (VII.36)
k=1

Vector spaces of matrices equipped with this antisymmetric bracket [A, B] := AB − BA,
which is also called the commutator, are special cases of Lie algebras.

74
VII.3 Special relativity

VII.3 Special relativity

Special relativity studies the effects of relative motion on physics. Its main object of study
is therefore the group of Lorentz transformations, which is given by coordinate changes
between inertial systems that move relatively to each other.

§1 Basics of mechanics. To describe the motion of an object fully, we need to describe


its position x ∈ R3 at every moment t ∈ R in time, cf. e.g. the coupled harmonic oscillators
(section VI.3, §9). To give a position, we have to choose a coordinate system to parametrise
the real world. We will always choose a coordinate system such that Newton’s law19 holds:
F = mẍ. That is, we work with inertial systems. Roughly speaking, an inertial system is
one in which the stars are at rest. A non-inertial system would be a system in which the
origin is accelerated, e.g. by steady rotations.

§2 Galilei transformations. We saw in section VI.2 that we can use coordinate changes
to simplify the study of linear transformations in a vector space. In mechanics, we can
do the same by changing inertial systems. For Newton’s law to hold, we can apply the
following changes:

t̃ = t + t0 : Shift of the time parameter


x̃(t̃) = x(t̃) + x0 : Shift of the origin of space
(VII.37)
x̃(t̃) = Ax(t̃) : Constant relative rotation of space
x̃(t̃) = x(t̃) + v t̃ : Constant relative motion in space

Here, A is a 3 × 3 matrix with det(A) = 1, x0 , v ∈ R3 and t0 ∈ R. Altogether, a Galilei


transformation from a system Σ with coordinates x to a system Σ̃ with coordinates x̃ is
given by the data T = (t0 , x0 , A, v):

t̃ = t + t0 , x̃(t̃) = Ax(t + t0 ) + x0 + v · (t + t0 ) . (VII.38)

Note that T = (0, 0, 13 , 0) is the trivial Galilei transformation with Σ̃ = Σ. Moreover, we


can perform two Galilei transformations T = (t0 , x0 , A, v) and T̃ = (t̃0 , x̃0 , Ã, ṽ) after one
another:
T
Σ −→ Σ̃ −→ Σ̃
T̃ ˜ . (VII.39)

And finally, there is an inverse transformation given by T −1 = (−t0 , −A−1 x0 , A−1 , −A−1 v):

T T −1
x(t) 7−→ Ax(t + t0 ) + x0 + v · (t + t0 ) 7−→ A−1 (Ax(t) + x0 + vt) − A−1 x0 − A−1 vt = x(t) .

Altogether, we conclude that Galilei transformations form a group.


19
This is a naı̈ve form of Newton’s law, the more appropriate one is F = ṗ, where p is momentum.

75
VII Advanced topics

§3 Michelson-Morley experiment. Assume that light was moving in a medium (the


“ether”) relative to which it always has a constant velocity. The ether should be an inertial
system, Σ, and so is (approximately) the earth, moving with constant velocity relative to Σ.
The Galilei transformations then imply that light would have different apparent velocities
in different directions. Parallel to the motion of the earth, it would experience an “ether
wind”, similarly to a plane flying against the wind. However, the experiments of Michelson
(1881) and Morley (1887) showed that the speed of light is the same in all directions. Today
(i.e. in 2009), this result has been verified to an accuracy of 1 : 1017 .
§4 Comparison of light spheres. Consider two inertial systems Σ and Σ̃ with t = t̃ and
x(0) = x̃(0). That is, Σ and Σ̃ can differ only by a constant relative velocity. Let us assume
that a light signal is emitted at t = 0 from x(0) = x̃(0) = 0. By the time t, we have a light
sphere in Σ described by hx(t), x(t)i = c2 t2 in Σ and hx̃(t), x̃(t)i = c2 t2 in Σ̃. Here, c is the
usual symbol for the speed of light. The change of inertial systems between Σ and Σ̃ must
therefore imply hx(t), x(t)i − c2 t2 = hx̃(t), x̃(t)i − c2 t2 .
§5 Minkowski space. The comparison of light spheres makes it necessary that we combine
position in space x and time t to a vector x̂ = (x, ct)T in space-time R4 . Here, we used
ct instead of t such that every component of the vector has unit of length. Moreover,
we should replace the standard norm-squared on R4 , which is given by ||x̂||2 := x̂T .x̂ by
||x̂||2 := x̂T ηx̂, where  
1 0 0 0
 0 1 0 0 
η :=   . (VII.40)
 
 0 0 1 0 
0 0 0 −1
This also induces a new “scalar product” hx̂, ŷi := x̂T ηŷ. Note that this is not an inner
product in the sense that we defined in section IV, as it is not positive definite. It is,
however, symmetric and non-degenerate: If hx̂, ŷi = 0 for all y ∈ R4 , then x = 0. The
vector space R4 with this generalised inner product h·, ·i is called Minkowski space.
§6 Lorenz transformations. Let us now derive the change of inertial systems allowed
between Σ and Σ̃. If space-time is homogeneous, then we expect this change to be governed
˜ = Λx̂, where Λ is an invertible 4 × 4 matrix. This map has to
by a linear map Λ : x̂ 7→ x̂
satisfy:

˜ x̂i
hx̂, ˜ = x̂ ˜ = (Λx̂)T η (Λx̂) = x̂T ΛT ηΛx̂ =! x̂T ηx̂ = hx̂, x̂i ,
˜ T η x̂ (VII.41)

and therefore ΛT ηΛ = η. Computing the determinants of both sides of this equation, we


see that det(Λ) = ±1. Moreover, if Λ is a rotation or reflection in space, this equation is
automatically satisfied. Let us therefore focus on
 
a 0 0 b
 0 1 0 0 
Λ(14) :=   . (VII.42)
 
 0 0 1 0 
c 0 0 d

76
VII.3 Special relativity

One can easily check that (Λ(14) )T η(Λ(14) ) = η implies a2 − c2 = 1, d2 − b2 = 1 and


ab = cd. As its most general solution, this has a = d = cosh ϕ and b = c = − sinh ϕ.
The trigonometric functions appearing in ordinary rotations have been replaced by the
corresponding hyperbolic functions20 . To understand the angle ϕ, let us consider what
happens to the origin 0 ∈ R3 of space with time. Its position and time are related to that
in Σ by
˜1 = (Λ(14) (0, ct))1 = −ct sinh ϕ
x̂ ˜4 = (Λ(14) (0, ct))4 = ct cosh ϕ .
and x̂ (VII.43)
˜1
dx̂ −c sinh ϕ
We can compute its velocity v = ˜4
dx̂
= cosh ϕ and therefore tanh ϕ = − vc . Here, v is the
velocity of Σ compared to Σ̃. Altogether, the Lorentz transformation reads as
x1 − vt ct − x1 v/c
x̃1 = p , ct̃ = p , x̃2 = x2 , x̃3 = x3 . (VII.44)
1 − v 2 /c2 1 − v 2 /c2
Note that there is an inverse to this transformation, which is obtained by reversing the sign
of v:
  
√ 1 2 2 0 0 √ −v/c2 2 √ 1 2 2 0 0 √ v/c2 2
 1−v /c 1−v /c   1−v /c 1−v /c 
 0 1 0 0   0 1 0 0 
(14) (14) −1
Λ (Λ ) =   = 14
  
0 0 1 0 0 0 1 0

   
−v/c v/c
 1
  1

√ 2 2 0 0 √ 2 2 √ 2 2 0 0 √ 2 2
1−v /c 1−v /c 1−v /c 1−v /c

A general Lorentz transformation is obtained from Λ(14) for arbitrary ϕ by concatenating


with rotations and reflections in space. The set of all Lorentz transformations forms the
group21 SO(1, 3): There is the trivial transformation for ϕ = v = 0, we can concatenate
two Lorentz transformations and the inverse of a Lorentz transformation is also a Lorentz
transformation. If one also allows for constant shifts between the two coordinate systems,
one obtains Poincaré transformations and the Poincaré group ISO(1, 3).
§7 Remark. Recall that Galilei transformations left Newton’s law invariant. They are the
symmetry group of Newton’s law. Lorentz transformations are the right transformations
for Maxwell’s equations of electromagnetism, as they form their symmetry group. The
Maxwell equations are not invariant under Galilei transformations.
§8 Index notation. In most of the physics literature, one uses the so-called index notation.
In this notation, all vectors and matrices are decorated with indices, and indices which
appear twice are summed over (Einstein sum convention):
x̂µ := (x̂)µ , ηµν := (η)µν , x̂µ := ηµν x̂µ = (ηx̂)µ ,
Λµ ν := (Λ)µν , Λµ ν x̂ν := (Λx̂)µ , Λµ ν := (ΛT )µν , (VII.45)
µ ν κ T
x̂µ ŷ = hx̂, ŷi , Λµ ηνκ Λ λ = ηκλ = η = Λ ηΛ .
Moreover, time is not considered as the fourth component of x̂, but as the zeroth component:
x̂0 = t. In the following, we will not use index notation.
20
This can be interpreted as rotations with imaginary angles.
21
In general, the group SO(p, q) is the group of matrices satisfying ΛT ηΛ = η, where η is the diagonal
matrix with p entries +1 and q entries −1.

77
VII Advanced topics

§9 Classical limit. First, let us try to verify that for low velocities v  c, Lorentz transfor-
mations go over into Galilei transformations. For this, consider the Lorentz transformation
from an inertial system at rest Σ to one in the fastest plane22 with v = 3000 × 3.6 m/s.
Note that this is still slow compared to the speed of light: 299792458 m/s. We have:

x̃1 = x1 −vt−7.29×10−6 t m/s+6.66×10−10 x1 , t̃ = t+6.66×10−10 t−1.22×10−13 x1 s/m ,

which is a Galilei transformation to a very good approximation. More generally, we can


consider the Taylor series:

v2 t − x1 v/c2
 
x1 − vt v
x̃1 = p ≈ x1 − vt + O , t̃ = p ≈t+O 2 . (VII.46)
1 − v 2 /c2 c2 1 − v 2 /c2 c

In general, for everyday velocities v, Lorentz transformations look like Galilei transforma-
tions.
§10 Approaching the speed of light. One could think that one can achieve speeds
faster than light in the following way: First, transform from one system Σ at rest to
another system Σ̃ at a speed of 0.8c relative to Σ along the x1 -direction. Then transform
˜ which moves at 0.8c relative to Σ̃. We have:
to a third system Σ̃

Λ(14) Λ(14) ˜
Σ −→ Σ̃ −→ Σ̃ , (VII.47)

where
 
√ 1
0 0 √ −v/c2  5

 1−v 2 /c2 1−v /c2  3 0 0 − 43
(14)
 0 1 0 0   0 1 0 0 
Λ = =  . (VII.48)
   
 0 0 1 0   0 0 1 0 
√ −v/c2 − 43 0 35
 1

0 0 √ 0
1−v /c2 1−v 2 /c2

For the total transformation, we have to use (Λ(14) )2 , which is


 41

9 0 0 − 40
9
 0 1 0 0 
(Λ(14) )2 =   . (VII.49)
 
 0 0 1 0 
− 40
9 0 0 41
9

This corresponds to a single Lorentz transformation with v = 40 ˜


41 c ≈ 0.976c. Therefore, Σ̃
is moving with about 0.976c relative to Σ. Transforming three times with Λ(14) , i.e. with
Λ(14) , would bring us to v = 364
365 ≈ 0.997c. No matter how often we transform, we will
never reach the speed of light.
22
i.e. the Lockheed SR-71 Blackbird

78
VII.3 Special relativity

§11 Simultaneous events. We know that space-time intervals remain unchanged. What
about lengths and durations? First, note that two events at x1 and x2 happening at time
t1 = t2 = t in the inertial system Σ, do not happen at the same time in Σ̃:

t − x1 v/c2 t − x2 v/c2
t̃1 = p while t̃2 = p . (VII.50)
1 − v 2 /c2 1 − v 2 /c2

Thus, there is no notion of simultaneity for events at different places.


§12 Minkowski diagrams. To visualise events in different coordinate system, we can use
Minkowski diagrams:
lightray ct lightray
future of 0

x1

past of 0

ct ct̃ lightray ct̃ lightray


ct

A A
× ×

x̃1

x1 x̃1

x1
Depicted in the above two diagrams are the space- and time-axis of Σ̃ in Σ on the left, as
well as these two axes of Σ in Σ̃ on the right. We see that each observer assigns different
times and locations to an event A. The x1 -axes label simultaneous events in each coordinate
system.
§13 Time dilation. Consider a clock at rest at x = 0 in Σ and flashing in regular intervals
of 1 s. That is, we have a flash at t0 = 0 s, t1 = 1 s, t2 = 2 s, etc. In a system Σ̃ moving
with velocity v, these flashes appear to be emitted at the times
t0 t1 t2
t̃0 = p = 0s , t̃1 = p , t̃2 = p , etc. (VII.51)
1 − v 2 /c2 1 − v 2 /c2 1 − v 2 /c2
p
Because 0 < 1 − v 2 /c2 < 1, the clock appears therefore to run more slowly to an observer
in Σ̃. If the clock was in Σ̃ and was flashing at the intervals t̃0 = 0 s, t̃1 = 1 s, t̃2 = 2 s,

79
VII Advanced topics

etc., an observer in Σ̃ would see flashes emitted at

t̃0 t̃1 t̃2


t0 = p = 0s , t1 = p , t2 = p , etc. (VII.52)
1 − v 2 /c2 1 − v 2 /c2 1 − v 2 /c2

The situation is therefore perfectly symmetrical. Clocks that move relative to an observer
appear to run more slowly.

§14 Example for Paradoxes: Twin paradox. Special relativity is frequently attacked
by non-physicists, as it contradicts common sense. Usually this is done by constructing
apparent paradoxes. In particular, the fact that time dilation appears symmetrically is
often overlooked and the source for the famous twin paradox: An astronaut leaves earth
on a spaceship flying at a speed close to the speed of light and then reverses and returns
to earth. As the clocks in his spaceship run slower to an observer on earth, he will have
aged less than his twin brother, that was left behind on earth. On the other hand, the
clocks on earth seem to run slower for the astronaut in the spaceship, so his twin brother
on earth should have aged less than him. The problem here is that the situation is not
perfectly symmetrical, as the spaceship does not form an inertial system: The reversal of
direction requires acceleration and thus force, which in a proper treatment explains that
indeed the twin on earth has aged faster, just as special relativity predicts. Given below is
the Minkowski diagram for the twin paradox:

ct lightray

× lines of simultaneity in Σ̃

x1

The earth is depicted at rest and the astronaut follows the arrows. The age difference
between the astronaut and his twin on earth can be read off this diagram: It is the difference
between the intersection of the last line of simultaneity on the way away from earth and
the ct-axis and the first one of the way back and the ct-axis.

§15 Length contraction. The analogue of time dilation is length contraction: Consider
a rod of length L in Σ extending from xA = (0, 0, 0) to xB = (L, 0, 0). In the system Σ̃,
moving relative to Σ with velocity v, we measure again the length of the rod, but this
measurement has to be made at the same time in Σ̃. For two space-time events (xA , tA )

80
A Linear algebra with SAGE

and (xB , tB ), we have the relations

tA tB − Lv/c2
t̃A = p , t̃B = p ,
1 − v 2 /c2 1 − v 2 /c2
! ! (VII.53)
−vtA L − vtB
x̃A = p , 0, 0 , x̃B = p , 0, 0 .
1 − v 2 /c2 1 − v 2 /c2

We can choose t̃A = t̃B = 0, which implies tA = 0 and tB = −Lv/c2 . This implies that
p
L̃ := ||x̃B − x̃A || = L 1 − v 2 /c2 . Thus, objects that move relative to an observer appear
shorter.
§16 The ladder paradox. Assume that we have a garage with a front and a rear door
and a ladder than does not quite fit into the garage. If we throw the ladder fast enough
into the garage, the ladder shrinks from the perspective of the garage, and we can close
both doors simultaneously, with the ladder inside. However, from the perspective of the
ladder, the garage shrinks, which leads to an apparent paradox. The solution is, that from
the perspective of the ladder, both doors do not close simultaneously, and thus, there is no
paradox.
§17 Example: Muons. Muons are created by cosmic radiation hitting the outer atmo-
sphere in a height of 9-12 km. They move with 99.94% of the speed of light towards earth.
The half-life period of a resting muon is 1.52 × 10−6 s, corresponding to an average life time
of 2.2 × 10−6 s. In this time, they could travel 659.15 m, which is not enough to reach the
earth’s surface. Nevertheless, we measure many of these muons on the ground. The reason
for this is time dilation of the muon: Its flight from 12 km height lasts 4 × 10−5 s from our
perspective. In the reference frame of the muon, however, only
p
t = 4 × 10−5 s × 1 − 0.99942 = 1.39 × 10−6 s (VII.54)

have passed, which is still under the half-life period. For the muon, on the other hand, the
distance of 12 km is contracted to 415.63 m, which is below the distance it can travel in its
average lifetime. This has been experimentally verified by Rossi and Hall in 1940.

Appendix
A Linear algebra with SAGE
In this section, we give a concise introduction to the computer algebra system SAGE. SAGE
is published under the GNU General Public Licence and available for Linux and Mac OSX
at sagemath.org. Furthermore, SAGE can be used from any internet browser. For deeper
introductions to SAGE and how to use it in general, please look at the wealth of tutorials
available online.

81
VII Advanced topics

§1 Setup. To try SAGE, load the webpage www.sagenb.org and create a free account.
Once logged in, you can create a new worksheet and you are ready to go. Enter for
example 2+2 and hit shift+return simultaneously. SAGE returns 4. Enter f=sin(5*x) and
hit shift+return. Enter then f.plot() (always followed by shift+return) to see the plot of
the function, f.derivative() will return the derivative. Let us now focus on computations
in Linear Algebra that you can do with SAGE.
Warning: Remember that SAGE is a nice way of checking your computations and it is
useful for general experimenting with vectors and matrices. However, do not forget that
practicing your manual calculating skills is vital for your future life as a mathematician, in
particular for the exam.
§2 Vectors. To define the vector v = (1, 2, 3), enter v=vector(QQ,[1,2,3]). The QQ tells
SAGE, that it is working over the field of rational numbers, which on a computer substitute
the real numbers. We can now compute 5*v or v+v etc. We can define further vectors and
linearly combine them.
§3 Matrices. To define a matrix A, enter A=matrix(QQ,[[1,2,3],[3,4,3],[3,2,1]]),
where the innermost brackets define the rows of the matrix. We can solve the system Ax = v
by entering A.solve right(v). Note that SAGE returns at most one solution. If there are
further solutions, one has to find them separately, see below. We have the following com-
mands with obvious meaning available: A.transpose(), A.inverse(), A.determinant(),
A.eigenvalues(), A.eigenvectors right(), A.characteristic polynomial(). Note
that a polynomial can be solved by the factor() command. To compute eigenvalues
manually, you can therefore type A.characteristic polynomial().factor().
§4 SLEs. SAGE can perform elementary row operations on a matrix. Assume that
we have defined a matrix e.g. by entering A=matrix(QQ,[[1,2,3],[3,4,3],[3,2,1]]).
We have the commands A.rescale row(1,2), A.add multiple of row(2,0,3) as well
as A.with rescaled row(1,2), A.with added multiple of row(2,0,3). The first set of
commands manipulates the matrix A, the second set creates an output without changing
A. Note that SAGE counts rows and columns (as usual in computer science) starting at 0.
The command A.add multiple of row(2,0,3) therefore adds three times row 1 to row 3.
To bring an SLE Ax = v into row echelon form (from which we can read off the solution
space), we can do the following. First, we augment the matrix A by aug=A.augment(v).
The method aug.rref() produces a matrix in reduced row-echelon form, i.e. the part of
the augmented matrix corresponding to A is reduced to a diagonal matrix containing only
1s and 0s.
§5 Span. We start by telling SAGE that we want to work in R4 , or rather Q4 : V=QQ^4.
Let us now consider two vectors, defined in SAGE as v1=vector(QQ,[1,1,3,4]) and
v2=vector(QQ,[1,3,4,5]). The span of these vectors is obtained as follows: Define the
subspace W as the span W=V.span([v1,v2]). If we enter W, we obtain a nice basis for the
vector space. Moreover, we can test if vectors lie in W: x=2*v1-5*v2 and x in W returns
True. On the other hand, y=vector(QQ,[1,1,1,1]) and y in W returns False. To check

82
A Linear algebra with SAGE

linear dependence, you can play with the method linear dependence(), applied on a list
of vectors.
§6 And beyond... There are many more things one can do with SAGE in the context
of Linear Algebra. For example, the pivots() method will identify columns of augmented
matrices corresponding to pivot elements, the method right kernel() will compute the
kernel of a matrix etc. For more details, see the introduction Sage for Linear Algebra by
Robert A. Beezer, which is available for free download online. Also helpful is his quick
reference available here.
§7 SAGE code for Fourier transform. Recall the discussion in section IV, §8. We
define a function that we want to Fourier transform, f (x) = −x(x − 21 )2 (x − 1):

f(x)=-x(x-1/2)^2(x-1)
f.plot(x,0,1)

We now define functions in the orthogonal basis and plot an example:

s(k,x)=2^(1/2)sin(2*pi*k*x)
c(k,x)=2^(1/2)sin(2*pi*k*x)
s(3,x).plot(x,0,1)

We then compute the first coefficients in the Fourier expansion:

b0=integrate(f(x),x,0,1)
a1=integrate(f(x)*s(1,x),x,0,1)
b1=integrate(f(x)*c(1,x),x,0,1)
a2=integrate(f(x)*s(2,x),x,0,1)
b2=integrate(f(x)*c(2,x),x,0,1)
a3=integrate(f(x)*s(3,x),x,0,1)
b3=integrate(f(x)*c(3,x),x,0,1)

We can now plot the original function together with various steps in the expansion:

plot([f(x),b0*1],x,0,1)
plot([f(x),b0*1+a1*s(1,x)+b1*c(1,x)],x,0,1)
plot([f(x),b0*1+a1*s(1,x)+b1*c(1,x)+a2*s(2,x)+b2*c(2,x)],x,0,1)
plot([f(x),b0*1+a1*s(1,x)+b1*c(1,x)+a2*s(2,x)+b2*c(2,x)+a3*s(3,x)+
b3*c(3,x)],x,0,1)

83
Index

A G
adjoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68 Gaußian elimination . . . . . . . . . . . . . . . . . . 23
golden ratio. . . . . . . . . . . . . . . . . . . . . . . . . . .65
B Gram-Schmidt algorithm . . . . . . . . . . . . . 41
basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
orthogonal . . . . . . . . . . . . . . . . . . . . . . . . 41 H
orthonormal . . . . . . . . . . . . . . . . . . . . . . 42 Hermitian conjugate . . . . . . . . . . . . . . . . . . 68
Hermitian form . . . . . . . . . . . . . . . . . . . . . . . 67
C Hermitian matrix . . . . . . . . . . . . . . . . . . . . . 68
Cauchy-Schwarz inequality . . . . . . . . . . . . 38
Cayley-Hamilton theorem . . . . . . . . . . . . . 61 I
change of basis. . . . . . . . . . . . . . . . . . . . . . . .56 inner product . . . . . . . . . . . . . . . . . . . . . . . . . 37
characteristic equation . . . . . . . . . . . . . . . . 52 complex vector space . . . . . . . . . . . . . 67
characteristic polynomial . . . . . . . . . . . . . 52 inner product space . . . . . . . . . . . . . . . . . . . 37
column rank . . . . . . . . . . . . . . . . . . . . . . . . . . 45
column space . . . . . . . . . . . . . . . . . . . . . . . . . 45 K
column vectors. . . . . . . . . . . . . . . . . . . . . . . .45 kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48
complex conjugation . . . . . . . . . . . . . . . . . . 66
L
complex numbers . . . . . . . . . . . . . . . . . . . . . 65
linear combination . . . . . . . . . . . . . . . . . . . . 30
conic sections . . . . . . . . . . . . . . . . . . . . . . . . . 59
linear equation . . . . . . . . . . . . . . . . . . . . . . . . 20
coordinate vector . . . . . . . . . . . . . . . . . . . . . 56
linear map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
coupled harmonic oscillators . . . . . . . . . . 62
linear recursive sequence . . . . . . . . . . . . . . 64
D linear transformation . . . . . . . . . . . . . . . . . 44
damped harmonic oscillator . . . . . . . . . . . 63 rank of . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
determinant . . . . . . . . . . . . . . . . . . . . . . . . . . 17 linearly dependent . . . . . . . . . . . . . . . . . . . . 31
diagonalisable matrix . . . . . . . . . . . . . . . . . 57 linearly independent . . . . . . . . . . . . . . . . . . 31
diagonalising matrices . . . . . . . . . . . . . . . . 57
M
dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
matrix
E analytic function of . . . . . . . . . . . . . . . 61
eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 augmented. . . . . . . . . . . . . . . . . . . . . . . .22
eigenvector . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 complex . . . . . . . . . . . . . . . . . . . . . . . . . . 68
elementary row operations . . . . . . . . . . . . 22 inverting. . . . . . . . . . . . . . . . . . . . . . . . . .26
Euclidean inner product . . . . . . . . . . . . . . 38 positive definite. . . . . . . . . . . . . . . . . . .55
Euler’s formula . . . . . . . . . . . . . . . . . . . . . . . 66 powers of . . . . . . . . . . . . . . . . . . . . . . . . . 60
rank of . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
F real symmetric. . . . . . . . . . . . . . . . . . . .55
Fibonacci numbers . . . . . . . . . . . . . . . . . . . . 64 multiplicity
Fourier transformation . . . . . . . . . . . . . . . . 43 algebraic . . . . . . . . . . . . . . . . . . . . . . . . . 54

84
Index

geometric . . . . . . . . . . . . . . . . . . . . . . . . . 54 T
trace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
N transformation matrix . . . . . . . . . . . . . . . . 56
norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
complex vector space . . . . . . . . . . . . . 68 triangle inequality . . . . . . . . . . . . . . . . . . . . 39
null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
nullity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 U
unitary matrix . . . . . . . . . . . . . . . . . . . . . . . . 68
O
orthogonal matrix . . . . . . . . . . . . . . . . . . . . 51 V
orthogonal projection . . . . . . . . . . . . . . . . . 41 vector space . . . . . . . . . . . . . . . . . . . . . . . . . . 28
orthogonal set of vectors . . . . . . . . . . . . . . 40 complex vector space . . . . . . . . . . . . . 66
orthogonal vectors . . . . . . . . . . . . . . . . . . . . 39 vector space axioms . . . . . . . . . . . . . . . . . . . 28
orthonormal set of vectors . . . . . . . . . . . . 40 vector subspace . . . . . . . . . . . . . . . . . . . . . . . 29
vector subspace test . . . . . . . . . . . . . . . . . . 29
P
pivot variable . . . . . . . . . . . . . . . . . . . . . . . . . 24
polar form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
polynomial of degree n . . . . . . . . . . . . . . . . 28
population growth model . . . . . . . . . . . . . 65
Pythagoras’ theorem . . . . . . . . . . . . . . . . . . 40

R
range space . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46, 49
rank-nullity theorem . . . . . . . . . . . . . . . . . . 50
row echelon form . . . . . . . . . . . . . . . . . . . . . 22
row rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
row space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
row vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

S
scalar product . . . . . . . . . . . . . . . . . . . . . . . . 37
sesquilinear form . . . . . . . . . . . . . . . . . . . . . . 67
similarity transformation . . . . . . . . . . . . . . 57
span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
system of linear equations . . . . . . . . . . . . . 20
consistent . . . . . . . . . . . . . . . . . . . . . . . . 20
equivalent . . . . . . . . . . . . . . . . . . . . . . . . 21
general solution . . . . . . . . . . . . . . . . . . . 24
homogeneous . . . . . . . . . . . . . . . . . . . . . 25
inconsistent . . . . . . . . . . . . . . . . . . . . . . . 20
inhomogeneous . . . . . . . . . . . . . . . . . . . 25

85

You might also like