Lecture Notes On Linear Algebra: Christian S Amann
Lecture Notes On Linear Algebra: Christian S Amann
Christian Sämann
3
Preface
These are the lecture notes of the course Linear Algebra F18CF taught at Heriot-Watt
University to second year students. While the material covered in this course is itself very
important for later courses, this course is mainly the first one to teach you how to prove
theorems. Linear Algebra is in fact the ideal course to learn this, as the proofs are rather
short, simple, and less technical than those in Analysis. Traditionally, switching from
algorithmic work to problem solving and proving theorems is very difficult for students,
and this course tries to ease the transition.
To understand theorems and proofs it is necessary to try to construct examples and/or
counterexamples to the statements. Playing and experimenting with definitions and the-
orems is one of the key activities for understanding mathematics. I therefore included an
appendix giving an introduction to doing Linear Algebra with the computer algebra pro-
gramme SAGE. SAGE can be a valuable help in playing and experimenting, as it does most
of the tedious calculations for you.
Please note that these notes may still contain typos and other mistakes. If you should
find something that requires corrections (or if you have a good suggestion for improving
these notes), please send an email to [email protected]. The lecture notes were created
relying on material from many different sources and are certainly not meant to be original.
Finally , I’d like to thank all students who spotted typos and took the time to let me
know, in particular Simone Rea.
Christian Sämann
1
Although there are many very deep statements that are usually called Lemma, see Wikipedia’s list of
Lemmata. Some people claim that a good lemma is worth a thousand theorems.
4
Course summary
Outline
Recommended textbooks
This course is fairly self-contained and the material covered in the published lecture notes
is certainly sufficient to pass the exam. To deepen your knowledge in Linear Algebra, you
could use one of the following textbooks:
5
Linear Algebra - Concepts and Methods by Martin Anthony and Michele Harvey.
Linear Algebra Done Right by Sheldon Axler, Springer.
Introduction to Linear Algebra by Gilbert Strang, Cambridge University Press.
Note that there is a wealth of lecture notes and other material freely available on the
internet. The book Linear Algebra in Schaum’s outline series is available on VISION.
B Use the Gaußian elimination procedure to determine whether a given system of simul-
taneous linear equations is consistent and if so, to find the general solution. Invert a
matrix by the Gaußian elimination method.
B Understand the concepts of vector space and subspace, and apply the subspace test to
determine whether a given subset of a vector space is a subspace.
B Understand the concepts of linear combination of vectors, linear (in)dependence, span-
ning set, and basis. Determine if a set of vectors is linearly independent and spans a
given vector space.
B Find a basis for a subspace, defined either as the span of a given set of vectors, or as
the solution space of a system of homogeneous equations.
B Understand the concept of inner product in general, and calculate the inner products
of two given vectors.
B Understand the concept of orthogonal projection and how to explicitly calculate the
projection of one vector onto another one. Use the Gram-Schmidt method to convert a
given basis for a vector space to an orthonormal basis. Understand (geometrically) the
concept of the vector product and calculate the vector product of two given vectors.
B Find the coordinates of a given vector in terms of a given basis - especially in the case
of an orthogonal or orthonormal basis.
B Calculate the rank of a given matrix and, from that, the dimension of the solution space
of the corresponding system of homogeneous linear equations. Calculate the determinant
of 2 × 2 and 3 × 3 matrices.
B Understand the concepts of linear transformation, range and kernel (nullspace). Un-
derstand the concept of invertibility of a linear transformation including injectivity and
surjectivity.
B Know and apply the Rank-Nullity Theorem.
B Compute the characteristic polynomial of a square matrix and (in simple cases) factorise
to find the eigenvalues.
B Determine whether a given square matrix is diagonalisable, and if so find a diagonalising
matrix.
B Apply the Cayley-Hamilton Theorem to compute powers of a given square matrix.
6
I Euclidean space
We start by recalling the definition of the euclidean spaces R2 and R3 , together with some
related elementary facts. These spaces serve as intuitive pictures for many of the definitions
introduced in this course. We then review briefly matrices and their action on vectors of
Rn .
0 r
Adding two numbers r1 and r2 corresponds to taking one of the corresponding arrows and
aligning its tail on the tip of the other arrow. The former arrow’s tip is then the tip of the
arrow corresponding to r1 + r2 .
0 r1 r2 r1 + r2
We can also stretch an arrow by multiplying the corresponding number by another number:
−1.5r 0 r 2r
§2 Euclidean plane. On the real line, a point and the corresponding arrow encodes merely
a length. On the plane R2 , an arrow corresponding to a point encodes both a length and a
direction. We call such an arrow a vector. Again, we can add and stretch vectors, as before:
v1
v1 + v2
−2v 2
0
v2
Note that we denoted the origin of the Euclidean plane by 0. Each point and thus each
vector v ∈ R2 can be denoted by a pair of numbers v = (v1 , v2 )T (thus the notation R2 ),
7
I Euclidean space
giving the horizontal and vertical distance of the tip of the vector from the origin:
v2
v
0 v1
We have the following rules for adding two vectors v = (v1 , v2 )T and w = (w1 , w2 )T and
stretching by λ ∈ R:
! ! ! !
v1 w1 v 1 + w1 λv1
v+w = + = and λv = . (I.2)
v2 w2 v 2 + w2 λv2
The origin or null vector 0 has coordinates 0 = (0, 0)T . Note that if we stretch a vector by
the factor 0, we obtain the null vector: 0v = 0 for all v ∈ R2 .
§3 Linear combination. We can combine both vector addition and stretching into ex-
pression like this:
u = λv + κw , (I.3)
u = λ1 v 1 + λ2 v 2 + λ3 v 3 + · · · + λn v n , (I.4)
For example, (5, 3)T ∈ R2 can be written as a linear combination of the vectors (1, 0)T and
(0, 1)T : (5, 3)T = 5(1, 0)T + 3(0, 1)T . It cannot be written as a linear combination of (2, 1)T
and (4, 2)T .
§4 Parallel vectors and linear dependence. Two vectors v, w ∈ R2 are called parallel,
if one is obtained by stretching of the other. That is, there is a λ ∈ R such that
v = λw or w = λv . (I.6)
8
I.1 The vector space R2
λv + κw = 0 . (I.7)
It should be clear that the above two definitions of parallel vectors are equivalent. One
also says that the vectors v and w are linearly dependent if they are parallel and linearly
independent otherwise.
F Usually, if v 1 is parallel to v 2 , and v 2 is parallel to v 3 , then v 1 is parallel to v 3 . When is
this not true?
§5 Lemma. Every vector u ∈ R2 can be obtained from a linear combination of two vectors
v, w ∈ R2 that are not parallel. (For example, the vector u = (u1 , u2 )T can be written as a
linear combination of the two vectors v = (1, 0)T and w = (1, 1)T : u = (u1 − u2 )v + u2 w.)
Proof: Consider the linear combination
! ! ! !
u1 λv1 κw1 λv1 + κw1
u= = λv + κw = + = . (I.8)
u2 λv2 κw2 λv2 + κw2
9
I Euclidean space
§7 Scalar product and norm. We can introduce the map h·, ·i : R2 × R2 → R defined
as
hu, vi := u1 v1 + u2 v2 . (I.12)
This map is called the vector product, the scalar product or, most appropriately, the inner
product. If we take the inner product of a vector v with itself, hv, vi, we obtain the square
of the length2 of this vector. We call the length of a vector its norm and define
p
||v|| := hv, vi . (I.13)
v12 + v22 + w12 + w22 = v12 + v22 + 2v1 w1 + 2v2 w2 + w12 + w22 = v12 + v22 + w12 + w22 + 2hv, wi . (I.14)
where cos(v, w) denotes the cosine of the angle3 between v and w. The above relation
follows rather directly from the law of cosines and equation (I.14): Given a triangle with
sides a, b, c and angle γ opposing side c, we have
10
I.2 The vector spaces R3 and Rn
λu + κv + µw or λ1 u1 + λ2 u2 + λ3 u3 + · · · + λn un , (I.18)
§2 Linear dependence. If two vectors u, v ∈ R3 are parallel to each other, i.e. one is a
multiple of the other, the set of their linear combinations or their span is just 0 if u = v = 0,
and a line through the origin otherwise. If they are not parallel, their linear combinations
form a plane containing the null-vector 0. A third vector w is an element of this plane, if
it can be written as a linear combination of the other two:
w = λu + κv . (I.20)
Alternatively, we can extend our notion of linear dependence: We call three vectors u, v, w
linearly dependent, if there is a non-trivial linear combination (i.e. at least one of the
constants λ, κ, µ is not zero) such that
λu + κv + µw = 0 . (I.21)
11
I Euclidean space
§5 Examples. a ) The vectors (1, 2, 3)T , (1, 1, 1)T , (2, 3, 4)T span a plane in R3 , as the
vectors are not linearly independent: (1, 2, 3)T + (1, 1, 1)T − (2, 3, 4)T = 0.
b ) The vectors (1, 1, 1)T , (0, 0, 0)T span a line in R3 .
§6 Basis. Any vector in R3 can be written as a linear combination of three linearly inde-
pendent vectors in R3 . We therefore call such a set a basis of R3 .
§7 Inner product and angles. Inner products and angles are defined in the obvious way.
That is, the inner product between two vectors v, w ∈ R3 is given by
hv, wi := v1 w1 + v2 w2 + v3 w3 . (I.22)
The inner product of a vector with itself is again the square of its length due to Pythagoras,
and the norm of a vector v ∈ R3 is therefore defined as ||v|| := hv, vi.
p
Note that two vectors in R3 are either parallel or define a plane. Within this plane,
we can still define an angle between the vectors. (If the vectors are parallel, we define the
angle between the vectors to be 0.) We can compute this angle via the formula
which follows again from the law of cosines, however in a slightly more involved fashion than
in R2 . Two vectors are perpendicular iff (i.e. if and only if) their scalar product vanishes.
§8 Cross product. In R3 (and only in R3 ) there is a further product mapping two vectors
into another vector. Given two vectors v, w ∈ R3 , the product vanishes if the vectors are
parallel. Otherwise, it equals the unique vector that is perpendicular to the plane spanned
by v and w (the orientation is here important) and whose norm equals the area of the
parallelogram with sides v and w. Explicitly, one has the following formula:
v1 w1 v2 w3 − v3 w2
v × w = v2 × w2 := v3 w1 − v1 w3 . (I.24)
v3 w3 v1 w2 − v2 w1
12
I.3 Matrices
Note that we can add two elements together and multiply them by a real constant:
x1 y1 x 1 + y1 x1 λx1
.. .. .. . .
. + . = and λ .. = ..
.
xn yn x n + yn xn λxn
for λ ∈ R and x, y ∈ V . All the other definitions like linear combination, span, basis etc.
generalise in the obvious way, and we will come back to the details in section III.
§11 Remark. F Although it is impossible to imagine a four- or higher-dimensional space,
some intuition can be obtained from reading Edwin A. Abott’s novel “Flatland: A Romance
of Many Dimensions” from 1884. The author describes life in a two-dimensional world. As
a two-dimensional creature, he also visits a one-dimensional world and encounters a three-
dimensional sphere. What would a four-dimensional sphere passing through our three-
dimensional world look like?
I.3 Matrices
§1 Definition. A map f : Rm → Rn is called linear, if
§2 Remark. The two conditions in the definition of a linear map can be combined into
the condition f (λx1 + x2 ) = λf (x1 ) + f (x2 ) for all λ ∈ R and x1 , x2 ∈ Rm .
§3 Examples. a ) A linear map f : R → R is of the form f (x) = ax, a ∈ R. This follows
from f (x) = f (1x) = xf (1) =: xa.
b ) A linear map f : R2 → R is of the form f [(x1 , x2 )T ] = f [(x1 , 0)T ] + f [(0, x2 )T ] =
f [(1x1 , 0)T ] + f [(0, 1x2 )T ] = f [(1, 0)T ]x1 + f [(0, 1)T ]x2 =: a1 x1 + a2 x2 , a1 , a2 ∈ R.
More generally, we have the following:
§4 Matrices. A linear map f = A : Rm → Rn is of the form:
a11 x1 + a12 x2 + . . . + a1m xm a11 a12 . . . a1m x1
a21 x1 + a22 x2 + . . . + a2m xm a21 a22 . . . a2m x2
.. = .. .. .. .. .
. . . . .
an1 x1 + an2 x2 + . . . + anm xm an1 an2 . . . anm xm
| {z } | {z }
A x
The array of numbers A is called a matrix, more specifically, an n × m-matrix4 . The 1.2
1.3
13
I Euclidean space
product between the matrix A and the vector x is given as above: One takes the vector and
writes its components as rows next to the components of the matrix and sums the rows.
We will call this product (and its later generalisations) a matrix product. Matrices with m
columns can only multiply vectors in Rm .
Note that there is a matrix A with all components 0 such that Av = 0 for any v ∈ Rm .
We denote the set of all square matrices with n rows and columns by Matn . We call a
matrix in Matn of the form
a11 0 0
A = 0 ... 0 . (I.26)
0 0 ann
a diagonal matrix. The set of diagonal matrices in Matn is denoted by Diagn . The unit
matrix 1n ∈ Matn acts on any vector v ∈ Rn according to 1n v = v. It is a diagonal matrix
with entries a11 = a22 = . . . = ann = 1.
§5 Basic operations. One easily checks that the matrix product satisfies Ax1 + Ax2 =
A(x1 + x2 ) for an n × m-matrix A and vectors x1 , x2 ∈ Rm . To have Ax + Bx = (A + B)x
for n × m-matrices A, B and x ∈ Rm , we define the following sum of matrices:
a11 a12 . . . a1m b11 b12 . . . b1m
a21 a22 . . . a2m b21 b22 . . . b2m
.. .. .. + .. .. ..
. . . . . .
an1 an2 . . . anm bn1 bn2 . . . bnm
a11 + b11 a12 + b12 . . . a1m + b1m
a21 + b21 a22 + b22 . . . a1m + b1m
= .. .. .
. .
an1 + bn1 an2 + bn2 . . . anm + bnm
We also want to have associativity of the matrix product: A(Bx) = (AB)x. This relation
holds, if we define the matrix product AB as the matrix product of A with each column
4
Note that matrices are always labelled row-column.
14
I.3 Matrices
vector of B:
a11 a12 . . . a1m b11 b12 ... b1p
a21 a22 . . . a2m b21 b22 ... b2p
. .. .. .. .. ..
.
. . . . . .
an1 an2 . . . anm bm1 bm2 . . . bmp
a11 b11 + . . . + a1m bm1 . . . a11 b1p + . . . + a1m bmp
= .. ..
.
. .
an1 b11 + . . . + anm bm1 . . . an1 b1p + . . . + anm bmp
An important consequence of this definition is that the matrix product is not commutative.
That is, in general AB 6= BA. F Find examples for matrices A, B such that AB = BA
and such that AB 6= BA.
Finally, the transpose of a matrix is the matrix with rows and columns interchanged (in
the case of square matrices, the entries are mirrored at the diagonal):
T
a11 . . . a1m a11 . . . an1
.. .. = .. .. .
. . . .
an1 . . . anm a1m . . . anm
This is the operation that we use to map a row vector (a 1 × n matrix) into a column vector
(a n × 1 matrix). The trace of a square matrix is the sum of its diagonal entries. That is,
a11 . . . a1n
tr ... .. := a + a + . . . + a . (I.27)
. 11 22 nn
an1 . . . ann
§6 Properties of the matrix operations. Let us sum up useful properties of the matrix
product. In the following A, B, C are matrices of a size compatible with the given products
and λ, κ are real constants. We have:
A+B =B+A , A + (B + C) = (A + B) + C , 1A = A , A1 = A ,
A(BC) = (AB)C , A(B + C) = AB + AC , (B + C)A = BA + CA ,
λ(A + B) = λA + λB , (λ + κ)A = λA + κA , (I.28)
(λκ)A = λ(κA) , λ(AB) = (λA)B = A(λB) ,
(AB) = B T AT ,
T
hx, yi = xT y .
F Prove these results. Note that if for two matrices A, B, Ax = Bx for any vector x, then
A = B.
§7 Inverse matrices. Given a matrix A ∈ Matn , another matrix B is called the inverse
of A, if AB = 1n . We then write A−1 for B. In the tutorials, we prove that if an inverse
exists it is unique and we also have A−1 A = 1n .
15
I Euclidean space
§8 Types of square matrices. On the vector space Rn , an n × n matrix can have two
kinds of actions, that can be combined arbitrarily
ṽ
v
θ
α x1
v1 = R cos α ṽ1 = R cos(α + θ) = R(cos α cos θ − sin α sin θ) = cos θv1 − sin θv2
v2 = R sin α ṽ2 = R sin(α + θ) = R(cos α sin θ + sin α cos θ) = sin θv1 + cosθv2
16
I.3 Matrices
The above example projects onto the vector (1, −1)T : Writing the vector x as a sum
! !
1 1
x = x1 + x2 , (I.30)
−1 1
we have ! ! !
1 1 1
P x = x1 P + x2 P = x1 . (I.31)
−1 1 −1
The function det is called the determinant. We can generalise the notion of a determinant
to arbitrary n vectors in Rn (and correspondingly, to n × n matrices A), as seen in the
definition below.
§11 ε-symbol. We define5
Examples are: ε132 = −ε123 = −1, ε112 = 0 and ε1432 = −ε1423 = ε1243 = −ε1234 = −1.
§12 Definition. The determinant of a matrix A is a function det : Matn 7→ R defined as
a11 a12 . . . a1n
n
a21 a22 . . . a2n
X
det(A) = det
.. .. .. = εi1 ...in a1i1 . . . anin .
. . .
i1 ,...,in =1
an1 an2 . . . ann
Here, the symbol ni1 ,...,in =1 means a summation over all indices from 1 to n: ni1 ,...,in =1 =
P P
Pn Pn Pn
i1 =1 i2 =1 · · · in =1 . 1.3
5
The ε-symbol is related to permutations, i.e. the various possible orderings, of sets. The ε-symbol is the 2.1
sign of the permutation given by the order of the indices.
17
I Euclidean space
§13 Remark. The notion of a determinant is not very intuitive, and it takes some expe-
rience and practice to get a handle on it. It is essentially a number assigned to a matrix
that contains useful information about the matrix. Consider for example the determinant
for Mat2 introduced in §10. If the vectors x and y are parallel, the enclosed volume and
thus the determinant is zero, which also implies that the corresponding matrix A = (x y)
is not invertible. This generalises to Matn . Furthermore, the determinant of a rotation is
1 and that of a dilation is the product of all the scaling factors. Projections P 6= 1 have
determinant 0 (and clearly correspond to non-invertible matrices).
§14 Special cases. a) Mat2 (“cross rule”: multiply diagonally, subtract):
2
!
a11 a12 X
det = εi1 i2 a1i1 a2i2 = ε11 a11 a21 + ε12 a11 a22 + ε21 a12 a21 + ε22 a12 a22
a21 a22 i ,i =1 1 2
18
Note that if det(A) = 0, then det(A−1 ) is not defined. As we will see later, this is due to
A−1 not being defined in this case.
F A very nice formula relating the trace, i.e. the sum of the diagonal elements, of a
matrix to its determinant is the following: log det A = tr log A. Here, the logarithm of a
matrix is defined via its Taylor series: log A = (A − 1) − 21 (A − 1)2 + 13 (A − 1)3 + · · · .
Verify this formula for 2× 2 matrices. Develop sketches of proofs of the rules (I.35) using
this formula.
a b c b c a
Physics (i.e. the lever rule) tells us that this means that
We would like to deduce the values of ma and mb from these two equations.
§2 Example. (Making TNT) We are interested in the following chemical reaction of
toluene and nitric acid to TNT and water, where the variables x, y, z and w denote the
numbers of the various molecules:
x C7 H8 + y HNO3 → z C7 H5 O6 N3 + w H2 O .
Considering the numbers of the various atoms in this reaction, we obtain the equations
7x = 7z
8x + 1y = 5z + 2w
1y = 3z
3y = 6z + 1w .
We would like to deduce the right ratios of toluene and nitric acid we need in order to have
none of these left over after the reaction took place.
19
II Systems of linear equations
§3 Remark. We see that systems of equations as above appear in many different problems.
Other subjects which make heavy use of such equations and linear algebra in general are
for example Special Relativity and Quantum Mechanics.
with a1 , . . . , an , b ∈ R.
We see that systems of linear equations can be written in matrix form Ax = b. Example:
! ! !
5x1 + 3x2 = 10 5 3 x1 10
= .
−2x1 + 2x2 = 2 −2 2 x2 2
20
II.3 Gaußian elimination
§5 Examples.
)
x1 + x2 = 2
one solution: x1 = x2 = 1 → consistent
x1 − x2 = 0
(Solve second equation: x1 = x2 , first then implies x1 = x2 = 1.)
x1 + x2 = 2
x1 − x2 = 0 no solution. → inconsistent
3x1 + 2x2 = 4
(First two eqns. imply x1 = x2 = 1, which does not solve the third eqn.)
)
x1 + x2 = 2
infinitely many solutions: x2 = 2 − x1 → consistent
2x1 + 2x2 = 4
(Second equation is multiple of first → same solution.)
)
x1 + x2 = 2
no solution. → inconsistent
3x1 + 3x2 = 7
(Left- and right-hand sides of second equation are different multiples of the
first equation.)
§6 Geometric interpretation. Consider an SLE consisting of two equations in two un-
knowns. Each equation determines a line in R2 . Points (x1 , x2 ) on this line correspond to
solutions of this equation. Intersections of two lines correspond to common solutions of the
corresponding equations. The plots for the first, the third and the last examples of §4 are
given below in figure 1.
3 x2 3 x2
3 x2
2 2 2
1 1 1
x1 x1 x1
-1 1 2 3 -1 1 2 3 -1 1 2 3
-1 -1 -1
Figure 1: The three possible situations for a system of two linear equations in two unknowns:
one solution, no solution and infinitely many solutions. In the last case, the two lines are
on top of each other.
21
II Systems of linear equations
The right form of the SLE is called an augmented matrix. We do not change the solutions
of (II.1), if we perform one of the following operations:
That is, these operations lead to an SLE, which is equivalent to the SLE (II.1).
F Verify this statement.
F What is the geometric interpretation of these operations?
x1 + 2x2 + x3 = 1 x1 = 1 − 2x2 − x3 = 2
0
x2 + 3x3 = 2 (∗ ) with solutions x2 = 2 − 3x3 = −1 .
x3 = 1 x3 = 1
The SLEs (∗) and (∗0 ) have the same solutions and are thus equivalent.
22
II.3 Gaußian elimination
Be lazy! Always reorder rows to simplify the calculations. Avoid fractions in actual com-
putations. Exercise!
§7 Example. Perform GE on the augmented matrix corresponding to the SLE for the
example in II.1, §1. We have
!
5x1 + 3x2 = 10 5 3 10
.
−2x1 + 2x2 = 2 −2 2 2
Using elementary row operations, we bring this augmented matrix into row echelon form:
! R2 ↔R1 ! !
5 3 10 R1 →− 21 R1 1 −1 −1 R2 →R2 −5R1 1 −1 −1
−2 2 2 5 3 10 0 8 15
!
R2 → 18 R2 7
1 −1 −1 x1 − x2 = −1 → x1 = 8
15 .
0 1 8 x2 = 15
8 → x2 = 15
8
23
II Systems of linear equations
§9 Consistency. From the row echelon form, it is easy to see7 whether an SLE is consis-
tent. Examples:
unique solution: infinitely many solutions: no solution:
1 × × × 1 × × × 1 × × ×
0 1 × × 0 1 × × 0 1 × ×
0 0 1 × 0 0 0 0 0 0 0 1
(Substitution yields the (x3 can be chosen (Last row: 0x1 + 0x2 + 0x3 = 1
unique solution.) arbitrarily.) yields a contradiction.)
§10 General solution. In an augmented matrix in row echelon form, the variables cor-
responding to a leading 1 in a row are called pivot variables. The values of these variables
are determined by substitution. The other, undetermined variables can be put to arbitrary
constants α, β, γ, . . .. The result is called the general solution. Example (we start at the
bottom):
1 5 3 2 3 → x1 = 3 − 5β − 3(8 − 4α) − 2α
0 0 1 4 8 → x = 8 − 4α
3
0 0 0 0 0 → x2 , x4 undetermined, so put
0 0 0 0 0 x2 = β , x4 = α
In this SLE, x1 and x3 are pivot variables. The general solution is given by the set {(−21 +
10α − 5β, β, 8 − 4α, α)T |α, β ∈ R}.
§11 Example. Recall the example in II.1, §2 (TNT). Gaußian elimination works here as
follows:
7x2 − 7x4 = 0 0 7 0 −7 0
−2x1 + 8x2 + x3 − 5x4 = 0 −2 8 1 −5 0
x3 − 3x4 = 0 0 0 1 −3 0
0 0 1 −3 0 0 0 1 −3 0
1 0 −3 6 0 R3 →− 15 R3 1 0 −3 6 0
0 1 0 1
R3 →R3 −8R2 −1 0 R4 →R4 − 5 R3 0 1 0 −1 0
0 0 −5 15 0 0 0 1 −3 0
0 0 1 −3 0 0 0 0 0 0
We read off: x4 = α, x3 = 3α, x2 = α and x1 = 3α. As expected, we have the freedom
to determine what amount of chemicals we want to obtain in the reaction: The choice α
corresponds to the amount of water molecules which should come out of the reaction.
7
If you don’t see this, rewrite the augmented matrices as SLEs.
24
II.4 Homogeneous and inhomogeneous SLEs
which are given by x3 = α, x2 = −x3 = −α, x1 = −x2 = α. The general solution of the
inhomogeneous SLE is thus x1 = 1 + α, x2 = 1 − α and x3 = 1 + α.
25
II Systems of linear equations
Use now elementary row operations to transfer this system into the form
1 0 0
.
e1 · · · en B = 0 .. 0 , B = (b1 , . . . , bn ) .
B
0 0 1
14
R1 →R1 + 16 R3
2 10 14
9 1 0 0 − 16 16 16 1 1 0 0 − 18 5
8
7
8
R2 →R2 − 16 R3 R3 → 16 R3
7 11 9 7
0 1 0 − − 0 1 0 16 − 11 9
− 16
16 16 16 16
1 3 1
0 0 16 1 3 1 0 0 1 16 16 16
1 5 7
−
78 8 8
⇒ A−1 = 16 − 11
16 − 9
16
1 3 1
16 16 16
2.2
3.1
26
§4 Remark. In some cases, the left side of the augmented matrix cannot be turned into
the unit matrix 1 using elementary row operations as one is left with rows of zeros, e.g.
1 0 0 × × ×
0 0 1 × × × (II.3)
0 0 0 × × ×
In these cases, the inverse of A does not exist, and the SLE Ax = b has either no or infinitely
many solutions.
§5 Lemma. Elementary row operations affect the determinant of a matrix as follows:
i) Exchanging two rows in a matrix changes the sign of the determinant.
ii) Multiplying a row of a matrix by a number λ 6= 0 changes the determinant by the
factor λ.
iii) Adding a row of a matrix to another does not change the determinant.
Proof: i) Assume that we interchange rows j and k in the matrix A and obtain a new
matrix B. According to our definition of the determinant (I.3, §12), we have
n
X
det(A) = εi1 ...ij ...ik ...in a1i1 · · · ajij · · · akik · · · anin
i1 ,...,in =1
X n
= εi1 ...ik ...ij ...in a1i1 · · · akij · · · ajik · · · anin (II.4)
i1 ,...,in =1
X n
=− εi1 ...ij ...ik ...in a1i1 · · · akij · · · ajik · · · anin = − det(B) ,
i1 ,...,in =1
where we first interchanged the variables ij ↔ ik as well as the order of akij and ajik . This
does not change the equation. We then used εi1 ...ij ...ik ...in = −εi1 ...ik ...ij ...in . ii) and iii)
follow analogously from the definition of the determinant. For iii), one needs that, because
of the antisymmetry of the ε-symbol, nij ,ik =1 εi1 ...in akij akik = 0.
P
§6 Theorem. The determinant of a square matrix vanishes iff it does not have an inverse.
Proof: If the matrix A does not have an inverse, we can use elementary row operations to
turn it into a diagonal matrix D with at least one zero entry along the diagonal. Because
of lemma §5, we have λ det(A) = det(D) for some λ 6= 0. Because det(D) = 0, it follows
that det(A) = 0. Inversely, if det(A) 6= 0, then we can turn it into a diagonal matrix D
with det(D) 6= 0, and therefore into the unit matrix. The matrix A is therefore invertible.
27
III Vector spaces
multiplying it by a real number. We now make the concept of a vector space more abstract,
thereby generalising it.
§2 Definition. A vector space over R is a set V endowed with two operations + : V × V →
V and · : R × V → V satisfying the following vector space axioms:
(1) There is an element 0 = 0V ∈ V , the zero or null vector, such that v + 0 = v for all
v ∈V.
(2) For all v ∈ V , there is an element −v ∈ V such that v + (−v) = 0.
(3) The operation + is commutative: v + w = w + v for all v, w ∈ V ,
(4) and associative: (v + w) + u = v + (w + u) for all v, w, u ∈ V .
(5) The scalar multiplication is distributive: a(v + w) = av + aw and (a + b)v = av + bv,
a, b ∈ R, v, w ∈ V ,
(6) and associative: a(bv) = (ab)v, a, b ∈ R, v ∈ V ,
(7) and compatible with 1: 1v = v.
§3 Remark. To check whether a set V is a vector space, you should first test if the
operations + and · are well defined, that is they do not take you out of the set. You can
then check the other axioms.
§4 Examples. a ) V = (Rn , +, ·) clearly satisfies the axioms and forms a vector space.
b ) Matrices form a vector space: +: add matrices, ·: multiply all entries by the number
c ) If a0 , a1 , . . . , an ∈ R, an 6= 0, then p(y) = a0 + a1 y + . . . + an y n is called a polynomial of
degree n. Polynomials can be added and multiplied by a real number λ ∈ R:
(a0 + a1 y + . . . + an y n ) + (b0 + b1 y + . . . + bn y n ) =
(a0 + b0 ) + (a1 + b1 )y + . . . + (an + bn )y n ,
λ(a0 + a1 y + . . . + an y n ) = (λa0 ) + (λa1 )y + . . . + (λan )y n .
It is easy to check that these operations satisfy the vector space axioms. We conclude that
polynomials of maximal degree n form a vector space, which we denote by Pn . Note that
the null vector 0 is identified with the constant polynomial p(y) = 0.
d ) Analogously, real valued smooth functions8 f ∈ C ∞ (R), f : R → R form a vector space
V with the following rules:
for all f, g ∈ V and λ ∈ R. Here, the null vector 0 is the function f (x) = 0.
§5 Theorem. Let V be a vector space over R. Then
(i) for all u, v, w ∈ V , u + v = u + w implies v = w and
(ii) for all α ∈ R, we have α.0 = 0.
(iii) 0.v = 0 for all v ∈ V .
8
Most of our discussion can be extended to functions that are sectionwise continuous.
28
III.2 Vector subspaces
Proof: Because of vector space axiom (2), there is a vector −u ∈ V such that u+(−u) = 0.
With (3), we have (−u)+u = 0. From u+v = u+w, we have (−u)+(u+v) = (−u)+(u+w).
Associativity of +, i.e. axiom (4), leads to ((−u) + u) + v = ((−u) + u) + w and therefore
to 0 + v = 0 + w and with (1) finally to v = w.
Consider α.0 + α.0 = α.(0 + 0) according to axiom (5). Because of (1), we have α.0 + α.0 =
α.0 = α.0 + 0. Call α.0 = u and together with (i) of this theorem, we have α.0 = 0. The
proof of (iii) is done in Tutorial 3.5.
F Construct an invertible linear map m that maps every polynomial in P2 to a vector
in R3 . Extend this map to one between Pn and Rn+1 . The existence of this map shows
that it does not really matter, if we work with Pn or Rn+1 .
The axioms are clearly satisfied. Note that W 0 = {(x, y)T : x + y = 1} is not a vector
subspace, as 0 = (0, 0)T is not an element of W 0 . Another reason is that the sum of two
elements s1 , s2 ∈ W 0 is not in W 0 . These observations lead us to the vector subspace test:
§3 Theorem. (Vector subspace test) Let V be a vector space and let W be a non-empty
subset of V . Then W is a vector subspace of V iff (if and only if)
Proof: First, note that due to (i) and (ii), the restrictions of the operations + and · from
V to W are indeed well defined on W and do not take us out of this set. It remains to check
the vector space axioms. In the tutorials, we will show that (−1)v = −v for any v ∈ V . We
have already shown in III.1, §5 that 0w = 0. Take an arbitrary vector v ∈ W . Because of
(ii), both 0w = 0 and (−1)v = −v are also in W . Axiom (1) and (2) are therefore satisfied:
There is a null vector 0 in W , and every vector v has an inverse −v in W . The validity of
the remaining axioms (3)-(7) is then inherited from V .
§4 Examples. a ) W = {(x, y)T ∈ R2 : x + y = 0} is a subspace of R2 . Vector subspace
test:
! ! ! ! !
x1 x2 x1 + x2 x1 λx1
+ = and λ = .
−x1 −x2 −(x1 + x2 ) −x1 −λx1
29
III Vector spaces
v = a1 u1 + a2 u2 + . . . + ak uk , a1 , . . . , ak ∈ R .
b ) The vectors (1, 0, 0)T , (0, 1, 0)T , (0, 0, 1)T span R3 and one needs all three of them. Two
of them span only a subspace.
c ) The vector space of polynomials up to degree n, Pn , is spanned by the polynomials
1, x, x2 , . . . , xn .
d ) The vector space of 2 × 2-dimensional matrices Mat2 is spanned by
! ! ! !
1 0 0 1 0 0 0 0
, , , .
0 0 0 0 1 0 0 1
implies c1 = c2 = 0. Analogously, {(1, 0, 0)T , (0, 1, 0)T , (0, 0, 1)T } is linearly independent in
R3 and the generalisation to Rn is clear.
31
III Vector spaces
b ) {(1, 2)T , (2, 4)T } is linearly dependent, as 2(1, 2)T − (2, 4)T = 0.
c ) Let u1 = (1, −1, 2)T , u2 = (3, 2, 1)T , u3 = (4, 1, 3)T . Then {u1 , u2 , u3 } is linearly
dependent as u1 + u2 − u3 = 0.
§3 Algorithm. To determine if a set {u1 , . . . , uk } is linearly independent or not, find (most
likely using Gaußian elimination) all solutions to
c1 u1 + c2 u2 + . . . + ck uk = 0 .
If all variables c1 , . . . , ck are pivot variables, this homogeneous SLE has only the trivial
solution c1 = c2 = . . . = ck = 0. In this case, the set {u1 , . . . uk } is linearly independent.
Otherwise, nontrivial solutions exist and the vectors are linearly dependent.
§4 Exercise. Determine whether (1, 2, 3, −1)T , (2, 1, 3, 1)T and (4, 5, 9, −1)T are linearly
dependent.
Solution: We must find all solutions (c1 , c2 , c3 ) of the homogeneous SLE
1 2 4 0
2 1 5 0
c1 + c2 + c3 = .
3 3 9 0
−1 1 −1 0
32
III.5 Basis and dimensions
d ) (1, 0, 0)T , (0, 1, 0)T is not a basis for R3 as these vectors do not span this space.
33
III Vector spaces
c1 w1 + c2 w2 + . . . + cm wm = 0 . (III.2)
To have all the coefficients of the v 1 , . . . , v n vanish, note that it is sufficient to find c1 , . . . , cm
such that
a11 c1 + a21 c2 + . . . + am1 cm = 0 ,
a12 c1 + a22 c2 + . . . + am2 cm = 0 ,
.. ..
. .
a1n c1 + a2n c2 + . . . + amn cm = 0 .
This is a homogeneous SLE with more unknowns (m) than equations (n) and thus there is
a solution with not all of the ci being zero (Theorem II.4, §3). It follows that (III.2) has a
solution besides the trivial solution and so S 0 is linearly dependent.
§5 Theorem. Any two bases for a vector space V contain the same number of vectors.
Proof: Let B = (v 1 , v 2 , . . . , v n ) and B 0 = (v 01 , v 02 , . . . , v 0m ) be bases for V . From the above
lemma, we conclude that since B is a basis and B 0 is linearly independent, m ≤ n. Equally,
since B 0 is a basis and B is linearly independent, n ≤ m. Altogether, we have m = n.
§6 Remark. From lemma §4 and theorem §5, it follows that a basis for V contains a
minimal number of vectors that span V , but a maximal number of vectors that are still
linearly independent.
§7 Definition. Let V be a vector space. We define the dimension of V to be the number
of vectors in any basis of V .
34
III.5 Basis and dimensions
R} = span{(2, 3, 4)T }. Since a single vector10 is always linearly independent, (2, 3, 5)T is
a basis for L and we have dim L = 1.
c ) Consider the plane P = {(x, y, z)T ∈ R3 : x + y − z = 0}. Then P is a subset of R3 .
Consider the tuple B = (1, 0, 1)T , (0, 1, 1)T of vectors in P . Since (1, 0, 1)T and (0, 1, 1)T
are not multiples of each other, B is linearly independent. Furthermore, they span P :
(x, y, z)T ∈ P ⇒ z = x + y, (x, y, x + y)T = x(1, 0, 1)T + y(0, 1, 1)T . We conclude that B is
a basis for P and P has dimension 2.
d ) Consider the SLE Ax = 0 given by
x1 − x2 + 2x3 + x4 = 0 ,
2x1 + x2 − x3 + x4 = 0 ,
(III.3)
4x1 − x2 + 3x3 + 3x4 = 0 ,
x1 + 2x2 − 3x3 = 0 .
Let W = {x ∈ R4 : Ax = 0}. Then W is a subspace of R4 according to corollary III.2,
§5, and we would like to determine a basis. The SLE (III.3) has the same solutions as the
SLEs corresponding to the following augmented matrices:
1 −1 2 1 0 R2 →R2 −2R1 1 −1 2 1 0
2 1 −1 1 0 R3 →R3 −4R1 0 3 −5 −1 0
R4 →R4 −R1
4 −1 3 3 0 0 3 −5 −1 0
1 2 −3 0 0 0 3 −5 −1 0
1 −1 2 1 0
R3 →R3 −2R2 0 3 −5 −1 0
R4 →R4 −R2 x1 − x2 + 2x3 + x4 = 0 ,
0 0 0 0 0 3x2 − 5x3 − x4 = 0 .
0 0 0 0 0
The general solution is thus
1 1 2
x3 = α , x4 = β ,
x2 = (5α + β) , x1 = x2 − 2x3 − x4 = − α − β .
3 3 3
Any solution can be written as
T
x1 −α − 2β −1 −2
x 1 5α + β 1 1
5 1
2
x= = = α + β ,
x3 3 3α 3 3 3 0
x4 3β 0 3
10
which is not the null vector 0
35
III Vector spaces
and we find B = (−1, 5, 3, 0)T , (−2, 1, 0, 3)T spans the solution space to Ax = 0. As the
two vectors in B are not multiples of each other, they are linearly independent (cf. III.4,
remark §5, b)) and B forms a basis of the solution space.
§9 Lemma. We have
Proof: In each case, we need to show that any vector of the set on the left-hand side is also
contained in the set on the right-hand side and vice versa. (i) is trivial. (ii): v = c1 v 1 +. . .+
c
cj v j + . . . + cm v m = d1 v 1 + . . . + dj λv j + . . . + dm v m , where dj = λj and di = ci else. (iii):
v = c1 v 1 +. . .+cj v j +. . .+ck v k +. . .+cm v m = d1 v 1 +. . .+dj (v j +v k )+. . .+dk v k +. . .+dm v m ,
where dk = ck − cj and di = ci else.
§10 Finding a basis. The above lemma §9 implies the following: We can find a basis for
span{v 1 , . . . , v k } as follows:
§11 Exercise. Find a basis for the set S = span{(2, −1, 2, 1)T , (1, 2, −1, 3)T , (4, 3, 0, 7)T ,
(0, 0, 1, 2)T }.
Solution: S equals the span of the rows of each of the following matrices:
1 2 −1 3 1 2 −1 3 1 2 −1 3
R2 →R2 −2R1 R3 →R3 −R2
2 −1 2 1 R3 →R3 −4R1 0 −5 4 −5 R3 ↔R4 0 −5 4 −5
.
4 3 0 7 0 −5 4 −5 0 0 1 2
0 0 1 2 0 0 1 2 0 0 0 0
Thus, S = span{(1, 2, −1, 3)T , (0, −5, 4, −5)T , (0, 0, 1, 2)T }. Because of the row echelon
form, it is easy to see that these vectors are linearly independent and therefore they form
a basis for S.
§12 Remark. The original vectors in the above example are linearly dependent. It follows
from the matrix after the first set of elementary row operations that
R2 − 2R1 = R3 − 4R1 or (2, −1, 2, 1)T − 2(1, 2, −1, 3)T = (4, 3, 0, 7)T − 4(1, 2, −1, 3)T .
4.1
11
4.2 It is a very common source of mistakes to confuse rows and columns here. In algorithms like this one,
always make sure you know how to arrange vectors!
36
§13 Theorem. Let V be an n-dimensional vector space and let {v 1 , . . . , v k } be a linearly
independent set of vectors which do not span V . Then {v 1 , . . . , v k } can be extended to a
basis of V .
Proof: (by construction) Since span{v 1 , . . . , v k } = 6 V , there is a v k+1 ∈ V such that v k+1 ∈ /
span{v 1 , . . . , v k }. By theorem III.4, §7, the set {v 1 , . . . , v k , v k+1 } is linearly independent.
If span{v 1 , . . . , v k , v k+1 } = V , then we found a basis. Otherwise, we repeat this procedure.
In each step, the dimension of the span of the vectors increases by one, and after n − k
steps, we arrive at the desired basis.
§14 Theorem. Suppose S = {v 1 , . . . , v k } spans a (finite-dimensional) vector space V .
Then there exists a subset of S which is a basis for V .
Proof: (by construction) If S is linearly independent, then S is a basis. Otherwise, one
of the v i can be written as a linear combination of the others as follows from lemma III.4,
§6. Therefore S\{v i } still spans V by theorem III.3, §6. We continue with this reduced set
from the top as often as possible. The minimal set, which still spans V is the basis of V .
37
IV Inner product spaces
xn yn
§3 Remark. Note that on a vector space, one can define many different inner products.
For example, besides the usual Euclidean inner product on R2 , we could use any expression
hx, yi = αx1 y1 + βx2 y2 for positive numbers α, β ∈ R.
F Try to formulate conditions on a matrix A to define an inner product via hx, yi :=
T
x Ay? We will return to this question later.
§4 Lemma. We have h0, ui = 0 for all u ∈ V .
Proof: It is 0 = 0hu, ui = h0u, ui = h0, ui.
§5 Definition. The norm of a vector (or its length or magnitude) is given by ||u|| =
p
hu, ui.
§6 Examples. a ) The Euclidean norm of x = (x1 , . . . , xn )T ∈ Rn is ||x|| = x21 + . . . + x2n .
p
b ) Recall that if θ denotes the angle between x and y, then hx, yi = ||x|| ||y|| cos θ.
c ) In the case of the vector space V of continuous functions defined on the interval [0, 1],
R 1/2
1
4.2 we have ||f || = 0 f (x)2 dx for all f ∈ V .
5.1 §7 Theorem. (Cauchy-Schwarz inequality) Let V be an inner product space and let u, v ∈
V . Then |hu, vi| ≤ ||u|| ||v||.
Proof: The proof is trivial if v = 0 and thus ||v|| = 0 by §4. Let now v 6= 0. We have for
any number δ ∈ R:
38
IV.2 Orthogonality of vectors
We choose now δ = hu, vihv, vi−1 = hu, vi ||v||−2 , which is indeed well-defined, as v 6= 0.
We continue:
0 ≤ hu, ui − |hu, vi|2 hv, vi−1
⇔ |hu, vi|2 ≤ ||u||2 ||v||2
⇔ |hu, vi| ≤ ||u|| ||v|| .
u+v v
Proof: We have
||u + v||2 = hu + v, u + vi = hu, ui + 2hu, vi + hv, vi ≤ ||u||2 + 2|hu, vi| + ||v||2
§7
≤ ||u||2 + 2||u|| ||v|| + ||v||2 = (||u|| + ||v||)2 .
Thus, ||u + v|| ≤ ||u|| + ||v||.
orthogonal if k1 6= k2 : Z 1
sin(k1 πx) sin(k2 πx)dx = . . . = 0 .
0
39
IV Inner product spaces
§3 Theorem. (Pythagoras) If u and v are orthogonal vectors, then ||u + v||2 = ||u||2 +
||v||2 .
Proof: We have
§4 Definitions. A set of vectors is called an orthogonal set, if all pairs of distinct vectors
in the set are orthogonal. An orthogonal set in which each vector has norm 1 is called
orthonormal.
§5 Examples. In R3 with Euclidean inner product, {(1, −2, 1)T , (1, 1, 1)T , (1, 0, −1)T } is
an orthogonal set and {(1, 0, 0)T , (0, 1, 0)T , (0, 0, 1)T } is an orthonormal set.
§6 Theorem. An orthogonal set that does not contain 0 is linearly independent.
Proof: Let the orthogonal set be given by S = {u1 , . . . , un } with huk , u` i = 0 for k 6= `.
Consider the equation c1 u1 + c2 u2 + . . . + cn un = 0. We need to show that c1 = c2 = . . . =
cn = 0 is the only solution. Taking the inner product of both sides of this SLE with an
arbitrary vector uk ∈ spanS, we find
hc1 u1 + c2 u2 + . . . + cn un , uk i = h0, uk i
c1 hu1 , uk i + c2 hu2 , uk i + . . . + ck huk , uk i + . . . + cn hun , uk i = 0
ck huk , uk i = 0 ⇒ ck = 0 .
v = c1 u1 + . . . + cn un ⇒ hv, u1 i = hc1 u1 + . . . + cn un , u1 i
= c1 hu1 , u1 i + c2 hu2 , u1 i + . . . + cn hun , u1 i
= c1 hu1 , u1 i ,
hv, u1 i hv, ui i
⇒ c1 = , and generally: ci =
hu1 , u1 i hui , ui i
40
IV.3 Gram-Schmidt algorithm
§9 Example. Consider the orthogonal basis (1, −1, 1)T , (1, 0, −1)T , (1, 2, 1)T of R3 . To
write the vector (3, 4, 5)T as a linear combination of the basis vectors, we compute
* 3 1 + * 3 1 +
4 , −1 4 , 0
5 1 4 5 −1
c1 = = , c2 = = −1 ,
* 1 1 + 3 * 1 1 +
−1 , −1 0 , 0
1 1 −1 −1
* 3 1 +
4 , 2
5 1 3 1 1 1
8 4 8
c3 = = ⇒ 4 = −1 − 0 + 2 .
* 1 1 + 3 3 3
5 1 −1 1
2 , 2
1 1
(1) Let w1 = u1 .
12
The notation here is not unique, sometimes Pu w is used instead of Pw u. This can be a source of much
confusion, so you should pay attention to the context.
41
IV Inner product spaces
(2) Take the next vector and make it perpendicular to all the previously obtained ones:
i−1
X
wi = ui − Pwj ui
j=1
The following two paragraphs contain extra material which will not be required for the
exam. Also, note that we are slightly sloppy in our treatment of the infinite-dimensional
vector space of real-valued functions on [0, 1] below13 .
13
A proper treatment would work with the vector space consisting of finite linear combinations (i.e. only
42
§7 Functions on [0, 1]. We saw in section IV.2, §2 that the functions sin(kπx) form an
orthogonal set of functions on the interval [0, 1]. Let us study this example in more detail.
R1
Recall that the scalar product on this space was hf, gi = 0 f (x)g(x)dx. We wish to turn
the set of functions S = {sin(πx), sin(2πx), . . .} into an orthonormal set. We already know
that sin(k1 πx) and sin(k2 πx) are orthogonal if k1 , k2 ∈ Z, k1 6= k2 . It remains to normalise
these functions. We evaluate:
Z 1
1 sin(2kπ)
sin(kπx) sin(kπx)dx = − , k ∈ N∗
0 2 4kπ
For k ∈ Z, we have sin(2kπ) = 0 and we conclude that hsin(kπx), sin(kπx)i = 21 . Thus,
√ √ √
2 sin(πx), 2 sin(2πx), . . . forms an orthonormal set. Similarly, 1, 2 cos(πx),
√
2 cos(2πx)), . . . forms an orthonormal set.
§8 Fourier transform. Recall that given an orthonormal basis (u1 , . . . , uk ) of a vector
space V , we can write any vector v ∈ V as the linear combination
where the hv, ui i are the coordinates of the vector v with respect to the given basis. (Check:
Take the scalar product of both sides with an arbitrary element ui of the basis.) It can
√ √ √ √
now be shown that 1, 2 sin(2πx), 2 cos(2πx), 2 sin(4πx), 2 cos(4πx), . . . forms an
orthonormal basis for continuous functions on [0, 1]. We can therefore write an arbitrary
continuous function f : [0, 1] → R, as
√ √ √ √
f (x) = b0 1 + a1 2 sin(2πx) + b1 2 cos(2πx) + a2 2 sin(4πx) + b2 2 cos(4πx) + . . .
∞ ∞
X √ X √
= b0 + ak 2 sin(2kπx) + bk 2 cos(2kπx) ,
k=1 k=1
where
√ √
b0 = hf, 1i , ak = hf, 2 sin(2kπx)i , bk = hf, 2 cos(2kπx)i , k ∈ N∗ ,
Z 1
i.e. ak = f (x) sin(2kπx)dx , etc.
0
Note that the coefficients a1 , a2 , . . . and b0 , b1 , . . . contain all information necessary to re-
construct f . The transition from f to these coefficients is called a Fourier transform. This
transform is used e.g. to solve differential equations, in cryptography, jpeg compression and
by your stereo or mp3-player to display a frequency analysis of music clips. 5.2
5.3
V Linear transformations
We already encountered linear maps between Rn and Rm in section I.3. Here, we define
linear maps between arbitrary vector spaces.
finitely many coefficients in the linear combinations are non-zero) of the infinite-dimensional basis and
showing that this vector space is dense in the space of functions on [0, 1]. Also, there is an issue with
linearly combining infinitely many vectors.
43
V Linear transformations
§2 Examples. a ) Let U = Rn and V = Rm , then the linear maps between U and V are
m × n-dimensional matrices, cf. section I.3.
b ) Let M be an m × m matrix and N and n × n matrix. The map A 7→ M AN , where A is
an m × n matrix is a linear map.
c ) Let C ∞ (R) be the vector space of smooth functions on R. Then the transformations
D : f 7→ f 0 and I : f 7→ 0 f (s)ds are linear transformations from the vector space C ∞ (R)
Rx
to itself.
d ) Since in general sin(x1 + x2 ) 6= sin(x1 ) + sin(x2 ), the map T (x, y) := (y, sin x) is not a
linear transformation.
e ) The map T : R3 → R3 with T (x, y, z) = (x+1, y+1, z+1) is not a linear transformation.
§3 Lemma. Let T : U → V be a linear transformation. We then have:
44
V.2 Row and column spaces of matrices
the vectors r1 = (a11 , a12 , . . . , a1n ), . . ., rm = (am1 , am2 , . . . , amn ) are called the row vectors
of A, while the vectors
a11 a1n
a a
21 2n
c1 = , . . . , cn = ,
... ...
am1 amn
are called the column vectors of A. The subspace span{r1 , . . . , rm } ⊆ Rn is called the row
space and the subspace span{c1 , . . . , cn } ⊆ Rm is called the column space. We denote their
dimensions by rkr (A) and rkc (A) and call them the row and column rank of the matrix A.
§2 Example.
!
1 2 3 row space: span{(1, 2, 3), (4, 5, 6)} , 2 = rkr (A) ≤ 2 ,
A= T T T
4 5 6 column space: span{(1, 4) , (2, 5) , (3, 6) } , 2 = rkc (A) ≤ 3 .
! ! !
2 1 3
Note that 2 = + .
5 4 6
§3 Elementary row operations. The elementary row operations of II.3, §2 do not change
the span of the rows of a matrix (cf. III.5, §9). Therefore, they do not change the row rank
of a matrix.
45
V Linear transformations
§4 Determining row spaces. We can determine the row space by bringing the matrix
to row echelon form. Example:
1 2 −1 3 1 2 −1 3
2 −1 2 1 0 −5 4 −5
A= (cf. III.5, §11) .
4 3 0 7 0 0 1 2
0 0 1 2 0 0 0 0
and we have span{(1, 2, −1, 3), (2, −1, 2, 1), (4, 3, 0, 7), (0, 0, 1, 2)} = span{(1, 2, −1, 3),
(0, −5, 4, −5), (0, 0, 1, 2), (0, 0, 0, 0)}. As the first three vectors are linearly independent,
we have rkr (A) = 3. In general, the number of non-zero rows in the row echelon form of a
matrix gives the row rank of the matrix. We can use (1, 2, −1, 3), (0, −5, 4, −5), (0, 0, 1, 2)
as a basis for the row space of A.
§6 Rank of a matrix. Above, we saw in the example that rkr (A) = rkc (A) = 3. This is
a consequence of the following general Theorem: rkc (A) = rkr (A) =: rk(A), the rank of
the matrix A.
Proof: Let (e1 , . . . , ek ) be a basis of the row space: span{r1 , . . . , rm } = span{e1 , . . . , ek }.
We then have:
r1 a11 e1 + . . . + a1k ek
r2 a21 e1 + . . . + a2k ek
A= . =
..
.. .
rm am1 e1 + . . . + amk ek
a11 e11 + . . . + a1k ek1 a11 e12 + . . . + a1k ek2 . . . a11 e1n + . . . + a1k ekn
= .. .. ..
.
. . .
am1 e11 + . . . + amk ek1 am1 e12 + . . . + amk ek2 . . . am1 e1n + . . . + amk ekn
The column space is thus spanned by {(a11 , . . . , am1 )T , . . . , (a1k , . . . , amk )T }. It follows
that rkr (A) ≥ rkc (A). Interchanging rows and columns in this argument leads to rkc (A) ≥
rkr (A), and altogether, we have rkc (A) = rkr (A).
46
V.2 Row and column spaces of matrices
§7 Examples.
0 0 0 1 1 1
0 0 0 has rank 0 1 −1 1 has rank 2
0 0 0 0 0 0
1 1 1 1 2 0
2 2 2 has rank 1 0 1 0 has rank 3
3 3 3 0 0 1
The last expression is the column space of A and thus consistency of the SLE Ax = b
requires that b is in the column space of A.
47
V Linear transformations
N (T ) R(T )
• T •
0U 0V
U V
Figure 2: The kernel and the range of a linear map T : U → V , depicted as sets.
48
V.3 Range and kernel, rank and nullity
implies that both v 1 + v 2 and αv 1 are in R(T ). Therefore, R(T ) is a vector subspace of V .
The set of linear combinations of the column vectors of A is the column space.
§6 Example. Consider the linear transformation T given by the matrix A:
1 1 0 0
A = 0 1 −1 0 .
1 0 1 0
We immediately read off the range: span{(1, 0, 1)T , (1, 1, 0)T , (0, −1, 1)T } = span{(1, 0, 1)T ,
(1, 1, 0)T }. Also, the kernel is given by span{(0, 0, 0, 1)T , (−1, 1, 1, 0)T }. Studying several
examples, one observes that dim U = dim N (T ) + dim R(T ). We will develop this as a
theorem below.
§7 Definition. If T : U → V is a linear transformation, then the dimension of the range
of T is called the rank of T : rk(T ) := dim R(T ). The dimension of the kernel of T is called
the nullity of T .
§8 Remark. If a linear transformation T is given by an m × n matrix A as T (x) := Ax,
then the rank of T equals the dimension of the column space of A, which is the rank of the
matrix A.
§9 Exercise. Let T : R3 → R3 be given by
x x+y+z
T y = 2x − y − 2z .
z 4x + y
49
V Linear transformations
The row spaces of the following matrices are the same as the column space of A:
1 2 4 R2 →R2 −R1 1 2 4 1 2 4
R3 →R3 −R1
AT = 1 −1 1 0 −3 −3 0 1 1 .
1 −2 0 0 −4 −4 0 0 0
We see that (1, 2, 4)T , (0, 1, 1)T is a basis for the column space of A, which is the range of
T . The rank of T is therefore 2. The kernel of T is equal to the solutions to the homogeneous
SLEs corresponding to the following augmented matrices:
1 1 1 0 R2 →R2 −2R1 1 1 1 0
R3 →R3 −4R1
2 −1 −2 0 0 −3 −4 0 .
4 1 0 0 0 −3 −4 0
SLE corresponds to the kernel of T and so the nullity of T is 1. Note that we observed
again that for T : U → V , we have dim U = dim N (T ) + dim R(T ).
50
V.4 Orthogonal linear transformations
and we have AT A = 12 .
§4 Remark. Orthogonal matrices preserve the norm of vectors. It follows that A is in-
vertible, that A has maximal rank and that AT = A−1 .
§5 Theorem. A square matrix A of size n × n is orthogonal iff its columns form an
orthonormal basis of Rn with respect to the Euclidean inner product.
Proof: Let ci be the ith column of A. We consider the entry of AT A at row i and column j.
This entry is given by the inner product of the i-th row of AT , which is the i-th column of
A, with the j-th column of A: cTi .cj = hci , cj i. The matrix AT A = 1 has 1s on the diagonal
and 0s everywhere else and thus AT A = 1. The condition that the matrix is orthogonal is
therefore equivalent to the condition that hci , ci i = 1 and hci , cj i = 0 for i 6= j.
§6 Theorem. The determinant of an orthogonal matrix is ±1.
Proof: Let A be the orthogonal matrix. Using the definition of the determinant and
formulæ in I.3, §15, we have 1 = det(1) = det(AT A) = det(AT ) det(A) = det(A)2 . It
follows that det(A) = 1 or det(A) = −1.
§7 Lemma. Orthogonal transformations leave the Euclidean inner product on Rn invari-
ant: hAx, Ayi = hx, yi for all x, y ∈ Rn . This is sometimes used as an alternative definition
of orthogonal transformations.
Proof: We have hx, yi := xT y = xT AT Ay = (Ax)T Ay = hAx, Ayi.
§8 Corollary. Orthogonal transformations map orthogonal bases to orthogonal bases.
§9 Remarks. a ) The rotation matrix above has orthogonal columns, it has determinant 1
and it leaves inner products on R2 invariant, as it leaves angles between vectors invariant.
b ) Consider the matrix
1 1 1
√ √ √
3 2 6
A= √1 − √12 √1 . (V.1)
3 6
√1 0 − √26
3
This matrix has orthonormal columns, and is therefore orthogonal. One easily verifies
AT A = 1 and det(A) = 1.
51
VI Eigenvalues and eigenvectors
c ) Because every orthogonal n × n matrix A is invertible, its kernel is trivial: N (A) = {0}
and its rank is maximal: rk(A) = n.
52
VI.1 Characteristic polynomial
or p(λ) = −λ3 + 6λ2 + 15λ + 8. We guess λ = −1 as a zero of this polynomial and factorise
it as −(λ+1)(λ+1)(λ−8). That is, the eigenvalues of A are −1 and 8. We now wish to find
the corresponding eigenvectors v = (x, y, z)T . For the eigenvalue λ = −1, the eigenvalue
equation yields
3x + 2y + 4z = −x , 4x + 2y + 4z = 0 ,
2x + 2z = −y , or 2x + y + 2z = 0 ,
4x + 2y + 3z = −z , 4x + 2y + 4z = 0 .
3x + 2y + 4z = 8x , −5x + 2y + 4z = 0 ,
2x + 2z = 8y , or 2x − 8y + 2z = 0 ,
4x + 2y + 3z = 8z , 4x + 2y − 5z = 0 .
The solutions of this homogeneous SLE are x = (α, 12 α, α)T and an eigenvector is given by
(1, 21 , 1)T . Eigenvectors with eigenvalue 8 are therefore multiples of (1, 12 , 1)T .
§7 Remarks. a) By the fundamental theorem of algebra, the characteristic polynomial of
an n × n matrix A can be factorised into n linear factors:
53
VI Eigenvalues and eigenvectors
where λi are (not necessarily distinct) complex numbers. Thus, A has at most n (real)
eigenvalues.
b) A matrix may have no real eigenvalues, e.g.
!
0 1 −λ 1
A= , = λ2 + 1 = (λ + i)(λ − i) = p(λ)
−1 0 −1 −λ
Its characteristic polynomial is p(A) = det(A − λ1) = −(λ − 2)(λ − 2)(λ + 2) = −(λ −
2)2 (λ + 2). The eigenvalues u1 , u2 of A corresponding to the eigenvalues 2 and −2 are given
by
u1 = (−1, 0, 2) , u2 = (−3, 4, 2) . (VI.2)
Thus, although the algebraic multiplicity of the eigenvalue 2 is 2, its geometric multiplicity
is 1.
14
A strict proof of this statement is possible with similarity transforms introduced in the next section. F
Try to find the proof once you studied the material in the next section.
54
VI.1 Characteristic polynomial
§12 Lemma. For every root of the characteristic polynomial of a matrix A, there is at
least one eigenvector of A. That is, the geometric multiplicity is always larger or equal 1.
Proof: Consider a matrix A and a root λ of its characteristic polynomial p(A). Since
det(A − 1λ) = 0, this matrix is not invertible and has a nontrivial kernel, i.e. the kernel
contains a vector u 6= 0. This is an eigenvector u, as it solves (A − 1λ)u = 0 or Au = λu.
and we can start this procedure from the top. After n steps, we find n eigenvectors for the
various eigenvalues of A.
§16 Example. Consider the symmetric matrix
1 −1 1
A = −1 2 −1 . (VI.4)
1 −1 1
√ √
This matrix has eigenvalues (0, 2− 2, 2+ 2) with corresponding eigenvectors (−1, 0, 1)T ,
√ √
(1, 2, 1)T , (1, − 2, 1)T .
§17 Definition. A real symmetric matrix with exclusively positive eigenvalues is called a
positive definite matrix.
15 ∗
λ is the complex conjugate of λ, and x∗ is the vector obtained by complex conjugating the components
of x.
16
R
That is, to compute the inner product of two vectors in V , take their inner product in n .
55
VI Eigenvalues and eigenvectors
hx, yi := xT Ay , x, y ∈ Rn (VI.5)
λ2 xT y = xT Ay = y T Ax = λ1 y T x = λ1 xT y . (VI.6)
vn
If E is an orthonormal basis, then the coefficients vi are given by the inner products
vi = hei , vi. The vector (v1 , . . . , vn )T is called the coordinate vector of the vector v with
respect to the basis E.
§2 Change of basis. Above, we saw that certain bases, e.g. orthonormal ones, are more
convenient than others. Note that the vectors in a new basis (b1 , . . . , bn ) can be written as
linear combinations of the old ones:
56
VI.2 Diagonalising matrices
The matrix P is called the transformation matrix of the change of basis. Let us now look
at the coordinate vector (v1 , . . . , vn )T of v:
v1 w1 w1
.. .. .
v = (e1 . . . en ) . = (b1 . . . bn ) . = (e1 . . . en )P .. .
vn wn wn
xn vn vn wn
We see that if A describes a linear transformation with respect to the basis (e1 , . . . , en ),
then P −1 AP describes this linear transformation with respect to the basis (b1 , . . . , bn ) =
(e1 , . . . , en )P . The map
A 7→ P −1 AP =: Ã
is called a similarity transformation. We say that A and à are similar matrices.
§4 Lemma. Similar matrices have the same determinant.
Proof: Let A and à be similar n × n matrices and P a transformation matrix on Rn . We
have det(A) = det(P −1 ÃP ) = det(P −1 ) det(Ã) det(P ) = det(Ã).
§5 Lemma. Similar matrices have the same rank, the same trace, the same eigenvalues
(but not necessarily the same eigenvectors) and the same characteristic polynomial.
Proof: Exercise.
§6 Remark. If we study a linear transformation T : Rn → Rn , a particularly useful basis
is a basis consisting of eigenvectors of the matrix A corresponding to T .
§7 Theorem. (Diagonalising matrices) Let A be an n × n matrix with n linearly inde-
pendent eigenvectors {u1 , . . . , un } corresponding to the eigenvalues λ1 , . . . , λn . Then we
have
λ1 0 . . . 0
0 λ2 . . . 0
−1
P AP = D , where D = .
.. and P = u1 . . . un ) .
..
. . . .
0 0 . . . λn
57
VI Eigenvalues and eigenvectors
Because the columns of P are linearly independent, rk(P ) = n and therefore P is invertible.
§8 Corollary. Real symmetric matrices are diagonalisable, cf. VI.1, Theorem §15.
§9 Remark. Examples for matrices which are not diagonalisable:
! !
0 1 0 1
A= , B= .
−1 0 0 0
The matrix A has complex eigenvalues and therefore cannot be diagonalised as a real
matrix. The matrix B has eigenvalue 0 with algebraic multiplicity 2 (i.e. the characteristic
polynomial has two roots) and geometric multiplicity 1 (i.e. there is only one vector mapped
to the zero vector).
§10 Example. Consider the linear transformation T : R2 → R2 , T (x) = Ax with
!
2 −1
A= .
−1 2
We compute the eigenvalues to be λ = 3 and λ = 1 and the eigenvectors are (1, −1)T and
(1, 1)T . We gain some intuition about the properties of T , if we look at the image of all
vectors in R2 of length 1: x2
x1
Vectors with a tip on the circle are mapped to vectors with tip on the ellipse, where the
eigenvectors are the main axes. Diagonalising A means to go to a coordinate system, where
the main axes correspond to the coordinate axes.
F What is the corresponding picture in higher dimensions, e.g. for R3 ?
58
VI.3 Applications
Figure 3: The three types of conic sections we study here: circle, ellipse and hyperbola.
VI.3 Applications
§1 Conic Sections. Circle and ellipse are special cases of conic sections. These are curves
obtained by intersecting a double cone by a plane, see figure 3. Leaving out the parabolas
y = ax2 , we have the following equations for the various conic sections:
x2 y2
Circle: + =1,
a2 a2
x2 y2
Ellipse: + 2 =1,
a2 b
x2 y2
Hyperbola, main axis: x-axis − 2 =1,
a2 b
y2 x2
Hyperbola, main axis: y-axis − 2 =1.
a2 b
59
VI Eigenvalues and eigenvectors
Ak = (P DP −1 )k = P DP −1 P DP −1 . . . P DP −1 = P Dk P −1 .
Here, Dk is simply the diagonal matrix with entries dk1 , . . . , dkn . Note that this also works
for rational k.
§4 Exercise. Let !
2 2
A= .
1 3
Find A20 . Find a matrix B such that B 2 = A.
Solution: Using the characteristic polynomial, we find that A has eigenvalues λ = 1 and λ =
4. The corresponding eigenvectors are (2, −1)T and (1, 1)T . We have the diagonalisation
! ! ! !
2 2 −1 2 1 1 0 1 1 −1
A= = P DP = 3 .
1 3 −1 1 0 4 1 2
We thus have
! ! !
2 1 1 0 1 −1
A20 = P D20 P −1 = 1
= ...
3 −1 1 0 420 1 2
!
1 2 + 420 −2 + 2 × 420
= 3 −1 + 420 1 + 2 × 420
60
VI.3 Applications
Note that the diagonal matrix D1 with entries 1, 2 satisfies D12 = D. Thus, B = P D1 P −1
satisfies B 2 = P D1 P −1 P D1 P −1 = P D12 P −1 = P DP −1 = A. We have
! ! ! !
2 1 1 0 1 −1 4 2
B = 13 = . . . = 31 .
−1 1 0 2 1 2 1 5
sin(1) = α0 + α1
and therefore α1 = sin(2) − sin(1) , α0 = 2 sin(1) − sin(2) .
sin(2) = α0 + 2α1
Altogether, we computed
sin(A) = R(A) = α0 1 + α1 A .
17
One can e.g. show that there is a diagonalisable matrix “arbitrarily close” to any non-diagonalisable
matrix. Then one argues that the theorem holds by continuity of the characteristic polynomial.
18
If they are not, one can consider the derivatives of these equations with respect to the λi to get a full
set of equations.
61
VI Eigenvalues and eigenvectors
κ κ κ
m m
Let x1 (t) and x2 (t) describe the positions of the two masses at time t. Physics tells us, that
this system is governed by the following equations of motion:
d2 x1 (t)
mẍ1 (t) := m = −κx1 (t) + κ(x2 (t) − x1 (t)) ,
dt2
d2 x2 (t)
mẍ2 (t) := m = −κx2 (t) + κ(x1 (t) − x2 (t)) ,
dt2
or
! ! ! ! ! !
ẍ1 (t) 1 2κ −κ x1 (t) ẍ1 (t) x1 (t) 0
+ = +A = .
ẍ2 (t) m −κ 2κ x2 (t) ẍ2 (t) x2 (t) 0
We recall/look up/happen to know that the solution to ẍ(t) + ax(t) = 0 for a > 0 is
√ √
x(t) = φ cos( at) + ψ sin( at) with φ = x(0) and ψ = ẋ(0)
√ :
a
√ √
ẍ(t) = φ(−a cos( at)) + ψ(−a sin( at)) .
62
VI.3 Applications
decompose the functions sin(x) and cos(x) into Q(x)p(x) + R(x) with R(x) = α0 + α1 x and
find r r r ! r
κ sin sin κ 3κ sin sin 3κ
sin t = α0 + α1 t and sin t = α0 + α1 t
m m m m
or
√ pκ q
3κ
q
3κ
pκ
3 sin m t − sin mt sin mt− sin mt
α0sin = √ and α1sin = √ pκ .
3−1 ( 3 − 1) m t
For the cosine, we obtain the same, just replace sin everywhere by cos. Let us put the result
together.
The vector (φ1 , φ2 )T clearly specifies the initial position of the two oscillators. For
simplicity, let us put them to zero: (φ1 , φ2 )T = (0, 0)T . The vector (ψ1 , ψ2 )T specifies the
initial direction of the two oscillators. To see this, assume that (ψ1 , ψ2 )T = (1, 1)T . Since
(1, 1)T is an eigenvector of A, it is an eigenvector of B and of Bt and thus multiplication by
sin(Bt) just changes the prefactor. The oscillators keep moving in parallel. On the other
hand, starting from the eigenvector (ψ1 , ψ2 )T = (1, −1)T , the oscillators always move in
opposite directions. Starting from a general configuration leads to a linear combination of
these two normal modes of the coupled oscillators, which is given by
! ! !
x1 (t) φ1 ψ1
= (α0 1 + α1 Bt)
cos cos
+ α0 1 + α1 Bt
sin sin
.
x2 (t) φ2 ψ2
§10 Damped harmonic oscillator. Another, very similar example is the damped har-
monic oscillator. We will describe the position of the oscillator by the coordinate function
x(t), its mass by m, the spring constant by κ and the dampening constant by γ. The
equation of motion reads as
By introducing a function p(t) = mẋ(t) (the momentum of the oscillator), we can rewrite
this as the following matrix equation:
! ! ! !
1
x(t) 0 m x(t) ẋ(t)
A = = .
p(t) −κ −2γ p(t) ṗ(t)
We will use the ansatz x(t) = Re(x0 exp(λ± t)) and p(t) = Re(p0 exp(λ± t)), where x0 , p0
are real, but we allow λ± ∈ C. Because ẋ(t) = λ± x(t) and ṗ = λ± p, the λ± are the two
eigenvalues of A. We obtain:
p p
−mγ + −κm + m2 γ 2 −mγ − −κm + m2 γ 2
λ+ = and λ− = .
m m
If γ is large, i.e., if the damping is strong, then m2 γ 2 > Dm, and the eigenvalues are λ±
are real and negative. The oscillator just dies off without oscillating. If the damping is
63
VI Eigenvalues and eigenvectors
smaller, then the eigenvalues are complex. Recall the formula exp(iα) = cos(α) + i sin(α).
Splitting λ± into real and imaginary parts: λ± = λR I
± + iλ± , we have
Here, x0 is the initial elongation of the oscillator, the exponential describes the dampening
and the cosine is responsible for the oscillations.
1.5 x0 1.5 x0
1.0 1.0
0.5 0.5
t t
0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0
- 0.5 - 0.5
- 1.0 - 1.0
Figure 4: Left: the case of pure damping with λ+ = −3 ∈ R. Right: The case of complex
λ+ , where we chose λR I
+ = −3 and λ+ = 20.
§11 Linear recursive sequences. Consider the linear recursive sequence an+1 = an +
an−1 . With the initial values a0 = 0 and a1 = 1, the ai are known as the Fibonacci numbers:
0, 1, 1, 2, 3, 5, 8, 13, . . . We can encode this recurrence relation into the matrix equation
! ! ! !
an+1 an 1 1 an
=A = .
an an−1 1 0 an−1
To compute an+1 from given a1 and a0 , we just have to apply An to the initial vector.
Using the diagonal form of A, we know how to do this efficiently: If A = P DP −1 , then
√ √
An = P −1 Dn P . The eigenvalues of the matrix A are λ1 = 12 (1 + 5) and λ2 = 12 (1 − 5)
√ √
with the corresponding eigenvectors x1 = ( 12 (1 + 5), 1)T and x2 = ( 21 (1 − 5), 1)T . One
can show that in terms of eigenvectors, our initial conditions read as
!
1 1 √ 1 √
= (5 + 5)x1 + (5 − 5)x2 .
1 10 10
Note that −1 < λ2 = −0.618 . . . < 0, and therefore λn2 tends to zero as n goes to infinity.
Thus, for large n, the contribution of x2 becomes negligible. We then make a curious
64
observation: The ratio of an+1 /an approaches the ratio of the components of x1 , which is
√
the golden ratio ϕ := 12 (1+ 5) = 1.61803 . . . Recall that a line is divided into two segments
of length a and b according to the golden ratio, if a+b a a
a = b . It follows b = ϕ:
a b
a+b
§12 Population growth model. Originally, the Fibonacci sequence was used to model
the growth of a rabbit population. Let us give a similar example: Consider a population
of animals of a0 adults and j0 juveniles. From one year to the next, the adults will each
produces γ juveniles. Adults will survive from one year to the next with probability α,
while juveniles will survive into the next year and become adults with the probability β.
Altogether, we have
! ! ! !
an+1 = αan + βjn an+1 an α β an
or =A = .
jn+1 = γan jn+1 jn γ 0 jn
Assume that α = 12 , β = 10 3
. What is the minimum value of γ to have a stable population?
1
√ √ 1
√ √
The eigenvalues of A are λ1 = 20 (5 − 5 5 + 24γ) and λ2 = 20 (5 + 5 5 + 24γ). To
have a stable population, we need an eigenvector of A with eigenvalue 1. As λ2 > λ1 we
demand that λ2 = 1, and find γ = 35 . The corresponding eigenvector reads as ( 35 , 1)T . Thus,
starting e.g. from a population of 3000 adults and 5000 juveniles, this population would be
(statistically) stable for γ = 53 .
65
VII Advanced topics
can use this number to generate a new kind of complex number, consisting of a real part
and a multiple of i, the imaginary part: z = x + iy, x, y ∈ R. We use the symbol C for
such numbers. One can now consistently define the sum and the product of two complex
numbers and the computational rules correspond to those of real numbers: Consider two
complex numbers z1 = x1 + iy1 and z2 = x2 + iy2 . We define:
These operations satisfy the usual associative, commutative and distributive laws of real
numbers. Using Maclaurin series, we can also define other functions of complex numbers,
for example:
z2 z3 z4
exp(z) = 1 + z + + + + ··· . (VII.2)
2! 3! 4!
We define one special map, which is called complex conjugation and denote it by an asterisk.
This map inverts the sign of the imaginary part: the complex conjugate of z = x + iy is
z ∗ = x − iy.
We can regard complex numbers as elements of the 2-dimensional vector space R2 with
basis (1, i) and coordinates (x, y)T . F Other vector spaces which can be treated as if their
elements were numbers are R4 yielding the quaternions (one looses commutativity) and R8
yielding the octonions (one looses associativity). Have a look at their multiplication rules.
As points onp the plane R2 , we can also describe a complex number z = x+iy by a radial
coordinate r = x2 + y 2 and an angle tan(θ) = xy . We then have z = r cos θ + ir sin θ =
r(cos θ + i sin θ). Consider the following Maclaurin series:
θ2 θ4 θ6 θ8
cos θ = 1 − + − + − ··· ,
2! 4! 6! 8!
θ3 θ5 θ7 θ9
sin θ = θ − + − + − ··· , (VII.3)
3! 5! 7! 9!
θ2 θ3 θ4 θ5
eθ = 1 + θ + + + + .
2! 3! 4! 5!
From this, we can glean Euler’s formula: cos θ + i sin θ = eiθ . Pictorially, Euler’s formula
states that eiθ is the point on the unit circle at angle θ. Note that our above expression for
z reduces to z = reiθ . This is what is called the polar form of a complex number. Under
complex conjugation, the sign of the angle is inverted: z ∗ = re−iθ .
F Euler’s formula also gives rise to one of the most beautiful equations in mathematics,
in which all central symbols appear: eiπ + 1 = 0.
§2 Complex vector spaces. A complex vector space is a vector space over the complex
numbers. It satisfies the axioms of III.1, §2, where all appearances of R have to be replaced
by C. The most important example of a complex vector space is Cn = {(c1 , . . . , cn )T |ci ∈
66
VII.1 Complex linear algebra
C}. Other examples are the space of complex polynomials or smooth complex valued
functions C ∞ (I, C) on an interval I.
Similarly, one has to extend the definition of linear combinations: A (complex) linear
combination of a set of vectors {v 1 , . . . , v n } is an expression of the form c1 v 1 + · · · + cn v n
with c1 , . . . , cn ∈ C. This leads to the notion of a complex span and complex linear
(in)dependence. It is easy to verify that all our theorems concerning bases and spans in
real vector spaces also hold in the complex setting.
The only new operation is the complex conjugation of a vector. This operation can be
defined as a linear map ∗ : V → V such that the map satisfies (v ∗ )∗ = v and is antilinear,
i.e. (v + w)∗ = v ∗ + w∗ and (λv)∗ = λ∗ v ∗ . In the case of Cn , it is most convenient to define
v ∗ = (c∗1 , . . . , c∗n ) for a vector v = (c1 , . . . , cn ) ∈ Cn . Note, however, that there are other
possibilities. On C2 , e.g., one could also define
!∗ ! !
c1 0 1 c∗1
:= . (VII.4)
c2 1 0 c∗2
Let us now go through our results on real vector spaces and generalise them step by
step to the complex setting.
§3 Complex vector spaces as real vectors spaces. Consider a complex vector space V
with dimension n and basis (b1 , . . . bn ). Any vector v ∈ V is a complex linear combination
of the form
v = c1 b1 + · · · + cn bn = (d1 + ie1 )b1 + · · · + (dn + ien )bn , (VII.5)
where ci = di + iei , ci ∈ C, di , ei ∈ R. We can therefore write the vector v as a real linear
combination of the vectors (b1 , ib1 , b2 , ib2 , . . . , bn , ibn ):
The vectors (b1 , ib1 , b2 , ib2 , . . . , bn , ibn ) thus form a basis of V , when regarded as a real
vector space. Altogether, we observe that a complex vector space V of dimension n can be
regarded as a real vector space of dimension 2n.
§4 Complex inner product spaces. Inner products for real vectors were defined as
symmetric positive definite bilinear forms. Positive definiteness allowed us to introduce the
p
notion of the norm of a vector, as it guaranteed that the expression ||v|| = hv, vi is well-
defined. We therefore need to preserve the positive definiteness by all means. Unfortunately,
the standard definition of an inner product on Rn as hx, yi = xT y does not have this
property on Cn . For example, on C2 , we have
!T !
x1 x1
= x21 + x22 , (VII.7)
x2 x2
which is negative, e.g. for x1 = x2 = i. Instead, one has to complex conjugate one of the
vectors in an inner product. This leads to the notion of a positive definite Hermitian or
sesquilinear form. These are maps h·, ·i : V × V → C such that
67
VII Advanced topics
∗
(i) hu, wi = hw, ui ,
(ii) hu + v, wi = hu, wi + hv, wi,
(iii) hu, λvi = λu, vi and
(iv) hv, vi ≥ 0 and hv, vi = 0 if and only if v = 0.
It immediately follows that hu, v + wi = hv, wi + hu, wi and hλu, vi = λ∗ hu, vi.
On Cn , we thus define the analogue to the Euclidean inner product
T
u∗1
w1
hu, vi := (u∗ )T v = ... ... . (VII.8)
u∗n wn
One readily checks that this defines a positive definite hermitian form. As the operation
of simultaneous complex conjugation and transposition is so important in complex linear
algebra (it essentially fully replaces transposition), it is denoted by a ‘dagger’: u† = (u∗ )T
and called the adjoint, Hermitian conjugate or simply dagger. We thus write hu, vi := u† v.
p
We define again the norm of a vector as ||v|| := hv, vi and we call two vectors u, v
orthogonal, if hu, vi = 0. We can still apply the Gram-Schmidt procedure to find an
orthogonal basis.
§5 Complex matrices. Complex matrices are simply rectangular tables of numbers with
complex entries. Analogous to real matrices, it is trivial to check that an m × n complex
matrix is a linear map from Cn to Cm . Essentially all definitions related to real matrices
generalise to complex matrices: We define sums, products, inverse, rank, determinant, range
and null space completely analogously to the real case.
§6 Hermitian matrices. In the definition of matrix operations, only transposition gets
modified. As we saw above when discussing complex inner product spaces, one should
replace transposition by Hermitian conjugation. This leads to the following definition:
Analogously to real symmetric matrices satisfying AT = A, we define hermitian matrices
as those which satisfy A† = A.
Recall that real symmetric matrices have real eigenvalues (theorem VI.1, §13). This
theorem also applies to Hermitian matrices, as the proof remains valid once one replaces
transposition with Hermitian conjugation. (As real symmetric matrices are a special case
of Hermitian matrices, we can regard theorem VI.1, §13 as a corollary to that for Hermitian
matrices.)
§7 Unitary matrices. What about the orthogonal matrices that we introduced in the
real case? Their characteristic property was that they preserved the Euclidean norm of a
vector in Rn : hv, vi = hAv, Avi. This lead to the relation AT = A−1 . Here, we again have
to replace transposition by hermitian conjugation:
!
hv, vi = v † v = hAv, Avi = (Av)† Av = v † A† Av , (VII.9)
68
VII.2 A little group theory
and we conclude that A† = A−1 . Matrices with this property are called unitary. Just as
orthogonal matrices, unitary matrices always have orthonormal columns. An example of a
unitary matrix is
1 i
!
√ √
A= 2 2 . (VII.10)
√i √1
2 2
Just as in the case of orthogonal matrices, the determinant of a unitary matrix is ±1.
§8 Diagonalisation. The notion of diagonalisation is fully analogous to the real case.
Given an n × n-matrix which has n eigenvalues λi with corresponding eigenvectors ui that
form a basis of Cn , we have
λ1 0 . . . 0
0 λ2 . . . 0
−1
P AP = D , where D= .. .. . and P = u1 . . . un ) . (VII.11)
. . ..
0 0 . . . λn
An important difference is that we can apply diagonalisation now also to matrices that
cannot be diagonalised using real matrices. Even when calculating exclusively with real
numbers, it still can be helpful to temporarily introduce complex matrices and then to use
tricks applicable only to diagonalised matrices. For example, compute the 500th power of
the following matrix:
!
1 2
A= . (VII.12)
−2 1
This matrix does not have any real eigenvalues and therefore cannot be diagonalised using
real matrices. Using complex matrices, however, we compute
! ! !−1
−i i 1 + 2i 0 −i i
A= (VII.13)
1 1 0 1 − 2i 1 1
and therefore
! !500 !†
500 1 −i i 1 + 2i 0 −i i
A = . (VII.14)
2 1 1 0 1 − 2i 1 1
(Here, we used the trick that the inverse of a unitary matrix, i.e. a matrix with orthonormal
columns, is its Hermitian conjugate.)
69
VII Advanced topics
§1 Groups. A group is a set G endowed with a map ◦ : G × G → G such that the following
axioms hold:
If the map ◦ satisfies x ◦ y = y ◦ x for all x, y ∈ G, then the group G is called Abelian.
§2 Lemma. The above definition is as minimalist as possible. In fact, right-neutral ele-
ments are automatically also left-neutral and unique, right-inverse elements are also left-
inverse elements and unique. We can thus speak of the neutral element and the inverse
element to a group element.
Proof: The proofs of the above statements are straightforward once one has the right
starting points. Let G be a group. Let x ◦ x−1 = e, x, x−1 , e ∈ G. We first show that this
implies x−1 ◦ x = e. For this, put y = x−1 ◦ x. We compute:
(i) (iii)
y ◦ y = (x−1 ◦ x) ◦ (x−1 ◦ x) = x−1 ◦ (x ◦ x−1 ) ◦ x = x−1 ◦ x = y . (VII.15)
(i) (iii)
y = y ◦ e = y ◦ (y ◦ y −1 ) = (y ◦ y) ◦ y −1 = y ◦ y −1 = e . (VII.16)
Let us now show that the neutral elements are unique: Assume that there are two such
neutral elements e and e0 . Then
e = e ◦ e0 and e0 = e ◦ e0 ⇒ e = e0 . (VII.18)
z = e ◦ z = (y ◦ x) ◦ z = y ◦ (x ◦ z) = y ◦ e = y . (VII.19)
70
VII.2 A little group theory
§5 Examples. a ) The real numbers R together with addition. The neutral element is
e = 0 and the inverse to an integer x ∈ R is −x ∈ R.
b ) The integers form a subgroup of the above group.
c ) Strictly positive reals R>0 together with multiplication. The neutral element is e = 1
and the inverse to a positive real r ∈ R>0 is 1r ∈ R>0 .
d ) A vector space with vector addition. The neutral element is e = 0 and the group inverse
of a vector is the corresponding inverse vector.
e ) The set of invertible n × n matrices with matrix multiplication. The neutral element is
e = 1 and the group inverse of an invertible matrix A is its inverse A−1 .
f ) The set K2 = {0, 1} with addition defined as 0 + 0 = 0, 0 + 1 = 1 + 0 = 1 and 1 + 1 = 0.
The neutral element is 0, the inverse of 0 is 0 and the inverse of 1 is 1.
§6 Permutation group. The last of the above examples is a group with finitely many
elements. Another example of such a group is the permutation group of an ordered set
with finitely many elements S = (s1 , . . . sn ). The elements of the permutation group are
reorderings of these elements. For n = 3, we have the 3! = 6 group elements that map
S = (s1 , s2 , s3 ) to
g1 g2 g3
S −→ (s1 , s2 , s3 ) , S −→ (s1 , s3 , s2 ) , S −→ (s2 , s3 , s1 ) ,
g4 g5 g6 (VII.20)
S −→ (s2 , s1 , s3 ) , S −→ (s3 , s1 , s2 ) , S −→ (s3 , s2 , s1 ) .
Here, the group structure becomes obvious. Because groups with finitely many elements
can always be represented in terms of linear transformations, group theory is closely linked
to and uses many results of linear algebra.
Counting how often a permutation interchanges pairs of elements, we can associate a
sign σ to a permutation. For example:
12 23 12
(s3 , s2 , s1 ) ←→ (s2 , s3 , s1 ) ←→ (s2 , s1 , s3 ) ←→ (s1 , s2 , s3 ) (VII.22)
and σ(g6 ) = (−1)3 = −1. This sign is given by the ε-symbol used in the definition of the
determinant. For example σ(g6 ) = ε321 = −1, σ(g3 ) = ε231 = 1 etc.
71
VII Advanced topics
The permutation group of three elements is the symmetry group of an equilateral tri-
angle with center at the origin: We have three rotations and three rotations with reflection
at one axis.
§7 Symmetries of Crystals. A crystal is an arrangement of atoms or molecules on points
of a lattice. A lattice is a set of points that is described by all integer multiples of two and
three vectors in two and three dimensions, respectively. Let us restrict for simplicity to two
dimensions. Examples for lattices are
( ! ! )
1 0
square lattice: Λ1 = a +b a, b ∈ Z ,
0 1
( ! ! )
2 0
rectangular lattice: Λ2 = a +b a, b ∈ Z , (VII.23)
0 1
( ! ! )
1 cos π3
triangular lattice: Λ3 = a +b a, b ∈ Z .
0 sin π3
Consider now a crystal with particles sitting in regular patterns on a lattice. Its rotational
symmetries can be described by certain 2 × 2-matrices. Which rotational symmetries can a
crystal in two dimensions have? The answer gives the crystallographic restriction theorem,
which states that the only possible rotational symmetries of a two-dimensional crystal are
rotations by π, 2π π π
3 , 2 and 3 . The proof is rather easy using our knowledge of linear algebra:
Proof: Consider a 2 × 2-matrix A describing the rotational symmetries of the crystal in
the lattice basis. Its entries have to be integers, as otherwise, lattice points are mapped
to non-lattice points. It follows that the trace of A (the sum of its diagonal elements) has
to be an integer, too. If we now change the basis from the lattice basis to the standard
Euclidean basis, we have the rotation matrix B = P −1 AP , where P is the transformation
matrix of the coordinate change. The trace is invariant under coordinate changes: tr(B) =
tr(P −1 AP ) = tr(P P −1 A) = tr(A), and therefore also the trace of B has to be an integer. A
matrix B that describes a rotation in two dimensions has trace 2 cos θ, and this is an integer
only for θ ∈ {0, ± π3 , ± π2 , ± 2π
3 , π}. This restricts the symmetries to the ones we want. One
can now easily construct examples with these symmetries to demonstrate their existence.
§8 The orthogonal and unitary groups. Let us now come to groups with infinitely
many elements. We introduced groups that preserve the Euclidean norm in both real
and complex vector spaces. The resulting sets of matrices form groups which are called
orthogonal and unitary groups, respectively. On Rn and Cn , we write O(n) and U(n) for
these, respectively. Recall that the determinants of orthogonal and unitary matrices can be
±1. The matrices with determinant +1 describe rotations, while those with determinant
−1 describe rotations together with a flip of one of the coordinate axes. This flip reverses
the orientation of vectors. Consider e.g. the matrix
!
1 0
P = . (VII.24)
0 −1
72
VII.2 A little group theory
This matrix maps the basis B = (e1 , e2 ) to a basis B 0 = BP = (e1 , −e2 ). The latter basis
cannot be turned into B by rotation and therefore has a different orientation than B.
The group of orthogonal and unitary matrices that preserve orientation and therefore
have determinant +1 form the subgroups of special orthogonal and special unitary trans-
formations, SO(n) and SU(n), respectively.
§9 Generators of groups. Let us briefly look at the group SO(2) in more detail. Its
elements are the rotation matrices in R2 , i.e.
!
cos θ − sin θ
P (θ) = . (VII.25)
sin θ cos θ
Although the group SO(2) has infinitely many elements – one for each θ ∈ [0, 2π) – these
elements depend on only one parameter θ. How can we make this statement more precise?
Let us consider the derivative of P at θ = 0:
! !
d − sin θ − cos θ 0 −1
τ := P (θ) = = . (VII.26)
dθ cos θ − sin θ 1 0
θ=0 θ=0
Consider now the following expression, which we can evaluate by our methods for computing
analytic functions of matrices from VI.3, §7:
The eigenvalues of τ are i and −i and the corresponding eigenvectors are (i, 1)T and (−i, 1)T .
We write
A(τ ) = R(τ ) = α0 12 + α1 τ . (VII.28)
Comparing with the eigenvalues, we have
We conclude that
Thus,
A(τ ) = cos θ12 + sin θτ = P (θ) . (VII.31)
We say that τ is a generator for the group SO(2).
§10 The generators of SU(2). Let us now describe the generators of SU(2). A group
element has to consist of orthonormal complex vectors and has to have determinant +1.
Explicitly, we have
( ! )
α −β ∗
SU(2) = α, β ∈ C , αα∗ + ββ ∗ = 1 . (VII.32)
β α∗
73
VII Advanced topics
Note that SU(2) is parameterised by a vector (α, β)T ∈ C2 , which we can rewrite as a
vector x = (x1 , x2 , x3 , x4 )T ∈ R4 . The condition αα∗ + ββ ∗ = 1 translates to ||x|| = 1, and
this condition describes the three-sphere S 3 in R4 . We thus expect that SU(2) is described
by three parameters, just like S 3 is (for example, by three angles).
Recall the formula log det(A) = tr log(A) for real matrices. Because of problems with
complex valued logarithms, we rewrite this formula as det(A) = exp(tr(log(A))). We want
to write a matrix g in SU(2) as g = exp(τ ). Because det(g) = 1, the determinant formula
implies exp(trτ ) = 1 and thus tr(τ ) = 0. Moreover, we want that gg † = 1. To analyse
this, let us first make two observations: exp(τ )† = exp(τ † ) and exp(τ ) exp(−τ ) = 1, which
are both clear from the Maclaurin series. (F Complete the proofs of these statements.)
Altogether, τ has to be traceless and satisfy τ † = −τ . A basis for the vector space of such
matrices is given by
! ! ! !
i 0 0 −1 0 i
(τ1 , τ2 , τ3 ) := , , . (VII.33)
0 −i 1 0 i 0
Note that any element of SU(2) can be written in this way. Furthermore, it is
d d d
g = τ1 , g = τ2 , g = τ3 . (VII.35)
da1 a1 =a2 =a3 =0 da2 a1 =a2 =a3 =0 da3 a1 =a2 =a3 =0
Altogether, we saw that the matrices τ1 , τ2 , τ3 are the generators of SU(2) and we write
su(2) := spanR {τ1 , τ2 , τ3 }. Any element τ ∈ su(2) yields an element g = exp(τ ) ∈ SU(2).
The matrices τi are up to factors of i the so-called Pauli matrices, that play a major role
in describing particles carrying spin, as e.g. electrons, in quantum mechanics.
An interesting relation is the following:
3
X
[τi , τj ] := τi τj − τj τi = 2 εijk τk . (VII.36)
k=1
Vector spaces of matrices equipped with this antisymmetric bracket [A, B] := AB − BA,
which is also called the commutator, are special cases of Lie algebras.
74
VII.3 Special relativity
Special relativity studies the effects of relative motion on physics. Its main object of study
is therefore the group of Lorentz transformations, which is given by coordinate changes
between inertial systems that move relatively to each other.
§2 Galilei transformations. We saw in section VI.2 that we can use coordinate changes
to simplify the study of linear transformations in a vector space. In mechanics, we can
do the same by changing inertial systems. For Newton’s law to hold, we can apply the
following changes:
And finally, there is an inverse transformation given by T −1 = (−t0 , −A−1 x0 , A−1 , −A−1 v):
T T −1
x(t) 7−→ Ax(t + t0 ) + x0 + v · (t + t0 ) 7−→ A−1 (Ax(t) + x0 + vt) − A−1 x0 − A−1 vt = x(t) .
75
VII Advanced topics
˜ x̂i
hx̂, ˜ = x̂ ˜ = (Λx̂)T η (Λx̂) = x̂T ΛT ηΛx̂ =! x̂T ηx̂ = hx̂, x̂i ,
˜ T η x̂ (VII.41)
76
VII.3 Special relativity
77
VII Advanced topics
§9 Classical limit. First, let us try to verify that for low velocities v c, Lorentz transfor-
mations go over into Galilei transformations. For this, consider the Lorentz transformation
from an inertial system at rest Σ to one in the fastest plane22 with v = 3000 × 3.6 m/s.
Note that this is still slow compared to the speed of light: 299792458 m/s. We have:
v2 t − x1 v/c2
x1 − vt v
x̃1 = p ≈ x1 − vt + O , t̃ = p ≈t+O 2 . (VII.46)
1 − v 2 /c2 c2 1 − v 2 /c2 c
In general, for everyday velocities v, Lorentz transformations look like Galilei transforma-
tions.
§10 Approaching the speed of light. One could think that one can achieve speeds
faster than light in the following way: First, transform from one system Σ at rest to
another system Σ̃ at a speed of 0.8c relative to Σ along the x1 -direction. Then transform
˜ which moves at 0.8c relative to Σ̃. We have:
to a third system Σ̃
Λ(14) Λ(14) ˜
Σ −→ Σ̃ −→ Σ̃ , (VII.47)
where
√ 1
0 0 √ −v/c2 5
1−v 2 /c2 1−v /c2 3 0 0 − 43
(14)
0 1 0 0 0 1 0 0
Λ = = . (VII.48)
0 0 1 0 0 0 1 0
√ −v/c2 − 43 0 35
1
0 0 √ 0
1−v /c2 1−v 2 /c2
78
VII.3 Special relativity
§11 Simultaneous events. We know that space-time intervals remain unchanged. What
about lengths and durations? First, note that two events at x1 and x2 happening at time
t1 = t2 = t in the inertial system Σ, do not happen at the same time in Σ̃:
t − x1 v/c2 t − x2 v/c2
t̃1 = p while t̃2 = p . (VII.50)
1 − v 2 /c2 1 − v 2 /c2
x1
past of 0
A A
× ×
x̃1
x1 x̃1
x1
Depicted in the above two diagrams are the space- and time-axis of Σ̃ in Σ on the left, as
well as these two axes of Σ in Σ̃ on the right. We see that each observer assigns different
times and locations to an event A. The x1 -axes label simultaneous events in each coordinate
system.
§13 Time dilation. Consider a clock at rest at x = 0 in Σ and flashing in regular intervals
of 1 s. That is, we have a flash at t0 = 0 s, t1 = 1 s, t2 = 2 s, etc. In a system Σ̃ moving
with velocity v, these flashes appear to be emitted at the times
t0 t1 t2
t̃0 = p = 0s , t̃1 = p , t̃2 = p , etc. (VII.51)
1 − v 2 /c2 1 − v 2 /c2 1 − v 2 /c2
p
Because 0 < 1 − v 2 /c2 < 1, the clock appears therefore to run more slowly to an observer
in Σ̃. If the clock was in Σ̃ and was flashing at the intervals t̃0 = 0 s, t̃1 = 1 s, t̃2 = 2 s,
79
VII Advanced topics
The situation is therefore perfectly symmetrical. Clocks that move relative to an observer
appear to run more slowly.
§14 Example for Paradoxes: Twin paradox. Special relativity is frequently attacked
by non-physicists, as it contradicts common sense. Usually this is done by constructing
apparent paradoxes. In particular, the fact that time dilation appears symmetrically is
often overlooked and the source for the famous twin paradox: An astronaut leaves earth
on a spaceship flying at a speed close to the speed of light and then reverses and returns
to earth. As the clocks in his spaceship run slower to an observer on earth, he will have
aged less than his twin brother, that was left behind on earth. On the other hand, the
clocks on earth seem to run slower for the astronaut in the spaceship, so his twin brother
on earth should have aged less than him. The problem here is that the situation is not
perfectly symmetrical, as the spaceship does not form an inertial system: The reversal of
direction requires acceleration and thus force, which in a proper treatment explains that
indeed the twin on earth has aged faster, just as special relativity predicts. Given below is
the Minkowski diagram for the twin paradox:
ct lightray
× lines of simultaneity in Σ̃
x1
The earth is depicted at rest and the astronaut follows the arrows. The age difference
between the astronaut and his twin on earth can be read off this diagram: It is the difference
between the intersection of the last line of simultaneity on the way away from earth and
the ct-axis and the first one of the way back and the ct-axis.
§15 Length contraction. The analogue of time dilation is length contraction: Consider
a rod of length L in Σ extending from xA = (0, 0, 0) to xB = (L, 0, 0). In the system Σ̃,
moving relative to Σ with velocity v, we measure again the length of the rod, but this
measurement has to be made at the same time in Σ̃. For two space-time events (xA , tA )
80
A Linear algebra with SAGE
tA tB − Lv/c2
t̃A = p , t̃B = p ,
1 − v 2 /c2 1 − v 2 /c2
! ! (VII.53)
−vtA L − vtB
x̃A = p , 0, 0 , x̃B = p , 0, 0 .
1 − v 2 /c2 1 − v 2 /c2
We can choose t̃A = t̃B = 0, which implies tA = 0 and tB = −Lv/c2 . This implies that
p
L̃ := ||x̃B − x̃A || = L 1 − v 2 /c2 . Thus, objects that move relative to an observer appear
shorter.
§16 The ladder paradox. Assume that we have a garage with a front and a rear door
and a ladder than does not quite fit into the garage. If we throw the ladder fast enough
into the garage, the ladder shrinks from the perspective of the garage, and we can close
both doors simultaneously, with the ladder inside. However, from the perspective of the
ladder, the garage shrinks, which leads to an apparent paradox. The solution is, that from
the perspective of the ladder, both doors do not close simultaneously, and thus, there is no
paradox.
§17 Example: Muons. Muons are created by cosmic radiation hitting the outer atmo-
sphere in a height of 9-12 km. They move with 99.94% of the speed of light towards earth.
The half-life period of a resting muon is 1.52 × 10−6 s, corresponding to an average life time
of 2.2 × 10−6 s. In this time, they could travel 659.15 m, which is not enough to reach the
earth’s surface. Nevertheless, we measure many of these muons on the ground. The reason
for this is time dilation of the muon: Its flight from 12 km height lasts 4 × 10−5 s from our
perspective. In the reference frame of the muon, however, only
p
t = 4 × 10−5 s × 1 − 0.99942 = 1.39 × 10−6 s (VII.54)
have passed, which is still under the half-life period. For the muon, on the other hand, the
distance of 12 km is contracted to 415.63 m, which is below the distance it can travel in its
average lifetime. This has been experimentally verified by Rossi and Hall in 1940.
Appendix
A Linear algebra with SAGE
In this section, we give a concise introduction to the computer algebra system SAGE. SAGE
is published under the GNU General Public Licence and available for Linux and Mac OSX
at sagemath.org. Furthermore, SAGE can be used from any internet browser. For deeper
introductions to SAGE and how to use it in general, please look at the wealth of tutorials
available online.
81
VII Advanced topics
§1 Setup. To try SAGE, load the webpage www.sagenb.org and create a free account.
Once logged in, you can create a new worksheet and you are ready to go. Enter for
example 2+2 and hit shift+return simultaneously. SAGE returns 4. Enter f=sin(5*x) and
hit shift+return. Enter then f.plot() (always followed by shift+return) to see the plot of
the function, f.derivative() will return the derivative. Let us now focus on computations
in Linear Algebra that you can do with SAGE.
Warning: Remember that SAGE is a nice way of checking your computations and it is
useful for general experimenting with vectors and matrices. However, do not forget that
practicing your manual calculating skills is vital for your future life as a mathematician, in
particular for the exam.
§2 Vectors. To define the vector v = (1, 2, 3), enter v=vector(QQ,[1,2,3]). The QQ tells
SAGE, that it is working over the field of rational numbers, which on a computer substitute
the real numbers. We can now compute 5*v or v+v etc. We can define further vectors and
linearly combine them.
§3 Matrices. To define a matrix A, enter A=matrix(QQ,[[1,2,3],[3,4,3],[3,2,1]]),
where the innermost brackets define the rows of the matrix. We can solve the system Ax = v
by entering A.solve right(v). Note that SAGE returns at most one solution. If there are
further solutions, one has to find them separately, see below. We have the following com-
mands with obvious meaning available: A.transpose(), A.inverse(), A.determinant(),
A.eigenvalues(), A.eigenvectors right(), A.characteristic polynomial(). Note
that a polynomial can be solved by the factor() command. To compute eigenvalues
manually, you can therefore type A.characteristic polynomial().factor().
§4 SLEs. SAGE can perform elementary row operations on a matrix. Assume that
we have defined a matrix e.g. by entering A=matrix(QQ,[[1,2,3],[3,4,3],[3,2,1]]).
We have the commands A.rescale row(1,2), A.add multiple of row(2,0,3) as well
as A.with rescaled row(1,2), A.with added multiple of row(2,0,3). The first set of
commands manipulates the matrix A, the second set creates an output without changing
A. Note that SAGE counts rows and columns (as usual in computer science) starting at 0.
The command A.add multiple of row(2,0,3) therefore adds three times row 1 to row 3.
To bring an SLE Ax = v into row echelon form (from which we can read off the solution
space), we can do the following. First, we augment the matrix A by aug=A.augment(v).
The method aug.rref() produces a matrix in reduced row-echelon form, i.e. the part of
the augmented matrix corresponding to A is reduced to a diagonal matrix containing only
1s and 0s.
§5 Span. We start by telling SAGE that we want to work in R4 , or rather Q4 : V=QQ^4.
Let us now consider two vectors, defined in SAGE as v1=vector(QQ,[1,1,3,4]) and
v2=vector(QQ,[1,3,4,5]). The span of these vectors is obtained as follows: Define the
subspace W as the span W=V.span([v1,v2]). If we enter W, we obtain a nice basis for the
vector space. Moreover, we can test if vectors lie in W: x=2*v1-5*v2 and x in W returns
True. On the other hand, y=vector(QQ,[1,1,1,1]) and y in W returns False. To check
82
A Linear algebra with SAGE
linear dependence, you can play with the method linear dependence(), applied on a list
of vectors.
§6 And beyond... There are many more things one can do with SAGE in the context
of Linear Algebra. For example, the pivots() method will identify columns of augmented
matrices corresponding to pivot elements, the method right kernel() will compute the
kernel of a matrix etc. For more details, see the introduction Sage for Linear Algebra by
Robert A. Beezer, which is available for free download online. Also helpful is his quick
reference available here.
§7 SAGE code for Fourier transform. Recall the discussion in section IV, §8. We
define a function that we want to Fourier transform, f (x) = −x(x − 21 )2 (x − 1):
f(x)=-x(x-1/2)^2(x-1)
f.plot(x,0,1)
s(k,x)=2^(1/2)sin(2*pi*k*x)
c(k,x)=2^(1/2)sin(2*pi*k*x)
s(3,x).plot(x,0,1)
b0=integrate(f(x),x,0,1)
a1=integrate(f(x)*s(1,x),x,0,1)
b1=integrate(f(x)*c(1,x),x,0,1)
a2=integrate(f(x)*s(2,x),x,0,1)
b2=integrate(f(x)*c(2,x),x,0,1)
a3=integrate(f(x)*s(3,x),x,0,1)
b3=integrate(f(x)*c(3,x),x,0,1)
We can now plot the original function together with various steps in the expansion:
plot([f(x),b0*1],x,0,1)
plot([f(x),b0*1+a1*s(1,x)+b1*c(1,x)],x,0,1)
plot([f(x),b0*1+a1*s(1,x)+b1*c(1,x)+a2*s(2,x)+b2*c(2,x)],x,0,1)
plot([f(x),b0*1+a1*s(1,x)+b1*c(1,x)+a2*s(2,x)+b2*c(2,x)+a3*s(3,x)+
b3*c(3,x)],x,0,1)
83
Index
A G
adjoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68 Gaußian elimination . . . . . . . . . . . . . . . . . . 23
golden ratio. . . . . . . . . . . . . . . . . . . . . . . . . . .65
B Gram-Schmidt algorithm . . . . . . . . . . . . . 41
basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
orthogonal . . . . . . . . . . . . . . . . . . . . . . . . 41 H
orthonormal . . . . . . . . . . . . . . . . . . . . . . 42 Hermitian conjugate . . . . . . . . . . . . . . . . . . 68
Hermitian form . . . . . . . . . . . . . . . . . . . . . . . 67
C Hermitian matrix . . . . . . . . . . . . . . . . . . . . . 68
Cauchy-Schwarz inequality . . . . . . . . . . . . 38
Cayley-Hamilton theorem . . . . . . . . . . . . . 61 I
change of basis. . . . . . . . . . . . . . . . . . . . . . . .56 inner product . . . . . . . . . . . . . . . . . . . . . . . . . 37
characteristic equation . . . . . . . . . . . . . . . . 52 complex vector space . . . . . . . . . . . . . 67
characteristic polynomial . . . . . . . . . . . . . 52 inner product space . . . . . . . . . . . . . . . . . . . 37
column rank . . . . . . . . . . . . . . . . . . . . . . . . . . 45
column space . . . . . . . . . . . . . . . . . . . . . . . . . 45 K
column vectors. . . . . . . . . . . . . . . . . . . . . . . .45 kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48
complex conjugation . . . . . . . . . . . . . . . . . . 66
L
complex numbers . . . . . . . . . . . . . . . . . . . . . 65
linear combination . . . . . . . . . . . . . . . . . . . . 30
conic sections . . . . . . . . . . . . . . . . . . . . . . . . . 59
linear equation . . . . . . . . . . . . . . . . . . . . . . . . 20
coordinate vector . . . . . . . . . . . . . . . . . . . . . 56
linear map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
coupled harmonic oscillators . . . . . . . . . . 62
linear recursive sequence . . . . . . . . . . . . . . 64
D linear transformation . . . . . . . . . . . . . . . . . 44
damped harmonic oscillator . . . . . . . . . . . 63 rank of . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
determinant . . . . . . . . . . . . . . . . . . . . . . . . . . 17 linearly dependent . . . . . . . . . . . . . . . . . . . . 31
diagonalisable matrix . . . . . . . . . . . . . . . . . 57 linearly independent . . . . . . . . . . . . . . . . . . 31
diagonalising matrices . . . . . . . . . . . . . . . . 57
M
dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
matrix
E analytic function of . . . . . . . . . . . . . . . 61
eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 augmented. . . . . . . . . . . . . . . . . . . . . . . .22
eigenvector . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 complex . . . . . . . . . . . . . . . . . . . . . . . . . . 68
elementary row operations . . . . . . . . . . . . 22 inverting. . . . . . . . . . . . . . . . . . . . . . . . . .26
Euclidean inner product . . . . . . . . . . . . . . 38 positive definite. . . . . . . . . . . . . . . . . . .55
Euler’s formula . . . . . . . . . . . . . . . . . . . . . . . 66 powers of . . . . . . . . . . . . . . . . . . . . . . . . . 60
rank of . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
F real symmetric. . . . . . . . . . . . . . . . . . . .55
Fibonacci numbers . . . . . . . . . . . . . . . . . . . . 64 multiplicity
Fourier transformation . . . . . . . . . . . . . . . . 43 algebraic . . . . . . . . . . . . . . . . . . . . . . . . . 54
84
Index
geometric . . . . . . . . . . . . . . . . . . . . . . . . . 54 T
trace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
N transformation matrix . . . . . . . . . . . . . . . . 56
norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
complex vector space . . . . . . . . . . . . . 68 triangle inequality . . . . . . . . . . . . . . . . . . . . 39
null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
nullity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 U
unitary matrix . . . . . . . . . . . . . . . . . . . . . . . . 68
O
orthogonal matrix . . . . . . . . . . . . . . . . . . . . 51 V
orthogonal projection . . . . . . . . . . . . . . . . . 41 vector space . . . . . . . . . . . . . . . . . . . . . . . . . . 28
orthogonal set of vectors . . . . . . . . . . . . . . 40 complex vector space . . . . . . . . . . . . . 66
orthogonal vectors . . . . . . . . . . . . . . . . . . . . 39 vector space axioms . . . . . . . . . . . . . . . . . . . 28
orthonormal set of vectors . . . . . . . . . . . . 40 vector subspace . . . . . . . . . . . . . . . . . . . . . . . 29
vector subspace test . . . . . . . . . . . . . . . . . . 29
P
pivot variable . . . . . . . . . . . . . . . . . . . . . . . . . 24
polar form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
polynomial of degree n . . . . . . . . . . . . . . . . 28
population growth model . . . . . . . . . . . . . 65
Pythagoras’ theorem . . . . . . . . . . . . . . . . . . 40
R
range space . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46, 49
rank-nullity theorem . . . . . . . . . . . . . . . . . . 50
row echelon form . . . . . . . . . . . . . . . . . . . . . 22
row rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
row space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
row vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
S
scalar product . . . . . . . . . . . . . . . . . . . . . . . . 37
sesquilinear form . . . . . . . . . . . . . . . . . . . . . . 67
similarity transformation . . . . . . . . . . . . . . 57
span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
system of linear equations . . . . . . . . . . . . . 20
consistent . . . . . . . . . . . . . . . . . . . . . . . . 20
equivalent . . . . . . . . . . . . . . . . . . . . . . . . 21
general solution . . . . . . . . . . . . . . . . . . . 24
homogeneous . . . . . . . . . . . . . . . . . . . . . 25
inconsistent . . . . . . . . . . . . . . . . . . . . . . . 20
inhomogeneous . . . . . . . . . . . . . . . . . . . 25
85