0% found this document useful (0 votes)
132 views

VMnotes Ox

This document is a lecture outline for a course on vectors and matrices, also known as linear algebra, taught by Professor Andre Lukas at the University of Oxford. The course covers topics such as vector spaces and vectors, linear maps and matrices, systems of linear equations, determinants, scalar products, and eigenvectors and eigenvalues. It provides definitions, theorems, and examples for understanding key concepts and applying linear algebra across various fields of mathematics and physics.

Uploaded by

Roy Vesey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views

VMnotes Ox

This document is a lecture outline for a course on vectors and matrices, also known as linear algebra, taught by Professor Andre Lukas at the University of Oxford. The course covers topics such as vector spaces and vectors, linear maps and matrices, systems of linear equations, determinants, scalar products, and eigenvectors and eigenvalues. It provides definitions, theorems, and examples for understanding key concepts and applying linear algebra across various fields of mathematics and physics.

Uploaded by

Roy Vesey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 140

Vectors and Matrices

aka Linear Algebra

Prof Andre Lukas


Rudolf Peierls Centre for Theoretical Physics
University of Oxford

MT 2017
Contents
1 Vector spaces and vectors 6
1.1 Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Linear combinations, linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Vectors in Rn , geometrical applications 21


2.1 Scalar product in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Vector product in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Some geometry, lines and planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.1 Affine space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.2 Lines in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.3 Lines in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.4 Planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.5 Intersection of line and plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3.6 Minimal distance of two lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.7 Spheres in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3 Linear maps and matrices 42


3.1 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.1.1 General maps between sets and their properties . . . . . . . . . . . . . . . . . . . . 42
3.1.2 Linear maps: Definition and basic properties . . . . . . . . . . . . . . . . . . . . . 45
3.2 Matrices and their properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.1 Basic matrix properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.2 Rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2.3 Linear maps between column vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2.4 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.5 The inverse of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3 Row/column operations, Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Relation between linear maps and matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4 Systems of linear equations 73


4.1 General structure of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2 Solution by ”explicit calculation” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3 Solution by row reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5 Determinants 82
5.1 Definition of a determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Properties of the determinant and calculation . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6 Scalar products 93
6.1 Real and hermitian scalar products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 Orthonormal basis, Gram-Schmidt procedure . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.3 Adjoint linear map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.4 Orthogonal and unitary maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

1
6.5 Dual vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

7 Eigenvectors and eigenvalues 113


7.1 Basic ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.2 Characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.3 Diagonalization of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.4 Normal linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.5 Simultaneous diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.6.1 Solving Newton-type differential equations with linear forces . . . . . . . . . . . . 126
7.6.2 Functions of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.6.3 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

A Definition of groups and fields 134

B Some basics of permutations 134

C Tensors for the curious 136

2
Foreword: The subject of “Vectors and Matrices”, more politely called Linear Algebra, is one of the basic
disciplines of mathematics. It underlies many branches of more advanced mathematics, such as calculus
of functions in many variables and differential geometry, and it has applications in practically all parts
of physics. There are numerous textbooks on the subject ranging in style from low-level “how-to-do”
guides, mainly teaching the mechanics of manipulating low-dimensional vectors and matrices, to hugely
formalized treaties which barely ever write down a vector or a matrix explicitly. Naturally, a course for
beginning physics students should stay away from either extreme.
In the present text we will follow the inherent logic of the subject, in line with how it is taught in
research universities across the world. This will require some of the language of formal mathematics and
the occasional proof but we will keep this as light as possible. We attempt to illustrate the material
with many examples, both from physics and other areas, and teach the practical methods and algorithms
required in the day-to-day work of a physicist.
Hopefully, a student will finish the course with a good working knowledge of “Vectors and Matrices”
but also with an appreciation of the structure and beauty of the subject of Linear Algebra.

I would like to thank Kira Boehm, Daniel Karandikar and Doyeong Kim for substantial help with the
typesetting of these notes.

Andre Lukas
Oxford, 2013

3
Notation
R the real numbers
C the complex numbers
F a field, usually either the real or the complex numbers
V, W, U vector spaces
Rn the vector space of n-dimensional column vectors with real entries
Cn the vector space of n-dimensional column vectors with complex entries
v, w, · · · boldface lowercase letters are used for vectors
0 the zero vector
i, j, k, · · · indices to label vector components, usually in the range 1, . . . , n
vi , wi , · · · components of column vectors v, w, · · ·
ei the standard unit vectors in Rn
i, j, k another notation for the standard unit vectors e1 , e2 , e3 in R3
α, β, a, b, · · · lowercase letters are used for scalars
A, B, · · · uppercase letters are used for matrices
Aij entry (i, j) of a matrix A
Ai column vector i of a matrix A
Ai row vector i of a matrix A
AT , A† the transpose and hermitian conjugate of the matrix A
(v1 , . . . , vn ) a matrix with column vectors v1 , . . . , vn
1n the n × n identity matrix
Eij the standard matrices with (i, j) entry 1 and zero otherwise
diag(a1 , . . . , an ) an n × n diagonal matrix with diagonal entries a1 , . . . , an
Span(v1 , . . . , vk ) the span of the vectors v1 , . . . , vk
dimF (V ) the dimension of the vector space V over F
v·w the dot product between two n-dimensional column vectors
|v| the length of a vector
^(v, w) the angle between two vectors v and w
v×w the cross (vector) product of two three-dimensional column vectors
hv, w, ui the triple product of three column vectors in three dimensions
δij the Kronecker delta symbol in n dimensions
ijk the Levi-Civita tensor in three dimensions
i1 ···in the Levi-Civita tensor in n dimensions
f a linear map, unless stated otherwise
idV the identity map on V
Im(f ) the image of a linear map f
Ker(f ) the kernel of a linear map f
rk(f ) the rank of a linear map f
[A, B] the commutator of two matrices A, B
(A|b) the augmented matrix for a system of linear equations Ax = b
det(v1 , . . . , vn ) the determinant of the column vectors v1 , . . . , vn
det(A) the determinant of the matrix A
Sn the permutations of 1, . . . , n
sgn(σ) the sign of a permutation σ
h·, ·i a real or hermitian scalar product (or a bi-linear form)
f† the adjoint linear map of f
EigA (λ) the eigenspace of A for λ

4
χA (λ) the characteristic polynomial of A as a function of λ
tr(A) the trace of the matrix A

5
1 Vector spaces and vectors
Linear algebra is foundational for mathematics and has applications in many parts of physics, including
Classical Mechanics, Electromagnetism, Quantum Mechanics, General Relativity etc.
We would like to develop the subject, explaining both its mathematical structure and some of its physics
applications. In this section, we introduce the “arena” for Linear Algebra: vector spaces. Vector spaces
come in many disguises, sometimes containing objects which do not at all look like ”vectors”. Surprisingly,
many of these “unexpected” vector spaces play a role in physics, particularly in quantum physics. After
a brief review of “traditional” vectors we will, therefore, introduce the main ideas in some generality.

1.1 Vectors in Rn
The set of real numbers is denoted by R and by Rn we mean the set of all column vectors
 
v1
 ..
v= .  , (1.1)

vn

where v1 , . . . , vn ∈ R are the components of v. We will often use index notation to refer to a vector and
write the components collectively as vi , where the index i takes the values i = 1, . . . , n. There are two
basic operations for vectors, namely vector addition and scalar multiplication and for column vectors they
are defined in the obvious way, that is “component by component”. For the vector addition of two vectors
v and w this means
     
v1 w1 v 1 + w1
v =  ...  , w =  ...  , v + w :=  ..
 , (1.2)
     
.
vn wn v n + wn

where the vector sum v + w has the geometrical interpretation indicated in Fig. 1.

v+w w

Figure 1: Vector addition

6
The scalar multiplication of a column vector v with a scalar α ∈ R is defined as
 
αv1
αv :=  ...  , (1.3)
 

αvn
and the geometrical interpretation is indicated in Fig. 2.

αv

Figure 2: Scalar multiplication of vectors

Example 1.1: Vector addition and scalar multiplication in R3


As an example in R3 consider the two vectors
  
1 −4
v =  −2  , w= 1  . (1.4)
5 −3
Their vector sum is given by
     
1 −4 −3
v + w =  −2  +  1  =  −1  . (1.5)
5 −3 2
Further, scalar multiplication of v by α = 3 gives
   
1 3
αv = 3  −2  =  −6  . (1.6)
5 15

These two so-defined operations satisfy a number of obvious rules. The vector addition is associative,
(u + v) + w = u + (v + w), it is commutative, u + v = v + u, there is a neutral element, the zero vector
 
0
 .. 
0= .  , (1.7)
0

7
which satisfies v + 0 = v and, finally, for each vector v there is an inverse −v, so that v + (−v) = 0.
The scalar multiplication satisfies three further rules, namely the distributive laws α(v + w) = αv + αw,
(α + β)v = αv + βv and the associativity law (αβ)v = α(βv). These rules can be easily verified from the
above definitions of vector addition and scalar multiplication and we will come back to this shortly. It is
useful to introduce the standard unit vectors
 
0
 .. 
 . 
 0 
 ← ith position,
 
ei = 
 1  for i = 1, . . . , n , (1.8)
 0 
 .. 
.
 
0

in Rn which are the n vectors obtained by setting the ith component to one and all other components to
zero. In terms of the standard unit vectors a vector v with components vi can be written as
n
X
v = v1 e1 + · · · + vn en = vi ei . (1.9)
i=1

The results of vector additions and P


scalar multiplications
Pcan also be expressed in terms of the standard
unit vectors. With two vectors v = ni=1 vi ei and w = ni=1 wi ei we have
n
X n
X n
X n
X n
X
v+w = vi ei + wi ei = (vi + wi )ei , αv = α vi ei = (αvi )ei . (1.10)
i=1 i=1 i=1 i=1 i=1

Here we have used some of the above general rules, notably the associativity and commutativity of vector
addition as well as the associativity and distributivity of scalar multiplication.
In a physics context, the case n = 3 is particularly important. In this case, sometimes the notation
     
1 0 0
i = e1 =  0  , j = e2 =  1  , k = e3 =  0  (1.11)
0 0 1
for the three standard unit vectors is used, so that a vector r with components x, y, z, can be expresses
as  
x
r =  y  = xi + yj + zk . (1.12)
z

Example 1.2: Vector addition and scalar multiplication with standard unit vectors
With the standard unit vectors i, j and k in R3 the vectors v and w from Eq. (1.4) can also be written
as
v = i − 2j + 5k , w = −4i + j − 3k . (1.13)
With this notation, the vector addition of v and w can be carried out as
v + w = (i − 2j + 5k) + (−4i + j − 3k) = −3i − j + 2k . (1.14)
For the scalar multiple of v by α = 3 we have
αv = 3(i − 2j + 5k) = 3i − 6j + 15k . (1.15)

8
While the case n = 3 is important for the description of physical space, other values are just as relevant
in physics. For example, a system of k mass points moving in three-dimensional space can be described
by a vector with n = 3k components. In Special Relativity, space and time are combined into a vector
with four components. And finally, in Quantum Mechanics, the quantum states of physical systems are
described by vectors which, depending on the system, can be basically any size. For this reason, we will
keep n general, whenever possible.
You are probably not yet sufficiently familiar with the above physics examples. Let me discuss an
example in a more familiar context which illustrates the need for vectors with an arbitrary number of
components and also indicates some of the problems linear algebra should address.

Application: Google’s search algorithm


Modern internet search engines order search results by assigning a page rank to each website. As we will
see, this task can be formulated as a problem in linear algebra.
Consider an internet with n web sites labeled by an index k = 1, . . . , n. Each site k has nk links to
some of the other sites and is linked to by the sites Lk ⊂ {1, . . . , n}. We would like to assign a page rank
xk to each site k. A first attempt might be to define the page rank of a page k as the number of pages
linking to it. However, is it desirable that a page linked to by high-ranked pages has itself a higher rank
than a page linked to by low-ranked pages, even if the number of links is the same in each case. So, an
improved version might be to define xk as the sum of all page ranks xj of the pages linking to k, so as
a sum over all xj , where j ∈ Lk . As a further refinement, a link to page k from a page j with a low
number of links nj might be considered worth more than a link from a page j with a high number of links.
Altogether, this leads to the following proposal for the page rank
X xj
xk = . (1.16)
nj
j∈Lk

Note that these are n equations (one for each page rank xk ) and that the sum on the RHS runs over
all pages j which link to page k. Eqs. (1.16) constitute a system of n linear equations for the variables
x1 , . . . , xn (while the number of links, nj , are given constants). Perhaps this is best explained by focusing
on a simple example.
Consider a very simple internet with four sites, so n = 4, and a structure of links as indicated in Fig. 3.
From the figure, it is clear that the number of links on each site (equal to the number of outgoing arrows

n₁= 3 n₃= 1

1 3

2 4

n₂= 2 n₄= 2

Figure 3: Example of a simple “internet” with four sites. An arrow from site j to site k indicates that
site k is linked by site j.

9
from each site) is given by n1 = 3, n2 = 2, n3 = 1, n4 = 2 while the links themselves are specified by
L1 = {3, 4}, L2 = {1}, L3 = {1, 2, 4}, L4 = {1, 2}. To be clear, L1 = {3, 4} means that site 1 is linked
to by sites 3 and 4. With this data, it is straightforward to specialize the general equations (1.16) to the
example and to obtain the following equations
x3 x4
x1 = 1 + 2
x1
x2 = 3
x1 x2 x4 (1.17)
x3 = 3 + 2 + 2
x1 x2
x4 = 3 + 2

for the ranks of the four pages. Clearly, this is a system of linear equations. Later, we will formulate such
systems using vector/matrix notation. In the present case, we can, for example, introduce
   
x1 0 0 1 1/2
 x2 
 , A =  1/3 0 0 0 
 
x=  x3   1/3 1/2 0 1/2  (1.18)
x4 1/3 1/2 0 0

and re-write the Eqs. (1.17) as Ax = x. This equation describes a so-called “eigenvalue problem”, a class of
problems we will discuss in detail towards the end of the course. Of course, we will also properly introduce
matrices shortly. At this stage, the main point to note is that linear systems of equations are relevant
in “everyday” problems and that we need to understand their structure and develop efficient solution
methods. The four equations (1.17) for our explicit example can, of course, be solved by elementary
methods (adding and subtracting equations and their multiples), resulting in
 
2
 2/3 
x = α 3/2  ,
 (1.19)
1

where α is an arbitrary real number. Hence, site 1 is the highest-ranked one. In reality, the internet has
an enormous number, n, of sites. Real applications of the page rank algorithm therefore involve very large
systems of linear equations and vectors and matrices of corresponding size. Clearly, solving such systems
will require more refined methods and a better understanding of their structure. Much of the course will
be devoted to this task.

1.2 Vector spaces


The modern mathematical approach is to define vectors by the properties they satisfy rather than by
“what they are” and we would now like to follow this route. First, however, we need to introduce the
concept of a field.
Field: Prime examples of fields are the rational numbers, Q, the real numbers, R, and the complex
numbers, C. More generally, a field is an arena within which “regular calculations” involving addition and
multiplication of numbers can be carried out. We will write F for a field and will usually have the real
or complex numbers in mind, although much of what we will do also holds for other fields. For the more
formal-minded, a proper definition of fields can be found in Appendix A.

Example 1.3: Examples of finite fields

10
There exist “unusual” fields which satisfy all the requirements listed in Appendix A. These include fields
with a finite number of elements and here we introduce the simplest such examples, the finite fields
Fp = {0, 1, . . . , p − 1}, where p is a prime number. Addition and multiplication of two numbers a, b ∈ Fp
are defined by
a + b := (a + b) mod p , a · b = (ab) mod p . (1.20)
Here, the addition and multiplication on the right-hand sides of these definitions are just the usual ones
for integers. The modulus operation, a mod p, denotes the remainder of the division of a by p. In
other words, the definitions (1.20) are just the usual ones for addition and multiplication except that the
modulus operation brings the result back into the required range {0, 1, . . . , p − 1} whenever it exceeds
p − 1. Although these fields might seem abstract, they have important applications, for example in
numerical linear algebra. They allow calculations based on a finite set of integers which avoids numerical
uncertainties (as would arise for real numbers) and overflows (as may arise if all integers are used).
The smallest example of a field in this class is F2 = {0, 1}. Since every fields must contain 0 (the
neutral element of addition) and 1 (the neutral element of multiplication), F2 is the smallest non-trivial
field. From the definitions (1.20) its addition and multiplication tables are

+ 0 1 · 0 1
0 0 1 0 0 0 (1.21)
1 1 0 1 0 1

Note that, taking into account the mod 2 operation, in this field we have 1 + 1 = 0. Since the elements of
F2 can be viewed as the two states of a bit, this field has important applications in computer science and
in coding theory, to which we will return later.

We are now ready to define vector spaces.

Definition 1.1. A vector space V over a field F (= R, C or any other field) is a set with two operations:
i) vector addition: (v, w) 7→ v + w ∈ V , where v, w ∈ V
ii) scalar multiplication: (α, v) 7→ αv ∈ V , where α ∈ F and v ∈ V .
For all u, v, w ∈ V and all α, β ∈ F , these operations have to satisfy the following rules:
(V1) (u + v) + w = u + (v + w) “associativity”
(V2) There exists a “zero vector”, 0 ∈ V so that 0 + v = v “neutral element”
(V3) There exists an inverse, −v with v + (−v) = 0 “inverse element”
(V4) v+w =w+v “commutativity”
(V5) α(v + w) = αv + αw
(V6) (α + β)v = αv + βv
(V7) (αβ)v = α(βv)
(V8) 1·v =v
The elements v ∈ V are called “vectors”, the elements α ∈ F of the field are called “scalars”.

In short, a vector space defines an environment which allows for addition and scalar multiplication of
vectors, subject to a certain number of rules. Note that the above definition does not assume anything
about the nature of vectors. In particular, it is not assumed that they are made up from components.
Let us draw a few simple conclusions from these axioms to illustrate that indeed all the “usual” rules
for calculations with vectors can be deduced.

11
i) −(−v) = v
This follows from (V3) and (V4) which imply v + (−v) = 0 and −(−v) + (−v) = 0, respectively.
Combining these two equations gives v + (−v) = −(−v) + (−v) and then adding v to both sides,
together with (V1) and (V3), leads to v = −(−v).

ii) 0 · v = 0
(V 6)
Since 0 v = (0 + 0)v = 0 v + 0v and 0 v = 0 v + 0 it follows that 0 v = 0.

iii) (−1)v = −v
(ii) (V 6),(V 8)
Since 0 = 0 v = (1 + (−1))v = v + (−1)v and 0 = v + (−v) it follows that (−1)v = −v.
We now follow the standard path and define the “sub-structure” associated to vector spaces, the sub vector
spaces. These are basically vector spaces in their own right but they are contained in larger vector spaces.
The formal definition is as follows.
Definition 1.2. A sub vector space W ⊂ V is a non-empty subset of a vector space V satisfying:
(S1) w1 + w2 ∈ W for all w1 , w2 ∈ W
(S2) αw ∈ W for all α ∈ F and for all w ∈ W
In other words, a sub vector space is a non-empty subset of a vector space which is closed under vector
addition and scalar multiplication.
This definition implies immediately that a sub vector space is also a vector space over the same field as
V . Indeed, we already know that 0w = 0 and (−1)w = −w, so from property (S2) a sub vector space
W contains the zero vector and an inverse for each vector. Hence, the requirements (V2) and (V3) in
Definition 1.1 are satisfied for W . All other requirements in Definition 1.1 are trivially satisfied for W
simply by virtue of them being satisfied in V . Hence, W is indeed a vector space.
Every vector space V has two trivial sub vector spaces, the vector space {0} consisting of only the
zero vector and the whole space V . We will now illustrate the concepts of vector space and sub vector
space by some examples.

Example 1.4: Some examples of vector spaces


(a) The column vectors discussed in Section 1.1 of course form a vector space. Here, we can slightly
generalize these to V = F n over F , where F = R or C (or indeed any other field), that is, to column
vectors  
v1
v =  ...  , (1.22)
 

vn
where vi ∈ F . Vector addition and scalar multiplication are of course defined exactly as in Eqs. (1.2),
(1.3), that is, component by component. It is easy to check that all vector space axioms (V1)–(V8) are
indeed satisfied for these definitions. For example consider (V6):
       
(α + β)v1 αv1 + βv1 αv1 βv1
(1.3) 
(α + β)v =  .. ..  (1.2)  ..   ..  (1.3)
=  =  .  +  .  = αv + βv (1.23)
 
. .
(α + β)vn αvn + βvn αvn βvn

It is useful to write the definitions of the two vector space operations in index notation as

(v + w)i := vi + wi , (αv)i := αvi . (1.24)

12
The subscript i on the LHS means that component number i from the vector enclosed in brackets is
extracted. Using this notation, the vector space axioms can be verified in a much more concise way. For
example, we can demonstrate (V7) by ((αβ)v)i = (αβ)vi = α(βvi ) = α(βv)i = (α(βv))i .
Now, this example is not a big surprise and on its own would hardly justify the formal effort of our
general definition. So, let us move on to more adventurous examples.
(b) The set of all functions f : S → F from a set S into a field F , with vector addition and scalar
multiplication defined as

(f + g)(x) := f (x) + g(x) (1.25)


(af )(x) := af (x) , (1.26)

that is, by “pointwise” addition and multiplication, forms a vector space over F . The null “vector” is
the function which is zero everywhere and all axions (V 1) − (V 8) are clearly satisfied. There are many
interesting specializations and sub vector spaces of this.
(c) All continuous (or differentiable) functions f : [a, b] → F on an interval [a, b] ⊂ R form a vector
space. Indeed, since continuity (or differentiability) is preserved under addition and scalar multiplication,
as defined in (b), this is a (sub) vector space. For example, consider the real-valued functions f (x) =
2x2 + 3x − 1 and g(x) = −2x + 4. Then the vector addition of these two functions and the scalar multiple
of f by α = 4 are given by

(f +g)(x) = (2x2 +3x−1)+(−2x+4) = 2x2 +x+3 , (αf )(x) = 4(2x2 +3x−1) = 8x2 +12x−4 . (1.27)

(d) In physics, many problems involve solving 2nd order, linear, homogeneous differential equations of the
form
d2 f df
p(x) 2 + q(x) + r(x)f = 0 (1.28)
dx dx
where p, q and r are fixed functions. The task is to find all functions f which satisfy this equation.
This equation is referred to as a “linear” differential equation since every term is linear in the unknown
function f (rather than, for example, quadratic). This property implies that with two solutions, f , g of
the differential equation also f + g and αf (for scalars α) are solutions. Hence, the space of solutions of
such an equation forms a vector space (indeed a sub vector space of the twice differentiable functions). A
simple example is the differential equation

d2 f
+f =0 (1.29)
dx2
which is obviously solved by f (x) = cos(x) and f (x) = sin(x). Since the solution space forms a vector
space, we know that α cos(x) + β sin(x) for arbitrary α, β ∈ R solves the equation. This can also be easily
checked explicitly by inserting f (x) = α cos(x) + β sin(x) into the differential equation.
(e) The matrices of size n × m consist of an array of numbers in F with n rows and m columns. A matrix
is usually denoted by  
a11 . . . a1m
A =  ... ..  , (1.30)

. 
an1 . . . anm
with entries aij ∈ F . As for vectors, we use index notation aij , where i = 1, . . . , n labels the rows and
j = 1, . . . , m labels the columns, to collectively refer to all the entries. Addition of two n × m matrices A

13
and B with components aij and bij and scalar multiplication can then be defined as
   
a11 . . . a1m b11 . . . b1m
A + B =  ... ..  +  .. ..  (1.31)

.   . . 
an1 . . . anm bn1 . . . bnm
 
a11 + b11 . . . a1m + b1m
:= 
 .. .. 
(1.32)
. . 
an1 + bn1 . . . anm + bnm
   
a11 . . . a1m αa11 . . . αa1m
 .. ..  :=  ...
. ..
αA = α . (1.33)
  
. 
an1 . . . anm αan1 . . . αanm
that is, as for vectors, component by component. Clearly, with these operations, the n × m matrices with
entries in F form a vector space over F . The zero “vector” is the matrix with all entries zero. Indeed, as
long as we define vector addition and scalar multiplication component by component it does not matter
whether the numbers are arranged in a column (as for vectors) or a rectangle (as for matrices). By slight
abuse of notation we sometimes denote the entries of a matrix A by Aij = aij . In index notation, the
above operations can then be written more concisely as (A + B)ij := Aij + Bij and (αA)ij := αAij , in
analogy with the definitions (1.24) for column vectors.
For a numerical example, consider the 2 × 2 matrices
   
1 −2 0 5
A= , B= . (1.34)
3 −4 −1 8
Their sum is given by      
1 −2 0 5 1 3
A+B = + = , (1.35)
3 −4 −1 8 2 4
while the scalar multiplication of A with α = 3 gives
   
1 −2 3 −6
αA = 3 = . (1.36)
3 −4 9 −12

This list of examples hopefully illustrates the strength of the general approach. Perhaps surprisingly, even
many of the more “exotic” vector spaces do play a role in physics, particularly in quantum physics. Much
of what follows will only be based on the general Definition 1.1 of a vector space and, hence, will apply
to all of the above examples and many more.

1.3 Linear combinations, linear independence


Let us introduce the following useful pieces of terminology. For k vectors v1 , . . . , vk in a vector space V
over a field F the expression
k
X
α 1 v1 + · · · + α k v k = αi vi , (1.37)
i=1
with scalars α1 , . . . , αn ∈ F , is called a linear combination. The set of all linear combinations of v1 , . . . , vk ,
( k )
X
Span(v1 , . . . , vk ) := α i vi | α i ∈ F , (1.38)
i=1

14
is called the span of v1 , . . . , vk . Although the second definition might seem slightly abstract at first, the
span has a rather straightforward geometrical interpretation. For example, the span of a single vector
consists of all scalar multiples of this vector, so it can be viewed as the line through 0 in the direction of
this vector. The span of two vectors (which are not scalar multiples of each other) can be viewed as the
plane through 0 containing these two vectors and so forth.
We note that Span(v1 , . . . , vk ) is a sub vector space of VP. This is rather easy
P to see. Consider two
vectors u, v ∈ Span(v 1 , . . . , v k ) in the span, so that u = α v
i i i and v = i βP
i vi . Then, the sum
u + v = i (αi + βi )vi is clearly in the span as well, as is the scalar multiple αu = ni=1 (ααi )vi . Hence,
P
from Def. 1.2, the span is a (sub) vector space.

Example 1.5: The span of vectors


(a) For a simple example in R3 consider the span of the first two standard unit vectors Span(e1 , e2 ) =
{xe1 + ye2 | x, y ∈ R} which, of course, corresponds to the x-y plane.
(b) For a more complicated example in R3 , define the two vectors
   
1 0
u= 4  , v =  −3  .
2 1

Their span is given by


  
 α 
Span(u, v) = {αu + βv | α, β ∈ R} =  4α − 3β  | α, β ∈ R
2α + β
 

which describes a plane through 0.

The above interpretation of the span as lines, planes, etc. through 0 already points to a problem. Consider
the span of three vectors u, v, w but assume that u is a linear combination of the other two. In this
case, u can be omitted without changing the span, so Span(u, v, w) = Span(v, w). In this sense, the
original set of vectors u, v, w was not minimal. What we would like is a criterion for minimality of a set
of vectors, so that none of them can be removed without changing the span. This leads to the concept of
linear independence which is central to the subject. Formally, it is defined as follows.

Definition 1.3. Let V be a vector space over F and α1 , . . . , αk ∈ F scalars. A set of vectors v1 , . . . , vk ∈ V
is called linearly independent if
Xk
αi vi = 0 =⇒ all αi = 0 . (1.39)
i=1
Pk
Otherwise, the vectors are called linearly dependent. That is, they are linearly dependent if i=1 αi vi =0
has a solution with at least one αi 6= 0.

To relate this to our previous discussion the following statement should be helpful.

Claim 1.1. The vectors v1 , . . . , vk are linearly dependent ⇐⇒ One vector vi can be written as a linear
combination of the others.

Proof. The proof is rather simple but note that there are two directions to show.

15
“⇒”: Assume that the vectors v1 , . . . , vk are linearly dependent to that the equation ki=1 αi vi = 0 has
P
a solution with at least one αi 6= 0. Say, α1 6= 0, for simplicity. Then we can solve for v1 to get
1 X
v1 = − α i vi , (1.40)
α1
i>1

and, hence, we have expressed v1 as a linear combination of the other vectors.


“⇐”: PNow assume one vector, say vP 1 , can be written as a linear combination of the others so that
v1 = i>1 βi vi . Then it follows that ni=1 αi vi = 0 with α1 = 1 6= 0 and αi = −βi for i > 1. Hence, the
vectors are linearly dependent.

So for a linearly dependent set of vectors we can eliminate (at least) one vector without changing the
span. A linearly independent set is one which cannot be further reduced in this way, so is “minimal” in
this sense.

Example 1.6: : Linear independence of vectors


(a) The standard unit vectors e1 , . . . , en in Rn over R or Cn over C form a linearly independent set. This
is quite easy to see using Eq. (1.39). We have
 
n α1
αi ei =  ...  = 0 ,
X  !
(1.41)

i=1 αn
and this clearly implies that all αi = 0.
(b) As a less trivial example, consider the following vectors in R3 :
     
0 0 1
v1 =  1  , v2 =  1  , v3 =  1  (1.42)
1 2 −1

Again, using Eq. (1.39), we have


 
α3
!
α1 v1 + α1 v2 + α3 v3 =  α1 + α2 + α3  = 0 . (1.43)
α1 + 2α2 − α3
From the first component it follows that α3 = 0 and combining the second and the third component leads
to α1 = α2 = 0. Therefore the three vectors are linearly independent.
(c) Let us discuss linear dependence for systems of two and three vectors. First, assume that the two
(non-zero) vectors u, v are linearly dependent. From Claim 1.1 this means that one can be written as a
linear combination of the other, so, u = αv. So for two vectors, linear dependence means that they point
into the same, or opposite direction, that is they lie on the same line through 0.
Analogously, for three linearly dependent vectors u, v, w, one can be expressed as a linear combination
of the other two, so, for example, u = αv + βw. This means that u is in the plane through 0 spanned by
v and w.
(d) In example 1.4(d) we have explained that the solutions to homogeneous, linear second order differential
equations form a vector space. A simple example of such a differential equation is
d2 f
= −f (1.44)
dx2

16
which is obviously solved by f (x) = sin x and f (x) = cos x. Are these two solutions linearly independent?
Using Eq. (1.39) we should start with α sin x + β cos x = 0 and, since the zero “vector” is the function
identical to zero, this equation has to be satisfied for all x. Setting x = 0 we learn that β = 0 and setting
x = π/2 it follows that α = 0. Hence, sin and cos are linearly independent.
(e) For an example which involves linear dependence consider the three vectors
     
−2 1 0
v1 =  0  , v2 =  1  , v3 =  2  . (1.45)
1 1 3

Forming a general linear combination gives


 
−2α1 + α2
α1 v1 + α1 v2 + α3 v3 =  α2 + 2α3  =! 0 . (1.46)
α1 + α2 + 3α3

This set of equations clearly has non-trivial solutions, for example α1 = 1, α2 = 2, α3 = −1, so that the
vectors are linearly dependent. Alternatively, this could have been inferred by noting that v3 = v1 + 2v2 .

1.4 Basis and dimension


For a given vector space V , it is useful to have a “minimal” set of vectors which still spans the whole
space. Such a set of vectors is called a basis and its formal definition is:
Definition 1.4. A set v1 , . . . , vn ∈ V of vectors is called a basis of V iff:
(B1) v1 , . . . , vn are linearly independent.
(B2) V = Span(v1 , . . . , vn )
It is easy to check that the vectors in Example 1.6 (a), (b), (d) above from a basis. The concept of basis is
of central importance in the theory of vectors spaces. Every vector can be written as a linear combination
of the basis vectors and, what is more, for a given vector this linear combination is unique:
ClaimP1.2. If v1 , . . . , vn is a basis of V , every vector v ∈ V can be written as a unique linear combination
n
v = i=1 αi vi , that is, given v, the αi are uniquely determined. The coefficients αi are called the
coordinates of v with respect to v1 , . . . , vn .
Proof. We need to show that there is indeed only one set of possible coefficients for a given vector v. Let
us write v as two linear combinations
n
X n
X
v= α i vi = βi vi (1.47)
i=1 i=1

with coefficients αi and βi . Taking the difference of these two equations implies
n
X
(αi − βi )vi = 0 , (1.48)
i=1

and, from linear independence of the basis, it follows that all αi − βi = 0, so that indeed αi = βi .

In summary, given a basis every vector can be represented by its coordinates relative to the basis. Let
us illustrate this with a few examples.

17
Example 1.7: : Coordinates relative to a basis
(a) For the standard basis ei of V = F n over F we can write every vector w as
 
w1 n
 ..  X
w= . = wi ei (1.49)
wn i=1

so the coordinates are identical to the components.


(b) For a more complicated example, start with the basis
     
0 0 1
v1 =  1 , v2 =
  1 , v3 =  1  (1.50)
1 2 −1

of R3 and a general vector r with components x, y, z. To write r as a linear combination of the basis
vectors we set    
x α3
!
r =  y  = α1 v1 + α2 v2 + α3 v3 =  α1 + α2 + α3  (1.51)
z α1 + 2α2 − α3
which implies x = α3 , y = α1 + α2 + α3 , z = α1 + 2α2 − α3 . Solving for the αi leads to α1 = −3x + 2y − z,
α2 = 2x − y + z, α3 = x, so these are the coordinates of r relative to the given basis.

We would like to call the number of vectors in a basis the dimension of the vector space. However, there
are usually many different choices of basis for a given vector space. Do they necessarily all have the same
number of vectors? Intuitively, it seems this has to be the case but the formal proof is more difficult than
expected. It comes down to the following

Lemma 1.1. (Exchange Lemma) Let v1 , . . . , vn be a basis of V and w1 , . . . , wm ∈ V arbitrary vectors.


If m > n then w1 , . . . , wm are linearly dependent.

Proof. If w1 = 0 then the vectors w1 , . . . , wm are linearly dependent, so can assume that w1 6= 0. Since
the vectors vi form a basis we can write
Xn
w1 = αi vi
i=1

with at least one αi (say α1 ) non-zero (or else w1 would be zero). We can, therefore, solve this equation
for v1 so that
n
!
1 X
v1 = w1 − αi vi
α1
i=2

This shows that we can “exchange” the vector v1 for w1 in the basis {vi } such that V = Span(w1 , v2 , . . . , vn ).
This exchange process can be repeated until all vi are replaced by wi and V = Span(w1 , . . . , wn ). Since
m > n there is at least one vector, wn+1 , “left over” which can be written as a linear combination
n
X
wn+1 = βi wi .
i=1

This shows that the vectors w1 , . . . , wm are linearly dependent.

18
Now consider two basis, v1 , . . . , vn and w1 , . . . , wm of V . Then the Lemma implies that both n > m
and n < m are impossible and n = m follows. Hence, while a vector space usually allows many choices of
basis the number of basis vectors is always the same. So we can define
Definition 1.5. For a basis v1 , . . . , vn of a vector space V over F we call dimF (V ) := n the dimension
of V over F .
From what we have just seen, it does not matter which basis we use to determine the dimension. Every
choice leads to the same result. Let us apply this to compute the dimension for some examples.

Example 1.8: Dimensions of some vector spaces


(a) We have already established that the standard unit vectors e1 , . . . , en form a basis of Rn and Cn (seen
as vector spaces of the fields R and C, respectively), so dimR (Rn ) = dimC (Cn ) = n. However, note that
Cn as a vector space over R has a basis e1 , . . . , en , ie1 , . . . , ien and, therefore, dimR (Cn ) = 2n.
(b) Following up from example 1.6 (d) we would like to determine the dimension of the solution space (of
real-valued functions) of
d2 f
= −f . (1.52)
dx2
The general solution is given by f (x) = α sin x + β cos x with arbitrary real coefficients α and β. We have
already seen that sin and cos are linearly independent. Hence, the dimension of this space is 2.
(c) Real polynomials of degree d in the variable x are of the form ad xd + ad−1 xd−1 + · · · + a1 x + a0 with
real coefficients ai and they form a vector space over R. What is the dimension of this space? Clearly, it
is spanned by the monomials 1, x, x2 , . . . , xd . To show that the monomials are linearly independent start
with
Xd
αi xi = 0 , (1.53)
i=0

take the k th derivative with respect to x and then set x = 0. This immediately implies that αk = 0 and,
hence, that the monomials are linearly independent and form a basis. The dimension of the space is,
therefore, d + 1.
(d) For the n × m matrices with entries in F (as a vector space over F ) define the matrices
 
0 ··· 0 0 0 ··· 0
 .. .. .. .. .. 
 . . . . . 
 
 0 ··· 0 0 0 ··· 0 
 
E(ij) =  0 ··· 0 1 0 ··· 0   , (1.54)
 0 ··· 0 0 0 ··· 0 
 
 .. .. .. .. .. 
 . . . . . 
0 ··· 0 0 0 ··· 0

where i = 1, . . . , n and j = 1, . . . , m and the “1” appears in the ith row and j th column with all other
entries zero. Clearly, these matrices form a basis, in complete analogy with the standard unit vectors.
Therefore, the vector space of n × m matrices has dimension nm.

In the following Lemma we collect a few simple conclusions about vector spaces which are spanned by a
finite number of vectors.

19
Lemma 1.2. For a vector space V spanned by a finite number of vectors we have:
(i) V has a basis
(ii) Every linearly independent set v1 , . . . , vk ∈ V can be completed to a basis.
(iii) If n = dim(V ), any linearly independent set of vectors v1 , . . . , vn forms a basis.
(iv) If dimF (V ) = dimF (W ) and V ⊂ W for two vector spaces V , W then V = W .

Proof. : (i) By assumption, V is spanned by a finite number of vectors, say V = Span(v1 , . . . , vk ). If


these vectors are linearly independent we have found a basis. If not, one of the vectors, say vk , can be
written as a linear combination of the others and can, hence, be dropped without changing the span,
so V = Span(v1 , . . . , vk−1 ). This process can be continued until the remaining set of vectors is linearly
independent.
(ii) If the linearly independent vectors v1 , . . . vk already span V we are finished. If not there exists a
vector vk+1 ∈ / Span(v1 , . . . vk ). Hence, the vectors v1 , . . . , vk , vk+1 must be linearly independent. We can
continue adding vectors to the list until it spans the whole space. (The process must terminate after a
finite number of steps or else we would contradict the Exchange Lemma 1.1.)
(iii) If dim(V ) = n and the linearly independent set v1 , . . . , vn did not span V then, from (ii), it could
be completed to a basis with more than n elements. However, this is a contradiction since the number of
elements in a basis is the same for any choice of basis. Hence, the vectors v1 , . . . , vn must span the space
and they form a basis.
(iv) We can choose a basis v1 , . . . , vn of V . Since V ⊂ W , these basis vectors are linearly independent
in W and, since dimF (W ) = dimF (V ), they must also form a basis of W , using (iii). Hence V =
Span(v1 , . . . , vn ) = W .

Application: Magic Squares


An amusing application of vector spaces is to magic squares. Magic squares are 3 × 3 (say) quadratic
schemes of (rational) numbers – we can think of them as 3 × 3 matrices – such that all rows, all columns
and both diagonals sum up to the same total. A simple example is
 
4 9 2
M = 3 5 7  (1.55)
8 1 6

where every row, column and diagonal sums up to 15. Magic squares have long held a certain fascination
and an obvious problem is to find all magic squares.
In our context, the important observation is that magic squares form a vector space. Let us agree
that we add and scalar multiply magic squares in the same way as matrices (see Example 1.4 (e)), that is,
component by component. Then, clearly, the sum of two magic squares is again a magic square, as is the
scalar multiple of a magic square. Hence, from Def. 1.2, the 3 × 3 magic squares form a sub vector space
of the space of all 3 × 3 matrices. The problem of finding all magic squares can now be phrased in the
language of vector spaces. What is the dimension of the space of magic squares and can we write down a
basis for this space?
It is relative easy to find the following three elementary examples of magic squares:
     
1 1 1 0 1 −1 −1 1 0
M1 =  1 1 1  , M2 =  −1 0 1  , M3 =  1 0 −1  . (1.56)
1 1 1 1 −1 0 0 −1 1

20
It is also easy to show that these three matrices are linearly independent, using Eq. (1.39). Setting a
general linear combination to zero,
 
α1 − α3 α1 + α2 + α3 α1 − α2
!
α1 M1 + α2 M2 + α3 + M3 =  α1 − α2 + α3 α1 α1 + α2 − α3  = 0 , (1.57)
α1 + α2 α1 − α2 − α3 α1 + α3
immediately leads to α1 = α2 = α3 = 0. Hence, M1 , M2 , M3 are linearly independent and span a three-
dimensional vector space of magic squares. Therefore, the dimension of the magic square space is at least
three. Indeed, our example (1.55) is contained in this three-dimensional space since M = 5M1 +3M2 +M3 .
As we will see later, this is not an accident. We will show that, in fact, the dimension of the magic square
space equals three and, hence, that M1 , M2 , M3 form a basis.

2 Vectors in Rn , geometrical applications


We would now like to pause the general story (before we resume in the next chapter) and focus on a
number of important topics for column vectors in Rn . In particular, we will introduce the scalar and
vector product for column vectors which are widely used in physics and discuss some related geometrical
applications.

2.1 Scalar product in Rn


The scalar (or dot) product for two n-dimensional column vectors
   
a1 b1
a =  ...  , b =  ...  (2.1)
   

an bn
is defined as
n
X
a · b := ai bi . (2.2)
i=1
In physics it is customary to omit the sum symbol in this definition and simply write a · b = ai bi , adopting
the convention that an index which appears twice in a given term (such as the index i in the present case)
is summed over. This is also referred to as the Einstein summation convention.
The scalar product satisfies a number of obvious properties, namely
(a) a·b=b·a
(b) a · (b + c) = a · b + a · c
(2.3)
(c) a · (βb) = βa · b
(d) a · a > 0 for all a 6= 0
Property (a) means that the dot product is symmetric. Properties (b), (c) can be expressed by saying
that the scalar product is linear in the second argument (vector addition and scalar multiplication can be
“pulled through”) and, by symmetry, it is therefore also linear in the first argument. It is easy to show
these properties using index notation.
(a) a · b = ai bi = bi ai = b · a
(b) a · (b + c) = ai (bi + ci ) = ai bi + ai ci = a · b + a · c
(c) a · (βb)P= ai (βbi ) = βai bi = βa · b
(d) a · a = ni=1 a2i > 0 for a 6= 0

21
Scalar products can also be defined “axiomatically” by postulating the four properties (a) – (d) and we
will come back to this approach in Section 6.
The last property, (d), allows us to define the length of a vector as

n
!1/2
√ X
|a| := a·a= a2i . (2.4)
i=1

Example 2.1: : Dot product and length of vectors


The three-dimensional vectors   
2 2
a= 4  , b= 1  (2.5)
−2 1
have a dot product    
2 2
a·b=  4  ·  1  = 2 · 2 + 4 · 1 + (−2) · 1 = 6 . (2.6)
−2 1
Their lengths are given by
p √ p √
|a| = 22 + 42 + (−2)2 = 2 6 , |b| = 22 + 1 2 + 1 2 = 6 . (2.7)

It follows easily that |αa| = |α||a| for any real number α. This relation means that every non-zero vector
a can be “normalised” to length one by defining

n = |a|−1 a . (2.8)

The length of n is indeed one, since |n| = ||a|−1 a| = |a|−1 |a| = 1.

The dot product satisfies an important inequality.

Lemma 2.1. (Cauchy-Schwarz inequality) For any two vectors a and b in Rn we have

|a · b| ≤ |a| |b| .

Proof. The proof is a bit tricky. We start with the simplifying assumption that |a| = |b| = 1. Then

0 ≤ |a ± b|2 = (a ± b) · (a ± b) = |a|2 ± 2a · b + |b|2 = 2(1 ± a · b) ,

which shows that |a · b| ≤ 1. Now consider arbitrary vectors a and b. If one of these vector is zero then
the inequality is trivially satisfied so we assume that both of them are non-zero. Then the vectors
a b
u= , v=
|a| |b|

have both length one and, hence, |u · v| ≤ 1. Inserting the definitions of u and v into this inequality and
multiplying by |a| and |b| gives the desired result.

A closely related inequality is the famous

22
Lemma 2.2. (Triangle inequality) For any two vectors a and b in Rn we have

|a + b| ≤ |a| + |b|

Proof.
|a + b|2 = |a|2 + |b|2 + 2a · b ≤ |a|2 + |b|2 + 2|a| |b| = (|a| + |b|)2 ,
where the Cauchy-Schwarz inequality has beed used in the second step.

a b

a+b

Figure 4: Geometric meaning of the triangle inequality: The length |a + b| is always less or equal than the
sum |a| + |b| for the other two sides.

The triangle inequality has an obvious geometrical interpretation which is illustrated in Fig. 4. For
two non-zero vectors a and b, the Cauchy-Schwarz inequality implies that
a·b
−1 ≤ ≤1 (2.9)
|a| |b|
so that there is a unique angle θ ∈ [0, π] such that
a·b
cos θ = . (2.10)
|a| |b|

Figure 5: Angle between two vectors

This angle θ is called the angle between the two vectors a and b, also denoted ^(a, b). With this definition
of the angle we can also write the scalar product as

a · b = |a| |b| cos(^(a, b)) . (2.11)

We call the two vectors, a and b orthogonal (or perpendicular), in symbols a ⊥ b, iff a · b = 0. For two
non-zero vectors a and b this means
π
a ⊥ b ⇐⇒ ^(a, b) = . (2.12)
2

23
Example 2.2: : Angle between vectors and orthogonality

(a) Recall √
that the two vectors a and b in Eq. (2.5) have a dot product a · b = 6 and lengths |a| = 2 6
and |b| = 6. Hence, the angle between them is given by
6 1 π
cos(^(a, b)) = √ √ = ⇒ ^(a, b) = . (2.13)
2 6 6 2 3

(b) The two vectors   



3 1
a =  −2  , b= 2  (2.14)
1 1
have a dot product a · b = 3 · 1 + (−2) · 2 + 1 · 1 = 0 and are, hence, orthogonal.
(c) Start with a general vector  
a1
a= (2.15)
a2
in R2 . It is easy to write down a vector orthogonal to a by exchanging the two components and inverting
the sign of one of them. Of course every scalar multiple of this new vector is also orthogonal to a so the
vectors  
−a2
b=α . (2.16)
a1
are orthogonal to a for arbitrary α ∈ R. Indeed, we have a · b = α(a1 (−a2 ) + a2 a1 ) = 0. Conversely, it is
easy to see that all vectors orthogonal to a must be of this form.

2.2 Vector product in R3


For two three-dimensional vectors a and b the vector product is defined as
     
a1 b1 a2 b3 − a3 b2
a × b =  a2  ×  b2  :=  a3 b1 − a1 b3  . (2.17)
a3 b3 a1 b2 − a2 b1

Note that this rule is relatively easy to remember: For the first entry of the cross product consider the
second and third components of the two vectors and multiply them “cross-wise” with a relative minus
sign between the two terms and similarly for the other two entries.

Example 2.3: Cross product


Using the above definition, the cross product can be carried out immediately, for example
       
2 −2 4·5−3·1 17
 4  ×  1  =  3 · (−2) − 2 · 5  =  −16  (2.18)
3 5 2 · 1 − 4 · (−2) 10

However, cross product calculations which involve non-numerical, symbolic expressions can become ex-
tremely tedious if done by writing out all three components explicitly. It is therefore useful to introduce

24
a more economical notation and adopt the Einstein summation convention. To this end, we define the
following two objects:

Kronecker delta in Rn The Kronecker delta in Rn is defined by



1 if i = j
δij = , (2.19)
0 if i 6= j
where i, j, . . . = 1, . . . , n. For a vector aj we have δij aj = ai (recall, a sum over j is implied), so the
Kronecker delta acts as an “index replacer”, substituting the summed over index j by i in the previous
expression. Another useful property is δii = n (again, note the double appearance of i means it is summed
over). The dot product can also be expressed in terms of the Kronecker delta by writing a · b = ai bi =
δij ai bj .

Levi-Civita tensor in R3 The Levi-Civita tensor, ijk , where i, j, k, . . . = 1, 2, 3 is defined as



 +1 if (i, j, k) = (1, 2, 3), (2, 3, 1), (3, 1, 2) “cyclic permutations”
ijk = −1 if (i, j, k) = (2, 1, 3), (3, 2, 1), (1, 3, 2) “anti-cyclic permutations” . (2.20)
0 otherwise

It has a number of useful properties, namely

(a) it remains unchanged under cyclic index permutations, for example ijk = jki (2.21)
(b) it changes sign under anti-cyclic index permutations, for example ijk = −ikj (2.22)
(c) it vanishes if two indices are identical, for example ijj = 0 (2.23)
(d) ijk ilm = δjl δkm − δjm δkl (2.24)
(e) ijk ijm = 2δkm (2.25)
(f) ijk ijk = 6 (2.26)
(g) ijk aj ak = 0 . (2.27)

The first three of these properties are obvious from the definition of the Levi-Civita tensor. Property (2.24)
can be reasoned out as follows. If the index pair (j, k) is different from (l, m) (in any order) then clearly
both sides of (2.24) are zero. On the other hand, if the two index pairs equal each other they can do so
in the same or the opposite ordering and these two possibilities correspond precisely to the two terms on
the RHS of (2.24). If we multiply (2.24) by δjl , using the index replacing property of the Kronecker delta,
we obtain
ijk ijm = (δjl δkm − δjm δkl )δjl = 3δkm − δkm = 2δkm
and this is property (2.25). Further, multiplying (2.25) with δkm we have

ijk ijk = 2δkm δkm = 2δkk = 6

and, hence, (2.26) follows. Finally, to show (2.27) we write ijk aj ak = −ikj ak aj = −ijk aj ak , where the
summation indices j and k have been swapped in the last step, and, hence, 2ijk aj ak = 0.

We can think of δij and ijk as a convenient notation for the 0’s, 1’s and −1’s which appear in the definition
of the dot and cross product. Indeed, the dot product can be written as

a · b = ai bi = δij ai aj , (2.28)

while the index version of the cross product takes the form

(a × b)i = ijk aj bk . (2.29)

25
To verify this last equation focus, for example, on the first component:

1jk aj bk = 123 a2 b3 + 132 a3 b2 = a2 b3 − a3 b2 , (2.30)

which indeed equals the first component of the vector product (2.17). Analogously, it can be verified that
the other two components match. Note that the index expression (2.29) for the vector product is much
more concise than the component version (2.17).

Example 2.4: Vector products in physics


Vector products make a frequent appearance in physics. Here are a few basic examples:
(a) In mechanics the angular momentum of a mass m at position r and with velocity ṙ is given by
L = mr × ṙ.
(b) The force a magnetic field B exerts on a particle with charge q and veclocity ṙ, the so-called Lorentz
force, is given by F = q ṙ × B.
(c) The velocity of a point with coordinate r in a rotating coordinate system with angular velocity ω is
given by v = ω × r.

The vector product satisfies the following useful properties.

(a) a × b = −b × a (2.31)
(b) a × (b + c) = a × b + a × c (2.32)
(c) a × (βb) = βa × b (2.33)
(d) e1 × e2 = e3 , e2 × e3 = e1 , e3 × e1 = e2 (2.34)
(e) a×a=0 (2.35)
(f) a × (b × c) = (a · c)b − (a · b)c (2.36)
(g) (a × b) · (c × d) = (a · c)(b · d) − (a · d)(b · c) (2.37)
2 2 2 2
(h) | a × b | =| a | | b | −(a · b) (2.38)

Property (a) means that the vector product is anti-symmetric. Properties (b) and (c) imply linearity in
the second argument (vector addition and scalar multiplication can be “pulled through”) and, from anti-
symmetry, linearity also holds in the first argument. Property (g) is sometimes referred to as Lagrange
identity. The above relations can be verified by writing out all the vectors explicitly and using the
definitions (2.2) and (2.17) of the dot and cross products. However, for some of the identities this leads to
rather tedious calculations. It is much more economical to use index notation and express dot and cross
product via Eqs. (2.28) and (2.29). The proofs are then as follows:

(a) (a × b)i = ijk aj bk = −ikj bk aj = −(b × a)i

(b) (a × (b + c))i = ijk aj (bk + ck ) = ijk aj bk + ijk aj ck = (a × b + a × c)i

(c) (a × (βb))i = ijk aj βbk = βijk aj bk = β(a × b)i

(d) By explicit computation using the definition (2.17) with the standard unit vectors.

(e) (a × a)i = ijk aj ak = 0 (from property (2.27) of ijk )

26
(2.24)
(f) (a × (b × c))i = ijk aj (b × c)k = ijk kmn aj bm nn = kij kmn aj bm cn = (δim δjn − δin δjm )aj bm cn
= aj cj bi − aj bj ci = a · cbi − a · bci = ((a · c)b − (a · b)c)i
(2.24)
(g) (a × b) · (c × d) = ijk imn aj bk cm dn = (δjm δkn − δjn δkm )aj bk cm dn = (a · c)(b · d) − (a · d)(b · c)

(h) Set c = a and d = b in property (g).

Note how expressions in vector notation are converted into index notation in these proofs by working from
the “outside in”. For example, for property (f) we have first written the outer cross product between a
and b × c in index notation and then, in the second step, we have converted b × c. Once fully in index
notation, the order of all objects can be exchanged at will - after all they are just numbers. In the proofs
of (f) and (g) the Kronecker delta acts as an index replacer, as explained below Eq. (2.19).
It is worth pointing out that the cross product can also be defined axiomatically by postulating a
product × : R3 → R3 with the properties (a) – (d) in (2.31)–(2.34). In other words, the cross product
can be defined as an anti-symmetric, bi-linear operation, mapping two three-dimensional vectors into a
three-dimensional vector, which acts in a simple, cyclic way (that is, as in property (d)) on the standard
unit vectors. It is easy to see that the vector product is indeed completely determined by theseP properties.
WritePtwo vector a, b ∈ R3 as linear combinations of the standard unit vectors, that is, a = i ai ei and
b = j bj ej , and work out their cross product using only the rules (a) – (d) (as well as the rule (e) which
follows directly from (a)). This leads to
!  
X X (2.32),(2.33) X (2.35) X
a×b= ai ei ×  bj ej  = ai bj ei × ej = ai bj ei × ej (2.39)
i j i,j i6=j
(2.31),(2.34)
= (a2 b3 − a3 b2 )e1 + (a3 b1 − a1 b3 )e2 + (a1 b2 − a2 b1 )e3 , (2.40)

which coincides with our original definition (2.17).

Application: Kinetic energy of a rotating rigid body


An example which illustrates some of the above identities and techniques in the context of classical
mechanics is the kinetic energy of a rigid rotating body. Consider a rigid body, as depicted in Fig. 6,
which we would like to think of as consisting of (a possibly large number of) mass points, labeled by an
index α, each with mass mα , position vector rα and velocity vα . The total kinetic energy P of this body is of
course obtained by summing over the kinetic energy of all mass points, that is, Ekin = 21 α mα vα2 . From
Example (2.4)(c) we know that the velocity of each mass point is related to its position by vα = ω × rα ,
where ω is the angular velocity of the body. The kinetic energy of the rotating body can then be written
as
1X 1X (2.38) 1 X
Ekin = mα vα2 = mα | ω × rα |2 = mα (| ω |2 | rα |2 −(ω · rα )2 )
2 α 2 a 2 α
" #
1X 2 1 X
2
= mα (ωi ωj δij | rα | −wi rαi wj rαj ) = ωi mα (| rα | δij − rαi rαj ) ωj .
2 α 2 α
| {z }
=: Iij

The object in the square bracket, denoted by Iij , is called the moment of intertia tensor of the rigid
body. It is obviously symmetric, so Iij = Iji , so we can think of it as forming a symmetric matrix, and
it is a characteristic quantity of the rigid body. We can think of it as playing a role in rotational motion

27
Figure 6: A rotating rigid body

analogous to that of regular mass in linear motion. Correspondingly, the total kinetic energy of the rigid
body can be written as
1X
Ekin = Iij ωi ωj (2.41)
2
i,j
This relation is of fundamental importance for the mechanics of rigid bodies, in particular the motion of
tops, and we will return to it later.

Dot and cross product can be combined to a third product with three vector arguments, the triple product,
which is defined as
ha, b, ci := a · (b × c) . (2.42)
It has the following properties
(a) ha, b, ci = ijk ai bj ck = a1 b2 c3 + a2 b3 c1 + a3 b1 c2 − a1 b3 c2 − a2 b1 c3 − a3 b2 c1 (2.43)
(b) It is linear in each argument, e.g. hαa + βb, c, di = αha, c, di + βhb, c, di (2.44)
(c) It is unchanged under cyclic permutations, e.g. ha, b, ci = hb, c, ai (2.45)
(d) It changes sign for anti-cyclic permutations, e.g. ha, b, ci = −ha, c, bi (2.46)
(e) It vanishes if any two arguments are the same, e.g. ha, b, bi = 0 (2.47)
(f) The triple product of the three standard unit vectors is one, that is he1 , e2 , e3 i = 1 (2.48)
Property (a) follows easily from the definitions of dot and cross products, Eqs. (2.28) and (2.29), in index
notation and the definition (2.20) of the Levi-Civita tensor. Properties (b)–(e) are a direct consequence of
(a) and the properties of the Levi-Civita tensor. Specifically, (c) and (d) follow from (2.21),(2.22) and (e)
follows from (2.27). Property (f) follows from direct calculation, using the cross product relations (2.34)
for the standard unit vectors.
Another notation for the triple product is
det(a, b, c) := ha, b, ci , (2.49)

28
where “det” is short for determinant. Later we will introduce the determinant in general and for arbitrary
dimensions and we will see that, in three dimensions, this general definition indeed coincides with the
triple product.
Note that the six terms which appear in the explicit expression (2.43) for the triple product correspond
to the six permutations of {1, 2, 3}, where the three terms for the cyclic permutations come with a positive
sign and the other, anti-cyclic ones with a negative sign. There is a simple way to memorise these six
terms. If we arrange the three vectors a, b and c into the columns of a 3 × 3 matrix for convenience, the
six terms in the triple product correspond to the products of terms along the diagonals.
 a b1 c1 
1
      
det  a2 b2 c2  = a1 b2 c3 + a2 b3 c1 + a3 b1 c2 − a1 b3 c2 − a2 b1 c3 − a3 b2 c1 (2.50)

      
a3 b3 c3
Here north-west to south-east lines connect the entries forming the three cyclic terms which appear with
a positive sign while north-east to south-west lines connect the entries forming the anti-cyclic terms which
appear with a negative sign. Corresponding lines on the left and right edge should be identified in order
to collect all the factors.

Example 2.5: Calculating the triple product


Let us calculate the triple product ha, b, ci = a · (b × c) for the three vectors
     
−1 −2 4
a =  2  , b =  5  , c =  −6  . (2.51)
−3 1 3

One way to proceed is to first work out the cross product between b and c, that is
     
−2 4 21
b × c =  5  ×  −6  =  10  . (2.52)
1 3 −8

and then dot the result with a, so


   
−1 21
a · (b × c) =  2  ·  10  = 23 . (2.53)
−3 −8

Alternatively and equivalently, we can use the rule (2.50) which gives
−1 −2 4
 
 
 
 

det(a, b, c) = det  2 5 −6  (2.54)

 
 
 

−3 1 3
= (−1) · 5 · 3 + (−2) · (−6) · (−3) + 4 · 2 · 1 − (−1) · (−6) · 1 − (−2) · 2 · 3 − 4 · 5 · (−3)
= −15 − 36 + 8 − 6 + 12 + 60 = 23 .

Having introduced all the general definitions and properties we should now discuss the geometrical inter-
pretations of the cross and triple products.

29
Geometrical interpretation of cross product:
Property (2.47) of the triple product implies that the cross product a × b is perpendicular to both vectors
a and b. For the length of a cross product it follows
(2.38) 1 (a · b)2 1
| a × b | = (| a |2 | b |2 −(a · b)) 2 =| a || b | (1 − ) 2 =| a | · | b | sin ^(a, b) (2.55)
| a |2 | b |2
| {z }
=1−cos2 (^(a,b))

From this result and Fig. 7 the length, |a × b|, of the cross product is equal to the area of the rectangle

Figure 7: Shear, leaves the area unchanged

indicated and, as this area is left invariant by a shear, it equals the area of the parallelogram defined
by the vectors a and b. In summary, we therefore see that the vector product a × b defines a vector
perpendicular to a and b whose length equals the area of the parallelogram defined by a and b.
This geometrical interpretation suggests a number of applications for the cross product. In particular,
it can be used to find a vector which is orthogonal to two given vectors and to calculate the area of the
parallelogram (and the triangle) defined by two vectors.

Example 2.6: Applications of the cross product


Consider the two vectors   
1 3
a =  −2  , b= 0  (2.56)
0 −1
with cross product      
1 3 2
c := a × b =  −2  ×  0  =  1  (2.57)
0 −1 6
It is easy to verify that c is indeed orthogonal to a and b, that is, c · a = c · b = 0. The area of the
parallelogram defined by a and b is given by

|a × b| = 41 , (2.58)

while the area of the triangle defined by a and b is half the area of the parallelogram, that is 41/2 for
the example.

Geometrical interpretation of triple product


We first note that the triple product of three standard unit vectors is he1 , e2 , e3 i = e1 · (e2 × e3 ) =

30
e1 · e1 = 1 which equals the volume of the unit cube. For three arbitrary vectors αe1 , βe3 , γe3 in the
directions of the coordinate axis we find, from linearity (2.44) of the triple product, that hαe1 , βe2 , γe3 i =
αβγhe1 , e2 , e3 i = αβγ which equals the volume of the cuboid with lengths α, β, γ. Suppose this cuboid
is sheared to a parallelepiped. As an example let us consider a shear in the direction of e3 , leading to a
parallelepiped defined by the vectors a = αe1 + δe3 , b = βe2 and c = γe3 . Then, by linearity of the triple
product and (2.25) we have ha, b, ci = αβγhe1 , e2 , e3 i+δβγhe3 , e2 , e3 i = αβγ, so the triple product is the
same for the cuboid and the parallelepiped related by a shear. It is clear that this remains true for general
shears. Since shears are known to leave the volume unchanged we conclude that the (absolute) value of
the triple product, |ha, b, ci|, for three arbitrary vectors a, b, c equals the volume of the parallelepiped
defined by these vectors.
This geometrical interpretation suggests that the triple product of three linearly dependent vectors
should be zero. Indeed, three linearly dependent vectors all lie in one plane and form a degenerate
parallelepiped with volume zero. To be sure let us properly formulate and proof this assertion.
Claim 2.1. ha, b, ci =
6 0 =⇒ a, b, c are linearly independent.
Proof. If a, b, c are linearly dependent then one of the vectors can be written as a linear combination of
the others, for example, a = βb + γc. From linearity of the triple product and (2.47) it then follows that

ha, b, ci = hβb + γc, b, ci = βhb, b, ci + γhc, b, ci = 0

We will later generalize this statement to arbitrary dimensions, using the determinant, and also show
that its converse holds. For the time being we note that we have obtained a useful practical way of
checking if three vectors in three dimensions are linearly independent and, hence, form a basis. In short,
6 0 then a, b, c form a basis of R3 .
if ha, b, ci =

Example 2.7: Applications of the triple product


In Example 1.7 we have seen that the three vectors
     
0 0 1
v1 =  1  , v2 =  1  , v3 =  1  (2.59)
1 2 −1

form a basis of R3 . We can now check this independently by computing the triple product of these vectors
and using Claim 2.1. For the triple product we find
   
0 −3
det(v1 , v2 , v3 ) = v1 · (v2 × v3 ) =  1  ·  2  = 1 . (2.60)
1 −1

Since this result is non-zero we conclude from Claim 2.1 that the three vectors are indeed linearly inde-
pendent and, hence, form a basis of R3 . Moreover, we have learned that the volume of the parallelepiped
defined by v1 , v2 and v3 equals 1.

2.3 Some geometry, lines and planes


The methods developed in this section can be applied to a wide range of geometrical problems in three
dimensions, involving objects such as lines and planes. These applications are in no way central to linear

31
algebra and are indeed part of another, related mathematical discipline called affine geometry. However,
given their importance in physics we will briefly discuss some of these applications.

2.3.1 Affine space


The “arena” of affine geometry is affine space An defined by
 
p1
n  ..
An = R = {P =  .  | pi ∈ R} .

pn
As a set this is the same as Rn , however, An is simply considered as a space of points without a vector
space structure. Vectors v ∈ Rn can act on points P in the affine space An by a translation defined as
 
p 1 + v1
P 7→ P + v = 
 .. 
. 
p n + vn
−→
The unique vector1 translating P = (p1 , . . . , pn )T ∈ An to Q = (q1 , . . . , qn )T ∈ An is denoted by P Q :=
(q1 − p1 , . . . , qn − pn )T ∈ Rn . It is easy to verify that
−→ −→ −→
P Q + QR = P R ,
a property which is also intuitively apparent from Fig. 8. The distance d(P, Q) between points P and Q

Figure 8: Addition of translation vectors


−→
is defined as the length of the corresponding translation vector P Q, so
−→
d(P, Q) := |P Q| .
We will frequently blur the distinction between the affine space An and the vector space Rn and identify
−→
a point P ∈ An with the vector OP ∈ Rn translating the origin O to P .

1
For ease of notation we will sometimes write a column v with components v1 , . . . , vn as v = (v1 , . . . , vn )T . The T super-
script, short for “transpose”, indicates conversion of a row into a column. We will discuss transposition more systematically
shortly.

32
2.3.2 Lines in R2
We begin by discussing lines in R2 . In Cartesian form they can be described as all points (x, y) ∈ R2
which satisfy the linear equation
y = ax + b (2.61)
where a, b are fixed real numbers. Alternatively, a line can be described in parametric vector form as all
vectors r(t) given by  
x(t)
r(t) = = p + tq , t∈R. (2.62)
y(t)
Here, p = (px , py )T and q = (qx , qy )T are fixed vectors and t is a parameter. The geometric interpretation
of these various objects is apparent from Fig. 9.

Figure 9: Lines in two dimensions

It is sometimes required to convert between those two descriptions of a line. To get from the Cartesian
to the vector form simply use x as the parameter so that x(t) = t and y(t) = at + b. Combining these two
equations into a vector equation gives
     
x(t) 0 1
r(t) = = +t (2.63)
y(t) b a
| {z } | {z }
p q

where the vectors p and q are identified as indicated. For the opposite direction, to get from the vector
to the Cartesian form, simply solve the two components of (2.62) for t so that
x − px y − py qy qy
t= = =⇒ y= x + py − px (2.64)
qx qy qx q
|{z} | {z x }
a b

and a and b are identified as indicated.

Example 2.8: Conversion between Cartesian and vector form for two-dimensional lines
(a) Start with a line y = 2x − 3 in Cartesian form. Setting x(t) = t and y(t) = 2t − 3 the vector form of
this line is given by        
x(t) t 0 1
r(t) = = = +t . (2.65)
y(t) 2t − 3 −3 2

33
(b) Conversely, the line in vector form given by
   
2 1
r(t) = +t (2.66)
−1 2

can be split up into the two components x = 2 + t and y = −1 + 2t. Hence, t = x − 2 and inserting this
into the equation for y gives the Cartesian form y = 2x − 5 of the line.

Finally, a common problem is to find the intersection of two lines given by r1 (t1 ) = p1 + t1 q1 and
r2 (t2 ) = p2 + t2 q2 . Setting r1 (t1 ) = r2 (t2 ) leads to

t1 q1 − t2 q2 = p2 − p1 . (2.67)

If q1 , q2 are linearly independent then they form a basis of R2 and, in this case, we know from Claim 1.2
that there is a unique solution t1 , t2 for this equation. The intersection point is obtained by computing
r1 (t1 ) or r2 (t2 ) for these values. If q1 , q2 are linearly dependent then the lines are parallel and either there
is no intersection or the two lines are identical.

Example 2.9: Insersection of two lines in two dimensions


We would like to determine the intersection point of the two lines
       
1 −1 3 1
r1 (t1 ) = + t1 , r2 (t2 ) = + t2 . (2.68)
2 1 0 2

Setting r1 (t1 ) = r2 (t2 ) leads to


     
−1 1 2 −t1 − t2 = 2
t1 − t2 = ⇐⇒ . (2.69)
1 2 −2 t1 − 2t2 = −2

The unique solution is t1 = −2 and t2 = 0, which are the parameter values of the intersection point. To
obtain the intersection point risec itself insert these parameter values into the equations for the lines which
gives  
3
risec = r1 (−2) = r2 (0) = . (2.70)
0

2.3.3 Lines in R3
The vector form for 2-dimensional lines (2.62) can be easily generalized to three dimensions:
 
x(t)
r(t) =  y(t)  = p + tq . (2.71)
z(t)

Here p = (px , py , pz )T and q = (qx , qy , qz )T are fixed vectors. As before, we can get to the Cartesian form
by solving the components of Eq. (2.71) for t resulting in
x − px y − py z − pz
t= = = . (2.72)
qx qy qz

34
Note that this amounts to two equations between the three coordinates x, y, z as should be expected for
the definition of a one-dimensional object (the line) in three dimensions. The geometrical interpretation
of the various vectors is indicated in Fig. 10.

Example 2.10: Conversion between Cartesian and vector form for lines in three dimensions
(a) We would like to convert the line in vector form given by
     
x(t) 2 3
r(t) =  y(t)  =  −1  + t  −5  (2.73)
z(t) 4 2

into Cartesian form. Solving the three components for t immediately leads to the Cartesian form
x−2 y+1 z−4
t= =− = . (2.74)
3 5 2

(b) Conversely, given the Cartesian form


x−7 z+3
t= =y+1= (2.75)
2 8
we can solve for x, y and z in terms of t which leads to x = 7 + 2t, y = −1 + t, z = −3 + 8t. Combining
these three equations into one vector equation gives the vector form of the line
     
x(t) 7 2
r(t) =  y(t)  =  −1  + t  1  . (2.76)
z(t) −3 8

Figure 10: Lines in three dimensions

For the minimum distance of a line from a given point we have the following statement.

35
d·q
Claim 2.2. The minimum distance of a line r(t) = p + tq from a point p0 arises at tmin = − |q| 2 where

d = p − p0 . The minimal distance is given by dmin = |d × q|/|q|.

Proof. We simply work out the distance square d2 (t) := |r(t) − p0 |2 of an arbitrary point r(t) on the line
from the point p0 which leads to

d·q 2 (d · q)2
 
2 2 2 2 2
d (t) =| d + tq | =| q | t + 2(d · q)t+ | d | = | q | t + + | d |2 − . (2.77)
|q| | q |2
d·q
This is minimal when the expression inside the square bracket vanishes which happens for t = tmin = − |q| 2.

This proves the first part of the claim. For the second part we simply compute

1 2
2 (2.38) | d × q |
d2min := d2 (tmin ) = 2 2

| d | | q | −(d · q) = . (2.78)
| q |2 | q |2

Example 2.11: Minimal distance of a line from a point


We would like to find the minimal distance of the line r(t) = p + tq from the point p0 , where
     
2 3 1
p=  −1  , q=  −5  , p0 =  1  . (2.79)
4 2 1

Using the notation from Claim 2.2 we have


       
1 1 3 11
d = p − p0 =  −2  , d × q =  −2  ×  −5  =  7  . (2.80)
3 3 2 1
√ √
Hence, |d × q| = 3 19 and |q| = 38 so that

|d × q| 3
dmin = =√ . (2.81)
|q| 2

2.3.4 Planes in R3
To obtain the vector form of a plane in three dimensions we can generalize Eq. (2.71), the vector form of
a 3-dimensional line, by introducing two parameters, t1 and t2 and define the plane as all points r(t1 , t2 )
given by  
x(t1 , t2 )
r(t1 , t2 ) =  y(t1 , t2 )  = p + t1 q + t2 s , t1 , t2 ∈ R , (2.82)
z(t1 , t2 )
where p, q and s are fixed vectors in R3 . Of course, for this to really define a plane (rather than a line)
the vectors q and s must be linearly independent. A unit normal vector to this plane is given by
q×s
n= . (2.83)
|q×s|

36
Multiplying the vector form (2.82) by n = (nx , ny , nz )T (and remembering that n · q = n · s = 0) we get
to the Cartesian form of a three-dimensional plane

n·r=d or nx x + ny y + nz z = d , (2.84)

where d = n · p is a constant. From Eq. (2.11) we can re-write the Cartesian form as cos(θ)|r| = d,
where θ = ^(n, r) is the angle between n and r. The distance |r| of the plane from the origin is minimal
if cos(θ) = ±1 which shows that the constant d (or rather its absolute value |d|) should be interpreted
as the minimal distance of the plane from the origin. The geometrical meaning of the various objects is
indicated in Fig. 11. Finally, to convert a plane in Cartesian form (2.84) into vector form we must first

Figure 11: Cartesian and vector descriptions of a plane

find a vector p with p · n = d (a vector “to the plane”) and then two linearly independent vectors q, r
satisfying q · n = r · n = 0 (two vectors “in the plane”). These are then three suitable vectors to write
down the vector form (2.82).

Example 2.12: Conversion between vector form and Cartesian form for a plane in three dimensions
(a) Start with a plane r(t1 , t2 ) = p + t1 q + t2 s in vector form, where
     
3 −1 2
p= 2  , q= 0  , s= 1  . (2.85)
0 1 −3

To convert to Cartesian form, we first work out a normal vector, N, to the plane, given by
     
−1 2 1
N=q×s=  0  ×  1  =− 1 
 (2.86)
1 −3 1

Then, the Cartesian form reads N · r = N · p and with N · p = −5 this leads to x + y + z = 5.


(b) Conversely, start with the plane 2x − 3y + z = 4 in Cartesian form with normal vector N = (2, −3, 1)T .
We need to find two linearly independent vectors “in the plane”, that is, vectors q and r satisfying
N · q = N · s = 0. Obvious simple choices are q = (1, 1, 1)T and s = (1, 0, −2)T . Further, we need a vector

37
“in the plane”, that is, a vector p satisfying N · p = 4 and we can choose, for example, p = (2, 0, 0)T .
Combining these results the vector form of the plane reads
     
2 1 1
r(t1 , t2 ) = p + t1 q + t2 s =  0  + t1  1  + t2  0  . (2.87)
0 1 −2

2.3.5 Intersection of line and plane


To study the intersection of a line and a plane first write down vector equations for each, so

rL (t) = a + tb , rP (t1 , t2 ) = p + t1 q + t2 s . (2.88)

Equating these two, that is rL (t) = rP (t1 , t2 ), leads to

t1 q + t2 s − tb = a − p . (2.89)

Let us assume that the triple product hb, q, si is non-zero so that, from Claim (2.1), the vectors b ,q, s
form a basis. In this case, Eq. (2.89) has a unique solution for t1 , t2 , t which corresponds to the parameter
values of the intersection point. This solution can be found by splitting (2.89) up into its three component
equations and explicitly solving for t1 , t2 , t. Perhaps a more elegant way to proceed is to multiply (2.89)
by (q × s), so that the terms proportional to t1 and t2 drop out. The resulting equation can easily be
solved for t which leads to
hp − a, q, si
tisec = . (2.90)
hb, q, si
This is the value of the line parameter t at the intersection point and we obtain the intersection point
itself by evaluating rL (tisec ).

Example 2.13: Intersection of line and plane in three dimensions


Consider a line rL (t) = a + tb and a plane rP (t1 , t2 ) = p + t1 q + t2 s in vector form with
         
1 1 2 1 1
a =  0  , b =  −1  , p =  3  , q =  0  , s =  1  . (2.91)
1 −1 1 1 2

By straightforward calculation we have hp − a, q, si = −4 and hb, q, si = −1 so that, from Eq. (2.90), the
value of the line parameter t at the intersection point is tisec = 4. The intersection point, risec , is then
obtained by evaluating the equation for the line at t = tisec = 4, so
 
5
risec = rL (4) =  −4  . (2.92)
−3

38
Figure 12: Sphere in three dimensions

2.3.6 Minimal distance of two lines


Two lines in three dimensions do not generically intersect but we can still ask about their minimal distance.
We begin with the two lines
ri (ti ) = pi + ti qi , where i = 1, 2 (2.93)
in vector form. One way to proceed would be in analogy with the proof of Claim 2.2, that is, by finding the
values of t1 , t2 for which the distance |r1 (t1 ) − r2 (t2 )| is minimal. However, this requires minimization of
a function of two variables, a technique you may not yet be familiar with. Alternatively we can introduce
the unit vector
q1 × q2
n= (2.94)
| q1 × q2 |
which is evidently perpendicular to both lines and start with the intuitive assertion that the vector of
minimal length between the two lines is in the direction of n. This leads to the relation
p1 + t1 q1 − p2 − t2 q2 = ±dmin n , (2.95)
where t1 , t2 are the parameter values of the points of minimal distance on the two lines and the sign on
the RHS should be chosen so that dmin ≥ 0 (since we would like a distance to be positive). By multiplying
the last equation with n it then follows easily that
dmin = |(p1 − p2 ) · n| (2.96)

Example 2.14: Minimal distance of two lines in three dimensions


We begin with two lines r1 (t1 ) = p1 + t1 q1 and r2 (t2 ) = p2 + t2 q2 in vector form, where
       
1 0 0 2
p1 =  0  , q 1 =  1  , p2 =  3  , q 2 =  0  . (2.97)
2 1 0 −1
The vector (2.94) normal to both lines is then given by
   
−1 −1
q1 × q2 1
q1 × q2 =  2  , n= =  2  . (2.98)
|q1 × q2 | 3
−2 −2

39
From Eq. (2.96) this gives the minimal distance dmin = |(p1 − p2 ) · n| = 11/3 between the two lines.

2.3.7 Spheres in R3
A sphere in R3 with radius ρ and center p = (a, b, c)T consists of all points r = (x, y, z)T with |r − p| = ρ.
Written out in coordinates this reads explicitly

(x − a)2 + (y − b)2 + (z − c)2 = ρ2 (2.99)

In particular, setting a = b = c = 0, a sphere around the origin is described by x2 + y 2 + z 2 = ρ2 . The RHS


of this equation is a particular example of a quadratic form, polynomials which consist of terms quadratic
in the variables. We will study quadratic forms in more detail later.

Application: The perceptron - a basic building block of artificial neural networks


Artificial neural networks constitute an important set of methods in modern computing which are moti-
vated by the structure of the human brain. Many of the operating principles of artificial neural networks
can be formulated and understood in terms of linear algebra. Here, we would like to introduce one of the
basic building blocks of artificial neural networks - the perceptron.
The structure of the perceptron is schematically illustrated in Fig. 13. It receives n real input values

x1 w1
x2 w2 z
⌃ f y
x3 w3

xn4 wn

Figure 13: Schematic representation of the perceptron

x1 , . . . , xn which can be combined into an n-dimensional input vector x = (x1 , . . . , xn )T . The internal state
of the perceptron is determined by three pieces of data: the real values w1 , . . . , wn , called the “weights”,
which can be arranged into the n-dimensional weight vector w = (w1 , . . . , wn )T , a real number θ, called
the “threshold” of the perceptron and a real function f , referred to as the “activation function”. In terms
of this data, the perceptron computes the output values y from the input values x as

z =w·x−θ , y = f (z) . (2.100)

The activation function can, for example, be chosen as the sign function

+1 for z ≥ 0
f (z) = . (2.101)
−1 for z < 0
Given this set-up, the functioning of the perceptron can be phrased in geometrical terms. Consider the
equation in Rn with coordinates x given by

w·x=θ . (2.102)

40
Note that this is simply the equation of a plane (or a hyper-plane in dimensions n > 3) in Cartesian form.
If a point x ∈ Rn is “above” (or on) this plane, so that w · x − θ ≥ 0, then, from Eqs. (2.100) and (2.101),
the output of the perceptron is +1. On the other hand, for a point x ∈ Rn below this plane, so that
w · x − θ < 0, the perceptron’s output is −1. In other words, the purpose of the perceptron is to “decide”
whether a certain point x is above or below a given plane.
So far this does not seem to hold much interest - all we have done is to re-formulate a sequence of
simple mathematical operations related to the Cartesian form a plane, in a different language. The point
is that the internal state of the perceptron, that is the choice of a plane specified by the weight vector w
and the threshold θ, is not inserted “by hand” but rather determined by a learning process. This works
as follows. Imagine a certain quantity, y, rapidly changes from −1 to +1 across a certain (hyper-) plane
in Rn whose location is not a priori known. Let us perform m measurements of y, giving measured values
y (1) , . . . , y (m) ∈ {−1, +1} at locations x(1) , . . . , x(m) ∈ Rn . These measurements can then be used to train
the perceptron. Starting from random values w(1) and θ(1) of the weight vector and the threshold we can
iteratively improve those values by carrying out the operations

w(a+1) = w(a) + λ(y (a) − y) x(a) , θ(a+1) = θ(a) − λ(y (a) − y) . (2.103)

Here, y is the output value produced by the perceptron given the input vector x(a) and λ is a real
value, typically chosen in the interval [0, 1], called the learning rate of the preceptron. Evidently, if the
value y produced by the perceptron differs from the true, measured value y (a) , the weight vector and the
threshold of the perceptron are adjusted according to Eqs. (2.103). This training process continues until
all measurements are used up and the final values w = w(m+1) and θ = θ(m+1) have been obtained. In
this state the perceptron can then be used to “predict” the value of y for new input vectors x. Essentially,
the perceptron has “learned” about the location of the plane via the training process and is now able to
decide whether a given point is located above or below.
In the context of artificial neural networks the perceptron corresponds to a single neuron. Proper arti-
ficial neural networks can be constructed by combining a number of perceptrons into a network, using the
output of certain perceptrons within the network as input for others. Such networks of perceptrons corre-
spond to collections of (hyper-) planes and are, for example, applied in the context of pattern recognition.
The details of this are beyond the present scope but are not too difficult to understand by generalising
the above discussion for a single perceptron.

41
3 Linear maps and matrices
We now return to the general story and consider arbitrary vector spaces. The next logical step is to study
maps between vector spaces which are “consistent” with the vector space structure - they are called linear
maps. As we will see, linear maps are closely related to matrices.

3.1 Linear maps


Before we consider linear maps we need to collect a few basic facts for general maps between arbitrary
sets. All these are elementary and considered part of basic mathematical literacy. They will help us to
deal with linear maps - and distinguish general properties of maps from more specific ones of linear maps
- but they are also foundational for other parts of mathematics.

3.1.1 General maps between sets and their properties


You probably have an intuitive understanding of a map between two sets but to be clear let us start with
the following definition.

Definition 3.1. A map between two sets X and Y assigns to each x ∈ X a y ∈ Y which is written as
y = f (x) and referred to as the image of x under f . In symbols,

f : X → Y, x 7→ f (x) .

The set X is the called domain of the map f , Y is called the co-domain of f . The set

Im(f ) = {f (x)|x ∈ X} ⊆ Y

is called the image of f and consists of all elements of the co-domain which can be obtained as images
under f .

Figure 14: Visual representation of a map, with domain, co-domain, and image.

So a map assigns to each element of the domain an element of the co-domain. However, note that not all
elements of the co-domain necessarily need to be obtained in this way. This is precisely what is encoded
by the image, Im(f ), the set of elements in the co-domain which can be “reached” by f . If Im(f ) = Y
then all elements of the co-domain are obtained as images, otherwise some are not. Also note that, while
each element of the domain is assigned to a unique element of the co-domain, two different elements of
the domain may well have the same image. It is useful to formalize these observations by introducing the
following definitions.

42
Definition 3.2. Let f : X → Y be a map between two sets X and Y . The map f is called
(i) one-to-one (or injective) if every element of the co-domain is the image of at most one element of the
domain. Equivalently, f is one-to-one iff f (x) = f (ex) =⇒ x = x
e for all x, x̃ ∈ X.
(ii) onto (or surjective) if every element of the co-domain is the image of at least one element of the
domain. Equivalently, f is onto iff Im(f ) = Y .
(iii) bijective, if it is injective and surjective, that is, if every element of the co-domain is the image of
precisely one element of the domain.

Figure 15: The above map is not one-to-one (injective), since f (x) = f (e
x) but x 6= x
e.

Figure 16: The above map is not onto (surjective), since Im(f ) 6= Y .

Example 3.1: Some simple examples of maps


(a) The map f : R → R defined by f (x) = ax, for a non-zero real constant a, is bijective.
(b) The map f : R → R defined by f (x) = x2 is neither injective (since f (x) = f (−x)) nor surjective (since
f (x) ≥ 0 always). However, it can be made surjective if we choose the co-domain to be R≥0 instead of R.
If we restrict both domain and co-domain to the positive numbers, so consider it as a map f : R≥0 → R≥0 ,
it is bijective.
(c) The sin function g(x) = sin x seen as a map g : R → R is neither injective (since g(x) = g(x + 2π))
nor surjective (since |g(x)| ≤ 1). However, restricted to g : [−π/2, π/2] → [−1, 1] it is bijective.

Two maps can be applied one after the other provided the co-domain of the first map is the same as the
domain of the second. This process is called composition of maps and is formally defined as follows.

Definition 3.3. For two maps f : X → Y and g : Y → Z the compositive map g ◦ f : X → Z is defined
by
(g ◦ f )(x) := g(f (x)) .

43
From this definition it is easy to show that map composition is associative, that is, h ◦ (g ◦ f ) = (h ◦ g) ◦ f ,
since
(h ◦ (g ◦ f ))(x) = h((g ◦ f )(x)) = h(g(f (x))) = (h ◦ g)(f (x)) = ((h ◦ g) ◦ f )(x) . (3.1)

Example 3.2: Map composition


For a simple example of map composition consider the maps in Example (3.1) (a), (c), that is f (x) = ax
and g(x) = sin(x). Their composition is g ◦ f (x) = sin(ax) or, in the opposite order, f ◦ g(x) = a sin(x).
In other words, if the maps are given as explicit functions then the composition is obtained by “inserting
one function into the other”. The example also shows that map composition is not commutative.

A trivial but useful map is the identity map idX : X → X which maps every element in X onto itself,
that is, idX (x) = x for all x ∈ X. It is required to define the important concept of inverse map.
Definition 3.4. Given a map f : X → Y , a map g : Y → X is called an inverse of f if

(g ◦ f ) = idX and (f ◦ g) = idY .

Figure 17: (a) The map g ◦ f : X → Z is the composition of maps f : X → Y and g : Y → Z,


(b) The identity map idX : X → X, (c) The map f −1 : Y → X is the inverse map of f : X → Y .

The inverse map “undoes” the effect of the original map and in order to construct such a map we need
to assign to each y ∈ Y in the co-domain an x ∈ X in the domain such that y = f (x). If the map is not
surjective this is impossible for some y since they are not in the image of f . On the other hand, if the
map is not injective some y ∈ Y are images of more than one element in the domain so that the required
assignment is not unique. Finally, if the map is bijective every y ∈ Y is the image of precisely one x ∈ X,
so we can attempt to define the inverse by setting g(y) = x for this unique x. Altogether this suggests the
following
Theorem 3.1. The map f : X → Y has an inverse if and only if f is bijective. If the inverse exists it is
unique and denoted by f −1 : Y → X.
Proof. “⇒” We assume that f : X → Y has an inverse g : Y → X with f ◦ g = idY and g ◦ f = idX .
To show that f is injective start with f (x) = f (x̃) and apply g from the left. It follows immediately that
x = x̃. To show surjectivity of we need to find, for a given y ∈ Y , an x ∈ X such that f (x) = y. We can
choose x = g(y) since f (x) = f ◦ g(y) = idY (y) = y. In conclusion f is bijective.
“⇐” We assume that f is bijective. Hence, for every y ∈ Y there is precisely one x ∈ X with f (x) = y.

44
We define the prospective inverse map by g(y) = x. Then f ◦ g(y) = f (x) = y and g ◦ f (x) = g(y) = x.
To show uniqueness we consider two maps g : Y → X and g̃ : Y → X with g ◦ f (x) = x = g̃ ◦ f (x). Setting
y = f (x) it follows that g(y) = g̃(y) and, since f is surjective this holds for all y ∈ Y . Hence, g = g̃.

If the maps f and g are both bijective, then it is easy to see that the composite map f ◦ g is also
bijective and, hence, has an inverse. This inverse of the composite map can be computed from the formula

(f · g)−1 = g −1 · f −1 . (3.2)
Note the change of order on the RHS of this equation. This relation follows from (f ◦ g)−1
◦ (f ◦ g) = id
−1 −1 −1 −1 −1
and (g ◦ f ) ◦ (f ◦ g) = id which implies that both (f ◦ g) and (g ◦ f ) provide an inverse for
f ◦ g. Uniqueness of the inverse function then leads to Eq. (3.2). Further we have
(f −1 )−1 = f (3.3)
from the uniqueness of the inverse and the fact that both f and (f −1 )−1 provide an inverse for f −1 .

Example 3.3: Inverse function


(a) In Example 3.1 (a), we have seen that the function f : R → R defined by f (x) = ax, where a ∈ R is
bijective for a 6= 0. Hence, there is an inverse function, which is explicitly given by f −1 (x) = x/a.
(b) In Example 3.1 (a), we have considered the function g(x) = sin(x) which was neither surjective nor
injective as a function g : R → R. However, with domain and co-domain restricted as g : [−π/2, π/2] →
[−1, 1] it it bijective and the inverse function is g −1 (x) = arcsin(x).

3.1.2 Linear maps: Definition and basic properties


We are now ready to discuss linear maps which are maps between two vector spaces - rather than general
sets as above - which, in addition, are consistent with the two vector space operations, that is, vector
addition and scalar multiplication. More precisely what we mean is:
Definition 3.5. A map f : V → W between two vector spaces V and W over a field F is called linear if
(L1) f (v1 + v2 ) = f (v1 ) + f (v2 )
(L2) f (αv) = αf (v)
for all v, v1 , v2 ∈ V and for all α ∈ F . Further, the set Ker(f ) = {v ∈ V | f (v) = 0} ⊂ V is called the
kernel of f .
A few remarks are in order. First, note that the two conditions for linearity are the obvious ones for
consistency with the vector space structure. Condition (L1) says that vector addition can be “pulled
through” linear maps and condition (L2) is a similar statement for scalar multiplication, that is, scalars
can be “pulled out” of linear maps. In short, linear maps are the maps between vector spaces which
are consistent with vector addition and scalar multiplication. As for any map, we can define the image
Im(f ) = {f (v) | v ∈ V } ⊂ W of the linear map f : V → W , which is a subset of the co-domain vector
space W . Since vector spaces have a special element - the zero vector 0 - we can define a further set
associated to a linear map, namely the kernel, Ker(f ). It is the subset of the domain vector space V which
consists of all vectors v ∈ V mapped to the zero vector, so f (v) = 0.
Following the standard mathematical approach we have defined linear maps by their properties rather
than by “what they are”. As we proceed we will gain some insight into the structure of linear maps and
also discuss many examples. We begin by summarizing a number of simple but important properties of
linear maps which follow from the above definition.

45
Lemma 3.1. (Properties of linear maps) A linear map f : V → W between two vector spaces V and W
over F has the following properties:
(i) The zero vectors are mapped into each other, so f (0) = 0. Hence 0 ∈ Kerf .
(ii) The kernel of f is a sub vector space of V .
(iii) The image of f is a sub vector space of W .
(iv) f surjective ⇔ Im(f ) = W ⇔ dim Im(f ) = dim W
(v) f injective ⇔ Ker(f ) = {0} ⇔ dim Ker(f ) = 0
(vi) The scalar multiple αf , where α ∈ F , is linear.
(vii) For another linear map g : V → W , the sum f + g is linear.
(vii) For another linear map g : W → U , the composition g ◦ f is linear.
(L2)
Proof. (i) f (0) = f (0 0) = 0f (0) = 0.
(ii) We need to check the two conditions in Def. 1.2. If v1 , v2 ∈ Ker(f ) then, by definition of the kernel,
f (v1 ) = f (v2 ) = 0. It follows that f (v1 + v1 ) = f (v1 ) + f (v2 ) = 0 so that v1 + v2 ∈ Ker(f ). Similarly,
if v ∈ Ker(f ), so that f (v) = 0 if follows that f (αv) = αf (v) = 0, hence, αv ∈ Ker(f ).
(iii) This is very similar to the proof in (ii) and we leave it as an exercise.
(iv) The first part, f surjective ⇔ Im(f ) = W , is clear by the definition of surjective and the image.
Clearly, if Im(f ) = W , then both spaces have the same dimension. Conversely, from Lemma 1.2, two
vector spaces with the same dimension and one contained in the other (here Im(f ) ⊂ W ) must be identical.
(v) Suppose f is injective and consider a vector v ∈ Kerf . Then f (v) = f (0) = 0, which implies that v = 0
and, hence, Ker(f ) = {0}. Conversely, assume that Ker(f ) = {0}. Then, from linearity, f (v1 ) = f (v2 )
implies that f (v1 − v2 ) = 0 so that v1 − v2 ∈ Ker(f ) = {0}. Hence, v1 − v2 = 0 and f is injective.
(vi) Simply check the linearity conditions for αf , for example (αf )(v1 + v2 ) = αf (v1 + v2 ) = α(f (v1 ) +
f (v2 )) = αf (v1 ) + αf (v2 ) = (αf )(v1 ) + (αf )(v2 ).
(vii) Check the linearity conditions for f + g, similar to what has been done in (vi).
(viii) Simply check the linearity conditions for g ◦f , given that they are satisfied for f and g. g ◦f (v+w) =
g(f (v + w)) = g(f (v) + f (w)) = g(f (v)) + g(f (w)) = g ◦ f (v) + g ◦ f (w) and g ◦ f (αv) = g(f (αv)) =
g(αf (v)) = αg(f (v)) = αg ◦ f (v).

The above Lemma contains a number of extremely useful statements. First of all, both the image and
the kernel of a linear map are sub vector spaces - as one would hope for maps designed to be consistent
with the vector space structure. This means we can assign dimensions to both of them. In fact, the
dimension of the image is of particular importance and is given a special name.

Definition 3.6. The dimension of the image of a linear map f is called the rank of f , in symbols
rk(f ) := dim Im(f ).

It might be difficult to check if a map is surjective or injective, using the original definitions of these prop-
erties. For linear maps, Lemma (3.1) gives simple criteria for both properties in terms of the dimensions
of image and kernel.
We have seen earlier that spaces of functions, if appropriately restricted, form vector spaces. The
same is in fact true for linear maps, a fact which will become important later when we discuss dual vector
spaces and which we formulate in the following

Lemma 3.2. The set of all linear maps f : V → W forms a vector space, also denoted Hom(V, W ),
(“homomorphisms from V to W ”). Vector addition and scalar multiplication are defined by (f + g)(v =
f (v) + f (w) and (αf )(v) = αf (v).

46
Proof. From Lemma 3.1, (vi), (vii) the scalar multiple of a linear map and the sum of two linear maps is
again a linear map. From Def. 1.2 this shows that the set of linear maps does indeed form a (sub) vector
space.

To get a better intuitive feel for the action of linear maps we recall our interpretation of sub vector spaces
as lines, planes and their higher-dimensional analogues through 0. We should think of both the kernel
and the image of a linear map in this way, the former residing in the domain vector space, the latter in
the co-domain.
To be concrete, let us consider a linear map f : R3 → R2 and let us assume that dim Ker(f ) = 2, that
is, the kernel of f is a plane through the origin in R3 . This situation is schematically shown in Fig. 18.
Recall that all vectors in the kernel of f , that is all vectors in the corresponding plane (the blue plane in
Fig. 18) are mapped to the zero vector. What is more, consider two vectors v1 , v2 ∈ Ker(f ) + k which
both lie in a plane parallel to Ker(f ), shifted by a vector k (the pink plane in Fig. 18). Then we have
v1 − v2 ∈ Ker(f ) so that f (v1 − v2 ) = 0 and, hence, by linearity f (v1 ) = f (v2 ). Therefore, not only do
all vectors in the kernel get mapped to the zero vector, but all vectors in a plane parallel to the kernel get
mapped to the same (although non-zero) vector. Effectively, the action of the linear map “removes” the
two dimensions parallel to the kernel plane and only keeps the remaining dimension perpendicular to it.
Hence, the image of this linear map is one-dimensional, that is a line through the origin, as indicated in
Fig. 18. This structure is indeed general as expressed by the following theorem.

Figure 18: Geometric representation of a linear map f : R3 → R2 . If dim(V ) = 3 and dim Ker(f ) = 2 it
follows that dim Im(f ) = 1.

Theorem 3.2. For a linear map f : V → W we have

dim Ker(f ) + rk(f ) = dim(V ) (3.4)

Proof. For simplicity of notation, set k = dim Ker(f ) and n = dim(V ). Let v1 , · · · , vk be a basis of Ker(f )
which we complete to a basis v1 , . . . , vk , vk+1 , . . . , vn of V . We will show that f (vk+1 ), . . . , f (vn ) forms
a basis of Im(f ). To do this we need to check the two conditions in Definition 1.4.
(B1) First we need to show that Im(f ) is spanned by f (vk+1 ), . . . , f (vn ). We begin with an arbitrary

47
vector w ∈ Im(f ). This vector must be the image of a v ∈ V , so that w = f (v). We can expand v as a
linear combination
Xn
v= αi vi
i=1
of the basis in V . Acting on this equation with f and using linearity we find
n n n
!
X X X
w = f (v) = f αi vi = αi f (vi ) = αi f (vi ),
i=1 i=1 i=k+1

Hence, we have written w as a linear combination of the vectors f (vk+1 ), . . . , f (vn ) which, therefore, span
the image of f .
(B2) For the second step, we have to show that the vectors f (vk+1 ), . . . , f (vn ) are linearly independent.
As usual, we start with the equation
n n
!
X X
αi f (vi ) = 0 ⇒ f α i vi = 0 .
i=k+1 i=k+1
Pn
The second of these equations means that the vector i=k+1 αi vi is in the kernel of f and, given that
v1 , . . . , vk form a basis of the kernel, there are coefficients α1 , . . . , αk such that
n
X k
X n
X
α i vi = − αi vi ⇒ α i vi = 0 .
i=k+1 i=1 i=1

Since v1 , . . . , vn form a basis of V it follows that all αi = 0 and, hence, f (vk+1 ), . . . , f (vn ) are linearly
independent.
Altogether, it follows that f (vk+1 ), · · · , f (vn ) form a basis of Im(f ). Hence, by counting the number of
basis elements dim Im(f ) = n − k = dim(V ) − dim Ker(f ).

The dimensional formula (3.4) is a profound statement about linear maps and it will be immensely
helpful to understand the solution structure of systems of linear equations. For now we draw a few easy
conclusions:
Claim 3.1. For a linear map f : V → W we have:
(i) f bijective (has an inverse) implies that dim(V ) = dim(W ).
(ii) For dim(V ) = dim(W ) = n the following conditions are equivalent:
f is bijective (has an inverse) ⇐⇒ dim Ker(f ) = 0 ⇐⇒ rk(f ) = n
(iii) If f is invertible then the inverse f −1 : W → V is also a linear map.
Proof. (i) If f is bijective, it is injective and surjective, so from Lemma 3.1 (v),(vi) dim Ker(f ) = 0 and
dim Im(f ) = dim(W ). Then, from Eq. (3.4), dim(V ) = dim Ker(f ) + dim Im(f ) = 0 + dim(W ) = dim(W ).
(ii) This is an easy consequence of Theorem 3.2 and Lemma 3.1 (v), (vi) and we leave the proof as an
exercise.
(iii) We set w1 = f (v1 ), w2 = f (v2 ) and w = f (v) and check the linearity conditions in Def. 3.5 for f −1 .

f −1 (w1 + w2 ) = f −1 (f (v1 ) + f (v2 )) = f −1 (f (v1 + v2 )) = v1 + v2 = f −1 (w1 ) + f −1 (w2 )


f −1 (αw) = f −1 (αf (v)) = f −1 (f (αv)) = αv = αf −1 (w)

48
Part (i) of the above claim says we can have invertible linear maps only between vector spaces of the
same dimension. If the dimensions are indeed the same, part (ii) tells us the map is invertible precisely if
its rank is maximal, that is, if the rank is equal to the dimension of the space.

Example 3.4: Examples of linear maps


(a) Linear maps f : F n → F m from matrices
Consider an n-dimensional column vector v ∈ F n with components vi and an m × n matrix
 
a11 · · · a1n
A =  ... ..  (3.5)

. 
am1 · · · amn

with entries aij ∈ F . We denote the column vector which consists of the entries in the ith row of A by ai .
To map n-dimensional into m-dimensional column vectors, we need to provide m functions each of which
can, in general, depend on all n coordinates vi . To obtain a linear map it seems intuitive that we should
choose these functions linear in the coordinates vi and the most general such possibility can be written
down using the coefficients of the above matrix A. It is given by
   
a11 v1 + · · · + a1n vn a1 · v
f (v) =  .. ..
=  =: Av . (3.6)
   
. .
am1 v1 + · · · + amn vn am · v

By the last equality, we have defined the multiplication Av of an m × n matrix A with an n-dimensional
vector v. By definition, this multiplication is carried out by forming the dot product between the vector
and the rows of the matrix, as indicated above 2 . Evidently, this only makes sense if “sizes fit”, that is,
if the number of components of the vector equals the number of columns of the matrix. The outcome
is a column vector whose dimension equals the number of rows of the matrix. Using index notation,
multiplication of vectors by matrices and the above linear map can more concisely be written as
n
X

f (v) i = aij vj = aij vj , (3.7)
j=1

where a sum over j is implied by the Einstein summation convention in the last expression. Using this
notation it is quite straightforward to check that f satisfies the conditions for a linear map in Definition 3.5.

f (v + w)i = aij (vj + wj ) = aij vj + aij wj = f (v)i + f (w)i (3.8)


f (αv)i = aij (αvj ) = α(aij vj ) = αf (v)i (3.9)

We conclude that Eq. (3.6) indeed defines a linear map and that, via the multiplication of matrices with
vectors, we can define such a map for each matrix A. In short, multiplication of n-dimensional column
vectors by a m × n matrix A corresponds to a linear map f : F n → F m . Conversely, we can ask if every
linear map between column vectors can be obtained from a matrix in this way. We will return to this
question shortly and see that the answer is “yes”.
Since this is the first time we encounter the multiplication of matrices and vectors an explicit numerical
2
Strictly, we have called this expression
P dot product only for the real case, F = R. For the present purpose we mean by
dot product any expression a · v = i ai vi , where ai , vi ∈ F and F is an arbitrary field.

49
example might be helpful. Consider, the 4 × 3 matrix and the three-dimensional vector
 
1 0 −1  
 2 1 1
3 

A=  −2 1 , v =  −2  . (3.10)
1 
3
0 0 4

Their product is obtained by dotting v into the rows of A, so


   
1 0 −1   −2
 2 1
1 3 
  −2  =  9  =: w ,
 
Av =  −2 (3.11)
1 1   −1 
3
0 0 4 12

resulting in the four-dimensional vector w. Stated another way, we can view this matrix as a linear map
R3 → R4 and we have just explicitly computed the image w of the vector v under this linear map.
(b) Coordinate maps
We have seen earlier that a vector can be uniquely described by its coordinates relative to a basis, see
Claim 1.2. We can now use the notion of linear maps to make this more precise. Consider an n-dimensional
vector space V over F with basis v1 , . . . , vn . By α, β ∈ F n we denote n-dimensional column vectors with
components αi and βi , respectively, and we define the coordinate map ϕ : F n → V by
n
X
ϕ(α) = α i vi . (3.12)
i=1

This map assigns to a coordinate vector the corresponding vector, relative to the given basis. It is easy
to verify that it is linear.
n
X n
X n
X
ϕ(α + β) = (αi + βi )vi = αi vi + βi vi = ϕ(α) + ϕ(β) (3.13)
i=1 i=1 i=1
Xn n
X
ϕ(aα) = (aαi )vi = a αi vi = aϕ(α) . (3.14)
i=1 i=1

Since v1 , . . . , vn forms a basis it is clear that Im(ϕ) = V and, hence, from Claim 3.1 P
(ii), ϕ is bijective
and has an inverse ϕ : V → F . Clearly, the inverse map assigns to a vector v = ni=1 αi vi ∈ V its
−1 n

coordinate vector α, so explicitly


n
!
X
−1 −1
ϕ (v) = ϕ αi vi = α (3.15)
i=1

A linear and bijective map between two vector spaces is also referred to as a (vector space) isomorphism
and two vector spaces related by such a map are called isomorphic. What the above discussion shows is
that every n-dimensional vector space V over F is isomorphic to F n by means of a coordinate map ϕ.
However, it should be noted that this isomorphism is not unique as it depends on the choice of basis.
For an explicit example of a coordinate map consider Example 1.7, where we have considered R3 with
basis      
0 0 1
v1 =  1  , v 2 =  1  , v3 =  1  . (3.16)
1 2 −1

50
The coordinate map for this basis is given by
 
α3
ϕ(α) = α1 v1 + α2 v2 + α3 v3 =  α1 + α2 + α3  . (3.17)
α1 + 2α2 − α3

(c) Linear differential operators


A linear differential operator of order n has the form
n
X dk
L= pk (x)
dxk
k=0

where pk (x) are fixed real-valued functions of x ∈ R. If we denote by V the vector space of infinitely many
times differentiable functions g : R → R then we can view this differential operator as a map L : V → V .
Since single differentiation and multiplication with fixed functions are each linear operations and L is a
composition of such operations it is clear that L is a linear map. We can also verify this explicitly:
n n n
X dk X dk g1 X dk g2
L(g1 + g2 ) = pk (x) k (g1 + g2 ) = pk (x) k + pk (x) k = L(g1 ) + L(g2 ) (3.18)
dx dx dx
k=0 k=0 k=0
n n
X dk X dk g
L(αg) = pk (x) (αg) = α pk (x) = αL(g) . (3.19)
dxk dxk
k=0 k=0

The solutions to the homogeneous differential equation

L(g) = 0 (3.20)

are given by the kernel, Ker(L). For any linear map the kernel is a (sub) vector space and, for the present
example, this means that any linear combination of solutions of the differential equation is also a solution.
As an explicit example consider the second order linear differential operator

d2 d
L= 2
+4 −5. (3.21)
dx dx
The associated homogeneous differential equation L(g) = 0 has the two solutions, g1 (x) = exp(x) and
g2 (x) = exp(−5x), but, from linearity, any linear combination g(x) = αg1 (x) + βg2 (x) = α exp(x) +
β exp(−5x) is also a solution.

3.2 Matrices and their properties


In the previous section we have introduced linear maps and we have seen that a prominent class of exam-
ples for such maps can be obtained from matrices. It is now time to be more serious about matrices and
develop their theory more systematically, both to obtain practical computational tools for matrices and
to gain more insight into the nature of linear maps. We begin low-key by reviewing some of the matrix
properties encountered so far and by adding a few further basic definitions.

51
3.2.1 Basic matrix properties
We consider matrices of arbitrary size n × m (that is, with n rows and m columns) given by
 
a11 · · · a1m
 .. ..
A= .  . (3.22)

.
an1 · · · anm

Here aij ∈ F , with i = 1, . . . , n and j = 1, . . . , m, are the (real or complex) entries. The matrix A is called
quadratic if n = m, that is if it has as many rows as columns. It is often useful to be able to refer to the
entries of a matrix by the same symbol and, by slight abuse of notation, we will therefore also denote the
entries by Aij = aij . We already know that the matrices of a given size form a vector space with vector
addition and scalar multiplication defined component by component, that is, (A + B)ij = Aij + Bij and
(αA)ij = αAij . We will frequently need to refer to the row and column vectors of a matrix for which we
introduce the following notation:
   
Ai1 A1j
Ai =  ...  , Aj =  ...  . (3.23)
   

Aim Anj

Hence, Ai is an m-dimensional column vector which contains the entries in the ith row of A and Aj is
an n-dimensional column vector which contains the entries in the j th column of A. We also recall that A
defines a linear map A : F m → F n via multiplication of matrices and vectors which can be written as
 
A1 · v m
A : v 7→ Av =  .
.
X
 or (Av)i = Aij vj . (3.24)
 
.
An · v j=1

A very specific matrix is the unit matrix 1n : F n → F n given by


 
1 0
1n = 
 ..
.

 (3.25)
0 1

Its row and columns vectors are given by the standard unit vectors, so Ai = ei and Aj = ej , and its
components (1n )ij = δij equal the Kronecker delta symbol introduced earlier. For its action on a vector
we have
(1v)i = δij vj = vi (3.26)
so, seen as a linear map, the unit matrix corresponds to the identity map.
More generally, a diagonal matrix is a matrix D with non-zero entries only along the diagonal, so
Dij = 0 for all i 6= j. It can be written as
 
d1 0
D=
 ..  =: diag(d1 , . . . , dn )

(3.27)
.
0 dn

The complex conjugate A∗ : F m → F n of a matrix A : F m → F n is simply the matrix whose entries are
the complex conjugates of the entries in A, so in component form, (A∗ )ij = (Aij )∗ . Of course, for matrices
with only real entries complex conjugation is a trivial operation which leaves the matrix unchanged.

52
The transpose of an n × m matrix A : F m → F n is a m × n matrix AT : F n → F m obtained by exchanging
the rows and columns of A. In component form, this means (AT )ij := Aji . A quadratic matrix A is said
to be symmetric if A = AT or, in index notation, if all entries satisfy Aij = Aji . It is called anti-symmetric
if A = −AT or, Aij = −Aji for all entries. Note that all diagonal entries Aii of an anti-symmetric matrix
vanish. We have
(A + B)T = AT + B T , (αA)T = αAT (3.28)
for n×m matrices A, B and scalars α. In particular, the sum of two symmetric matrices is again symmetric
as is the scalar multiple of a symmetric matrix (and similarly for anti-symmetric matrices). This means
that symmetric and anti-symmetric n × n matrices each form a sub vector space within the vector space
of all n × n matrices.

Example 3.5: Transpose of a matrix, symmetry and anti-symmetry


(a) An explicit example for a matrix and its transpose is
 
1 3  
T 1 2 0
A =  2 −1  , A = (3.29)
3 −1 4
0 4

Note that, for non-quadratic matrices, the transpose changes the “shape” of the matrix. While A above
is a 3 × 2 matrix defining a linear map A : R2 → R3 , its transpose is a 2 × 3 matrix which defines a linear
map AT : R3 → R2 .
(b) The general form of 2 × 2 symmetric and anti-symmetric matrices is
   
a b 0 b
Asymm = , Aanti−symm = (3.30)
b d −b 0

with arbitrary numbers a, b, d ∈ F . The dimension of the vector space of symmetric 2 × 2 matrices over
F is three (as they depend on three independent numbers) while the dimension of the vector space of
anti-symmetric 2 × 2 matrices over F is one (as they depend on one parameter). In particular, note that
the diagonal of an anti-symmetric matrix vanishes. It is easy to write down a basis for these vector spaces
and also to generalize these statements to matrices of arbitrary size.

Finally, a combination of the previous two operations is the hermitian conjugate of an n × m matrix
A : F m → F n which is defined as a m × n matrix A† : F n → F m obtained by taking the complex
conjugate of the transpose of A, that is, A† := (AT )∗ . For matrices with only real entries, hermitian
conjugation is of course the same as transposition. A quadratic matrix A is said to be hermitian if the
matrix is invariant under hermitian conjugation, that is, if A = A† , and anti-hermitian if A = −A† . In
analogy with the properties of transposition we have

(A + B)† = A† + B † , (αA)† = α∗ A† . (3.31)

The first property means that the sum of two hermition (anti-hermitian) matrices is again hermitian (anti-
hermitian). More care is required for scalar multiples. The scalar multiple of a hermitian (anti-hermitian)
matrix with a real scalar remains hermitian (anti-hermitian). However, for a complex scalar this is no
longer generally the case due to the complex conjugation of the scalar in the second equation (3.31). This
means the n × n hermitian (anti-hermitian) matrices form a sub vector space of the vector space of all
n × n matrices with complex entries, but only if the underlying field is taken to be the real numbers.

53
Example 3.6: Hermitian conjugate
(a) An explicit example for a 3 × 3 matrix with complex entries and its hermitian conjugate is
   
i 1 2−i −i 2 1+i
A= 2 3 −3i  , A† =  1 3 4  . (3.32)
1−i 4 2+i 2+i 3i 2 − i
Note that, in addition to the transposition carried out by exchanging rows and columns, all entries are
complex conjugated.
(b) The hermitian conjugate of an arbitrary 2 × 2 matrix with complex entries is
   ∗ ∗ 
a b † a c
A= , A = (3.33)
c d b∗ d∗
For A to be hermitian we need that a = a∗ , d = d∗ , so that the diagonal is real, and c = b∗ . Hence, the
most general hermitian 2 × 2 matrix has the form
 
a b
Aherm = , a, d ∈ R , b ∈ C . (3.34)
b∗ d
These matrices form a four-dimensional vectors space (over R) since they depend on four real parameters.
For an anti-hermitian matrix, the corresponding conditions are a = −a∗ , d = −d∗ , so that the diagonal
must be purely imaginary, and c = −b∗ . The most general such matrices are
 
a b
Aanti−herm = , a, d ∈ iR , b ∈ C . (3.35)
−b∗ d
and they form a four-dimensional vector space over R.

3.2.2 Rank of a matrix


Previously, we have defined the rank of a linear map as the dimension of its image. Since every matrix
defines a linear map we can, therefore, talk about the rank of a matrix. Can we be more specific about
what the rank of a matrix is and how it can be determined?
Consider an n × m matrix A : F m → F n with columns A1 , · · · , Am , and a vector v ∈ F m with
components vi . The image of v under the action of A can then be written as
m
X
Av = vi A i , (3.36)
i=1

and is hence given by a linear combination of the column vectors Aj with the coefficients equal to the
components of v. This observation tells us that
Im(A) = Span(A1 , · · · , Am ) , (3.37)
so the image of the matrix is spanned by its column vectors. For the rank of the matrix this implies that
rk(A) = dim Span(A1 , · · · , Am ) = maximal number of lin. indep. column vectors of A . (3.38)
For obvious reasons this is also sometimes called the column rank of the matrix A. This terminology
suggests we can also define the row rank of the matrix A as the maximal number of linearly independent
row vectors of A. Having two types of ranks available for a matrix seems awkward but fortunately we
have the following

54
Theorem 3.3. Row and column rank are equal for any matrix.

Proof. Suppose one row, say A1 , of a matrix A can be written as a linear combination of the others.
Then, by dropping A1 from A we arrive at a matrix with one less row, but its row rank unchanged from
that of A. The key observation is that the column rank also remains unchanged under this operation.
This can be seen as follows. Write
 
n α2
α =  ... 
X
A1 = αj Aj ,
 
j=2 αn

with some coefficients α2 , . . . , αn which we have arranged into the vector α. Further, let us write the
column vectors of A as  
i ai
A = ,
bi
that is, we split off the entries in the first row, denoted by ai , from the entries
Pn in the remaining
Pn n − 1 irows
which are contained in the vectors bi . It follows that ai = A1i = (A1 )i = j=2 αj Aji = j=2 αj (A )j =
α · bi , so that the column vectors can also be written as
 
i α · bi
A = ,
bi

Hence, the entries in the first row are not relevant for the linear independence of the column vectors Ai -
merely using the vectors bi will lead to the same conclusions for linear independence. As a result we can
drop a linearly dependent row without changing the row and the column rank of the matrix. Clearly, an
argument similar to the above can be made if we drop a linearly dependent column vectors - again both
the row and column rank remain unchanged.
In this way, we can continue dropping linearly dependent row and column vectors from A until we
arrive at a (generally smaller) matrix A0 which has linearly independent row and column vectors and the
same row and column ranks as A. On purely dimensional grounds, a matrix with all row vectors and all
column vectors linearly independent must be quadratic (For example, consider a 3 × 2 matrix. Its three
2-dimensional row vectors cannot be linearly independent.). Therefore, row and column rank are the same
for A0 and, hence, for A.

The above interpretation of the rank of a matrix as the maximal number of linearly independent row
or column vectors gives us a practical way to determine the rank of the matrix, using the methods to
check linear independence we have introduced in Section 1. Efficient, algorithmic methods for this will be
introduced in the next sub-section but for smaller matrices the rank can often be found “by inspection”,
as in the following example.

Example 3.7: Rank of a matrix by inspection, kernel and image of a matrix


(a) The 2 × 2 matrix  
2 −1
A= (3.39)
1 0
defines a map A : R2 → R2 . Clearly, A has rank two since its two columns are linearly independent.
This means that the image of A is two-dimensional and, hence, Im(A) = R2 . From the dimensional
formula (3.4) it also follows that dim Ker(A) = 0, so that the kernel is trivial, Ker(A) = {0}.

55
(b) Consider the 3 × 3 matrix  
−1 4 3
A =  2 −3 −1  . (3.40)
3 2 5
which defines a map A : R3 → R3 . It is clear that the first two columns of this matrix are linearly
independent (they are not multiples of each other) and that the third column is the sum of the first two.
Hence, the rank of this matrix is two. This means that the dimension of its image is two while, from
Eq. (3.4), the dimension of its kernel is one. To find the kernel of A explicitly we have to solve Av = 0.
With v = (x, y, z)T this leads to

−x + 4y + 3z = 0 , 2x − 3y − z = 0 , 3x + 2y + 5z = 0 , (3.41)

and these equations are solved precisely if x = y = −z. The image of A is, in general, spanned by the
column vectors, but since A3 = A1 + A2 it is already spanned by the first two columns. In conclusion,
we have      
1 −1 4
Ker(A) = Span  1  , Im(A) = Span  2  ,  −3  . (3.42)
−1 3 2

3.2.3 Linear maps between column vectors


Now we return to the question posed in the previous section. We saw how an n × m matrix A corresponds
to a linear map f : F m → F n . Is it the case that every linear map f : F m → F n can be obtained from a
matrix in this way?
To answer this question, we begin with an arbitrary linear map f : F m → F n , the standard unit
vectors ei of F m and the standard unit vectors ẽi of F n . Let us consider the images, f (ei ), of these
standard unit vectors under our linear map. While we do not know what these images are explicitly it is
clear that they can be expanded in terms of the basis ẽi , that is, we can write
n
X
f (ej ) = aij ẽi , (3.43)
i=1

set of coefficients aij . Now consider an arbitrary vector v ∈ F m written as a linear


for some suitable P
combination v = m j=1 vj ej . For the image under f of this vector we find
   
Xm m
X m
X n
X n
X m
X
f (v) = f  vj ej  = vj f (ej ) = vj aij ẽi =  aij vj  ẽi . (3.44)
j=1 j=1 j=1 i=1 i=1 j=1

Hence, for the ith component of this image we have


m
X
 
f (v) i = aij vj = (Av)i , (3.45)
j=1

where A is the matrix with entries aij , these being the coefficients which appear in the parametrization
of the images (3.43). We have therefore succeeded in expressing the action of our arbitrary linear map f
in terms of a matrix and we conclude that all linear maps between column vectors are given by matrices.
We summarize this in the following

56
Lemma 3.3. Every linear map f : F m P → F n can be written in terms of an n × m matrix A, such that
f (v) = Av for all v ∈ F . If f (ej ) = ni=1 aij ẽi for the standard unit vectors ei of F m and ẽi of F n ,
m

then aij are the entries of A.

Example 3.8: Matrix describing a linear map R3 → R3


Consider a fixed vector n ∈ R3 and a map f : R3 → R3 defined by f (v) = n×v. From the properties (2.31)–
(2.33) of the vector product, it is easy to show that this map is linear:

f (v1 + v2 ) = n × (v1 + v2 ) = n × v1 + n × v2 = f (v1 ) + f (v2 ) (3.46)


f (αv) = n × (αv) = αn × v = αf (v) (3.47)

Hence, we know from Lemma 3.3 that f can be described by a 3 × 3 matrix A which can be worked out
by studying the action of f on the standard unit vectors ei . Writing n = (n1 , n2 , n3 )T we find by explicit
computation

f (e1 ) = n × e1 = n3 e2 − n2 e3
f (e2 ) = n × e2 = −n3 e1 + n1 e3 (3.48)
f (e3 ) = n × e3 = n2 e1 − n1 e2 .

Lemma 3.3 states that the coefficients in front of the standard unit vectors on the right-hand sides of
these equations are the entries of the desired matrix A. More precisely, the coefficients which appear in
the expression for f (ej ) form the j th column of the matrix A. Hence, the desired matrix is
 
0 −n3 n2
A =  n3 0 −n2  , (3.49)
−n2 n1 0

and we have f (v) = n × v = Av for all vectors v ∈ R3 . The interesting conclusion is that vector products
with a fixed vector n can also be represented by multiplication with the anti-symmetric matrix (3.49).
Everything is much more elegant in index notation where

Aij = [f (ej ]i = [n × ej ]i = ikl nk [ej ]l = ikl nk δjl = ikj nk , (3.50)

so that Aij = ikj nk , in agreement with Eq. (3.49).

3.2.4 Matrix multiplication


We have seen earlier that the composition of linear maps is again linear and we have just shown that all
linear maps between column vectors are matrices. Hence, the composition of two matrices must again be
a matrix. To work this out more explicitly, we start with an n × m matrix A and an r × n matrix B which
A B
generate linear maps according to the chain F m −→ F n −→ F r . We would like to determine the matrix
C which describes the composite map B ◦ A : F m → F r . By a straightforward computation we find
 
n m X n m
!
X X X
(B(Av))i = Bij (Av)j =  Bij Ajk vk =
 Cik vk = (Cv)i (3.51)
j=1 k=1 j=1 k=1
| {z }
Cik

57
so that the entries Cik of C are given by
n
X
Cik = Bij Ajk = Bi · Ak (3.52)
j=1

This equation represents the component version of what we refer to as matrix multiplication. We obtain
the entries of the new matrix C - which corresponds to the composition of B and A - by performing the
summating over the entries of B and A as indicated in the middle of (3.52) or, equivalently, by dotting
the columns of A into the rows of B, as indicated on the RHS of (3.52). In matrix notation this can also
be written as
B1 · A1 · · · B1 · Am
 

C = BA :=  .. ..
 . (3.53)
 
. .
Br · Am · · · Br · Am
Note that the product of the r × n matrix B with the n × m matrix A results in the r × m matrix C = BA.
The important conclusion is that composition of matrices - in their role as linear maps - is accomplished
by matrix multiplication.
We should discuss some properties of matrix multiplication. First note that the matrix product BA
only makes sense if “sizes fit”, that is, if A has as many rows as B columns - otherwise the dot products
in (3.53) do not make sense. This consistency condition is of course a direct consequence of the role of
matrices as linear maps. The maps B and A can only be composed to BA if the co-domain of A has the
same dimension as the domain of B. Let us illustrate this with the following

Example 3.9: Matrix multiplication


Consider the two matrices
 
  0 1 1
1 0 −1
B= , A= 2 0 1  (3.54)
2 3 −2
1 −1 1

of sizes 2 × 3 and 3 × 3, respectively. Dotting the column vectors of A into the rows of B we can compute
their product  
  0 1 1  
1 0 −1  −1 2 0
BA = 2 0 1 = , (3.55)
2 3 −2 4 4 3
1 −1 1
a 2 × 3 matrix. Note, however, that the product AB is ill-defined since B has two rows but A has 3
columns.

Matrix multiplication is associative, so

A(BC) = (AB)C . (3.56)

This is a direct consequence of the associativity of map composition (see the discussion around Eq. (3.1))
but can also be verified directly from the definition of matrix multiplication. This is most easily done in
index notation (using Eq. (3.52)) which gives

(A(BC))ij = Aik (BC)kj = Aik Bkl Clj = (AB)il Clj = ((AB)C)ij . (3.57)

58
However, matrix multiplication is not commutative, that is, typically, AB 6= BA. The “degree of non-
commutativity” of two matrices is often measured by the commutator defined as

[A, B] := AB − BA (3.58)

Evidently, the matrices A, B commute if and only if [A, B] = 0.

Example 3.10: Non-commutativity of matrix multiplication


(a) Consider the two matrices
   
1 2 3 −1
A= , B= (3.59)
−1 0 0 2

By straightforward computation we have


         
1 2 3 −1 3 0 3 −1 1 2 2 6
AB = = , BA = = , (3.60)
−1 0 0 2 −3 1 0 2 −1 0 2 0

so that indeed AB 6= BA.


(b) Note that matrices with a specific structure may still commute. For example, it is easy to check that
the matrices    
a b c d
A= , B= (3.61)
b a d c
for arbitrary real numbers a, b, c, d satisfy [A, B] = 0.

What is the relation between multiplication and transposition of matrices? The answer is

(AB)T = B T AT . (3.62)

Note the change of order on the RHS! A proof of this relation is most easily accomplished in index notation:

(AB)T = (AB)ji = Ajk Bki = Bki Ajk = (B T )ik (AT )kj = (B T AT )ij .

ij
(3.63)
For the complex conjugation of a matrix product we have of course (AB)∗ = A∗ B ∗ , so together with
Eq. (3.62) this means for the hermitian conjugate that

(AB)† = B † A† . (3.64)

Finally, using matrix terminology, we can think of vectors in a slightly different way. A column vector v
with components v1 , . . . , vm can also be seen as an m × 1 matrix and the action Av of an n × m matrix A
on v as a matrix multiplication. The transpose, vT = (v1 , . . . , vm ) is an m dimensional row vector and,
hence, the dot product of two m-dimensional (column) vectors v and w can also be written as

v · w = vT w , (3.65)

that is, as a matrix product between the 1 × m matrix vT and the m × 1 matrix w.

59
3.2.5 The inverse of a matrix
Recall from Claim 3.1, that a linear map f : V → W can only have an inverse if dim(V ) = dim(W ).
Hence, for a matrix A : F m → F n to have an inverse it must be quadratic, so that n = m. Focusing
on quadratic n × n matrices A we further know from Claim 3.1 that we have an inverse precisely when
rk(A) = n, that is, when the rank of A is maximal. In this case, the inverse of A, denoted A−1 , is the
unique linear map (and, therefore, also a matrix) satisfying
AA−1 = A−1 A = 1n . (3.66)
Note that this is just the general Definition (3.4) of an inverse map applied to a matrix, using the fact
that matrices correspond to linear maps and map composition corresponds to matrix multiplication. We
summarize the properties of the matrix inverse in the following
Lemma 3.4. (Properties of matrix inverse) A quadratic n × n matrix A : F n → F n is invertible if and
only if its rank is maximal, that is, iff rk(A) = n. If A, B are two invertible n × n matrices we have
(a) The inverse matrix, denoted A−1 , is the unique matrix satisfying AA−1 = A−1 A = 1n .
(b) (AB)−1 = B −1 A−1
(c) A−1 is invertible and ((A)−1 )−1 = A
(d) AT is invertible and (AT )−1 = (A−1 )T
Proof. (a) This has already been shown above.
(b) (c) These are direct consequences of the corresponding properties (3.2), (3.3) for general maps.
(d) Recall from Claim 3.1 that a matrix is invertible iff its rank is maximal. Since, from Theorem 3.3,
rk(A) = rk(AT ), we conclude that AT is indeed invertible which proves the first part of the claim. For the
second part, we transpose A−1 A = AA−1 = 1, using Eq. (3.62), to arrive at AT (A−1 )T = (A−1 )T AT = 1.
On the other hand, from the definition of the inverse for AT , we have AT (AT )−1 = (AT )−1 AT = 1.
Comparing the two equations shows that both (A−1 )T and (AT )−1 provide an inverse for AT and, hence,
from the uniqueness of the inverse, they must be equal.

Application: Matrices in graph theory


Graphs are objects consisting of a certain number of vertices, V1 , . . . , Vn and links connecting these vertices.
A simple example with five vertices is shown in Fig. 19. Here we focus on undirected graphs as in Fig. 19

V2 V3

V4

V1 V5

Figure 19: A simple (undirected) graph with five vertices.

for which the links have no direction, but our considerations can easily be generalized to directed graphs.
Graphs can be related to linear algebra via the adjacency matrix which is defined by

1 if Vi and Vj are linked
Mij = (3.67)
0 otherwise

60
For example, for the graph in Fig. 19 the adjacency matrix is given by
 
0 1 0 1 0
 1 0 1 1 0 
 
M =  0 1 0 0 1   . (3.68)
 1 1 0 0 1 
0 0 1 1 0
This matrix is symmetric due to the underlying graph being undirected. The following fact (which we
will not try to prove here) makes the adjacency matrix a useful object.
Fact: The number of possible walks from vertex Vi to vertex Vj over precisely n links in a graph is given
by (M n )ij , where M is the adjacency matrix of the graph.
To illustrate this compute the low powers of the adjacency matrix M in Eq. (3.68).
   
2 1 1 1 1 2 4 2 4 2
 1 3 0 1 2   4 2 5 6 1 
 
M2 =  3
 
 1 0 2 2 0  , M =  2 5 0 1 4 
   . (3.69)
 1 1 2 3 0   4 6 1 2 5 
1 2 0 0 2 2 1 4 5 0
For example, the number of possible walks from V1 to V3 over three links is given by (M 3 )13 = 2.
By inspecting Fig. 19 it can be seen that these two walks correspond to V1 → V4 → V5 → V3 and
V1 → V4 → V2 → V3 .

Application: Matrices in cryptography


Matrices can be used for encryption. Here is a basic example for how this works. Suppose we would like to
encrypt the text: ”linear algebra ”. First, we translate this text into numerical form using the simple
code → 0, a → 1, b → 2, · · · and then we split the resulting sequence of numbers into blocks of the same
size. Here we use blocks of size three for definiteness. Next, we arrange these numbers into a matrix, with
each block forming a column of the matrix. For our sample text this results in
 
12 5 0 7 18 l e g r
T =  9 1 1 5 1  for i a a e a .
14 18 12 2 0 n r l b
So far, this is relatively easy to decode, even if we had decided to permute the assignment of letters to
numbers. As long as same letters are represented by same numbers, the code can be deciphered by a
frequency analysis, at least for a sufficiently long text. To do this, the relative frequency of each number
is determined and compared with the typical frequency with which letters appear in the English language.
Matching similar frequencies leads to the key.
For a more sophisticated encryption, define a quadratic “encoding” matrix whose size equals the length
of the blocks, so a 3 × 3 matrix for our case. Basically, the only other restriction on this matrix is that it
should be invertible. For our example, let us choose
 
−1 −1 1
A= 2 0 −1  .
−2 1 1
To encode the text, carry out the matrix multiplication
    
−1 −1 1 12 5 0 7 18 −7 12 11 −10 −19
Tenc = AT =  2 0 −1   9 1 1 5 1  =  10 −8 −12 12 36 
−2 1 1 14 18 12 2 0 −1 9 13 −7 −35

61
Note that in Tenc same letters are now represented by different numbers. For example, the letter “a" which
appears three times, and corresponds to the three 1’s in T , is represented by three different numbers in
Tenc . Without knowledge of the encoding matrix A it is quite difficult to decypher Tenc , particularly for
large block sizes. The legitimate receiver of the text should be provided with the inverse A−1 of the
encoding matrix, for our example given by
 
1 2 1
A−1 =  0 1 1  ,
2 3 2

and can then recover the message by the simply matrix multiplication

T = A−1 Tenc .

3.3 Row/column operations, Gaussian elimination


We should now develop more systematic, algorithmic methods to compute properties of matrices. At the
heart of these methods are elementary row operations which are defined as follows.

Definition 3.7. The following manipulations of a matrix are called elementary row operations.
(R1) Exchange two rows.
(R2) Add a multiple of one row to another.
(R3) Multiply a row with a non-zero scalar.
Analogous definitions hold for elementary column operations.

For definiteness, we will focus on elementary row operations but most of our statements have analogues for
elementary column operations. As we will see, elementary row operations will allow us to devise methods
to compute the rank and the inverse of matrices and, later on, to formulate a general algorithm to solve
linear systems of equations.

A basic but important observation about elementary row operations (which is indeed the main motivation
for defining them) is that they do not change the span of the row vectors. Recall that the rank of a matrix
is given by the maximal number of linearly independent row (or column) vectors. Hence, the rank of a
matrix is also unchanged under elementary row operations. This suggests a possible strategy to compute
the rank of a matrix: By a succession of elementary row operations, we should bring the matrix into a
(simpler) form where the rank can easily be read off. Suppose a matrix has the form
 
· · · a1j1 ∗
.
..
 

 a2j2 

 . . 
A=  . 

 . . 
 . arjr · · · 
..
 
0 .

where the entries aiji are non-zero for all i = 1, . . . , r, all other entries above the solid line are arbitrary
(indicated by the ∗) and all entries below the solid line are zero. This form of a matrix is called (upper)

62
echelon form. Clearly, the first r row vectors in this matrix are linearly independent and, hence, the rank
of a matrix in upper echelon form can be easily read off and is given by

rk(A) = r = (number of steps in upper echelon form) . (3.70)

The important fact is that every matrix can be brought into upper echelon form by a sequence of elementary
row operations. This works as follows.

Algorithm to bring matrix into upper echelon form


We consider an n × m matrix. The algorithm proceeds row by row. Let us assume that we have already
dealt with the first i − 1 rows of the matrix. Then, for the ith row we should carry out three steps.

(1) Find the leftmost column j which has at least one non-zero entry in rows i, . . . , n.

(2) If the (i, j) entry is zero exchange row i with one of the rows i + 1, . . . , n (the one which contains
the non-zero entry identified in step 1) so that the new (i, j) entry is non-zero.

(3) Subtract suitable multiples of row i from all rows i + 1, . . . , n such that all entries (i + 1, j), . . . , (n, j)
in column j and below row i vanish.

Continue with the next row until no more non-zero entries can be found in step 1.

This procedure of bringing a matrix into its upper echelon form using elementary row operations is
called Gaussian elimination (sometimes also referred to as row reduction). In summary, our procedure to
compute the rank of a matrix involves, first, to bring the matrix into upper echelon form using Gaussian
elimination and then to read off the rank from the number of steps in the upper echelon form. This is
probably best explained with an example.

Example 3.11: Gaussian elimination and rank of a matrix


Consider the 3 × 3 matrix  
0 1 −1
A =  2 3 −2  .
2 1 0
Then, Gaussian elimination amounts to
       
0 1 −1 R1 ↔R3 2 1 0 R2 →R2 −R1 2 1 0 R3 →R3 −R2 /2 2 1 0
 2 3 −2  −−−−→  2 3 −2  −−−−−−−→  0 2 −2  −−−−−−−−−→  0 2 −2 
2 1 0 0 1 −1 0 1 −1 0 0 0

We have indicated the row operation from one step to the next above the arrow, referring to the ith row
by Ri . The final matrix is in upper echelon form. There are two steps so that rk(A) = 2.

A neat and very useful fact about elementary row operations is that they can be generated by multiplying
with certain, specific matrices from the left. In other words, to perform a row operation on a matrix A, we
can find a suitable matrix P such that the row operation is generated by A → P A. For example, consider
a simple 2 × 2 case where    
a b 1 λ
A= , P = . (3.71)
c d 0 1

63
Then     
1 λ a b a + λc b + λd
PA = = . (3.72)
0 1 c d c d
Evidently, multiplication with the matrix P from the left has generated the elementary row operation
R1 → R1 + λR2 on the arbitrary 2 × 2 matrix A. This works in general and the appropriate matrices,
generating the three types of elementary row operations in Def. 3.7, are given by
1
 
.. 
1

 . 
..
 
 1   .

th
   
 0 1 i row   1 
(I)  .
 (III)  th

PRi ↔Rj =  .. 
 P Ri →λRi = 
 λ i row 

 1 0 th
j row   1 
.
  
 1   .. 
 
.. th
i col 1
.
 
1
 
1
..
.
 
 
th

 1 ··· λ i row 

(II) .. ..
PRi →Ri +λRj = . (3.73)
 
. .
 
 1 
..
 
 . 
j th col 1

This means we can bring a matrix A into upper echelon form by matrix multiplications P1 · · · Pk A where
the matrices P1 , . . . , Pk are suitably chosen from the above list. Note that all the above matrices are
invertible. This is clear, since we can always “undo” an elementary row operations by the inverse row
operation or, alternatively, it can be seen directly from the above matrices. The matrices P (II) and
P (III) are already in upper echelon form and clearly have maximal rank. The matrices P (I) can easily be
brought into upper echelon form by exchanging row i and j. Then they turn into the unit matrix which
has maximal rank.

Application: Back to Magic Squares


We now return to our discussion of magic squares. We saw previously that all 3 × 3 magic squares form
a vector space, and we have shown that the three specific magic squares M1 , M2 , M3 in Eq. (1.56) are
linearly independent. It remains to be shown that these matrices form a basis of the magic square vector
space as asserted earlier. To do this it suffices to show that the dimension of the magic square vector
space is three.

We begin with an arbitrary 3 × 3 matrix


 
a b c
S= d e f  .
g h i

Recall that, for S to be a magic square, its rows, columns and both diagonals have to sum up to the same

64
total. These conditions can be cast into the seven linear equations


 d+e+f =a+b+c −a − b − c + d + e + g = 0



 g+h+i=a+b+c −a − b − c + g + h + i = 0
a+d+g =a+b+c −b − c + d + g = 0



b+e+h=a+b+c or −a − c + e + h = 0
c+f +i=a+b+c −a − b + f + i = 0




a+e+i=a+b+c −b − c + e + i = 0




c+e+g =a+b+c −a − b + e + g = 0

In matrix form, this system of equations can be written as follows.


a
 
−1 −1 −1 1 1 1 0 0 0  b 0
   
 

 −1 −1 −1 0 0 0 1 1 1   c   0 
    
 0 −1 −1 1 0 0 1 0 0   d   0 
    
 −1 0 −1 0 1 0 0 1 0  e = 0 ,
    
 −1 −1 0 0 0 1 0 0 1   f
   0 
   
 0 −1 −1 0 1 0 0 0 1   g   0 
 
−1 −1 0 0 1 0 1 0 0  h  0
| {z
A
} i | {z }
0
| {z }
x

or, in short, Ax = 0. The magic squares are precisely the solutions to this equation which shows that
the magic square vector space is the kernel, Ker(A), of the matrix A. By Gaussian elimination and with
a bit of calculation, the matrix A can be brought into upper echelon form and the rank can be read off
as rk(A) = 6. Then, the dimension formula (3.4) leads to dim Ker(A) = 9 − rk(A) = 3 and, hence, the
dimension of the magic square vector space is indeed three. In summary, the three matrices M1 , M2 , M3
in Eq. (1.56) form a basis of the magic square vector space and every magic square is given as a (unique)
linear combination of these three matrices.

Application: Coding theory


Coding theory deals with the problem of errors in information such as they may arise when information
is transmitted in the presence of noise. Whenever information may be faulty, methods are required for
both error detection and error correction. A simple but potentially inefficient method is to transmit
the information repeatedly. Here, we would like to discuss a more sophisticated method, referred to as
Hamming code, which is based on some of the linear algebra methods we have explored.
Information is conveniently described in binary form, that is, as a sequence of bits, β1 , . . . , βn ∈ {0, 1}.
Mathematically, a bit can be seen as an element of the finite field F2 = {0, 1} which we have introduced
in Example 1.3 and information encoded by n bits can be seen as an element of the n-dimensional vectors
space V = Fn2 over the field F2 . In other words, we can think of the above bit sequence as a column vector
(β1 , . . . , βn )T ∈ Fn2 . Through this simple re-interpretation all the tools of linear algebra are now available
to deal with information.
To be specific we focus on the case of four bits, β = (β1 , . . . , β4 )T , but the method can be generalized
to arbitrary dimensions. We begin by writing down the matrix
 
0 0 0 1 1 1 1
H = (H1 , . . . , H7 ) =  0 1 1 0 0 1 1  , (3.74)
1 0 1 0 1 0 1

65
whose columns consist of all non-zero vectors of Z32 . Clearly, rk(H) = 3 (since H1 , H2 , H4 are linearly in-
dependent) and, therefore, its kernel has dimension dim Ker(H) = 7−3 = 4. It is easy to see that this four-
dimensional kernel has a basis K1 = (1, 0, 0, 0, 0, 1, 1)T , K2 = (0, 1, 0, 0, 1, 0, 1)T , K3 = (0, 0, 1, 0, 1, 1, 0)T ,
K4 = (0, 0, 0, 1, 1, 1, 1)T which we can arrange into the rows of a 4 × 7 matrix
 
1 0 0 0 0 1 1
 0 1 0 0 1 0 1 
K=  0 0 1 0 1 1 0 
 (3.75)
0 0 0 1 1 1 1

The key idea is now to encode the information stored in β1 , . . . , β4 by forming the linear combination
of these numbers with the basis vectors of Ker(H) which we have just determined. In other words, we
encode the information in the seven-dimensional vector
4
X
v= βi Ki = β T K . (3.76)
i=1

Note that, given the structure of the matrix K, the first four bits in v coincide with the actual information
β1 , . . . , β4 . By construction, the vector v is an element of Ker(H).
Now suppose that the transmission of v has resulted in a vector w which can have an error in at most
one bit. How do we detect whether such an error has occurred? We note that the seven-dimensional
standard unit vectors e1 , . . . , e7 are not in the kernel of H. Further, if v is in the kernel then none of the
vectors w = v + ei is. This means the transmitted information w is free of (one-bit) errors if and only if
w ∈ Ker(H), a condition which can be easily tested.
Assuming w ∈ / Ker(H) so that the information is faulty, how can the error be corrected? Assume that
bit number i has changed in w so that the correct original vector is v = w − ei . Since v ∈ Ker(H) it
follows that Hw = Hei = Hi , so that the product Hw coincides with one of the columns Hi of H. This
means, if Hw equals column i of H then we should flip bit number i in w to correct for the error.
Let us carry all this out for an explicit example. Suppose that the transmitted message is w =
(1, 1, 0, 0, 0, 1, 1)T and that it contains at most one error. Then we work out
 
0
Hw =  1  = H2 . (3.77)
0

First, w is not in the kernel of H so an error has indeed occurred. Secondly, the vector Hw corresponds
to the second column vector of H so we should flip the second bit to correct for the error. This means,
v = (1, 0, 0, 0, 0, 1, 1)T and the original information (which is contained in the first four entries of v) is
β = (1, 0, 0, 0)T .
By paying the price of enhancing the transmitted information from four bits (in β) to seven bits (in v)
both a detection and correction of one-bit errors can be carried out with this method. Compare this with
the naive method of simply transmitting the information in β twice which corresponds to an enhancement
from four to eight bits. In this case, one-bit errors can of course be detected. However, without further
information they cannot be corrected since it is impossible to decide which of the two transmissions is the
correct one.

Our next task is to devise an algorithm to compute the inverse of a matrix, using elementary row opera-
tions. The basic observation is that every quadratic, invertible n × n matrix A can be converted into the

66
unit matrix 1n by a sequence of row operations. Schematically, this works as follows:
a011 a011
   
∗ 0  
1 0
echelon form  a022  (R1), (R2)  a022  (R3)
A −−−−−−−−−→   −−−−−−−→   −−−→  ..  = 1n
     
.. .. .
 .   . 
0 1
0 a0nn 0 a0nn
In the first step, we bring A into upper echelon form, by the algorithm already discussed. At this point
we can read off the rank of the matrix. If rk(A) < n the inverse does not exist and we can stop. On the
other hand, if rk(A) = n then all diagonal entries a0ii in the upper echelon form must be non-zero (or else
we would not have n steps). This means, in a second step, we can make all entries above the diagonal
zero. We start with the last column and subtract suitable multiples of the last row from the others until
all entries in the last column except a0nn are zero. We proceed in a similar way, column by column from
the right to the left, using row operations of type (R1) and (R2). In this way we arrive at a diagonal
matrix, with diagonal entries a0ii 6= 0 which, in the final step, can be converted into the unit matrix by
row operations of type (R3).

This means we can find a set of matrices P1 , . . . , Pk of the type (3.73), generating the appropriate elemen-
tary row operations, such that

1n = P ···P A
| 1 {z k}
⇒ A−1 = P1 · · · Pk 1n . (3.78)
A−1

These equations imply an explicit algorithm to compute the inverse of a square matrix. We convert A
into the unit matrix 1n using elementary row operations as described above, and then simply carry out
the same operations on 1n in parallel. When we are done the unit matrix will have been converted into
A−1 . Again, we illustrate this procedure by means of an example.

Example 3.12: Computing the inverse of a matrix with row operations

! !
1 0 −2 1 0 0
A= 0 3 −2 13 = 0 1 0
1 −4 0 0 0 1
! !
1 0 −2 1 0 0
R3 → R3 − R1 : 0 3 −2 0 1 0
0 −4 2 −1 0 1
! !
4 1 0 −2 1 0 0
R3 → R3 + R2 : 0 3 −2 ← rk(A) = 3 0 1 0
3 0 0 − 23 −1 4
1
3
! !
1 0 −2 1 0 0
R2 → R2 − 3R3 : 0 3 0 3 −3 −3
0 0 − 23 −1 4
3
1
! !
1 0 0 4 −4 −3
R1 → R1 − 3R3 : 0 3 0 3 −3 −3
0 0 − 23 −1 4
3
1
! !
R2 1 0 0 4 −4 −3
R2 → : 0 1 0 1 −1 −1
3 0 0 − 23 −1 4
1
3
! !
3 1 0 0 4 −4 −3
R3 → − R3 : 0 1 0 = 13 1 −1 −1 = A−1
2 0 0 1 3
−2 − 32
2

67
As a final check we show that
    
1 0 −2 4 −4 −3 1 0 0
AA−1 =  0 3 −2   1 −1 −1  =  0 1 0  = 13 X
3 3
1 −4 0 2 −2 − 2 0 0 1

and thus confirm that we have correctly computed the inverse of A.

3.4 Relation between linear maps and matrices


We have now fully understood linear maps between column vector spaces - they are described by matrices.
The action of such linear maps is given by multiplication of a matrix with column vectors, composition of
maps is via matrix multiplication and the inverse map corresponds to the inverse matrix. Further we have
introduced the computational tools to work with matrices. However, we still do not have a clear picture
of linear maps between arbitrary vector space and this is what we will analyze now.

Start with a linear map f : V → W between two vector spaces V and W over F with dimensions n
and m, respectively. We introduce a basis v1 , . . . , vn of V and a basis w1 P
, . . . , wm of W . Then, all
n
vectors in v ∈ V and w ∈ W can P be written as linear combinations v = i=1 αi vi with coordinate
m
vectors α = (α1 , . . . , αn ) and w = j=1 βj wj with coordinate vectors β = (β1 , . . . , βm )T , respectively.
T

Following Example (3.4) (b), we can introduce coordinate maps ϕ : F n → V and ψ : F m → W relative to
each basis which act as
Xn Xm
ϕ(α) = α i vi , ψ(β) = βj wj . (3.79)
i=1 j=1

The images f (vi ) of the V basis vectors can always be written as a linear combination of the basis vectors
for W so we have
m
X
f (vj ) = aij wi (3.80)
i=1
for some coefficients aij ∈ F . The situation so far can be summarized by the following diagram
f
V −→ W
ϕ↑ ↑ψ (3.81)
A=?
Fn −→ F m

Essentially, we are describing vectors by their coordinate vectors (relative to the chosen basis) and we
would like to find a matrix A which acts on these coordinate vectors “in the same way” as the original
linear map f on the associated vectors. In this way, we can describe the action of the linear map by a
matrix. How do we find this matrix A? Abstractly, it is given by

A = ψ −1 ◦ f ◦ ϕ , (3.82)

as can be seen by going from F n to F m in the diagram (3.81) using the “upper path”, that is, via V and
W . From Lemma 3.3 we know that we can work out the components of a matrix by letting it act on the
standard unit vectors.
m m m
!
(3.82) −1 (3.79) −1 (3.80) −1 X linearity X −1 (3.79) X
Aej = ψ ◦ f ◦ ϕ(ej ) = ψ ◦ f (vj ) = ψ aij wi = aij ψ (wi ) = aij ẽi
i=1 i=1 i=1
(3.83)

68
Comparing with Lemma 3.3 it follows that aij are the entries of the desired matrix A. Also, Eq. (3.82)
implies that Im(A) = ψ −1 (Im(f )). If we denote by χ := ψ −1 |Im(f ) the restriction of ψ −1 to Im(f ) we
have dim Ker(χ) = 0 since ψ −1 is an isomorphism and hence dim Ker(ψ −1 ) = 0. We conclude, using
Theorem 3.2, that

rk(A) = dim Im(A) = rk(χ) = dim Im(f ) − dim Ker(χ) = rk(f ) , (3.84)

which means that the linear map f and the matrix A representing f have the same rank, that is, rk(A) =
rk(f ).

While this discussion might have been somewhat abstract it has a simple and practically useful conclusion.
To find the matrix A which represents a linear map relative to a basis, compute the images f (vj ) of the
(domain) basis vectors and write them as a linear combinations of the (co-domain) basis vectors wi , as
in Eq. (3.80). The coefficients in these linear combinations form the matrix A. More precisely, by careful
inspection of the indices in Eq. (3.80), it follows that the coefficients which appear in the image of the j th
basis vector form the j th column of the matrix A. We summarize these conclusions in
Lemma 3.5. Let f : V → W be a linear map, v1 , . . . , vn a basis of V and w1 , . . . , wm a basis of W . The
entries aij of the m × n matrix A which describes this linear map relative to this choice of basis can be
read off from the images of the basis vectors as
m
X
f (vj ) = aij wi . (3.85)
i=1

We have rk(A) = rk(f ) and, in particular, A is invertible if and only if f is.


The relation between linear maps and matrices is a key fact of linear algebra which we would like to
illustrate with two examples.

Example 3.13: Relation between linear maps and matrices


(a) Consider the linear map B : R2 → R2 defined by the matrix
 
1 0
B= . (3.86)
0 −2

For simplicity, we choose the same basis for the domain and the co-domain, namely v1 = w1 = (1, 2)T
and v2 = w2 = (−1, 1)T . Then, the images of the basis vectors under B, written as linear combinations
of the same basis, are
   
1 −1
Bv1 = = −1v1 − 2v2 , Bv2 = = −1v1 + 0v2 . (3.87)
−4 −2

Arranging the coefficients from Bv1 into the first column of a matrix and the coefficients from Bv2 into
the second column we find  
0 −1 −1
B = . (3.88)
−2 0
This is the matrix representing the linear map B relative to the basis {v1 , v2 }. It might be useful to be
explicit about what exactly this means. Write an arbitrary 2-dimensional vector as

x0 − y 0
   
x 0 0
= x v1 + y v2 = (3.89)
y 2x0 + y 0

69
so that a vector (x, y)T is described, relative to the basis {v1 , v2 }, by the coordinate vector (x0 , y 0 )T .
Consider the example (x, y) = (1, 8) with associated coordinate vector (x0 , y 0 ) = (3, 2). Then
   
1 1
B =
8 −16
 l  l  (3.90)
3 −5
B0 =
2 −6

The vectors connected by arrows relate exactly as in Eq. (3.89), that is the vectors in the lower equation
are the coordinate vectors of their counterparts in the upper equation. Eqs. (3.90) are basically a specific
instance of the general diagram (3.81). This illustrates exactly how the representing matrix acts “in the
same way” as the linear map: If the linear map relates two vectors then the representing matrix relates
their two associated coordinate vectors.
(b) It might be useful to discuss a linear map which, originally, is not defined by a matrix. To this end, we
consider the vector space V = {a2 x2 + a1 x + a0 |ai ∈ R} of all quadratic polynomials with real coefficients
d
and the linear map f = dx : V → V , that is, the linear map obtained by taking the first derivative.
As before we choose the same basis for domain and co-domain, namely the standard basis 1, x, x2 of
monomials. We would like to find the matrix A representing the first derivative, relative to this basis.
As before, we work out the images of the basis vectors and write them as linear combinations of the
same basis:
d
1 = 0 · 1 + 0 · x + 0 · x2
dx
d
x = 1 · 1 + 0 · x + 0 · x2
dx
d 2
x = 0 · 1 + 2 · x + 0 · x2
dx
Arranging the coefficients in each row into the columns of a matrix we arrive at
 
0 1 0
A=  0 0 2  . (3.91)
0 0 0

This matrix generates the first derivative of quadratic polynomials relative to the standard monomial basis.
As before, let us be very explicit about what this means. Consider the polynomial p(x) = 5x2 + 3x + 7
with coordinate vector (7, 3, 5)T and its first derivative p0 (x) = 10x + 3 with coordinate vector (3, 10, 0)T .
Then we have    
7 3
A  3  =  10  , (3.92)
5 0
that is, A indeed maps the coordinate vector for p into the coordinate vector for p0 .

The correspondence between operators acting on functions and matrices acting on vectors illustrated by
this example is at the heart of quantum mechanics. Historically, Schrödinger’s formulation of quantum
mechanics is in terms of (wave) functions and operators, while Heisenberg’s formulation is in terms of
matrices. The relation between those two formulations is precisely as in the above example.

70
3.5 Change of basis
We have seen that a linear map can be described, relative to a basis in the domain and co-domain, by a
matrix. It is clear from the previous discussion that, for a fixed linear map, this matrix depends on the
specific choice of basis. In other words, if we choose another basis the matrix describing the same linear
map will change. We would now like to work out how precisely the representing matrix transforms under
a change of basis.
To simplify the situation, we consider a linear map f : V → V from a vector space to itself and choose
the same basis on domain and co-domain. (The general situation of a linear map between two different
vector spaces is a straightforward generalization.) The two sets of basis vectors, coordinate maps and
representing matrices are then denoted by
basis of V coordinate map coordinate vector representing matrix
n
α = (α1 , . . . , αn )T A = ϕ−1 ◦ f ◦ ϕ
P
v1 , ..., vn ϕ(α) = αi vi
i=1 (3.93)
n
v10 , ..., vn0 ϕ0 (α0 ) = αi0 vi0 α0 = (α10 , . . . , αn0 )T A0 = ϕ 0 −1 ◦ f ◦ ϕ0
P
i=1

We would like to find the relationship between A and A0 , that is, between the representing matrices for
f relative to the unprimed and the primed basis. From Eq. (3.82) we know that the two matrices can be
written as A = ϕ−1 ◦ f ◦ ϕ and A0 = ϕ0−1 ◦ f ◦ ϕ0 , so that
A0 = ϕ0−1 ◦ f ◦ ϕ0 = ϕ0−1 ◦ ϕ ◦ ϕ−1 ◦ f ◦ ϕ ◦ ϕ−1 ◦ ϕ0
= ϕ0−1 ◦ ϕ ◦ ϕ−1 ◦ f ◦ ϕ ◦ ϕ−1 ◦ ϕ0 = P AP −1 . (3.94)
| {z } | {z } | {z }
=: P =A = P −1

Note that all we have done is to insert two identity maps, ϕ ◦ ϕ−1 , in the second step and then combined
maps differently in the third step. What is the interpretation of P = ϕ0−1 ◦ ϕ? For a given vector v ∈ V
and its coordinate vectors α = ϕ−1 (v) and α0 = ϕ0 −1 (v) relative to the unprimed and primed basis we
have α0 = ϕ0 −1 (v) = ϕ0 −1 ◦ ϕ(α) = P α, so in summary

α0 = P α . (3.95)

Hence, P converts unprimed coordinate vectors α into the corresponding primed coordinate vector α0
and, as a linear map between column vector, it is a matrix. In short, P describes the change of basis
under consideration. The corresponding transformation of the representing matrix under this basis change
is then
A0 = P AP −1 . (3.96)
This is one of the key equations of linear algebra. For example, we can ask if we can choose a basis
for which the representing matrix is particularly simple. Eq. (3.96) is the starting point for answering
this question to which we will return later. Note that Eq. (3.96) makes intuitive sense. Acting with
the equation on a primed coordinate vector α0 , the first we obtain on the RHS is P −1 α0 . This is the
corresponding unprimed coordinate vector on which the matrix A can sensibly act, thereby converting it
into another unprimed coordinate vector. The final action of P converts this back into a primed coordinate
vector. Altogether, this is the action of the matrix A0 on α0 as required by the equation.
Another way to think about the matrix P is byPrelating the primed and the unprimed basis vectors.
In general, from Lemma 3.3, we can write P ej = i Pij ei . Multiplying this equation with ϕ0 from the
left and using vj = ϕ(ej ), vi0 = ϕ0 (ei ) we find
X X
vj = Pij vi0 ⇐⇒ vj0 = (P −1 )ij vi . (3.97)
i i

71
Hence, the entries of the matrix P can be calculated by expanding the unprimed basis vectors in terms of
the primed basis.

Example 3.14: Basis transformation of a matrix


Relative to the unprimed basis v1 = e1 , v2 = e2 of standard unit vectors, a linear map is described by
the matrix  
1 0
A= .
0 −1
We would like to determine the matrix A0 which describes the same linear map relative to the basis
   
0 1 1 0 1 1
v1 = √ , v2 = √ .
2 −1 2 1
One way to proceed is as before, by applying Lemma 3.5, and compute the images of the basis vectors in
order to read off A0 . This leads to

Av10 = 0v10 + 1v20 , Av20 = 1v10 + 0v20 ,

and arranging the coefficients on the right-hand sides into the column of a matrix gives
 
0 0 1
A = . (3.98)
1 0

Alternatively, we should be able to determine A0 from Eq. (3.96). To work out the relation between the
primed and un-primed coordinate vectors α0 = (α10 , α20 )T and α = (α1 , α2 )T we write

α10 + α20
   
α1 ! 0 0 0 0 1
α1 v1 + α2 v2 = = α1 v1 + α2 v2 = √ 0 0 .
α2 2 −α1 + α2

Comparing this with the general relation (3.95) between the coordinate vectors we can read off the
coordinate transformation P −1 as
   
−1 1 1 1 1 1 −1
P =√ ⇒ P =√ .
2 −1 1 2 1 1

Applying the basis transformation (3.96) with this matrix P we find


     
0 −1 1 1 −1 1 0 1 1 0 1
A = P AP = = ,
2 1 1 0 −1 −1 1 1 0

in accordance with the earlier result (3.98).

72
4 Systems of linear equations
We will now apply our general results and methods to the problem of solving linear equations. This will
lead to an understanding of the structure of the solutions and to explicit solution methods.

4.1 General structure of solutions


Consider a linear map f : V → W . We are looking for all solutions x ∈ V of the equation

f (x) = b (4.1)

where b ∈ W is a fixed vector. For b 6= 0 this is called an inhomogenous linear equation and

f (x) = 0 (4.2)

is the associated homogenous equation. Its general solution is Ker(f ). The solutions of the inhomogeneous
and associated homogeneous equations are related in an interesting way.
Lemma 4.1. If x0 ∈ V solves the inhomogenous equation, that is f (x0 ) = b, then the affine space

x0 + Ker(f )

is the general solution of the inhomogenous equation.


Proof. If x is a solution to f (x) = b then f (x − x0 ) = f (x) − f (x0 ) = b − b = 0, so x − x0 ∈ Kerf .
Conversely, if x ∈ x0 + Ker(f ), then we can write x = x0 + v for some vector v ∈ Ker(f ). Then,
f (x) = f (x0 + v) = f (x0 ) + f (v) = b + 0 = b.

In short, the Lemma says that the general solution of the inhomogeneous equation is obtained by the
sum of a special solution to the inhomogeneous equation and all solutions to the homogeneous equation.
Recall that Ker(f ) is a sub vector space, so a line, a plane, etc. through 0 with dimension dim(Ker(f )) =
dim(V ) − rf(f ) (see Eq. (3.4)). This shows that the geometry of the solution is schematically as indicated
in Fig. 20. Lemma (4.1) is helpful in order to find the general solution to inhomogenous, linear differential
equations as in the following

Example 4.1: Solution to inhomogenous linear differential equation.


The previous Lemma has a prominent application to inhomogeneous, linear (second order) differential
equations, that is differential equations for y = y(x) of the form

d2 y dy
p(x) + q(x) + r(x)y = s(x) ,
dx2 dx
where p, q, r and s are fixed functions. The relevant vector space is the space of (infinitely many times)
d2 d
differentiable functions, the linear map f corresponds to the linear differential operator p(x) dx 2 + q(x) dx +
r(x) and the inhomogeneity b given by s(x). From Lemma 4.1, the general solution to this equation can be
obtained by finding a special solution, y0 , and then adding to it all solutions of the associated homogeneous
equation
d2 y dy
p(x) 2 + q(x) + r(x)y = 0 .
dx dx
To be specific consider the differential equation
d2 y
+y =x.
dx2

73
x0+Ker(f)=solution to
inhomogeneous system

x0

Ker(f)=solution to
homogeneous system

Figure 20: Solutions to homogeneous and inhomogenous linear equations.

An obvious special solution is the function y0 (x) = x. The general solution of the associated homogeneous
equation
d2 y
+y =0
dx2
is a sin(x) + b cos(x) for arbitrary real constants a, b. Hence, the general solution to the inhomogenous
equation is
y(x) = x + a sin(x) + b cos(x) .

Our main interest is of course in systems of linear equations, that is, the case where the linear map
is an m × n matrix A : F n → F m with entries aij . For x = (x1 , . . . , xn )T ∈ F n and a fixed vector
b = (b1 , . . . , bm )T ∈ F m the system of linear equations can be written as

a11 x1 + ··· + a1n xn = b1


Ax = b or .. .. .. .. (4.3)
. . . .
am1 x1 + · · · + amn xn = bm

This is a system of m equations in n variables with associated homogeneous system

a11 x1 + ··· + a1n xn = 0


Ax = 0 or .. .. .. .. (4.4)
. . . .
am1 x1 + · · · + amn xn = 0

The solution space of the homogenous system is Ker(A), a (sub) vector space whose dimensions is given
by the dimension formula dim Ker(A) = n − rk(A) (see Eq. (3.4)). If the inhomogenous system has a
solution, x0 , then its general solution is x0 + Ker(A) and such a “special” solution x0 exists if and only if

74
b ∈ Im(A). If rk(A) = m then Im(A) = F m and a solution exists for any choice of b. On the other hand,
if rk(A) < m, there is no solution for “generic” choices of b. For example, if m = 3 and rk(A) = 2 then
the image of A is a plane in a three-dimensional space and we need to choose b to lie in this plane for a
solution to exist. Clearly this corresponds to a very special choice of b and generic vectors b will not lie
in this plane. To summarize the general structure of the solution to Ax = b, where A is an m × n matrix,
we should, therefore distinguish two cases.

(1) rk(A) = m
In this case there exists a solution, x0 , for any choice of b and the general solution is given by

x0 + Ker(A) (4.5)

The number of free parameters in this solution equals dim Ker(A) = n − rk(A) = n − m.

(2) rk(A) < m


(a) If b ∈ Im(A) we have a solution with dim Ker(A) = n − rk(A) free parameters.
(b) If b ∈
/ Im(A) there is no solution.

For a quadratic n × n matrix A we can be slightly more specific and the above cases are as follows.

(1) rk(A) = n
A solution exists for any choice of b and there are no free parameters since dim Ker(A) = n − n = 0.
Hence, the solution is unique. Indeed, in this case, the matrix A is invertible (see Lemma 3.4) and
the unique solution is given by x = A−1 b.

(2) rk(A) < n


(a) If b ∈ Im(A) we have a solution with n − rk(A) free parameters.
(b) If b ∈
/ Im(A) there is no solution.

The main message of this discussion is that, given the size of the matrix A and its rank, we are able to draw
a number of conclusions about the qualitative structure of the solution, without any explicit calculation.
We will see below how this can be applied to explicit examples.

We can also think about the solutions to a system of linear equations in a geometrical way. With the row
vectors Ai of the matrix A, the linear system (4.3) can be re-written as m equations for (hyper) planes
(that is n − 1-dimensional planes) in n dimensions:

Ai · x = bi , i = 1, . . . , m . (4.6)

Geometrically, we should then think of the solutions to the linear system as the common intersection
of these m (hyper) planes. For example, if we consider a 3 × 3 matrix we should consider the common
intersection of three planes in three dimensions. Clearly, depending on the case, these planes can intersect
in a point, a line, a plane or not intersect at all. In other words, we may have no solution or the solution
may have 0, 1 or 2 free parameters. This corresponds precisely to the cases discussed above.

4.2 Solution by ”explicit calculation”


We begin our discussion of solution methods and examples with the most basic approach: Explicit cal-
culation by which we mean the addition of suitable multiples of the various equations so solve for the

75
variables. To be specific, we consider the following system with three variables x = (x, y, z)T and three
equations

E1 : 2x + 3y − z = −1
E2 : −x − 2y + z = 3 (4.7)
E3 : ax + y − 2z = b .

To make matters more interesting, we have introduced two parameters a, b ∈ R. We would like to find the
solution to this system for arbitrary real values of these parameters. We can also write the above system
in matrix form, Ax = b, with
   
2 3 −1 −1
A =  −1 −2 1  b= 3  . (4.8)
a 1 −2 b

Before we embark on the explicit calculation, let us apply the results of our previous general discussion
and predict the qualitative structure of the solution. The crucial piece of information required for this
discussion is the rank of the matrix A. Of course, this can be determined from the general methods based
on row reduction which we have introduced in Section 3.3. But, as explained before, for small matrices
the rank can often be inferred “by inspection”. For the matrix A in (4.8) it is clear that the second and
third column vectors, A2 and A3 , are linearly independent. Hence, its ranks is at least two. The first
column vector, A1 , depends on the parameter a so we have to be more careful. For generic a values A1
does not lie in the plane spanned by A2 , A3 , so the generic rank of A is three. In this case, from our
general results, there is a unique solution to the linear system for any value of the other parameter b. For
a specific a value A1 will be in the plane spanned by A2 , A3 and the rank is reduced to two. Then, the
image of A is two-dimensional, that is a plane. For generic values of b the vector b will not lie in this
plane so there is no solution. However, for a specific b value, when b does lie in this plane, there is a
solution with dim Ker(A) = 3 − rk(A) = 1 parameter, that is, a solution line. So, in summary we expect
the following qualitative structure for the solution to the system (4.7).
1) For generic values of a the rank of A is three and there is a unique solution for all values of b.

2a) For a specific value of a (when rk(A) = 2) and for a specific value of b there is a line of solutions.

2b) For the above specific value of a and generic b there is no solution.
Let us now confirm this expectation by an explicit calculation. We begin by adding appropriate multiples
of Eqs. (4.7), namely

E1 + E2 : x+y =2 (4.9)
E3 + 2E2 : (a − 2)x − 3y = b + 6 . (4.10)

Eliminating y from these two equations then leads to

(a + 1)x = b + 12 . (4.11)

This equation allows us to explicitly identify the various cases we expect.


1) a 6= −1: We can divide Eq. (4.11) by (a + 1) to solve for x and then insert into Eq. (4.9) and the
first Eq. (4.7) to get y and z. So in this case we have a unique solution for any b given by
b + 12 2a − b − 10 7a − b − 5
x= , y= , z= . (4.12)
a+1 a+1 a+1

76
2a) a = −1 and b = −12: In this case, Eq. (4.11) becomes trivial and we are left with only two inde-
pendent equations. Solving Eq. (4.9) and the first Eq. (4.7) for x and z in terms of y we find

x=2−y z =5+y . (4.13)


that is, a line of solutions parametrized by y.
2b) a = −1 and b 6= −12: In this case, Eq. (4.11) leads to a contradiction so there is no solution.

4.3 Solution by row reduction


While “explicit calculation” as in the previous sub-section is probably the fastest “by hand” method for
relatively small systems, larger linear systems require a more systematic method. For a specific case, a
linear system Ax = b with a quadratic and invertible matrix A, we already know how this works. The
unique solution in this case is x = A−1 b and the inverse of A can be computed by the row reduction
method introduced in Section (3.3). We will now generalize this method so it can be applied to all linear
systems.

So let us start with an arbitrary linear system with m equations for n variables, so a system of the form
Ax = b with an m×n matrix A, inhomogeneity b ∈ F m and variables x ∈ F n . We can multiply the linear
system with one of the m × m matrices P from Eq. (3.73), generating the elementary row operations, to
get the linear system P Ax = P b. This new system has the same solutions as the original one since P is
invertible. This means we do not change the solutions to the linear system if we carry out elementary row
operations simultaneously on the matrix A and the inhomogeneity b. This suggests we should encode the
linear system by the augmented matrix defined by
A0 = (A|b) , (4.14)
an m × (n + 1) matrix which consists of A plus one additional column formed by the vector b. We can now
reformulate our previous observation by stating that elementary row operations applied to the augmented
matrix do not change the solutions of the associated linear system. So our solution strategy will be to
simplify the augmented matrix by successive elementary row operations until the solution can be easily
“read off”. Before we formulate this explicitly, we note a useful criterion which helps us to decide whether
or not b ∈ Im(A), that is, whether or not the linear system has solutions.
Lemma 4.2. b ∈ Im(A) ⇐⇒ rk(A) = rk(A0 )
Proof. “ ⇒ ”: If b ∈ Im(A) it is a linear combination of the column vectors of A and adding it to the
matrix does not increase the rank.
“ ⇐ ”: If rk(A) = rk(A0 ) the rank does not increase when b is added to the matrix. Therefore, b ∈
Span(A1 , . . . , An ) = Im(A).
Let us now describe the general algorithm.
1. Apply row operations, as described in Section (3.3), to the augmented matrix A0 until the matrix A
within A0 is in upper echelon form. Then, the resulting matrix has the form
 0 
··· a1j1 ∗ b1
. ..
a2j2 .. .
 
 
 .. .. 
 . . 
0
 
A → ..

 . arjr ··· b0r 
0
 
 0 br+1 
 .. 
.
 
b0n

77
where aiji 6= 0 for i = 1, . . . r so that A has rank r. In this form is it easy to apply the criterion,
Lemma 4.2. If b0i 6= 0 for any i > r then rk(A0 ) > rk(A) and the linear system has no solutions. In
this case we can stop. On the other hand, if b0i = 0 for all i > r which we assume from hereon, then
rk(A0 ) = rk(A) and the system has a solution.

2. As explained we assume that b0i = 0 for all i > r. For ease of notation we also permute the columns
of A (this corresponds to a permutation of the variables that we will have to keep track of) so that
the columns with the non-zero entries aiji become the first r of the matrix. The result is
0
 
a1j1 b1
..
∗ ∗
 
 a2j2 . 
 .. .. 
 . .

A0 →  0
 
 arjr b0r 

 0 
 .. 
0 .
 
0

3. By further row operations we can convert the r × r matrix in the upper left corner of the previous
matrix into a unit matrix 1r . Schematically, the result is

1r B c
 
0
Afin = (4.15)
0 0 0

where B is an r × (n − r) matrix and c is an r-dimensional column vector.

4. Recall that r = rk(A) is the rank and n − r = dim Ker(A) is the number of free parameters of the
solution. For this reason it makes sense to split our variables as
 
ξ
x= (4.16)
t

into an r-dimensional vector ξ and an (n − r)-dimensional vector t. Note that this split is adapted
to the form of the matrix A0fin so that the associated linear system takes the simple form

ξ + Bt = c . (4.17)

The point is that this system can be easily solved for ξ in terms of t. This leads to the general
solution  
c − Bt
x= , (4.18)
t
which depends on n − r free parameters t, as expected.
Let us see how this works for an explicit example.

Example 4.2: Solving linear systems with row reduction of the augmented matrix
Consider the following system of linear equations and its augmented matrix
 
x + y − 2z = 1 1 1 −2 1
2x − y + 3z = 0 A0 =  2 −1 3 0  , (4.19)
−x − 4y + 9z = b −1 −4 9 b

where b ∈ R is an arbitrary real parameter. We proceed in the four steps outlined above.

78
1. First we bring A within A0 into upper echelon form which results in
 
1 1 −2 1
A0 →  0 −3 7 −2 
0 0 0 b+3

For b 6= −3 we have rk(A0 ) = 3 > 2 = rk(A) so there are no solutions. So we assume from hereon
that b = −3.

2. Setting b = −3 we have  
1 1 −2 1
 0 −3 7 −2  .
0 0 0 0
In this case, we do not have to permute columns since the (two) steps of the upper echelon form
already arise in the first two columns.

3. By further elementary row operations we convert the 2 × 2 matrix in the upper left corner into a
unit matrix.
1 1
 
1 0 3 3
A0fin =  0 1 − 73 23 
0 0 0 0

4. We have r = rk(A) = 2 and dim Ker(A) = 3 − rk(A) = 1 so we expect a solution with one free
variable t (a line). Accordingly, we split the variables as
 
x
x= y  , (4.20)
t

where ξ = (x, y)T in our general notation. Writing the linear system for A0fin in those variables
results in
1 1
x+ t = (4.21)
3 3
7 2
y− t = (4.22)
3 3
This can be easily solved for x, y in terms of t which was really the point of the exercise. The result
is x = 31 − 13 t and y = 23 + 37 t and, inserting into Eq. (4.20), this results in the vector form
1
− 13
   
3
2 7
x= 3
 + t
3

0 1

for the line of solutions.

Application: Linear algebra and circuits


Electrical circuits with batteries and resistors, such as the circuit in Fig. 21, can be described using methods

79
from linear algebra. To do this, first assume that the circuit contains n loops and assign (“mesh”) currents
Ii , where i = 1, . . . , n, to each loop. Then, applying Ohm’s law and Kirchhoff’s voltage low (“The voltages
along a closed loop must sum to zero.”) to each loop leads to the linear system

R11 I1 + · · · + R1n In = V1
.. .. .. (4.23)
. . .
Rn1 Ii + · · · + Rnn In = Vn ,

where Rij describe the various resistors and Vi correspond to the voltages of the batteries. If we introduce
the n × n matrix R with entries Rij , the current vector I = (I1 , . . . , In )T and the vector V = (V1 , . . . , Vn )T
for the battery voltages this system can, of course, also be written as

RI = V . (4.24)

This is an n × n linear system, where we think of the resistors and battery voltages as given, while the
currents I1 , . . . , In are a priori unknown and can be determined by solving the system. Of course any of
the methods previously discussed can be used to solve this linear system and determine the currents Ii .

For example, consider the circuit in Fig. 21. To its three loops we assign the currents I1 , I2 , I3 as indicated

R1

R2 I2 R4

V I1
R6
R3
I3 R5

Figure 21: A simple three-loop circuit with a battery and resistors.

in the figure. Kirchhoff’s voltage law applied to the three loops then leads to

R1 I1 + R2 (I1 − I2 ) + R3 (I1 − I3 ) = V (R1 + R2 + R3 )I1 − R2 I2 − R3 I3 = V


R2 (I2 − I1 ) + R4 I2 + R6 (I2 − I3 ) = 0 ⇐⇒ −R2 I1 + (R2 + R4 + R6 )I2 − R6 I3 = 0 (4.25)
R3 (I3 − I1 ) + R6 (I3 − I2 ) + R5 I3 = 0 −R3 I1 − R6 I2 + (R3 + R5 + R6 )I3 = 0 .

With the current and voltage vectors I = (I1 , I2 , I3 )T and V = (V, 0, 0)T the matrix R in Eq. (4.24) is
then given by  
R1 + R2 + R3 −R2 −R3
R= −R2 R2 + R4 + R6 −R6  . (4.26)
−R3 −R6 R3 + R5 + R6

80
For example, for resistances (R1 , . . . , R6 ) = (3, 10, 4, 2, 5, 1) (in units of Ohm) we have the resistance
matrix  
17 −10 −4
R =  −10 13 −1  . (4.27)
−4 −1 10
For a battery voltage V = 12 (in units of volt) we can write down the augmented matrix
 
17 −10 −4 12
R0 =  −10 13 −1 0  , (4.28)
−4 −1 10 0

and solve the linear system by row reduction. This leads to the solution
 
1548
1 
I= 1248  (4.29)
905
744

for the currents (in units of Ampere).

81
5 Determinants
Determinants are multi-linear objects and are a useful tool in linear algebra. In Section 2 we have
introduced the three-dimensional determinant as the triple product of three vectors. Here we will study
the generalization to arbitrary dimensions and verify that the three-dimensional case coincides with our
previous definition. As with the other general concepts, we first define the determinant by its properties
before we derive its explicit form and study a few applications. In our discussion in Section 2 we have
observed that the three-dimensional determinant is linear in each of its vector arguments (see Eq. (2.44)),
it changes sign when two vector arguments are swapped (see Eq. (2.46)) and the determinant of the three
standard units vector is one (see Eq. (2.48)). We will now use these properties to define the determinant
in arbitrary dimensions.

5.1 Definition of a determinant


Definition 5.1. A determinant maps n vectors a1 , · · · , an ∈ F n to a number, denoted det(a1 , · · · , an ) ∈
F , such that the following properties are satisfied:
(D1) det(· · · , αa + βb, · · · ) = α det(· · · , a, · · · ) + β det(· · · , b, · · · )
This means the determinant is linear in each argument.
(D2) det(· · · , a, · · · , b, · · · ) = − det(· · · , b, · · · , a · · · )
This means the determinant is completely anti-symmetric.
(D3) det(e1 , · · · , en ) = 1
The determinant of the standard unit vectors is one.
The determinant of an n × n matrix A is defined as the determinant of its column vectors, so det(A) :=
det(A1 , . . . , An ).

An easy but important conclusion from these properties is that a determinant with two same arguments
must vanish. Indeed, from the anti-symmetry property (D2) it follows that det(· · · , a, · · · , a, · · · ) =
− det(· · · , a, · · · , a, · · · ), which means that

det(· · · , a, · · · , a, · · · ) = 0 . (5.1)

We know that an object with these properties exists for n = 3 but not yet in other dimensions. To
address this problem we first need to understand a few basic facts about permutations. Here, we will
just present a brief account of the relevant facts. For the formal-minded, Appendix B contains a more
complete treatment which includes the relevant proofs.
Permutations
You probably have an intuitive understanding of a permutation as an operation which changes the order
of a certain set of n objects. Here, we take this set to be the numbers {1, . . . , n}. Mathematically, a
permutation is defined as a bijective map from this set to itself. So the set of all permutations of n objects
is given by
Sn := {σ : {1, · · · , n} → {1, · · · , n} | σ is bijective} , (5.2)
and this set has n! elements. The basic idea is that, under a permutation σ ∈ Sn , a number i ∈ {1, . . . , n}
is permuted to its image σ(i). A useful notation for a permutation mapping 1 → σ(1), . . . , n → σ(n) is 3
 
1 ··· n
σ= . (5.3)
σ(1) . . . σ(n)
3
Note, despite the similar notation, this is not a matrix in the sense introduced earlier.

82
For example, for n = 3, a permutation which swaps 2 and 3 is written as
 
1 2 3
τ1 = . (5.4)
1 3 2

Carrying out two permutations, one after the other, simply corresponds to composition of maps in this
formalism. For example, consider a second permutation
 
1 2 3
τ2 = (5.5)
2 1 3

of three objects which swaps the numbers 1 and 2. Permuting first with τ2 and then with τ1 corresponds
to the permutation σ := τ1 ◦ τ2 which is given by
     
1 2 3 1 2 3 1 2 3
σ = τ1 ◦ τ2 = ◦ = , (5.6)
1 3 2 2 1 3 3 1 2

a cyclic permutation of the three numbers. A further advantage of describing permutations as bijective
maps is that the inverse of a permutation σ, that is, the permutation which “undoes” the effect of the
original permutation, is simple described by the inverse map σ −1 .
The specific permutations which only swap two numbers and leave all other numbers unchanged are
called transpositions. For example, the permutations (5.4) and (5.5) are transpositions. A basic and
important fact about permutations, proved in Appendix B, is that every permutation can be written as a
composition of transpositions, so any σ ∈ Sn can be written as σ = τ1 ◦ · · · ◦ τk , where τ1 , . . . , τk ∈ Sn are
transpositions. Eq. (5.6) is an illustration of this general fact.
Writing permutations as a composition of transpositions is not unique, that is, two different such
compositions can lead to the same permutation. Not even the number of transpositions required to
generate a given permutation is fixed. For example, the permutation σ in Eq. (5.6) can also be written
as σ = τ1 ◦ τ2 ◦ τ1 ◦ τ1 , that is, as a composition of four transpositions. However, it can be shown
(see Appendix B) that the number of transpositions required is always either even or odd for a given
permutation. For a permutation σ = τ1 ◦ · · · ◦ τk , written as a composition of k transpositions, it,
therefore, makes sense to define the sign of the permutation as
(
+1 : “even” permutation
sgn(σ) := (−1)k = . (5.7)
−1 : “odd” permutation

From this definition, transpositions τ are odd permutations, so sgn(τ ) = −1. For the permutation σ in
Eq. (5.6) we have sgn(σ) = 1 since it can be built from two transpositions. It is, therefore, even as we
would expect from a cyclic permutation of three objects. In essence, the definition (5.7) provides the
correct mathematical way to distinguish even and odd permutations.
When two permutations, each written in terms of transpositions, are composed with each other the
number of transpositions simply adds up. From the definition (5.7) this means that

sgn(σ1 ◦ σ2 ) = sgn(σ1 )sgn(σ2 ) . (5.8)

A direct consequence of this rule is that 1 = sgn(σ ◦ σ −1 ) = sgn(σ)sgn(σ −1 ) and, hence,

sgn(σ −1 ) = sgn(σ) . (5.9)

In other words, a permutation and its inverse have the same sign.

83
We are now ready to return to determinants and derive an explicit formula. We start with an n × n matrix
A with entries aij whose column vectors we write as linear combinations of the standard unit vectors:
 
a1i
Ai =  ...  =
 X
aji ej . (5.10)

ani j

By using the properties of the determinant from Def. (5.1) we can then attempt to work out the determi-
nant of A. We find
 
n n
(5.10) X X (D1) X
det(A) = det(A1 , · · · , An ) = det  aj1 1 ej1 , · · · , ajn n ejn  = aj1 1 · · · ajn n det(ej1 , · · · , ejn )
j1 =1 jn =1 j1 ,··· ,jn
(5.1),ja =σ(a) X (D2) X
= aσ(1)1 · · · aσ(n)n det(eσ(1) , · · · , eσ(n) ) = sgn(σ)aσ(1)1 · · · aσ(n)n det(e1 , · · · , en )
σ∈Sn σ∈Sn
(D3) X
= sgn(σ)aσ(1)1 · · · aσ(n)n
σ∈Sn

Hence, having just used the general properties of determinants and some facts about permutations, we
have arrived at a unique expression for the determinant. Conversely, it is straightforward to show that
this expression satisfies all the requirements of Def. 5.1. In summary, we conclude that the determinant,
as defined in Def. 5.1, is unique and explicitly given by
X
det(A) = det(A1 , · · · , An ) = sgn(σ)aσ(1)1 · · · aσ(n)n , (5.11)
σ∈Sn

where aij are the entries of the n × n matrix A. Note that the sum on the RHS runs over all permutations
in Sn and, therefore, has n! terms. A useful way to think about this sum is as follows. From each column of
the matrix A, choose one entry such that no two entries lie in the same row. A term in Eq. (5.11) consists
of the product of these n entries (times the sign of the permutation involved) and the sum amounts to all
possible ways of making this choice.
Another useful way to write the determinant which is often employed in physics involves the n-
dimensional generalization of the Levi-Civita tensor, defined by

 +1 if i1 , . . . , in is an even permutation of 1, . . . , n
i1 ···in = −1 if i1 , . . . , in is an odd permutation of 1, . . . , n . (5.12)
0 otherwise

Essentially, the Levi-Civita tensor plays the same role as the sign of the permutation (plus it vanishes if it
has an index appearing twice when i1 , . . . , in is not actually a permutation of 1, . . . , n) so that Eq. (5.11)
can alternatively be written as
det(A) = i1 ···in ai1 1 · · · ain n , (5.13)
with a sum over the n indices i1 , . . . , in implied.
Low dimensions and some special cases
To get a better feel for the determinant it is useful to look at low dimensions first. For n = 2 we have
 
a1 b1
det = ij ai bj = 12 a1 b2 + 21 a2 b1 = a1 b2 − a2 b1 . (5.14)
a2 b2

84
The two terms on the right-hand side correspond to the two permutations of {1, 2}. In three dimensions
we find
 
a1 b1 c1
det  a2 b2 c2  = ijk ai bj ck = a1 b2 c3 + a2 b3 c1 + a3 b1 c2 − a2 b1 c3 − a3 b2 c1 − a1 b3 c2 (5.15)
a3 b3 c3
= ha, b, ci = a · (b × c) (5.16)

The last line follows by comparison with Eq. (2.43). Hence, the three-dimensional determinant as from our
general definition is indeed the triple product and coincides with our earlier definition of the determinant.
The six terms in the right-hand side of Eq. (5.15) correspond to the six permutations of {1, 2, 3} and we
recall from Eq. (2.50) that they can be explicitly computed by multiplying the terms along the diagonals
of the matrix.

Example 5.1: Computing determinants for 2 × 2 and 3 × 3 matrices


(a) For a 2 × 2 matrix  
3 −2
A= (5.17)
4 −5
we have from Eq. (5.14)
3 −2
 
det(A) = det 
 = 3 · (−5) − (−2) · 4 = −7 (5.18)
4 −5

(b) To compute the determinant of a 3 × 3 matrix


 
1 −2 0
A= 3 2 −1  (5.19)
4 2 5

we need to write down the six terms in Eq. (5.15) which can be obtained by multiplying the terms along
the diagonals of A, following the rule (2.50). Explicitly
1 −2 0
 
    
 

det(A) =det  3 2 −1 

   
 

4 2 5
= 1 · 2 · 5 + (−2) · (−1) · 4 + 0 · 3 · 2 − 0 · 2 · 4 − (−2) · 3 · 5 − 1 · (−1) · 2
= 10 + 8 + 30 + 2 = 50 . (5.20)

Recall that each of the six terms are obtained by multiplying three entries along a diagonal as indicated
by the lines (where corresponding lines at the left and right edge of the matrix should be identified to
collect all factors). The three  diagonals correspond to the three cyclic permutations which appear with
a positive sign while the three  diagonals lead to the anti-cyclic terms which come with a negative sign.

The determinant of a 4 × 4 matrix has 4! = 24 terms and it is n! terms for an n × n matrix, so this becomes
complicated quickly. An interesting class of matrices for which the determinant is simple consists of upper

85
triangular matrices, that is, matrices with all entries below the diagonal vanishing. In this case
 
a1 ∗
det 
 ..  = a1 · · · an ,

(5.21)
.
0 an

so the determinant is simply the product of the diagonal elements 4 . This can be seen from Eq. (5.11). We
should consider all ways of choosing one entry per column such that no two entries appear in the same row.
For an upper triangular matrix, the only non-zero choice in the first column is the first entry, so that the
first row is “occupied”. In the second column the only available non-trivial choice is, therefore, the entry
in the second row etc. In conclusion, from the n! terms in Eq. (5.11) only the term which corresponds to
the product of the diagonal elements is non-zero. An easy conclusion from Eq. (5.21) is that

det(1n ) = 1 , (5.22)

as must be the case from property (D3) in Def. 5.1.

5.2 Properties of the determinant and calculation


As we have seen from the previous discussion, the explicit expression for the determinant becomes compli-
cated quickly as the dimension increases. To be able to work with determinants in general we, therefore,
need to explore some of their more sophisticated properties. We begin with the relation between the
determinant and the transposition of matrices.
Lemma 5.1. The determinant of a matrix and its transpose are the same, so det(A) = det(AT ).
Proof. By setting ja = σ(a), for a permutation σ ∈ Sn we can re-write a term in the sum (5.11) for the
determinant as Aσ(1)1 · · · Aσ(n)n = Aj1 σ−1 (j1 ) · · · Ajn σ−1 (jn ) = A1σ−1 (1) · · · Anσ−1 (n) , where the last equality
follows simply be re-ordering the factors, given that j1 , . . . , jn is a permutation of 1, . . . , n. From this
observation the determinant (5.11) can be written as
X (5.9) X
det(A) = sgn(σ)A1σ−1 (1) · · · Anσ−1 (n) = sgn(σ −1 )A1σ−1 (1) · · · Anσ−1 (n)
σ∈Sn σ −1 ∈Sn
ρ=σ −1 X
= sgn(ρ)(AT )ρ(1)1 · · · (AT )ρ(n)n = det(AT ) .
ρ∈Sn

Another obvious question is about the relation between the determinant and matrix multiplication.
Fortunately, there is a simply and beautiful answer.
Theorem 5.1. det(AB) = det(A) det(B), for any two n × n matrices A, B.
Proof. Recall from Eq. (3.52) the index form of matrix multiplication
X
(AB)ij = Aik Bkj .
k

By focusing on a particular value of j in this expression we can write the j th column of AB as


X
(AB)j = Bkj Ak , (5.23)
k
4
Of course an analogous statement holds for lower triangular matrices.

86
where Ak are the columns of A. Hence,
 
(5.23) X X
det(AB) = det((AB)1 , · · · , (AB)n ) = det  Bk1 1 Ak1 , · · · , Bkn n Akn 
k1 kn
(D1) X ka =σ(a) X
= Bk1 1 · · · Bkn n det(Ak1 , · · · , Akn ) = Bσ(1)1 · · · Bσ(n)n det(Aσ(1) , · · · , Aσ(n) )
k1 ,··· ,kn σ∈Sn
(D2) X
= sgn(σ)Bσ(1)1 · · · Bσ(n)n det(A1 , · · · , An ) = det(A) det(B)
| {z }
σ∈Sn
| {z } det(A)
det(B)

This simple multiplication rule for determinants of matrix products has a number of profound conse-
quences. First, we can prove a criterion for invertibility of a matrix, based on the determinant, essentially
a more complete version of Claim 2.1.

Corollary 5.1. For an n × n matrix A we have:

A is bijective (that is, A has an inverse) ⇐⇒ det(A) 6= 0 (5.24)

If A is invertible then det(A−1 ) = (det(A))−1 .

Proof. “⇒”: If A is bijective it has an inverse A−1 and 1 = det(1n ) = det(AA−1 ) = det(A) det(A−1 ).
This implies that det(A) 6= 0 and that det(A−1 ) = (det(A))−1 which is the second part of our assertion.
“⇐”: We prove this indirectly, so we start by assuming that A is not bijective. From Lemma 3.4 this
means that rk(A) < n, so the rank of A is less than maximal. Hence, at least one of the column vectors
of A, say A1 for definiteness, can be expressed as a linear combination of the others, so that
n
X
A1 = αi Ai
i=2

for some coefficients αi . For the determinant of A this means


n n
!
1 2 n
X
i 2 n (D1) X (5.1)
det(A) = det(A , A , . . . , A ) = det αi A , A , . . . , A = αi det(Ai , A2 , . . . , An ) = 0
i=2 i=2

Note that, for invertible matrices A, this provides us with a useful way to calculate the determinant
of the inverse matrix by
det(A−1 ) = (det(A))−1 . (5.25)
Combining this rule and Theorem 5.1 implies that det(P AP −1 ) = det(P ) det(A)(det(P ))−1 = det(A), so,
in short
det(P AP −1 ) = det(A) . (5.26)
This equation says that the determinant remains unchanged under basis transformations (3.96) and, as a
result, the determinant is the same for every matrix representing a given linear map. The determinant is,
therefore, a genuine property of the linear map and we can talk about the determinant of a linear map,
defined as the determinant of any of its representing matrices.

87
Example 5.2: Using the determinant to check if a matrix is invertibe
The above Corollary is useful to check if (small) matrices are invertible. Consider, for example, the family
of 3 × 3 matrices  
1 −1 a
A= 0 a −3  (5.27)
−2 0 1
where a ∈ R is a real parameter. We can ask for which values of a the matrix A is invertible. Computing
the determinant is straightforward and leads to

det(A) = 2a2 + a − 6 . (5.28)

This vanishes precisely when a = −2 or a = 3/2 and, hence, for these values of a the matrix A is not
invertible. For all other values it is invertible.

Our next goal is to find a recursive method to calculate the determinant, essentially by writing the
determinant of a matrix in terms of determinants of sub-matrices. To this end, for an n × n matrix A, we
define the associated n × n matrices
 th

0 ←j col
 .. 
 “A” . “A” 
 0 
th
 
Ã(i,j) =  0 ··· 0 1 0 ··· 0 

 ← i row (5.29)
 0 
 .. 
“A” . “A”
 
0

They are obtained from A by setting the (i, j) entry to 1, the other entries in row i and column j to zero
and keeping the rest of the matrix unchanged. Note that the subscripts (i, j) indicate the row and column
which have been changed rather than specific entries of the matrix (hence the bracket notation). With
the so-defined matrices we define the co-factor matrix, an n × n matrix C with entries

Cij := det(Ã(i,j) ) . (5.30)

To find a more elegant expression for the co-factor matrix, we also introduce the (n − 1) × (n − 1) matrices
A(i,j) which are obtained from A by simply removing the ith row and the j th column. It takes i − 1 swaps
of neighbouring rows in (5.29) to move row i to the first row (without changing the order of any other
rows) and a further j − 1 swaps to move column j to the first column. After these swaps the matrix Ã(i,j)
becomes  
1 0 ··· 0
 0 
B(i,j) =  .  , (5.31)
 
 .. A(i,j) 
0
From Def. (5.1) (D2) and Lemma 5.1 it is clear that det(Ã(i,j) ) = (−1)i+j det(B(i,j) ), since we need a total
of i + j − 2 swaps of rows and columns to convert one matrix into the other. Further, the explicit form of
the determinant (5.11) implies that det(B(i,j) ) = det(A(i,j) ) (as the only non-trivial choice of entry in the
first column of B(i,j) is the 1 in the first row). Combining these observations means the co-factor matrix
is given by
Cij = det(Ã(i,j) ) = (−1)i+j det(A(i,j) ) . (5.32)

88
Hence, the co-factor matrix contains, up to signs, the determinants of the (n − 1) × (n − 1) sub-matrices
of A, obtained by deleting one row and one column from A. As we will see, for explicit calculations, it is
useful to note that the signs in Eq. (5.32) follow a “chess board pattern”, that is, the matrix with entries
(−1)i+j has the form  
+ − + ···
 − + − ··· 
(5.33)
 
 + − + ··· 
.. .. .. ..
 
. . . .
Our goal is to relate the determinant of A to the determinants of sub-matrices, that is to the entries of
the co-factor matrix. This is accomplished by
Lemma 5.2. For an n × n matrix A with associated co-factor matrix C, defined by Eq. (5.32), we have

C T A = det(A)1n (5.34)

Proof. This follows from the definition of the co-factor matrix, more or less by direct calculation.
(3.52) X X (5.32) X
(C T A)ij = (C T )ik Akj = Akj Cki = Akj det(Ã(k,i) )
k k k
(5.29) X
1 i−1 i+1
= Akj det(A , · · · , A , ek , A , · · · , An )
k
!
(D1) X
= det A1 , · · · , Ai−1 , Akj ek , Ai+1 , · · · , An
k
(5.1)
= det(A1 , · · · , Ai−1 , Aj , Ai+1 , · · · , An ) = δij det(A) = (det(A)1n )ij

An immediate conclusion from Lemma 5.2 is


X X X
det(A) = (C T A)jj = (C T )ji Aij = Cij Aij = (−1)i+j Aij det(A(i,j) ).
i i i
so, X
det(A) = (−1)i+j Aij det(A(i,j) ) . (5.35)
i
This identity is referred to as Laplace expansion of the determinant. It realizes our goal of expressing the
determinant of A in terms of determinants of the sub-matrices A(i,j) . More specifically, in Eq. (5.35) we
can choose any column j and compute the determinant of A by summing over the entries i in this column
times the determinants of the corresponding sub-matrices A(i,j) (taking into account the sign). This is
also referred to as expanding the determinant “along the j th column”. Since the determinant remains
unchanged under transposition it can also be computed in a similar way be expanding “along the ith row”.
To see how this works in practice we consider the following

Example 5.3: Laplace expansion of determinant


We would like to compute the determinant of the matrix
 
2 −1 0
A= 1 2 −2  (5.36)
0 3 4

89
by expanding along its 1st column. From Eq. (5.35), taking into account the signs as indicated in (5.33),
we find
det(A) = A11 det(A(1,1) ) − A21 det(A(2,1) ) + A31 det(A(3,1) )
     
2 −2 −1 0 −1 0
= 2 · det − 1 · det + 0 · det = 2 · 14 − 1 · (−4) + 0 · 2 = 32
3 4 3 4 2 −2
Note that the efficiency of the calculation can be improved by choosing the row or column with the most
zeros.

A by-product of Lemma 5.2 is a new method to compute the inverse of a matrix. If A is invertible then,
from Cor. 5.1, det(A) 6= 0 and we can divide by det(A) to get
1
C T A = 1n .
det(A)
Hence, the inverse of A is given by
1
A−1 = CT . (5.37)
det(A)
Again, it is worth applying this to an example.

Example 5.4: Inverse of a matrix using the co-factor method


(a) We would like to find the inverse of a general 2 × 2 matrix
 
a b
A= (5.38)
c d
using the co-factor method. The co-factor matrix of A is easily obtained by switching around the diagonal
and non-diagonal entries and inverting the signs of the latter:
 
d −c
C= . (5.39)
−b a
With det(A) = ad − cb (which should be different from zero for the inverse to exist) we have for the inverse
 
−1 1 T 1 d −b
A = C = . (5.40)
det(A) ad − cb −c a
Note that this provides a rule for inverting 2 × 2 matrices which is relatively easy to remember: Exchange
the diagonal elements, invert the signs of the off-diagonal elements and divide by the determinant.
(b) We consider the matrix A, Eq. (5.36), from the previous example. From Eq. (5.32) we find for the
associated co-factor matrix  
14 −4 3
C= 4 8 −6  .
2 4 5
With det(A) = 32 the inverse is
 
14 4 2
1 1 
A−1 = CT = −4 8 4  .
det(A) 32
3 −6 5

90
We note that, for larger matrices, the row reduction method discussed in Section 3.3 is a more efficient
way of computing the inverse than the co-factor method. Indeed, for an n × n matrix the number of
operations required for a row reduction grows roughly as n3 while computing a determinant requires ∼ n!
operations.
Despite our improved methods, the calculation of determinants of large matrices remains a problem,
essentially because the aforementioned n! growth of the number of terms in Eq. (5.11). Using a Laplace
expansion will improve matters only if the matrix in question has many zeros. However, by using elemen-
tary row operations, we can get to an efficient way of computing large determinants. The key observation
is that, from the general properties of the determinant in Def. 5.1, row operations of type (R1) (see
Def. 3.7) only change the sign of the determinant and row operations of type (R2) leave the determinant
unchanged. A given matrix A can be brought into upper echelon form, A0 , by a succession of these row
operations and, hence, det(A) = (−1)p det(A0 ), where p is the number of row swaps used in the process.
The matrix A0 is in fact in upper triangular form
 
a1 ∗
A0 = 
 .. .

.
0 an
and, as discussed earlier, the determinant of such a matrix is simply the product of its diagonal entries.
It follows that det(A) = (−1)p a1 · · · an .

5.3 Cramer’s Rule


We have already seen how the determinant of a matrix can be used to decide if an n × n matrix A is
invertible, and how to compute the inverse of a matrix. Here we introduce Cramer’s Rule, which uses
determinants to solve systems of linear equations Ax = b for the case of quadratic and invertible n × n
matrices A. Recall from our general discussion in Section 4.1 that, in this case, the linear system has a
unique solution, x = A−1 b, for any vector b.
To derive Cramer’s rule we first define the matrices

B(i) := (A1 , · · · , Ai−1 , b, Ai+1 , · · · , An ) , (5.41)

which are obtained from A by replacing the ith column with b and keeping all other columns unchanged.
We also note that, in terms of the column vectors Aj of A the linear system Ax = b can be written as
(see, for example, Eq. (3.36)) X
xj A j = b , (5.42)
j

where x = (x1 , . . . , xn )T . Then we find


(5.42) X
det(B(i) ) = det(A1 , · · · , Ai−1 , b, Ai+1 , · · · , An ) = det(A1 , · · · , Ai−1 , xj Aj , Ai+1 , · · · , An )
j
(D1) X (D2)
= xj det(A1 , · · · , Ai−1 , Aj , Ai+1 , · · · , An ) = xi det(A1 , · · · , Ai−1 , Ai , Ai+1 , · · · , An )
j

= xi det(A) .

Solving for xi we find Cramer’s rule


det(B(i) ) det(A1 , · · · , Ai−1 , b, Ai+1 , · · · , An )
xi = = (5.43)
det(A) det(A)

91
for the solution x = (x1 , . . . , xn )T of the linear system Ax = b, where A is an invertible n × n matrix.
To solve linear systems explicitly, Cramer’s rule is only useful for relatively small systems, due to the n!
growth of the determinant. For larger linear systems the row reduction method introduced in Section (4.3)
should be used.

Example 5.5: Cramer’s rule


Let us apply Cramer’s rule to a linear system Ax = b with
   
2 −1 0 1
A= 1 2 −2  , b= 2  . (5.44)
0 3 4 0

From Eq. (5.41), that is by replacing one column of A with the vector b, we find the three matrices
     
1 −1 0 2 1 0 2 −1 1
B(1) =  2 2 −2  , B(2) =  1 2 −2  , B(3) =  1 2 2  . (5.45)
0 3 4 0 0 4 0 3 0

By straightforward computation, for example using a Laplace expansion, it follows that det(A) = 32,
det(B(1) ) = 22, det(B(2) ) = 12 and det(B(3) ) = −9. From Eq. (5.43) this leads to the solution
 
22
1 
x= 12  .
32
−9

92
6 Scalar products
In Section 2 we have introduced the standard scalar product on Rn (the dot product) and we have seen
its usefulness, particularly for geometrical applications. Here, we study its generalizations to arbitrary
real and complex vector spaces.

6.1 Real and hermitian scalar products


Definition 6.1. A real (hermitian) scalar product on a vector space V over F = R (F = C) is a map
h · , · i : V × V → R (C) satisfying
(S1) hv, wi = hw, vi, for a real scalar product, F = R
hv, wi = hw, vi∗ , for a hermitian scalar product, F = C
(S2) hv, αu + βwi = αhv, ui + βhv, wi
(S3) hv, vi > 0 if v 6= 0
for all vectors v, u, w ∈ V and all scalars α, β ∈ F .
If (S1) and (S2), but not necessarily (S3) are satisfied, then h · , · i is called a bi-linear form (in the real
case F = R) or a sesqui-linear form (in the complex case F = C).

Let us discuss this definition, beginning with the case of a real scalar product. The condition (S2) says
that a scalar product is linear in the second argument, in precisely the same sense that a linear map
is linear (see Def. 3.5). For the real case, the scalar product is symmetric in the two arguments from
condition (S1) and, together with (S2), this implies linearity in the first argument, so

hαv + βu, wi = αhv, wi + βhu, wi . (6.1)

So, in the real case, the scalar product is bi-linear. In this sense, we should think of the above definition
as natural, extending our notion of linearity to maps with two vectorial arguments.
The situation is somewhat more complicated in the hermitian case. Here, the complex conjugation in
(S1) together with (S2) leads to

hαv + βu, wi = α∗ hv, wi + β ∗ hu, wi . (6.2)

Hence, sums in the first argument of a hermitian scalar product can still be pulled apart, but scalars are
pulled out with a complex conjugation. This property, together with the linearity in the second argument 5
is also called sesqui-linearity.
The property (S3) ensures that we can sensibly define the norm (or length) of a vector as
p
|v| := hv, vi . (6.3)

Note that in the hermitian case, (S1) implies that hv, vi = hv, vi∗ so that hv, vi is real. For this reason,
the condition (S3) actually makes sense in the hermitian case (if hv, vi was complex there would be no
well-defined sense in which we could demand it to be positive) and this explains the need for including
the complex conjugation in (S1).
The Cauchy-Schwarz inequality can be shown as in Lemma 2.1 (taking care to include complex con-
jugation in the hermitian case), so we have in general

|hv, wi| ≤ |v||w| . (6.4)


5
In some parts of the mathematics literature a hermitian scalar product is defined to be linear in the first argument. Our
definition based on linearity in the second argument is the usual convention in the physics literature.

93
The proof of the triangle inequality in Lemma 2.2 also goes through in general, so for the norm (6.3) of a
general scalar product we have
|v + w| ≤ |v| + |w| . (6.5)
For a real scalar product, in analogy with Eq. (2.10), the Cauchy-Schwarz inequality allows the definition
of the angle ^(v, w) ∈ [0, π] between two non-zero vectors v, w by

hv, wi
cos(^(v, w)) := . (6.6)
|v||w|

For any scalar product, two vectors v and w are called orthogonal iff hv, wi = 0. Hence, for a real scalar
product, the non-zero vectors v and w are orthogonal precisely when they form an angle ^(v, w) = π/2.
We should now discuss some examples of scalar products.

Example 6.1: Examples of scalar products


(a) Standard scalar product in Rn
This is the dot product introduced earlier. For two vectors v = (v1 , . . . , vn )T and w = (w1 , . . . , wn )T in
Rn it is defined as
Xn
T
hv, wi := v · w = v w = v i wi . (6.7)
i=1

We already know from Eq. (2.3) that it satisfies all the requirements in Def. 6.1 for a real scalar product.
(b) Standard scalar product in Cn
For two vectors v = (v1 , . . . , vn )T and w = (w1 , . . . , wn )T in Cn the standard scalar product in Cn is
defined as
X n

hv, wi := v w = vi∗ wi . (6.8)
i=1

It is easy to check that it satisfies the requirements in Def. 6.1 for a hermitian scalar product. In particular,
the associated norm is given by
X n
2
|v| = hv, vi = |vi |2 , (6.9)
i=1

where |vi | denotes the modulus of the complex number vi . This is indeed real and positive, as it must,
but note that the inclusion of the complex conjugate in Eq. (6.8) is crucial.
(c) Minkowski product in R4
For two four-vectors v = (v0 , v1 , v2 , v3 )T and w = (w0 , w1 , w2 , w3 )T in R4 , the Minkowski product is
defined as

hv, wi := vT ηw = −v0 w0 + v1 w1 + v2 w2 + v3 w3 where η = diag(−1, 1, 1, 1) . (6.10)

It is easy to show that it satisfies conditions (S1) and (S2) but not condition (S3). For example, for
v = (1, 0, 0, 0)T we have
hv, vi = −1 , (6.11)
which contradicts (S3). Therefore, the Minkowski product is not a scalar product but merely a bi-linear
form. Nevertheless, it plays an important role in physics, specifically in the context of special (and general)
relativity.
(d) Scalar product for function vector spaces

94
Def. 6.1 applies to arbitrary vector spaces so we should discuss at least one example of a more abstract
vector space. Consider the vector space of continuous (real- or complex-valued) functions f : [a, b] → R
or C on an interval [a, b] ⊂ R. A scalar product for such functions can be defined by the integral
Z b
hf, gi := dxf (x)∗ g(x) . (6.12)
a

It is easily checked that the conditions (S1)–(S3) are satisfied. Scalar products of this kind are of great
importance in physics, particularly in quantum mechanics.
(e) Scalar product for real matrices
The real n × n matrices form a vector space V with vector addition and scalar multiplication defined
component-wise as in Example 1.4 (e). The dimension of this space is n2 with a basis given by the
matrices E(ij) , defined in Eq. (1.54). On this space, we can introduce a scalar product by
X
hA, Bi := tr(AT B) = Aij Bij . (6.13)
i,j

where thePsymbol tr denotes the trace of a matrix A, defined as the sum over its diagonal entries, so
tr(A) := i Aii . The sum on the RHS of Eq. (6.13) shows that this definition is in complete analogy
with the dot product for real vectors, but with the summation running over two indiced instead of just
one. It is, therefore, clear that all requirements for a scalar product are satisfied. For complex matrices a
hermitian scalar product can be defined analogously simply by replacing the transpose in Eq. (6.13) with
a hermitian conjugate.

We conclude our introduction to scalar products with a simple but important observation about orthogonal
vectors.

Lemma 6.1. Pairwise orthogonal and non-zero vectors v1 , . . . , vk are linearly independent.

Proof. Start with


k
X
α i vi = 0
i=1

and take the scalar product of this equation with one of the vectors, vj . Since hvi , vj i = 0 for i 6= j it
follows that αj |vj |2 = 0. Since vj 6= 0 its norm is positive, |vj | > 0, so αj = 0.

6.2 Orthonormal basis, Gram-Schmidt procedure


From the previous Lemma, n pairwise orthogonal, non-zero vectors in an n-dimensional vector space form
a basis. This motivates the following

Definition 6.2. A basis 1 , . . . , n of a vector space V with a scalar product is called ortho-normal iff

hi , j i = δij , (6.14)

that is, if the basis vectors are pairwise orthogonal and have length one.

Example 6.2: Examples of ortho-normal basis


(a) The basis of standard unit vectors, e1 , . . . , en of Rn (Cn ) is an ortho-normal basis with respect to the

95
standard scalar product on Rn (Cn ), as defined in Example 6.1.
(b) The vectors    
1 1 1 1
1 = √ , 2 = √ (6.15)
2 1 2 −1
form an orthonormal basis on R2 with respect to the standard scalar product, that is, Ti j = δij .
(c) The vectors    
1 2 1 1
1 = √ , 2 = √ (6.16)
5 i 5 −2i

form an orthonormal basis on C2 with respect to the standard scalar product, that is, †i j = δij . Note,
it is crucial to use the proper standard scalar product 6.1 (b) for the complex case which involves the
hermitian conjugate rather than the transpose.
(d) For the vector space of real n × n matrices, the matrices E(ij) , defined in Eq. (1.54), form an ortho-
normal basis with respect to the scalar product (6.13).

An ortho-normal basis has many advantages compared to an arbitrary basis of a vector space. For example,
consider the coordinates of a vector v ∈PV relative to an ortho-normal basis {1 , . . . , n }. Of course, we
can write v as a linear combination v = ni=1 αi i with some coordinates αi but, in the general case, these
coefficients need to be determined by solving a system of linear equations. For an ortho-normal basis, we
can just take the scalar product of this equation with j , leading to
n
X n
X
hj , vi = hj αi i i = αi hj , i i = αj
| {z }
i=1 i=1
=δij

So in summary, the coordinates of a vector v relative to an ortho-normal basis {1 , . . . , n } can be computed
as
Xn
v= αi i ⇐⇒ αi = hi , vi . (6.17)
i=1

Example 6.3: Coordinates relative to an ortho-normal basis in R2 and C2


As before, we have to be careful to distinguish the real and the complex case, since the respective standard
scalar products differ by a complex conjugation. We begin with the real case.
(a) Consider R2 with ortho-normal basis {1 , 2 } as in Eq. (6.15) and the vector v = (2, −3)T . We would
like to write this vector as a linear combination v = α1 1 + α2 2 . Then, the coordinates α1 , α2 are given
by
 T    T  
T 1 1 2 1 T 1 1 2 5
α1 =  1 v = √ =− √ , α2 = 2 v = √ =√ .
2 1 −3 2 2 −1 −3 2

(b) For C2 we use the ortho-normal basis {1 , 2 } from (6.16) and the same vector v = (2, −3)T which
we would like to write as a linear combination v = β1 1 + β2 2 . Then,
 †    †  
1 2 2 4 + 3i 1 1 2 2 − 6i
β1 = †1 v =√ = √ , β2 = †2 v =√ = √ .
5 i −3 5 5 −2i −3 5

96
Note it is crucial to use the hermitian conjugate, rather than the transpose in this calculation.

Does every (finite-dimensional) vector space have an ortho-normal basis and, if so, how can it be deter-
mined? The Gram-Schmidt procedure answers both of these questions.

Theorem 6.1. (Gram-Schmidt procedure) If {v1 , . . . , vn } is a basis of the vector space V , then there exists
an ortho-normal basis {1 , . . . , n } of V such that Span(1 , . . . , k ) = Span(v1 , . . . , vk ) for all k = 1, . . . , n.

Proof. The proof is constructive. The first vector of our prospective ortho-normal basis is obtained by
simply normalizing v1 , that is,
v1
1 = . (6.18)
|v1 |
Clearly, |1 | = 1 and Span(1 ) = Span(v1 ). Suppose we have already constructed the first k − 1 vectors
1 , . . . , k−1 , mutually orthogonal, normalized and such that Span(1 , . . . , j ) = Span(v1 , . . . , vj ) for all
j = 1, . . . , k − 1. The next vector, k , is then constructed by first subtracting from vk its projections onto
1 , . . . , k−1 and then normalizing, so
k−1
X vk0
vk0 = vk − hi , vk ii , k = . (6.19)
|vk0 |
i=1

Obviously, |k | = 1 and for any vector j with j < k we have


k−1
X
hj , vk0 i = hj , vk i − hi , vk ihj , i i = hj , vk i − hj , vk i = 0 .
| {z }
i=1
=δij

Hence, k is orthogonal to all vectors 1 , . . . , k−1 . Moreover, since Span(1 , . . . , k−1 ) = Span(v1 , . . . , vk−1 )
and vk and k only differ by a re-scaling and terms proportional to 1 , . . . , k−1 is follows that Span(1 , . . . , k ) =
Span(v1 , . . . , vk ).

We have seen that every finitely spanned vector space has a basis. The above theorem, therefore, shows
that every finitely spanned vector space with a scalar product also has an ortho-normal basis. Note that
the proof provides a practical method, summarized by Eqs. (6.18), (6.19), to compute an ortho-normal
basis from a given basis. Let us apply this method to some explicit examples.

Example 6.4: Gram-Schmidt procedure


(a) Start with the basis     
1 2 1
v1 =  1  , v2 =  0  , v3 =  −2 
0 1 −2
of R3 . We would like to construct the associated ortho-normal basis with respect to the standard scalar
product (the dot product).
1) To find 1 use Eq. (6.18):  
1
v1 1
1 = =√  1  .
|v1 | 2 0

97
2) To find 2 use Eq. (6.19) for k = 2:
       
2 1 1 1
v20 1 
v20 = v2 − h1 , v2 i1 =  0  −  1  =  −1  , 2 = 0 = √ −1  .
|v2 | 3
1 0 1 1
3) To find 3 use Eq. (6.19) for k = 3:
         
1 1 1 1 1
1 1 7 v30 1
v30 = v3 −h1 , v3 i1 −h2 , v3 i2 =  −2 +  1 −  −1  =  −1  , 3 = = √  −1  .
2 3 6 |v30 | 6
−2 0 1 −2 −2
So, in summary, the ortho-normal basis is
     
1 1 1
1 1 1
1 = √  1  , 2 = √  −1  , 3 = √  −1  .
2 0 3 1 6 −2
It is easy (and always advisable) to check that indeed hi , j i = δij .
(b) For a somewhat more adventurous application of the Gram-Schmidt procedure consider the vector
space of quadratic polynomials in one variable x ∈ [1, −1] with real coefficients and a scalar product
defined by Z 1
hf, gi = dxf (x)g(x) .
−1
We would like to find the ortho-normal basis associated to the standard monomial basis v1 = 1, v2 = x,
v3 = x2 of this space.
1) To find 1 :
Z 1
v1 1
hv1 , v1 i = dx = 2 , 1 = =√ .
−1 |v 1 | 2
2) To find 2 first compute v2 0

Z 1
x
h1 , v2 i = dx √ = 0 , v20 = v2 − h1 , v2 i1 = x ,
−1 2
and then normalize r
1
2 v0 3
Z
hv20 , v20 i = 2
dx x = 2 = 20 = x.
−1 3 |v2 | 2
3) To find 3 first compute v30
Z 1 √ r Z 1
1 2 3 1
h1 , v3 i = √ 2
dx x = , h2 , v3 i = dx x3 = 0 , v30 = v3 −h1 , v3 i1 −h2 , v3 i2 = x2 − ,
2 −1 3 2 −1 3
and normalize
1 2
r
1
v0
 
8 5
Z
hv30 , v30 i = dx x2 − = , 3 = 30 = (3x2 − 1) .
−1 3 45 |v3 | 8
So, in summary, the ortho-normal polynomial basis is
r r
1 3 5
1 = √ , 2 = x, 3 = (3x2 − 1) .
2 2 8
These are the first three of an infinite family or ortho-normal polynomials, referred to as Legendre poly-
nomials, which play an important role in mathematical physics.

98
We have already seen in Eq. (6.17) that the coordinates of a vector relative to an ortho-normal basis are
easily computed from the scalar product. There are a few more helpful simplifications which arise for an
ortho-normal basis. For their derivation, we start with two vectors
X
v= αi i , αi = hi , vi (6.20)
i
X
w= βi i , βi = hi , wi (6.21)
i

and compute their scalar product


X X X
hv, wi = αi∗ βj hi , j i = αi∗ βi = hv, i ihi , wi . (6.22)
| {z }
i,j i i
=δij

This shows that, relative to an ortho-normal basis, a scalar product can be expressed in terms of the
standard scalar product on Rn or Cn . Suppose we would like to compute the representing matrix A
of a linear map f : V → V relative to an ortho-normal basis {1 , . . . , n } of V . In general, following
Lemma 3.3, the entries Aij of the matrix A can be obtained from
X
f (j ) = Aij i . (6.23)
i

Taking the scalar product of this equation with k results in the simple formula
Aij = hi , f (j )i . (6.24)
In physics, the RHS of this expression is often referred to as a matrix element of the map f . It is worth
noting that a linear map is uniquely determined by its matrix elements.
Lemma 6.2. If two linear maps f : V → V and g : V → V satisfy hv, f (w)i = hv, g(w)i (or hf (v), wi =
hg(v), wi) for all v, w ∈ V then f = g.
Proof. By linearity of the scalar product in the second argument the assumption implies that hv, f (w) −
g(w)i = 0 for all v, w ∈ V . In particular, if we choose v = f (w) − g(w), it follows from Def. 6.1 (S3)
that f (w) − g(w) = 0. Since this holds for all w it follows that f = g. The alternative statement follows
simply by applying Def. 6.1 (S1).

Example 6.5: Calculating the matrix representing a linear map relative to an ortho-normal basis
For a fixed vector n ∈ R3 , we consider the linear map f : R3 → R3 defined by
f (v) = (n · v)n . (6.25)
Evidently, this map projects vectors into the direction of n. We would like to compute the matrix A
representing this linear map relative to the ortho-normal basis given by the three standard unit vectors
ei , using Eq. (6.24). We find
Aij = ei · f (ej ) = (n · ei )(n · ej ) = ni nj . (6.26)
and, hence, Aij = ni nj or, in matrix notation
n21 n1 n2 n1 n3
 

A =  n1 n2 n22 n2 n3  (6.27)
n1 n3 n2 n3 n23

99
We end this discussion of orthogonality with a result on perpendicular spaces. For a sub vector space
W ⊂ V the perpendicular space W ⊥ is defined as

W ⊥ = {v ∈ V | hw, vi = 0 for all w ∈ W } . (6.28)

In other words, W ⊥ consists of all vectors which are orthogonal to all vector in W . For example, if W ⊂ R3
is a plane through the origin then W ⊥ is the line through the origin perpendicular to this plane. The
following statements are intuitive and will be helpful for our treatment of eigenvectors and eigenvalues in
the next section.

Lemma 6.3. For a sub vector space W ⊂ V of a finite dimensional vector space V with a scalar product
the following holds:
(i) W ⊥ is a sub vector space of V .
(ii) W ∩ W ⊥ = {0}
(iii) dim(W ) + dim(W ⊥ ) = dim(V )

Proof. (i) If v1 , v2 ∈ W ⊥ then clearly αv1 + βv2 ∈ W ⊥ so from Def. 1.2 W ⊥ is a sub vector space.
(ii) If v ∈ W ∩ W ⊥ then hv, vi = 0, but from Def. 6.1 (S3) this implies that v = 0.
(iii) Choose an ortho-normal basis {1 , . . . , k } of W and define the linear map f : V → V by f (v) =
P k
i=1 hi , vii (a projection onto W ). Clearly Im(f ) ⊂ W . For w ∈ W it follows from Eq. (6.17) that
f (w) = w so that Im(f ) = W . Moreover, Ker(f ) = W ⊥ and the claim follows from the dimension
formula (3.4) applied to the map f .

6.3 Adjoint linear map


A common theme in mathematics is to explore the new structures which arise from consistency require-
ments when two mathematical ideas are combined. In the present case, we have combined vector spaces
and scalar products. Since vector spaces are equipped with linear maps it is, therefore, natural to ask
about the relation between linear maps and scalar product. Specifically, we would like to study, in this
sub-section and the next, specific classes of linear maps which relate to a given scalar product in an
interesting way. We begin by defining adjoint linear maps.

Definition 6.3. For a linear map f : V → V on a vector space V with scalar product, an adjoint linear
map, f † : V → V is a map satisfying
hv, f wi = hf † v, wi (6.29)
for all v, w ∈ V .

In other words, a linear map can be “moved” into the other argument of the scalar product by taking its
adjoint. The following properties of the adjoint map are relatively easy to show.

Lemma 6.4. (Properties of the adjoint)


(i) For a given linear map f the adjoint f † is uniquely determined.
(ii) (f † )† = f
(iii) (f + g)† = f † + g †
(iv)(αf )† = α∗ f †
(v) (f ◦ g)† = g † ◦ f †
(vi) (f −1 )† = (f † )−1 , if f is invertible.

100
Proof. (i) For two adjoints f1 , f2 for f we have hf1 (v), wi = hv, f (w)i = hf2 (v), wi for all v, w ∈ V .
Then Lemma 6.2 implies that f1 = f2 .
(ii) h(f † )† (v), wi = hv, f † (w)i = hf (v, wi. Comparing the LHS and RHS together with Lemma 6.2 shows
that (f † )† = f .
(iii) h(f +g)† (v), wi = hv, (f +g)(w)i = hv, f (w)i+hv, g(w)i = hf † (v), wi+hg † (v), wi = h(f † +g † )(v), wi
and the claim follows from Lemma 6.2.
(iv) h(αf )† (v), wi = hv, (αf )(w)i = αhv, f (w)i = αhf † (v), wi = h(α∗ f † )(v), wi and Lemma 6.2 leads to
the stated result.
(v) h(f ◦ g)† (v), wi = hv, (f ◦ g)(w)i = hf † (v), g(w)i = hg † ◦ f † (v), wi.
(vi) From (v) we have idV = (f ◦ f −1 )† = f † ◦ (f −1 )† . This means (f −1 )† is the inverse of f † and, hence,
(f † )−1 = (f −1 )† .

Let us now proceed in a more practical way and understand the adjoint map relative to an ortho-
normal basis 1 , . . . , n of V . From Eq. (6.24) the matrices A, B describing f and f † relative to this basis
are given by
Aij = hi , f (j )i , Bij = hi , f † (j )i . (6.30)
Using the scalar product property (S1) in Def. (6.1) these matrices are related by

Bij = hi , f † (j )i = hf † (j ), i i∗ = hj , f (i )i∗ = A∗ji =⇒ B = A† , (6.31)

that is, if A represents f then the hermitian conjugate A† represents f † . This also shows that, by reversing
the above argument and defining f † as the linear map associated to A† , that the adjoint always exists -
this was not immediately clear from the definition.
Previously, we have introduced hermitian conjugation merely as a “mechanical” operation to be carried
out for matrices. Now we understand its proper mathematical context - it leads to the matrix which
describes the adjoint linear map. In the case of a real scalar product we can of course drop the complex
conjugation in Eq. (6.31) and the matrix describing the adjoint becomes AT , the transpose of A. Hence,
we have also found a mathematical interpretation for the transposition of matrices.
We have seen in Eq. (6.22) that, with respect to on ortho-normal basis, a scalar product is described
by the standard (real or complex) scalar product on Rn or Cn . It is, therefore, clear that the adjoint of
a matrix A with respect to the standard scalar product is given by its hermitian conjugate, A† (or AT
in the real case). This is easy to verify explicitly from the definition of the standard scalar product in
Example 6.1.
hv, Awi = v† Aw = (A† v)† w = hA† v, wi . (6.32)
A particularly important class of linear maps are those which coincide with their adjoint.

Definition 6.4. A linear map f : V → V on a vector space V with scalar product is called self-adjoint
(or hermitian) iff f = f † .

In other words, self-adjoint maps can be moved from one argument of the scalar product into another, so

hv, f (w)i = hf (v), wi ⇐⇒ hv, f (w)i = hw, f (v)i∗ . (6.33)

Clearly, relative to an ortho-normal basis, a self-adjoint linear map is described by a hermitian matrix (or
a symmetric matrix in the real case). Further, the self-adjoint linear maps on Rn (Cn ) with respect to the
standard scalar product are the symmetric (hermitian) matrices.

Example 6.6: A self-adjoint derivative map

101
For a more abstract example of a self-adjoint linear map, consider the vector space of (infinitely many
times) differentiable functions ϕ : [a, b] → C, satisfying ϕ(a) = ϕ(b), with scalar product
Z b
hϕ, ψi = dx ϕ(x)∗ ψ(x) .
a
The derivative operator
d
D = −i
dx
defines a linear map on this space and we would like to check that it is self-adjoint. Performing an
integration by parts we find
Z b Z b Z b
∗ dψ ∗ b dϕ ∗
hϕ, Dψi = −i dx ϕ(x) (x) = −i [ϕ(x) ψ(x)]a + i dx (x) ψ(x) = dx (Dϕ)(x)∗ ψ(x)
a dx a dx a
= hDϕ, ψi .
Hence, D is indeed hermitian. Note that the boundary term vanishes due to the boundary condition
on our functions and that including the factor of i in the definition of D is crucial for the sign to work
out correctly. In quantum mechanics physical quantities are represented by hermitian operators. In this
context, the present operator D plays an important role as it corresponds to linear momentum.

6.4 Orthogonal and unitary maps


Another important class of linear maps which relate to scalar product in a particular way are orthogonal
and unitary maps. They are the linear maps which leave a scalar product unchanged in the sense of
Definition 6.5. Let V be a vector space with a real (hermitian) scalar product. A linear map f : V → V
is called orthogonal (unitary) iff
hf (v), f (w)i = hv, wi (6.34)
for all v, w ∈ V .
In particular, orthogonal or unitary maps f leave lengths of vectors unchanged, so |f (v)| = |v|. In the
real case, we can use the scalar product to defines angles between vectors as in Eq. (6.6), so orthogonal
maps f leave such angles unchanged, that is, ^(v, w) = ^(f (v), f (w)).
Lemma 6.5. (Properties of unitary maps)
(i) Unitary maps f can also be characterized by f † ◦ f = idV .
(ii) Unitary maps f are invertible and f −1 = f † .
(iii) The composition of unitary maps is a unitary map.
(iv) The inverse, f † , of a unitary map f is unitary.
Proof. (i) Using the adjoint map, the condition (6.34) can be re-written as hv, f † ◦ f (w)i = hv, idV (w)i.
From Lemma 6.2 a function is uniquely determined by its matrix elements so that orthogonal and unitary
operators can also be defined by the condition
f † ◦ f = idV . (6.35)
(ii) A direct consequence of (i).
(iii) For two unitary maps f , g, satisfying hf (v), f (w)i = hv, wi and hg(v), g(w)i = hv, wi, it follows that
hf ◦ g(v), f ◦ g(w)i = hf (v), f (w)i = hv, wi and hence, that f ◦ g is unitary.
(iv) From hf (v), f (w)i = hv, wi, writing v0 = f (v), w0 = f (w) it follows that hv0 , w0 i = hf −1 (v0 ), f −1 (w0 )i
so that f −1 = f † is unitary.

102
As before, it is useful to work out what this means relative to an ortho-normal basis. If f is described by
a matrix A relative to this basis then we already know that f † is described by the hermitian conjugate
A† in the complex case or by the transpose AT in the real case.

We begin with the real case where the condition (6.35) turns into

AT A = 1 ⇐⇒ A−1 = AT ⇐⇒ Ai · Aj = δij . (6.36)

Matrices A satisfying this condition are called orthogonal matrices and they can be characterized, equiv-
alently, by either one of the three conditions above. The simplest way to check if a given matrix is
orthogonal is usually to verify the condition on the LHS. The condition in the middle tells us it is easy to
compute the inverse of an orthogonal matrix - it is simply the transpose. And, finally, the condition on
the RHS says that the column vectors of an orthogonal matrix form an ortho-normal basis with respect
to the standard scalar product (the dot product). In fact, since a real scalar product, written in terms of
an ortho-normal basis, corresponds to the dot product, see Eq. (6.22), we expect that orthogonal matrices
are precisely those matrices which leave the dot product invariant. Indeed, we have

AT A = 1n ⇐⇒ (Av)T (Aw) = vT w for all v, w ∈ Rn . (6.37)

The set of all n × n orthogonal matrices is also denoted by O(n). Taking the determinant of the LHS con-
dition in (6.36) and using Lemma 5.1 and Theorem 5.1 gives 1 = det(1) = det(AAT ) = det(A) det(AT ) =
det(A)2 so that
det(A) = ±1 (6.38)
for any orthogonal matrix. The subset of n × n orthogonal matrices A with determinant det(A) = +1
is called special orthogonal matrices or rotations and denoted by SO(n). Note that the term “rotation”
is indeed appropriate for those matrices. Since they leave the dot product invariant they do not change
lengths of vectors and angles between vectors and the det(A) = +1 conditions excludes orthogonal ma-
trices which contain reflections. The relation between orthogonal matrices with positive and negative
determinants is easy to understand. Consider an orthogonal matrix A with det(A) = −1 and the specific
orthogonal matrix F = diag(−1, 1, . . . , 1) with det(F ) = −1 which corresponds to a reflection in the first
coordinate direction. Then the matrix R = F A is a rotation since det(R) = det(F ) det(A) = (−1)2 = 1.
This means every orthogonal matrix A can be written as a product

A = FR (6.39)

of a rotation R and a reflection F . To get a better feeling for rotations we should look at some low-
dimensional examples.

Example 6.7: Rotations in two and three dimensions


(a) Two dimensions
To find the explicit form of two-dimensional rotation matrices we start with a general 2 × 2 matrix
 
a b
R= ,
c d

where a, b, c, d are real numbers and impose the conditions RT R = 12 and det(R) = 1. This gives
 2
a + c2 ab + cd
  
T ! 1 0 !
R R= 2 2 = , det(R) = ad − bc = 1 ,
ab + cd b + d 0 1

103
and, hence, the equations a2 + c2 = b2 + d2 = 1, ab + cd = 0 and ad − bc = 1. It is easy to show that a
solution to these equations can always be written as a = d = cos(θ), c = −b = sin(θ), for some angle θ so
that two-dimensional rotation matrices can be written in the form
 
cos θ − sin θ
R(θ) = . (6.40)
sin θ cos θ
For the rotation of an arbitrary vector x = (x, y)T we get
 
0 x cos θ − y sin θ
x = Rx = . (6.41)
x sin θ + y cos θ
It is easy to verify explicitly that |x0 | = |x|, as must be the case, and that the cosine of the angle between
x and x0 is given by
x0 · x (x cos θ − y sin θ)x + (x sin θ + y cos θ)y
cos(^(x0 , x)) = 0
= = cos θ . (6.42)
|x ||x| |x|2
This result means we should interpret R(θ) as a rotation by an angle θ. From the addition theorems of
sin and cos it also follows easily that
R(θ1 )R(θ2 ) = R(θ1 + θ2 ) , (6.43)
that is, the rotation angle adds up for subsequent rotations, as one would expect. Note, Eq. (6.43)
also implies that two-dimensional rotations commute, since R(θ1 )R(θ2 ) = R(θ1 + θ2 ) = R(θ2 + θ1 ) =
R(θ2 )R(θ1 ), again a property intuitively expected.
(b) Three dimensions
To find the explicit form for three-dimensional rotations we could, in principle, use the same approach as
in two dimensions and impose all relevant constraints on an arbitrary 3 × 3 matrix. However, this leads
to a set of equations in 9 variables and is much more complicated. However, it is easy to obtain special
three-dimensional rotations from two-dimensional ones. For example, the matrices
 
1 0 0
R1 (θ1 ) =  0 cos θ1 − sin θ1  (6.44)
0 sin θ1 cos θ1
clearly satisfy R1 (θ1 )T R1 (θ1 ) = 13 and det(R1 (θ1 )) = 1 and are, hence, rotation matrices. They describe
a rotation by an angle θ1 around the first coordinate axis. Analogously, rotation matrices around the
other two coordinate axis can be written as
   
cos θ2 0 − sin θ2 cos θ3 − sin θ3 0
R2 (θ2 ) =  0 1 0  , R3 (θ3 ) =  sin θ3 cos θ3 0  . (6.45)
sin θ2 0 cos θ2 0 0 1
It turns out that general three-dimensional rotation matrices can be obtained as products of the above
three special types. For example, we can write a three-dimensional rotation matrix as R(θ1 , θ2 , θ3 ) =
R1 (θ1 )R2 (θ2 )R3 (θ3 ), that is, as subsequent rotations around the three coordinate axis. Of course, there are
different ways of doing this, another choice frequently used in physics being R(ψ, θ, φ) = R3 (ψ)R1 (θ)R3 (φ).
The angles ψ, θ, φ in this parametrization are also called the Euler angles and in this case, the rotation
is combined from a rotation by φ around the z-axis, then a rotation by θ around the x-axis and finally
another rotation by ψ around the (new) z-axis. The Euler angles are particularly useful to describe the
motion of tops in classical mechanics.
Finally, we note that, unlike their two-dimensional counterparts, three-dimensional rotations do not,
in general, commute. For example, apart from special choices for the angles R1 (θ1 )R2 (θ2 ) 6= R2 (θ2 )R1 (θ1 ).

104
Application: Rotating physical systems
Suppose we have a stationary coordinate system with coordinates x ∈ R3 and another coordinate system
with coordinates y ∈ R3 , which is rotating relative to the first one. Such a set-up can be used to describe
the mechanics of objects in rotating systems and has many applications, for example to the physics of
tops or the laws of motion in rotating systems such as the earth (see below). Mathematically, the relation
between these two coordinate system can be described by the equation

x = R(t)y , (6.46)

where R(t) are time-dependent rotation matrices. This means the matrices R(t) satisfy

R(t)T R(t) = 13 , (6.47)

(as well as det(R(t)) = 1)) for all times t. In practice, we can write rotation matrices in terms of rotation
angles, as we have done in Example 6.7. The time-dependence of R(t) then means that the rotation angles
are functions of time. For example, a rotation around the z-axis with constant angular speed ω can be
written as  
cos(ωt) − sin(ωt) 0
R(t) =  sin(ωt) cos(ωt) 0  . (6.48)
0 0 1
In physics, a rotation is often described by the angular velocity ω, a vector whose direction indicates the
axis of rotation and whose length gives the angular speed. It is very useful to understand the relation
between R(t) and ω. To do this, define the matrix

W = RT Ṙ , (6.49)

where the dot denotes the time derivative and observe, by differentiating Eq. (6.47) with respect to time,
that
T T
| {zṘ} + Ṙ
R | {zR} = 0 . (6.50)
=W =W T
Hence, W is an anti-symmetric matrix and can be written in the form
 
0 −ω3 ω2
W =  ω3 0 −ω1  or Wij = ikj ωk . (6.51)
−ω2 ω1 0

The three independent entries ωi of this matrix define the angular velocity ω = (ω1 , ω2 , ω3 )T . To see that
this makes sense consider the example (6.48) and work out the matrix W .
    
cos(ωt) sin(ωt) 0 − sin(ωt) − cos(ωt) 0 0 −ω 0
W = ω  − sin(ωt) cos(ωt) 0   cos(ωt) − sin(ωt) 0  =  ω 0 0  . (6.52)
0 0 1 0 0 0 0 0 0

Comparison with the general form (6.51) of W then shows that the angular velocity for this case is given
by ω = (0, 0, ω), indicating a rotation with angular speed ω around the z-axis, as expected.
In Example 3.8 we have seen that the multiplication of an anti-symmetric 3 × 3 matrix with a vector
can be written as a cross-product, so that

Wb = ω × b (6.53)

105
for any vector b = (b1 , b2 , b3 )T . This can also be directly verified using the matrix form of W together with
the definition (2.17) of the cross product or, more elegantly, by the index calculation Wij bj = ikj ωk bj =
(ω ×b)i , using the index form (2.29) of the cross product. This relation can be used to re-write expressions
involving W in terms of the angular velocity ω.
For a simple application of this formalism, consider an object moving with velocity ẏ relative to
the rotating system. What is its velocity relative to the stationary coordinate system? Differentiating
Eq. (6.46) gives
ẋ = Rẏ + Ṙy = R (ẏ + W y) = R (ẏ + ω × y) . (6.54)
The velocity ẋ in the stationary system has, therefore, two contribution, namely the velocity ẏ relative to
the rotating system and the velocity ω × y due to the rotation itself.

We now turn to the complex case. In this case, from Eq. (6.35), (complex) matrices A describing unitary
maps relative to an ortho-normal basis are characterized by the three equivalent conditions

A† A = 1 ⇐⇒ A−1 = A† ⇐⇒ (Ai )† Aj = δij . (6.55)

Matrices satisfying these conditions are called unitary. As for orthogonal matrices, checking whether a
given matrix is unitary is usually easiest accomplished using the condition on the LHS. The condition in
the middle states that the inverse of a unitary matrix is simply its hermitian conjugate and the condition
on the RHS says that the column vectors of a unitary matrix form an ortho-normal basis under the
standard hermitian scalar product on Cn . Unitary matrices are precisely those matrices which leave the
standard hermitian scalar product invariant, explicitly

A† A = 1n ⇐⇒ (Av)† (Aw) = v† w for all v, w ∈ Cn . (6.56)

The set of all n × n unitary matrices is denoted by U (n). Orthogonal matrices (being real) also satisfy
the condition for unitary matrices so O(n) ⊂ U (n). For the determinant of unitary matrices we conclude
that 1 = det(1) = det(A† A) = det(A)∗ det(A) = | det(A)|2 . Hence, the determinant of unitary matrices
has complex modulus 1, so
| det(A)| = 1 . (6.57)
The unitary matrices U with det(U ) = 1 are called special unitary matrices, and the set of these matrices
is denoted by SU (n). Clearly, rotations are also special unitary so SO(n) ⊂ SU (n). For an arbitrary
unitary n × n matrix A we can always find a complex phase ζ such that ζ n = det(A). Then, the matrix
U = ζ −1 A is special unitary since det(U ) = det(ζ −1 A) = ζ −n det(A) = 1. This means every unitary
matrix A can be written as a product
A = ζU (6.58)
of a special unitary matrix U and a complex phase ζ.

Example 6.8: Special unitary matrices in two dimensions


We can find all two-dimensional special unitary matrices by using the same method as for two-dimensional
rotations. We start with an arbitrary complex 2 × 2 matrix
 
α β
U= ,
γ δ

where α, β, γ, δ are complex numbers and impose the conditions U † U = 12 and det(U ) = 1. After a
short calculation we find that every two-dimensional special unitary matrix can be written in terms of

106
two complex numbers α, β as
 
α β
U= where |α|2 + |β|2 = 1 . (6.59)
−β α∗

This shows that two-dimensional special unitary matrices depend on two complex parameters α, β subject
to the (real) constraint |α|2 + |β|2 = 1 and, hence, on three real parameters. Inserting the special
choice α = cos θ, β = − sin θ into (6.59) we recover the two-dimensional rotation matrices (6.7), so that
SO(2) ⊂ SU (2), as expected from our general discussion.

The general study of orthogonal and unitary matrices is part of the theory of Lie groups, a more advanced
mathematical discipline which is beyond the scope of this introductory text.
Orthogonal and unitary matrices have numerous applications in physics which we would like to illus-
trate with an example from classical mechanics.

Application: Newton’s law in a rotating system


Newton’s law for the motion x = x(t) of a mass point with mass m under the influence of a force F reads

mẍ = F , (6.60)

where the dot denotes the derivative with respect to time t. We would like to work out the form this law
takes if we transform it to rotating coordinates y, related to the original, non-rotating coordinates x by

x = R(t)y . (6.61)

Here R(t) is a (generally time-dependent) rotation, that is, a 3 × 3 matrix satisfying

R(t)T R(t) = 13 (6.62)

for all times t. For example, such a version of Newton’s law is relevant to describing mechanics on earth.
To re-write Eq. (6.60) in terms of y we first multiply both sides with RT = R−1 so that

mRT ẍ = FR , (6.63)

with FR := RT F the force in the rotating coordinate system. If the rotation matrix is time-independent
it can be pulled through the time derivatives on the LHS of Eq. (6.63) and we get mÿ = FR . This simply
says that Newton’s law keeps the same form in any rotated (but not rotating!) coordinate system.
If R is time-dependent so that the system with coordinates y is indeed rotating relative to the coor-
dinate system x we have to be more careful. Taking two time derivatives of Eq. (6.61) gives

ẋ = Rẏ + Ṙy , ẍ = Rÿ + 2Ṙẏ + R̈y . (6.64)

Using the second of these equations to replace ẍ in Eq. (6.63) leads to

mÿ = FR − 2mRT Ṙẏ − mRT R̈y . (6.65)

Compared to Newton’s equation in the standard form (6.60) we have acquired the two additional terms on
the RHS which we should work out further. From Eq. (6.49), recall the definition W = RT Ṙ and further
note that Ẇ = RT R̈ + ṘT Ṙ = RT R̈ + (ṘT R)(RT Ṙ) = RT R̈ − W 2 , so that

RT R̈ = Ẇ + W 2 . (6.66)

107
With these results we can re-write Newton’s equation (6.65) as

mÿ = FR − 2mW ẏ − mW 2 y − mẆ y . (6.67)

Also, recall that the matrix W is anti-symmetric, encodes the angular velocity ω, as in Eq. (6.51) and its
action on vectors can be re-written as a cross product with the angular velocity ω (see Eq. (6.53)). Then,
Newton’s equation (6.67) in a rotating system can be written in its final form

mÿ = FR −2mω × ẏ −mω × (ω × y) −2mω̇ × y . (6.68)


| {z } | {z } | {z }
Coriolis force centrifugal force Euler force

The three terms on the RHS represent the additional forces a mass point experiences in a rotating system.
The centrifugal force is well-known. The Coriolis force is proportional to the velocity, ẏ, and, hence,
vanishes for mass points which rest in the rotating frame. It is, for example, responsible for the rotation
of a Faucault pendulum. Finally, the Euler force is proportional to the angular acceleration, ω̇. For the
earth’s rotation, ω is approximately constant so the Euler force is quite small in this case.

6.5 Dual vector space


We have seen in Lemma 3.2 that the linear maps f : V → W , between two vector spaces V and W over
F , form a vector space themselves. An important special case is the set of all linear maps ϕ : V → F ,
where we consider the field F as a (trivial, one-dimensional) vector space. Such linear maps are also called
linear functionals and the set of all linear functionals is called the dual vector space V ∗ of V .
Definition 6.6. For a vector space V over F the dual vector space V* is the set of all linear maps V →
F (where F is seen as a one-dimensional vector space.) The elements of V* are called linear functionals.

Example 6.9: Examples of linear functionals


(a) For V = Rn and a fixed vector w ∈ V we can define ϕw ∈ (Rn )∗ by

ϕw (v) = wT v ∈ R . (6.69)

It is clear that ϕw is a linear functional. Indeed all linear functionals in (Rn )∗ are of this form. To see
this, start with an arbitrary ϕ ∈ (Rn )∗ and define the vector w with components wi = ϕ(ei ). Then
!
X X X
ϕ(v) = ϕ vi ei = vi ϕ(ei ) = wi vi = wT v = ϕw (v) . (6.70)
i i i

Hence, ϕ = ϕw and we have written an arbitrary functional in the form (6.69). This result means we can
think of the functionals on Rn as n-dimensional row vectors.
(b) For the vector space of continuous functions h : [a, b] → R the integral
Z b
I(h) = dx h(x) (6.71)
a

is a linear functional. Another interesting functional on the same vector space is

δx0 (h) := h(x0 ) , (6.72)

where x0 ∈ [a, b] is a fixed point. In the physics literature this functional is also called Dirac delta function.

108
We know that a linear map f : V → W , with n = dim(V ) and m = dim W , is described by an m × n
matrix relative to a choice of basis on V and W . For W = F , we have m = dim(W ) = dim(F ) = 1, so
relative to a basis on V , linear functionals are described by 1 × n matrices, that is, by row vectors. So,
for a choice of basis, we can think of the vector space V as consisting of column vectors and its dual V ∗
as consisting of row vectors. To make this more precise we prove the following

Theorem 6.2. For a basis 1 , ..., n of V there is a basis 1∗ , ..., n∗ of V*, called the dual basis, such that

i∗ (j ) = δji . (6.73)

In particular, dim(V ∗ ) = dim(V ).

Proof. Recall from Example 3.4, that we can define a coordinate map ψ(α) = i αi i which assigns to a
P
coordinate vector α = (α1 , . . . , αn )T the corresponding vector, relative to the chosen basis i . We define

i∗ (v) := eTi ψ −1 (v) , (6.74)

and claim that this provides the correct dual basis. First we check

i∗ (j ) = ei T ψ −1 (j ) = ei T ej = δij . (6.75)

To verify that the i∗ form a basis we first check linear independence. Applying i βi i∗ = 0 to j and
P
using Eq. (6.75) shows immediately that βj = 0, so that the i∗ are indeed linearly
P i independent. To see
∗ ∗
that they span V start with an arbitrary functional ϕ ∈ V and a vector v = i v i . Then
!
X X X X
ϕ(v) = ϕ v i i = v i ϕ(i ) = ϕi v i = ϕi i∗ (v) . (6.76)
| {z }
i i i i
:=ϕi ∈F

i
P
This means ϕ = i ϕi ∗ so that we have written an arbitrary functional ϕ as a linear combination of the
i∗ .

To summarize the discussion, for a basis {i } and its dual basis {i∗ } we can write vectors and dual
vectors as in the following table.

vectors in V dual vectors in V ∗


vectors v = v i i ϕ = ϕj j∗ (6.77)
coordinates vi ϕj

You have probably noticed that we have quietly refined our index convention. Vector space basis elements
have lower indices and their coordinates have upper indices while the situation is reversed for dual vectors.
For one, this allows us to decide the origin of coordinate vectors simply by the position of their index -
for an upper index, v i , we refer to vectors and for a lower index, ϕj to dual vectors. From Eq. (6.73), the
action of dual vectors on vectors can be written as
X
ϕ(v) = ϕi v j i∗ (j ) = ϕi v i , (6.78)
| {z }
i,j
=δji

so, as a simple summation over their indices, also referred to as contraction. Note that this corresponds to
a refined Einstein summation convention where the same lower and upper index are being summed over.

109
From here it is only a few steps to defining tensors. For the curious, a basic introduction into tensors can
be found in Appendix C.
Finally, we would like to have a look at the relation of dual vector spaces and scalar products. In fact,
for reasons which will become clear, we keep the discussion slightly more general and consider symmetric
bi-linear forms with an additional property:
Definition 6.7. A symmetric bi-linear form h · , · i on a (real) vector space V is called non-degenerate if
hv, wi = 0 for all w ∈ V implies that v = 0.
Note that a real scalar product is non-degenerate since already hv, vi = 0 implies that v = 0. Intuitively,
non-degeneracy demands that there is no vector which is orthogonal to all other vectors. It turns out that
a non-degenerate symmetric bi-linear form allows for a “natural” identification of a vector space and its
dual. This is the content of the following
Lemma 6.6. Let V be a real vector space with a symmetric bi-linear form h · , · i and define the map
ı : V → V ∗ by ı(v)(w) = hv, wi. Then we have

h · , · i non-degenerate ⇐⇒ ı is an isomorphism . (6.79)


Proof. The map ı is certainly linear, given the linearity of the bi-linear form in the first argument. Since
dim(V ) = dim(V ∗ ) and from Claim 3.1, ı is bijective precisely when Ker(ı) = {0}. This is the same as
saying that ı(v)(w) = hv, wi = 0 for all w implies that v = 0 which is indeed precisely the definition of
non-degeneracy.

It is useful to work this out in a basis {i } of V with dual basis {i∗ } of V ∗ . To do this we first introduce
the symmetric matrix g, also called the metric tensor or metric in short, with entries

gij = hi , j i . (6.80)

We would like to work out the matrix which represents the map ı relative to our basis choice. This means
we should look at the images of the basis vectors, so ı(i )(j ) = hi , j i = gij = gki k∗ (j ). Stripping off
the basis vector j we have
ı(i ) = gji j∗ , (6.81)
and, by comparison with Eq. (3.80), we learn that ı is represented by the metric g. If the bi-linear form
is non-degenerate, so that ı is bijective, then g is invertible. The components of g −1 are usually denoted
by g ij , so that
g ij gjk = δki . (6.82)
In the physics literature it is common
P to use the same symbol for
Pthe components of a vector and the dual
i i
vector, related under ı. So if v = i v i then we write ı(v) = i vi ∗ . Since g represents ı this means

vi = gij v j , v i = g ij vj . (6.83)

Physicists refer to these equations by saying that we can “lower and raise indices” with the metric gij and
its inverse g ij . Mathematically, they are simply a component version of the isomorphism ı between V and
V ∗ which is induced from the non-degenerate bi-linear form. With this notation, the bi-linear form on
two vector v = i v i i and w = j wj j can be written as
P P

hv, wi = gij v i wj = v i wi = vj wj . (6.84)

Application: Minkowski product in R4

110
The Minkowski product has already been introduced in Example 6.1 (c). For two four-vectors v, w ∈ R4
and η = diag(−1, 1, 1, 1) the symmetric bi-linear form is defined by

hv, wi = vT ηw . (6.85)

It is customary in this context to label coordinates by Greek indices as µ, ν, . . . = 0, 1, 2, 3. The metric


tensor with respect to the basis of standard unit vectors in R4 , µ = eµ is

gµν = heµ , eν i = eTµ ηeν = ηµν , (6.86)

so is simply given by η. Since η is invertible this also shows, from Lemma 6.6, that the Minkowski product
is non-degenerate. From Eq. (6.83) lowering and raising of indices then takes the form

vµ = ηµν v ν , v µ = η µν vν , (6.87)

and, from Eq. (6.84), the Minkowski product can be written as

vT ηw = ηµν v µ wν = v µ wµ = vν wν . (6.88)

All these equations are part of the standard covariant formulation of special relativity.
We can go one step further and ask about the linear transformations Λ : R4 → R4 which leave the
Minkowski product invariant, that is, which satisfy

hΛv, Λwi = hv, wi ⇐⇒ ΛT ηΛ = η ⇐⇒ Λµ ρ Λν σ ηµν = ηρσ . (6.89)

Note that these linear transformations, which are referred to as Lorentz transformations, relate to the
Minkowski product in the same way that orthogonal linear maps relate to the standard scalar product on
Rn (see Section 6.4). In Special Relativity the linear transformation
µ
x → x0 = Λx ⇐⇒ xµ → x0 = Λµ ν xν (6.90)

generated by Λ is interpreted as a transformation from one inertial system with space-time coordinates
x = (t, x, y, z)T to another one with space-time coordinates x0 = (t0 , x0 , y 0 , z 0 )T .

Lorentz transformations have a number of interesting properties which follow immediately from their
definition (6.89). Taking the determinant of the middle equation (6.89) and using standard properties of
the determinant implies that (det(Λ))2 = 1 so that

det(Λ) = ±1 (6.91)
P3
Further, the ρ = σ = 0 component of the last Eq. (6.89) reads −(Λ0 0 )2 + i 2
i=1 (Λ 0 ) = −1 so that

Λ0 0 ≥ 1 or Λ0 0 ≤ −1 . (6.92)

Combining the two sign ambiguities in Eqs. (6.91) and (6.92) we see that there are four types of Lorentz
transformations. The sign ambiguity in the determinant is analogous to what we have seen for orthogonal
matrices and its interpretation is similar to the orthogonal case. Lorentz transformation with determinant
1 are called “proper” Lorentz transformations while Lorentz transformations with determinant −1 can be
seen as a combination of a proper Lorentz transformation and a reflection. More specifically, consider the
special Lorentz transformation P = diag(1, −1, −1, −1) (note that this matrix indeed satisfies Eq. (6.89))
which is also referred to as “parity”. Then every Lorentz transformation Λ can be written as

Λ = P Λ+ , (6.93)

111
where Λ+ is a proper Lorentz transformation. The sign ambiguity (6.92) in Λ0 0 is new but has an obvious
physical interpretation. Under a Lorentz transformations Λ with Λ0 0 ≥ 1 the sign of the time component
x0 = t of a vector x remains unchanged, so that the direction of time is unchanged. Correspondingly,
such Lorentz transformation with positive Λ0 0 are called “ortho-chronous”. On the other hand, Lorentz
transformations Λ with Λ0 0 ≤ −1 change the direction of time. If we introduce the special Lorentz
transformation T = diag(−1, 1, 1, 1), also referred to as “time reversal”, then every Lorentz transformation
Λ can be written as
Λ = T Λ↑ , (6.94)
where Λ↑ is an ortho-chronous Lorentz transformation. Combining the above discussion, we see that every
Lorentz transformation Λ can be written in one of four ways, namely
 ↑

 Λ+ for det(Λ) = 1 and Λ0 0 ≥ 1
P Λ↑+ for det(Λ) = −1 and Λ0 0 ≥ 1


Λ= ↑ , (6.95)

 P T Λ+ for det(Λ) = 1 and Λ0 0 ≤ −1
T Λ↑+

for det(Λ) = −1 and Λ0 0 ≤ −1

where Λ↑+ is a proper, ortho-chronous Lorentz transformation. The Lorentz transformations normally
used in Special Relativity are the proper, ortho-chronous Lorentz transformations. However, the other
Lorentz transformations are relevant as well and it is an important question as to whether they constitute
symmetries of nature in the same way that proper, ortho-chronous Lorentz transformations do. More to
the point, the question is whether nature respects parity P and time-reversal T .

What do proper, ortho-chronous Lorentz transformations look like explicitly? To answer this question we
basically have to solve Eq. (6.89) which is clearly difficult to do in full generality. However, some special
Lorentz transformations are more easily obtained. First, we note that matrices of the type
 
1 0
Λ= (6.96)
0 R

where R is a three-dimensional rotation matrix are proper, ortho-chronous Lorentz transformations. In-
deed, such matrices satisfy Eq. (6.89) by virtue of RT R = 13 and we have det(Λ) = det(R) = 1 and
Λ0 0 = 1. In other words, regular three-dimensional rotations in the spatial directions are proper, ortho-
chronous Lorentz transformations.

To find less trivial examples we start with the Ansatz


   
Λ2 0 a b
Λ= , Λ2 = (6.97)
0 12 c d

of a two-dimensional Lorentz transformation which affects time and the x-coordinate, but leaves y and z
unchanged. Inserting this Ansatz into Eq. (6.89) and, in addition, requiring that det(Λ2 ) = 1 for proper
Lorentz transformations, leads to

a2 − c2 = 1 , d2 − b2 = 1 , ab − cd = 0 , ad − cb = 1 . (6.98)

Also demanding Λ0 0 = a ≥ 1, so that Λ is ortho-chronous, this set of equations is solved by a = cosh(ξ)


and b = sinh(ξ) for a real number ξ. Hence, our two-dimensional Lorentz transformation can be written
as  
cosh(ξ) sinh(ξ)
Λ2 (ξ) = . (6.99)
sinh(ξ) cosh(ξ)

112
Note the close analogy of this form with two-dimensional rotations in Example 6.7 (a). The quantity ξ is
also called “rapidity”. It follows from the addition theorems for hyperbolic functions that Λ(ξ1 )Λ(ξ2 ) =
Λ(ξ1 + ξ2 ), so rapidities add up in the same way that two-dimensional rotation angles do. For a more
common parametrisation introduce the parameter β = tanh(ξ) ∈ [−1, 1] so that
1
cosh(ξ) = p =: γ , sinh(ξ) = βγ . (6.100)
1 − β2

In terms of β and γ the two-dimensional Lorentz transformations can then be written in the more familiar
form  
γ βγ
Λ2 = . (6.101)
βγ γ
Here, β is interpreted as the relative speed of the two inertial systems (in units of the speed of light).

7 Eigenvectors and eigenvalues


In Section 3, we have seen that a linear map f : V → V is represented by a matrix A, relative to a choice
of basis on V . For a different basis, the same linear map is represented by another matrix A0 , related to
A by the basis transformation A0 = P AP −1 . This suggests an obvious problem. How can we find a basis
for which the representing matrix of the linear map is particularly simply, for example diagonal? As we
will see, eigenvectors and eigenvalues are the key to solving this problem. Eigenvectors and eigenvalues
have numerous applications in mathematics and physics some of which will be discussed towards the end
of the section.

7.1 Basic ideas


Recall from Eq. (3.80) that the matrix representing a linear map is computed by writing the images of the
basis vectors as linear combinations of the basis, with the coefficients from each image forming a column
of the matrix. Suppose that the image of a basis vector v is simply a multiple of itself, so f (v) = λv for
some number λ. In this case, the corresponding column of the representing matrix only has one non-zero
entry, λ, in the diagonal. Hence, for such basis vectors, the representing matrix becomes simple. This
observation motivates the following

Definition 7.1. For a linear map f : V → V on a vector space V over F the number λ ∈ F is called an
eigenvalue of f if there is a non-zero vector v such that

f (v) = λv . (7.1)

In this case, v is called an eigenvector of f with eigenvalue λ.

In short, an eigenvector is a vector which is just “scaled” by the action of a linear map.
How can we find eigenvalues and eigenvectors of a linear map? To discuss this we first introduce the
idea of eigenspaces. The eigenspace for λ ∈ F is defined by

Eigf (λ) := Ker(f − λ idV ) , (7.2)

and, hence, from Eq. (7.1) it “collects” all eigenvectors for λ. Being the kernel of a linear map, an
eigenspace is of course a sub vector space of V . Evidently, λ is an eigenvalue of f precisely when
dim Eigf (λ) > 0. If dim Eigf (λ) = 1 the eigenvalue λ is called non-degenerate (up to re-scaling there

113
is only one eigenvector for λ) and if dim Eigf (λ) > 1 the eigenvalue λ is called degenerate (there are at
least two linearly independent eigenvectors for λ).

We see that λ is an eigenvalue of f precisely when Ker(f − λ idV ) is non-trivial. From Lemma 3.1 this
is the same as saying that f − λ idV is not invertible which is equivalent to det(f − λ idV ) = 0, using
Lemma 5.1. So in summary

λ eigenvalue of f ⇐⇒ Ker(f − λ idV ) 6= {0} ⇐⇒ det(f − λ idV ) = 0 . (7.3)

This leads to an explicit method to calculate eigenvalues and eigenvectors which we develop in the next
sub-section.

7.2 Characteristic polynomial


Definition 7.2. The characteristic polynomial of a linear map f : V → V is defined by

χf (λ) := det(f − λ idV ) . (7.4)

For an n-dimensional vector space V the characteristic polynomials χf (λ) is a polynomial of order n in λ
whose coefficients depend on f . Clearly, from Eq. (7.3), the eigenvalues of f are precisely the zeros of its
characteristic polynomial. So schematically, eigenvalues and eigenvectors of f can be computed as follows.

1. Compute the characteristic polynomial χf (λ) = det(f − λ idV ) of f .

2. Find the zeros, λ, of the characteristic polynomial. They are the eigenvalues of f .

3. For each eigenvalue λ compute the eigenspace Eigf (λ) = Ker(f − λ idV ) by finding all vectors v
which solve the equation
(f − λ idV )(v) = 0 . (7.5)

Example 7.1: Computing eigenvalues and eigenvectors


For V = R3 , we would like to compute the eigenvalues and eigenvectors of the matrix
 
1 −1 0
A =  −1 2 −1  . (7.6)
0 −1 1

The characteristic polynomial is


 
1 − λ −1 0
χA (λ) = det  −1 2 − λ −1  = λ(λ − 1)(λ − 3) , (7.7)
0 −1 1 − λ

so we have three eigenvalues λ1 = 0, λ2 = 1 and λ3 = 3. Writing v = (x, y, z)T , we compute the


eigenvectors for each of these eigenvalues in turn.
λ1 = 0:
    
1 −1 0 x x−y
(A − 01)v =  −1
!
2 −1   y  =  −x + 2y − z  = 0 ⇐⇒ x=y=z
0 −1 1 z −y + z

114
Hence, up to scaling, there is only one eigenvector so the eigenvalue is non-degenerate. Normalizing the
eigenvector with respect to the dot product gives
 
1
1
v1 = √  1  .
3 1

λ2 = 1:
    
0 −1 0 x −y
(A − 11)v =  −1
!
1 −1   y  =  −x + y − z  = 0 ⇐⇒ y = 0 , x = −z
0 −1 0 z −y

Again, the eigenvalue is non-degenerate and the normalized eigenvector is


 
−1
1 
v2 = √ 0  .
2 1

λ3 = 3:
    
−2 −1 0 x −2x − y
(A − 31)v =  −1 −1 −1   y  =  −x − y − z  = 0
!
⇐⇒ y = −2x , z = x
0 −1 −2 z −y − 2z

The eigenvalue is non-degenerate and the normalized eigenvector is


 
1
1 
v3 = √ −2  .
6 1

Some general properties of the characteristic polynomial are given in the following

Lemma 7.1. (Properties of characteristic polynomial) The characteristic polynomial χA (λ) = cn λn +


cn−1 λn−1 + · · · + c1 λ + c0 of an n × n matrix A has the following properties:
(i) χP AP −1 = χA , so the characteristic polynomial is basis-independent.
(ii) The coefficients ci of the characteristic polynomial are basis-independent.
(iii) cn = (−1)n , cn−1 = (−1)n−1 ni=1 Aii , c0 = det(A).
P

Proof. (i) χP AP −1 (λ) = det(P AP −1 − λ1) = det(P (A − λ1)P −1 ) = det(A − λ1) = χA (λ).
(ii) This is a direct consequence of (i).
(iii) First, it is clear that c0 = χA (0) = det(A). The expressions for the other two coefficients follow by
carefully thinking about the order in λ of the terms in det(A − λ1), by using the general expression (5.11)
for the determinant. Terms of order λn and λn−1 only receive contributions from the product of the
diagonal elements, so that
n n
!
Y X
n−2 n n n−1
χA (λ) = (Aii − λ) + O(λ ) = (−1) λ + (−1) Aii λn−1 + O(λn−2 ) .
i=1 i=1

115
The above Lemma shows that the constant term in the characteristic polynomial equals det(A) and that
this is basis-independent. Of course, we have already shown the basis-independence of the determinant in
Section (5.2). However, we do gain some new insight from the basis-independence of the coefficient cn−1
in the characteristic polynomial. We define the trace of a matrix A by
n
X
tr(A) := Aii , (7.8)
i=1

that is, by the sum of its diagonal entries. Since cn−1 = (−1)n−1 tr(A) it follows that the trace is basis-
independent. This can also be seen more directly. First, note that
X X
tr(AB) = Aij Bji = Bji Aij = tr(BA) , (7.9)
i,j i,j

so matrices inside a trace can be commuted without changing the value of the trace. Hence,

tr(P AP −1 ) = tr((P A)P −1 ) = tr(P −1 (P A)) = tr(A) , (7.10)

and we have another proof for the basis-independence of the trace.

7.3 Diagonalization of matrices


We now come back to our original question. How can we find a basis in which a linear map or a matrix
has a particularly simply form, preferably diagonal? To be precise we start with
Definition 7.3. We say a linear map f : V → V can be diagonalised if there exist a basis of V such that
the matrix which describes f relative to this basis is diagonal.
Further, we say an n × n matrix A with entries in F can be diagonalized if there is an invertible n × n
matrix P with entries in F such that  := P −1 AP is diagonal.
The key statement relating eigenvectors and eigenvalues to diagonalization of a linear map is
Lemma 7.2. A linear map f : V → V can be diagonalised iff there exists a basis of V consisting of
eigenvectors of f . Relative to such a basis of eigenvectors, f is described by a diagonal matrix with the
eigenvalues along the diagonal.
Proof. The
P entries of the matrix A which describes f relative to a basis v1 , . . . , vn of V are obtained from
f (vj ) = i Aij vi , see the discussion around Eq. (3.80). From this equation, if A = diag(λ1 , . . . , λn ) is
diagonal, then f (vj ) = λj vj and the basis vectors vj are eigenvectors with eigenvalues λi . Conversely, if
f has a basis v1 , . . . , vn of eigenvectors with eigenvalues λi , then, from the eigenvalue equation (7.1), we
have f (vi ) = λi vi and, hence, the matrix describing f relative to this basis is A = diag(λ1 , . . . , λn ).

The analogous statement for matrices is


Lemma 7.3. The n × n matrix A with entries in F can be diagonalized iff A has n eigenvectors v1 , . . . , vn
which form a basis of F n . In this case, if we define the matrix

P = (v1 , . . . , vn ) (7.11)

whose columns are the eigenvectors of A it follows that

P −1 AP = diag(λ1 , . . . , λn ) , (7.12)

where λi are the eigenvalues for vi .

116
Proof. “⇐”: We assume that we have a basis v1 , . . . , vn of eigenvectors with eigenvalues λi so that
Avi = λi vi . Define the matrix P = (v1 , . . . , vn ) whose columns are the eigenvectors of A. Since the
eigenvectors form a basis of F n the matrix P is invertible. Then

P −1 AP = P −1 A(v1 , . . . , vn ) = P −1 (Av1 , . . . , Avn ) = P −1 (λ1 v1 , . . . , λn vn )


= P −1 (v1 , . . . , vn ) diag(λ1 , . . . , λn ) = diag(λ1 , . . . , λn ) .
| {z }
=P

“⇒”: Assume that A can be diagonalized, so we have an invertible matrix P with P −1 AP = Â =


diag(λ1 , . . . , λn ). Denote the column vectors of P by vi so that P ei = vi . Since P is invertible these
column vectors form a basis of F n . Then

Âei = λi ei =⇒ P −1 A P ei = λi ei =⇒ Avi = λi vi ,
|{z}
=vi

and, hence, v1 , . . . , vn is a basis of eigenvectors of A.

The requirement which is easily overlooked in the previous lemmas is that we are asking for a basis of
eigenvectors. Once we have found all the eigenvectors of a linear map they might or might not form a
basis of the underlying vector space. Only when they do can the linear map be diagonalized.

If a matrix A can be diagonalized, with eigenvalues λ1 , . . . , λn , so that P −1 AP = diag(λ1 , . . . , λn ), then


the basis-independence of the determinant and the trace implies that
n
Y n
X
det(A) = λi , tr(A) = λi , (7.13)
i=1 i=1

so, in this case, the determinant is the product of the eigenvalues and the trace is their sum.

Example 7.2: Diagonalizing matrices


(a) We begin with the matrix (7.6) from our previous Example 7.1. We have already determined its eigen-
values and eigenvectors and the latter clearly form a basis of R3 . Hence, this matrix can be diagonalized
and the matrix  1
− √12 √1


3 6
P =  √13 0 − √26   , (7.14)

√1 √1 √1
3 2 6

contains the three eigenvectors v1 , v2 , v3 from Example 7.1 as its columns. Note that these three columns
form an ortho-normal system with respect to the dot product so the above matrix P is orthogonal. This
means that its inverse is easily computed from P −1 = P T . With the matrix A from Eq. (7.6) it can then
be checked explicitly that
P T AP = diag(0, 1, 3) .
Note that the eigenvalues of A appear on the diagonal. It is not an accident that the eigenvectors of A
are pairwise orthogonal and, as we will see shortly, this is related to A being a symmetric matrix.
(b) Consider the 2 × 2 matrix  
0 1
A=
0 0

117
whose characteristic polynomial is
 
−λ 1
χA (λ) = det = λ2 .
0 −λ

Hence, there is only one eigenvalue, λ = 0. The associated eigenvectors are found by solving
    
0 1 x y !
= = 0 ⇐⇒ y = 0
0 0 y 0

so the eigenvalue is non-degenerate with eigenvectors proportional to (1, 0)T . This amounts to only one
eigenvector (up to re-scaling) so this matrix does not have a basis of eigenvectors (which requires two
linearly independent vectors in R2 ) and cannot be diagonalized.
(c) Our next example is for the matrix  
0 1
A=
−1 0
with characteristic polynomial
 
−λ 1
χA (λ) = det = λ2 + 1 .
−1 −λ

At this point we have to be a bit more specific about the underlying vector space. If the vector space is
R2 , we have to work with real numbers and there are no eigenvalues since the characteristic polynomial
has no real zeros. Hence, in this case, the matrix cannot be diagonalized. On the other hand, for C2 and
complex scalars, there are two eigenvalues, λ± = ±i. The corresponding eigenvectors v = (x, y)T are:
λ+ = i
    
−i 1 x −ix + y
(A − i12 )v = A =
!
= = 0 ⇐⇒ y = ix .
−1 −i y −x − iy
The eigenvalue is non-degenerate and, as in Example (7.1) it is useful to normalize the eigenvector.
However, since we are working over the complex numbers, we should be using the standard hermitian
scalar product and demand that v† v = 1. Then
 
1 1
v+ = √ .
2 i

λ− = −i
    
i 1 x ix + y
(A + i12 )v = A =
!
= =0 ⇐⇒ y = −ix .
−1 i y −x + iy
Again, this eigenvalue is non-degenerate with corresponding normalized eigenvector
 
1 1
v+ = √ .
2 −i
The diagonalizing basis transformation is
 
1 1 1
P = (v+ , v− ) = √ ,
2 i −i

and its column vectors form an ortho-normal system (under the standard hermitian scalar product on
C2 ). Therefore, P is a unitary matrix and P −1 = P † . Again, the orthogonality of the eigenvectors is not

118
an accident and is related to the matrix A being anti-symmetric. To check these results we verify that
indeed
P † AP = diag(i, −i) .

(d) Finally, we consider the hermitian matrix


 
1 2i
A=
−2i 1

with characteristic polynomial


 
1−λ 2i
χA (λ) = det = (λ − 3)(λ + 1) .
−2i 1 − λ

Hence, the eigenvalues are λ1 = 3 and λ2 = −1.


λ1 = 3
  
−2 2i x
(A − 313 )v =
!
=0 ⇐⇒ x = iy .
−2i −2 y
The eigenvalue is non-degenerate and the corresponding eigenvector can be chosen as
 
1 i
v1 = √ ,
2 1

so that it is properly normalized with respect to the C2 standard scalar product, v1† v1 = 1.
λ2 = −1
  
2 2i x
(A + 13 )v =
!
= 0 ⇐⇒ x = −iy .
−2i 2 y
This eigenvalue is also non-degenerate and the normalized eigenvector, satisfying v2† v2 = 1, can be chosen
as  
1 −i
v2 = √
2 1
Note also that the two eigenvectors are orthogonal, v1† v2 = 0. Consequently, the diagonalizing matrix
 
1 i −i
U = (v1 , v2 ) = √
2 1 1

is unitary, U † U = 12 , and it is straightforward to verify that indeed U † AU = diag(3, 1).

While Lemma 7.3 provides a general criterion for a matrix to be diagonalizable it requires calculation
of all the eigenvectors and checking whether they form a basis. It would be helpful to have a simpler
condition, at least for some classes of matrices, which can simply be “read off” from the matrix. To this
end we prove
Theorem 7.1. Let V be a vector space over R (C) with real (hermitian) scalar product h·, ·i. If f : V → V
is self-adjoint then
(i) All eigenvalues of f are real.
(ii) Eigenvectors for different eigenvalues are orthogonal.

119
Proof. (i) For the real case, the first part of the statement is of course trivial. For the complex case, we
start with an eigenvector v 6= 0 of f with corresponding eigenvalue λ, so that f (v) = λv. Then

λhv, vi = hv, λvi = hv, f (v)i = hf (v), vi = hλv, vi = λ∗ hv, vi .

In the third step we have used the fact that f is self-adjoint and can, hence, be moved from one argument
of the scalar product into the other. Since v 6= 0 and, hence, hv, vi = 6 0 it follows that λ = λ∗ , so the
eigenvalue is real.
(ii) Consider two eigenvectors v1 , v2 , so that f (v1 ) = λ1 v1 and f (v2 ) = λ2 v2 , with different eigenvalues,
λ1 6= λ2 . Then

(λ1 − λ2 )hv1 , v2 i = hλ1 v1 , v2 i − hv1 , λ2 v2 i = hf (v1 ), v2 i − hv1 , f (v2 )i = hv1 , f (v2 )i − hv1 , f (v2 )i = 0 .

Since λ1 − λ2 6= 0 this means hv1 , v2 i = 0 and the two eigenvectors are orthogonal.

Theorem 7.2. Let V be a vector space over C with hermitian scalar product h·, ·i. If f : V → V is
self-adjoint it has an ortho-normal basis, 1 , . . . , n of eigenvectors.

Proof. The proof is by induction in n, the dimension of the vector space V . For n = 1 the assertion is
trivial. Assume that it is true for all dimensions k < n. We would like to show that it is true for dimension
n. The characteristic polynomial χf of f has at least one zero, λ, over the complex numbers. Since f is
self-adjoint, λ is real from the previous theorem. Consider the eigenspace W = Eigf (λ). Since λ is an
eigenvalue, dim(W ) > 0. Vectors v ∈ W ⊥ and w ∈ W are perpendicular, hw, vi = 0, so

hw, f (v)i = hf (w), vi = hλw, vi = λhw, vi = 0 .

This means that f (v) is perpendicular to w so that, whenever v ∈ W ⊥ , then also f (v) ∈ W ⊥ . As a result,
W ⊥ is invariant under f and we can restrict f to W ⊥ , that is, consider g = f |W ⊥ . Since dim W ⊥ < n, there
is an ortho-normal basis 1 , . . . , k of W ⊥ consisting of eigenvectors of g (which are also eigenvectors of f )
by the induction assumption. Add to this ortho-normal basis of W ⊥ an ortho-normal basis of W (which,
by definition of W , consists of eigenvectors of f with eigenvalue λ). Since dim(W ) + dim(W ⊥ ) = n
(see Lemma 6.3) and pairwise orthogonal vector are linearly independent this list of vectors forms an
ortho-normal basis of V , consisting of eigenvectors of f .

In summary, these results mean that every real symmetric (hermitian) matrix can be diagonalized, has
an ortho-normal basis 1 , . . . , n of eigenvectors with corresponding real eigenvalues λ1 , . . . , λn and the
diagonalizing matrix P = (1 , . . . , n ) is orthogonal (unitary), such that

P † AP = diag(λ1 , . . . , λn ) . (7.15)

How can this ortho-normal basis of eigenvectors be found? First, the eigenvalues and eigenvectors have to
be computed in the usual way, as outlined above. From Theorem 7.1 eigenvectors for different eigenvalues
are orthogonal so if all eigenvalues are non-degenerate then the eigenvectors will be automatically pairwise
orthogonal. What remains to be done in order to obtain an ortho-normal system is simply to normalize
the eigenvectors. This is what has happened in Example 7.1 (and its continuation, Example 7.2 (a)) where
all eigenvalues were indeed non-degenerate.

The situation is slightly more involved in the presence of degenerate eigenvalues. Of course eigenvectors
for different eigenvalues are still automatically orthogonal. However, for a degenerate eigenvalue we have
two or more linearly independent eigenvectors which are not guaranteed to be orthogonal. The point is
that we can choose such eigenvectors to be orthogonal. To see how this works it is useful to think about

120
the eigenspaces, EigA (λ), of the hermitian matrix A. Eigenspaces for different eigenvalues are of course
orthogonal to one another (meaning that all vectors of one eigenspace are orthogonal to all vectors of
the other), as a consequence of Theorem 7.1. For each eigenspace, we can find a basis of eigenvectors
and, applying the Gram-Schmidt procedure to this basis, we can convert this into an ortho-normal basis.
If the eigenvector is non-degenerate, so that dim EigA (λ) = 1, this simply means normalizing the single
basis vectors. For degenerate eigenvalues, when dim EigA (λ) > 1, we have to follow the full Gram-Schmidt
procedure as explained in Section 6.2. Combining the ortho-normal sets of basis vectors for each eigenspace
into one list then gives the full basis of ortho-normal eigenvectors. To see how this works explicitly let us
discuss a more complicated example with a degenerate eigenvalue.

Example 7.3: Diagonalizing with degenerate eigenvalues


In R3 , we consider the matrix √ √ 
2
√ 3 2 3 2
1
A =  3√2 −1 3 
4
3 2 3 −1
with characteristic polynomial
 1 3 3

2 −λ √
2 2

2 2
3
χA (λ) = det 
 √
2 2
− 14 − λ 3
4
3 2
 = −λ + 3λ + 2 = (2 − λ)(1 + λ) .

3 3 1

2 2 4 −4 − λ

Hence, there are two eigenvalues, λ1 = 2 and λ2 = −1. For the eigenvectors v = (x, y, z)T we find:
λ1 = 2:
 √ √    √ √ 
−2
√ 2 2 x −2x
√ + 2y + 2z
3 3 x
(A − 213 )v =  √2 −3 1   y  =  √2x − 3y + z
 =! 0 =⇒ y=z= √ .
4 4 2
2 1 −3 z 2x + y − 3z

Hence, this eigenvalue is non-degenerate and a suitable normalized eigenvector is


 √ 
2
1
1 = 1  .
2
1

λ1 = −1:
 √ √    √ √ 
3 √2 2 2 x
3
2x√+ 2y + 2z √
(A + 13 )v =  √2
!
1 1   y  =  √2x + y + z  = 0 =⇒ z = − 2x − y .
4 4
2 1 1 z 2x + y + z

Since we have found only one condition on x, y, z there are two linearly independent eigenvectors, so
this eigenvalue has degeneracy 2. Obvious choices for the two eigenvectors are obtained by setting x = 1,
y = 0 and x = 0, y = 1, so    
1 0
v2 =  √ 0  , v3 =  1  .
− 2 −1
Both of these vectors are orthogonal to 1 above, as they must be, but they are not orthogonal to one
another. However, they do form a basis of the two-dimensional eigenspace EigA (−1) = Span(v2 , v3 ) so

121
that every linear combination of these vectors is also an eigenvector for the same eigenvalue −1. With
this in mind we apply the Gram-Schmidt procedure to v2 and v3 . First normalizing v2 leads to
 
1
v2 1
2 = =√  √ 0  .
|v2 | 3 − 2

Then, subtracting from v3 its projection onto 2 and normalizing results in


 √   √ 
− 2 0 − 2
1 v 1
v30 = v3 − (2 · v3 )2 =  3  , 3 = 30 = √  3  .
3 |v3 | 2 3
−1 −1

The system 1 , 2 , 3 is now an ortho-normal basis of eigenvectors, so the matrix


 1 
√ √1 − √1
 12 3 √ 6 
3
P = (1 , 2 , 3 ) = 
 2 q0 2


1 2 1
2 − 3 − √
2 3

indeed satisfies P T P = 13 and P T AP = diag(2, −1, −1).

7.4 Normal linear maps


We have seen above that symmetric and hermitian matrices can always be diagonalised by an orthogonal
or unitary basis transformation, respectively. What about matrices which are neither symmetric nor
hermitian? It turns out that a useful criterion for the existence of a diagonalising basis change can be
formulated for the more general class of normal linear maps or matrices.

Definition 7.4. Let V be a vector space over C with hermitian scalar product h·, ·, i. A linear map
f : V → V is called normal if f ◦ f † = f † ◦ f (or, equivalently, iff the commutator of f and f † vanishes,
that is, [f, f † ] := f ◦ f † − f † ◦ f = 0).

Recall from Section 6.3 that the adjoint map, f † , for a linear map f : V → V is defined relative to a
scalar product on V . Clearly hermitian and unitary linear maps are normal (since f = f † if f is hermitian
and f † ◦ f = f ◦ f † = id if f is unitary), as are anti-hermitian maps, that is, maps satisfying f = −f † .
If we consider the vector space V = Cn over C with the standard hermitian scalar product then (anti-)
hermitian and unitary maps simply correspond to (anti-) hermitian and unitary matrices and we learn
that these classes of matrices are normal.
A useful statement for normal linear maps is

Lemma 7.4. Let V be a vector space over C with hermitian scalar product h·, ·, i and f : V → V be a
normal linear map. If λ is an eigenvalue of f with eigenvector v then λ∗ is an eigenvalue of f † for the
same eigenvector v.

Proof. First, we show that the map g = f − λ id is also normal. This follows from the straightforward
calculation

g ◦ g † = (f − λ id) ◦ (f † − λ∗ id) = f ◦ f † − λ∗ f − λf † + |λ|2 id


= f † ◦ f − λ∗ f − λf † + |λ|2 id = g † ◦ g .

122
Note that some of the properties of the adjoint map in Lemma 6.4 have been used in this calculation.
Now consider an eigenvalue λ of f with eigenvector v, so that f (v) = λv or, equivalently, g(v) = 0. Then
we have
0 = hgv, gvi = hv, g † ◦ gvi = hv, g ◦ g † vi = hg † v, g † vi ,
and, from the positivity property of the scalar product, (S3) in Def. 6.1, it follows that g † (v) = 0. Since
g † = f − λ∗ id this, in turn means that f † (v) = λ∗ v. Hence, λ∗ is indeed an eigenvalue of f † with
eigenvector v.

The key property of normal matrices relevant to matrix diagonalization is

Theorem 7.3. Let V be a vector space over C with hermitian scalar product h·, ·i and f : V → V a linear
map. Then we have: f is normal ⇐⇒ f has an ortho-normal basis of eigenvectors

Proof. “⇐”: Start with an ortho-normal basis 1 , . . . , n of eigenvector of f , so that hi , j i = δij and
f (i ) = λi i . Then
hj , f † (i )i = hf (j ), i i = λ∗i δij
which holds for all j and, hence, implies that f † (i ) = λ∗i i . From this result we have

f ◦ f † (i ) = |λi |2 i = f † ◦ f (i ) ,

for all i . This means that f ◦ f † = f † ◦ f so that f is normal.


“⇒”: Conversely, assume that f is normal. We will show that f has an ortho-normal basis of eigenvectors
by induction in n = dim(V ). For n = 1 the statement is trivial. Assume that it is valid for all dimensions
k < n. Since we are working over the complex numbers, f has at least one eigenvalue λ with eigenvector
v, so that f (v) = λ(v). Define the sub vector space W = {w ∈ V | hw, vi = 0}, that is the space
perpendicular to v. For any w ∈ W it follows

hf (w), vi = hw, f † (v)i = hw, λ∗ vi = λ∗ hw, vi = 0


hf † (w), vi = hw, f (v)i = hw, λvi = λhw, vi = 0

where the above Lemma has been used in the first line. This means that f (W ) ⊂ W and f † (W ) ⊂ W , so
that sub vector space W is invariant under both f and f † . This implies immediately that the restriction,
f |W of f to W is normal as well. Since dim(W ) = n − 1, the induction assumption can be applied and
we conclude that f |W has an ortho-normal basis of eigenvectors. Combining this basis with v/|v| gives
an ortho-normal basis of eigenvectors for f .

As we have seen, unitary maps are normal so the theorem implies that they have an ortho-normal
basis of eigenvectors. Focusing on V = Cn with the standard hermitian scalar product this means that
unitary matrices can be diagonalised. The eigenvalues of unitary maps are constrained by the following

Lemma 7.5. Let V be a vector space over C with hermitian scalar product h·, ·i and U : V → V a unitary
map. If λ is an eigenvalue of U then |λ| = 1.

Proof. Let λ be an eigenvalue of U with eigenvector v, so that U v = λv. From unitarity of U it follows
that
|λ|2 hv, vi = hλv, λvi = hU v, U vi = hv, vi ,
and, dividing by hv, vi (which must be non-zero since v is an eigenvector), gives |λ|2 = 1.

123
Combining these statements we learn that every unitary matrix U can be diagonalised, by means of a
unitary coordinate transformation P , such that P † U P = diag(eiφ1 , . . . , eiφn ). Since orthogonal matrices
are also unitary they can be diagonalised in the same way, provided we are working over the complex
numbers . In fact, we have already seen this explicitly in Example 7.2 (c) where the matrix A is a specific
two-dimensional rotation matrix.

Example 7.4: Three-dimensional rotations – again


Some of our results on eigenvalues and eigenvectors can be used to extract useful properties of three
dimensional rotations without much calculation. We know from the above discussion that, over the
complex numbers, three-dimensional rotations R can be diagonalised (by a unitary basis transformation)
to a matrix diag(eiφ1 , eiφ2 , eiφ3 ) with phases in the diagonal. However, rotations are matrices with real
entries, acting on R3 , and for many purposes it seems more appropriate to work over the real numbers.
Let us see how much we can say allowing for real basis transformation only. First, since a three-
dimensional rotation matrix R has real entries, the coefficients of the characteristic polynomial, χR , are
also real. This means for any eigenvalue λ its complex conjugate λ∗ is also an eigenvalue. Combine this
observation with the fact that all eigenvalues of a rotation matrix have complex modulus one and that
their product needs to be one (since det(R) = 1) and we learn that the eigenvalues of a three-dimensional
rotation R must be of the form 1, eiφ , e−iφ . In particular, for every three-dimensional rotation at least one
of the eigenvalues equals 1 and the corresponding eigenvector n (normalised, so that n · n = 1), which
satisfies
Rn = n , (7.16)
is called the axis of rotation.
Let us consider an ortho-normal basis {n, u1 , u2 } of R3 with the axis of rotation as its first basis vector.
What is the matrix, R̃, representing the rotation R relative to this basis? From Eq. (6.24) this can be
worked out by computing the matrix elements

n · (Rn) = n · n = 1 , ua · (Rn) = ua · n = 0 , n · (Rua ) = (R−1 n) · ua = n · ua = 0 (7.17)

of R. These results show that the representing matrix is of the form

1 0T
 
R̃ = , (7.18)
0 R2

where R2 is a 2 × 2 matrix. However, R̃ is also a rotation and, hence, needs to satisfy R̃T R̃ = 13
and det(R̃) = 1. This immediately implies that R2T R2 = 12 and det(R2 ) = 1, so that R2 must be a
two-dimensional rotation, R2 = R(θ), of the form given in Eq. (6.40).
In summary, we learn that for every three-dimensional rotation R we can find an orthonormal basis
(where the first basis vector is the axis of rotation) where it takes the form
 
 T
 1 0 0
1 0
R̃ = =  0 cos(θ) − sin(θ)  . (7.19)
0 R(θ)
0 sin(θ) cos(θ)

The angle θ which appears in this parametrisation is called the angle of rotation. Basis-independence of
the trace means that tr(R) = tr(R̃) = 1 + 2 cos(θ) which leads to the interesting and useful formula
1
cos(θ) = (tr(R) − 1) (7.20)
2

124
for the angle of rotation of a rotation matrix R. This formula allows for an easy computation of the angle
of rotation, even if the rotation matrix is not in the simple form (7.19). The axis of rotation n, on the
other hand, can be found as the eigenvector for eigenvalue one, that is, by solving Eq. (7.16).
For example, consider the matrix
 √ 
2 √−1 −1 √
1
R =  √0 2 − 2  . (7.21)
2
2 1 1

It is easy to verify that RT R = 13 and det(R) = 1 so this is indeed a rotation. By solving Eq. (7.16) for
this matrix (and normalising the eigenvector) we find for the axis of rotation
1 √ T
n= p √ (1, −1, 2 − 1) . (7.22)
5−2 2

Also, we have tr(R) = 2 + 1/2, so from Eq. (7.20) the angle of rotation satisfies
1 √
cos(θ) = (2 2 − 1) . (7.23)
4

7.5 Simultaneous diagonalization


Frequently, one would like to know whether two n × n matrices A, B can be diagonalized simultaneously,
that is, whether a single basis transformation P can be found such that both P AP −1 and P BP −1 are
diagonal. One way to solve this problem is to calculate the eigenvectors for both matrices and to check
if a basis of common eigenvectors can be selected. However, this can be tedious. The following theorem
gives a simple criterion which is easy to check.
Theorem 7.4. Let A, B be two diagonalizable n × n matrices. Then we have:
A, B can be diagonalized simultaneously ⇐⇒ [A, B] = 0 . (7.24)
Proof. “⇒”: This is the easy direction. Assume that A, B can be diagonalized simultaneously so that
there is a basis transformation P such that both  = P −1 AP and B̂ = P −1 BP are diagonal. Then
[A, B] = AB − BA = P Â |P −1 −1 −1 −1 −1 −1
{z P} B̂P − P B̂ |P {z P} ÂP = P (ÂB̂ − B̂ Â)P = P [Â, B̂]P = 0 ,
=1 =1

since diagonal matrices commute.


“⇐”: The converse is more difficult and to simplify matters we assume that the eigenvalues of A are non-
degenerate. (Without this assumption the proof goes along similar lines but is more involved.) Since, by
assumption, A can be diagonalized we have a basis, v1 , . . . , vn of eigenvectors with eigenvalues λ1 , . . . , λn
such that Avi = λi vi . The two matrices commute so that
A(Bvi ) = B(Avi ) = λi Bvi .
This shows that Bvi is also an eigenvectors of A (or it is the zero vector), with eigenvalue λi . Since the
eigenvalue λi is non-degenerate (and this is where our simplifying assumption enters) it follows that Bvi
must be a multiple of vi , so there should be scalars µi such that
Bvi = µi vi .
This means the vi are also eigenvectors of B, although in general for different eigenvalues µi . Hence,
v1 , . . . , vn is a basis of common eigenvectors, so A and B can be diagonalized simultaneously.

125
Example 7.5: Simultaneous diagonalization
Can the three matrices
     
2 −1 3 2 1 1
A= , B= , C=
−1 2 2 3 1 2

be diagonalized simultaneously? A straightforward explicit computation of the commutators shows that

[A, B] = 0 , [A, C] 6= 0 , [B, C] 6= 0 .

Hence, A, B can be diagonalized simultaneously but not A, C or B, C.

7.6 Applications
Eigenvectors and eigenvalues have a wide range of applications, both in mathematics and in physics. Here,
we discuss a small selection of those applications.

7.6.1 Solving Newton-type differential equations with linear forces


We would like to find the solutions q(t) = (q1 (t), . . . , qn (t))T to the differential equation

d2 q
= −M q , (7.25)
dt2
where M is a real symmetric n × n matrix. In a physical context, this differential equation might describe
a system of mass points connected by springs. The practical problem in solving this equation is that, for
a non-diagonal matrix M , its various components are coupled. However, this coupling can be removed by
diagonalizing the matrix M . To this end, we consider an orthogonal matrix P such that P T M P = M̂ =
diag(m1 , . . . , mn ) and introduce new coordinates Q by setting q = P Q. By multiplying Eq. (7.25) with
P T this leads to
d2 Q T d2 Qi
= −P M P} Q or = −mi Qi for i = 1, . . . , n . (7.26)
dt2 | {z dt2
=M̂

In terms of Q the system decouples and the solutions can easily be written down as

 ai sin(wi t) + bi cos(wi t) for mi > 0 p
Qi (t) = ai ewi t + bi e−wi t for mi < 0 where wi = |mi | , (7.27)
ai t + bi for mi = 0

and ai , bi are arbitrary constants. In terms of the original coordinates, the solution is then obtained by
inserting Eq. (7.27) into q = P Q. One interesting observation is that the nature of the solution depends
on the signs of the eigenvalues mi of the matrix M . For a positive eigenvalue, the solution is oscillatory, for
a negative one exponential and for a vanishing one linear. Physically, a negative or vanishing eigenvalue
mi indicates an instability. In this case, the corresponding Qi (t) becomes large at late times (except for
special choices of the constants ai , bi ). The lesson is that stability of the system can be analyzed simply
by looking at the eigenvalues of M . If they are all positive, the system is fully oscillatory and stable, if
there are vanishing or negative eigenvalues the system generically “runs away” in some directions.

126
Example 7.6: As an explicit example, consider the differential equations

d2 q1
= −q1 + q2
dt2
d2 q2
= q1 − q2 + q3
dt2
d2 q3
= q2 − q3 .
dt2
This system is indeed of the general form (7.25) with
 
1 −1 0
M =  −1 2 −1  .
0 −1 1

This is the same matrix we have studied in Example 7.1 and it has eigenvalues m1 = 0, m2 = 1 and
m3 = 3. Due to the zero eigenvalue this system has a linear instability in one direction. Inserting into
Eq. (7.27), the explicit solution reads
 
a1 t + b1
Q(t) =  a2 sin(t)
√ + b2 cos(t)√
 (7.28)
a3 sin( 3t) + b3 cos( 3t)

In terms of the original coordinates q, the solution is obtained by inserting (7.28) into q = P Q using the
diagonalizing matrix P given in Eq. (7.14).

7.6.2 Functions of matrices


Start with a real or complex function g(x). We would like to “insert” an n × n matrix A into this function,
that is, we would like to make sense of the expression g(A). This can be done whenever the function has
a (suitably convergent) power series expansion

g(x) = a0 + a1 x + a2 x2 + · · · . (7.29)

In this case, we can define g(A) as

g(A) = a0 1n + a1 A + a2 A2 + · · · , (7.30)

that is, by simply “replacing” x with A in the power series expansion. Note that, convergence assumed,
the RHS of Eq. (7.30) is well-defined via addition and multiplication of matrices and the function “value”
g(A) is a matrix of the same size as A.

Example 7.7: The matrix exponential is defined as



1 1 X 1
e A = 1 + A + A2 + A3 + · · · = Ak . (7.31)
2 6 k!
k=0

Since the exponential series converges for all real (and complex) x it can be shown that the matrix
exponential converges for all matrices.

127
Computing the function of a non-diagonal matrix can be complicated as it involves computing higher
and higher powers of the matrix A. However, it is easily accomplished for a diagonal matrix  =
diag(a1 , . . . , an ) since Âk = diag(ak1 , . . . , akn ) so that

g(diag(a1 , . . . , an )) = diag(g(a1 ), . . . , g(an )) (7.32)

for a function g. This suggest that we might be able to compute the function of a more general matrix
by diagonalizing and then applying Eq. (7.32). To do this, we first observe that computing the function
of a matrix “commutes” with a change of basis. Indeed from

(P −1 AP )k = P −1 A |P P −1 −1 −1 k
{z } AP · · · P AP = P A P
=1

it follows that
g(P −1 AP ) = P −1 g(A)P . (7.33)
Now suppose that A can be diagonalized and P −1 AP = Â = diag(λ1 , . . . , λn ). Then

g(A) = g(P ÂP −1 ) = P g(Â)P −1 = P diag(g(λ1 ), . . . , g(λn ))P −1 . (7.34)

That is, we can compute the function of the matrix A by first forming the diagonal matrix which contains
the function values of the eigenvalues and then transforming this matrix back to the original basis. Let
us see how this works explicitly.

Example 7.8: Computing functions of matrices


(a) Let us consider the hermitian matrix
 
1 2i
A= (7.35)
−2i 1

which we have already diagonalized in Example 7.2 (d). Recall that the eigenvalues of this matrix are 3,
−1 and the diagonalizing basis transformation is given by
 
1 i −i
U=√
2 1 1
so that U † AU = diag(3, −1). We would like to calculate g(A) for the function g(x) = xn , where n is an
arbitrary integer. Then, from Eq. (7.34), we have

(−1)n + 3n −i((−1)n − 3n )
 
n n † 1
g(A) = U diag(3 , (−1) )U = .
2 i((−1)n − 3n ) (−1)n + 3n

(b) For another example consider the matrix


 
0 1
A = θT , T = ,
−1 0

where θ is an arbitrary real number. Apart from the overall θ factor (which does not affect the eigenvectors
and multiplies the eigenvalues) this is the matrix we have studied in Example 7.2 (c). Hence, we know
that the eigenvalues are ±iθ and the diagonalizing basis transformation is
 
1 1 1
P =√ ,
2 i −i

128
so that P † AP = diag(iθ, −iθ). From Eq. (7.34) we, therefore, find for the matrix exponential of A
   iθ    
A iθ −iθ † 1 1 1 e 0 1 −i cos θ sin θ
e = P diag(e , e )P = = .
2 i −i 0 e−iθ 1 i − sin θ cos θ
It is not an accident that this comes out as a two-dimensional rotation. The theory of Lie groups states
that rotations (and special unitary matrices) in all dimensions can be obtained as matrix exponentials
of certain, relative simple matrices, such as A in the present example. This fact is particularly useful in
higher dimensions when the rotation matrices are not so easily written down explicitly. To explain this in
detail is well beyond the scope of this lecture.
Sometimes functions of matrices can be computed more straightforwardly without resorting to diago-
nalizing the matrix. This is usually possible when the matrix in question is relatively simple so that its
powers can be computed explicitly. Indeed, this works for the present example and leads to an alternative
calculation of the matrix exponential. To carry this out we first observe that A2 = −θ2 12 and, hence,
A2n = (−1)n θ2n 12 , A2n+1 = (−1)n θ2n+1 T .
With these results it is straightforward to work out the matrix exponential explicitly.
∞ ∞ ∞
X 1 n X 1 X 1
eA = A = A2n + A2n+1
n! (2n)! (2n + 1)!
n=0 n=0 n=0
∞ ∞
(−1)n θ2n X (−1)n θ2n+1
12 + T = cos(θ)12 + sin(θ)T .
X
=
(2n)! (2n + 1)!
n=0 n=0

This coincides with the earlier result, as it must.


(c) We can take the previous example somewhat further and consider the three Pauli matrices σi , defined
by      
0 1 0 −i 1 0
σ1 = , σ2 = , σ3 = . (7.36)
1 0 i 0 0 −1
The three-dimensional vector space L := Span(σ1 , σ2 , σ3 ) over R spanned by the Pauli matrices consists
of all 2 × 2 hermitian, traceless matrices. The Pauli matrices have a number of nice algebraic properties
which can be summarized by the relation
σi σj = 12 δij + iijk σk , (7.37)
which is easily verified by using the explicit matrices above. For example, this relation implies immediately
that their commutator (defined by [A, B] := AB − BA) and anti-commutator (defined by {A, B} :=
AB + BA) are given by
[σi , σj ] = 2iijk σk , {σi , σj } = 212 δij . (7.38)
We would like to work out the matrix exponential of an arbitrary linear combination of the Pauli matrices.
Introducing the formal vector σ = (σ1 , σ2 , σ3 )T we can write such a linear combination for a vector a with
components ai as a · σ = ai σi . Multiplying Eq. (7.37) with ai aj shows that (a · σ)2 = |a|2 12 and, hence,
for an arbitrary positive integer n,
(a · σ)2n = |a|2n 12 , (a · σ)2n+1 = |a|2n a · σ . (7.39)
Thanks to these relations it is now easy to work out the matrix exponent of iθn · σ, where n is a unit
vector and θ a real number, even without prior diagonalization. Using the Eqs. (7.39) with a = n we find

(iθ)n
(n · σ)n = cos(θ)12 + i sin(θ)n · σ .
X
U := exp(iθ n · σ) = (7.40)
n!
n=0

129
Remembering that σi† = σi , it is easy to verify that

U † U = (cos(θ)12 − i sin(θ)n · σ) (cos(θ)12 + i sin(θ)n · σ) = 12 , (7.41)

and, hence, the matrix exponentials U are unitary. Writing U out explicitly, using the Pauli matrices (7.36)
and Eq. (7.40), gives  
cos θ + in3 sin θ (n2 + in1 ) sin θ
U= (7.42)
−(n2 − in1 ) sin θ cos θ − in3 sin θ
This shows that det(U ) = |n|2 = 1 so that U is, in fact, special unitary. It turns out that all 2 × 2 special
unitary matrices can be obtained as matrix exponentials of Pauli matrices in this way, another example of
the general statement from the theory of Lie groups mentioned earlier. Indeed, the matrix (7.42) can be
converted into our earlier general form for SU (2) matrices in Eq. (6.59) by setting α = cos θ + in3 sin θ and
β = (n2 + in1 ) sin θ. In mathematical parlance, the vector space L of 2 × 2 hermitian, traceless matrices
is referred to as the Lie algebra of the 2 × 2 special unitary matrices SU (2).

Example 7.9: Solving differential equations with matrix exponentials


Consider the simple first order ordinary differential equation
dx
= ax
dt
for a real function t → x(t) and an arbitrary real constant a. The general solution to this equation is of
course
x(t) = eat c , (7.43)
for an arbitrary “initial value” c. What about its multi-dimensional generalization
dx
= Ax , (7.44)
dt
where x(t) = (x1 (t), . . . , xn (t))T is a vector of n real functions and A is a constant real n × n matrix? The
straightforward generalization of the solution (7.43) to the multi-dimensional case reads

x(t) = eAt c , (7.45)

where c is an arbitrary n-dimensional vector. Note that, given our definition of the matrix exponential,
Eq. (7.45) makes perfect sense. But does it really solve the differential equation (7.44)? We verify this
by simply inserting Eq. (7.45) into the differential equation (7.44), using the definition of the matrix
exponential.
∞ ∞ ∞
dx d d X 1 n n X 1 X 1
= eAt c = A t c= An tn−1 c = A An tn c = Ax (7.46)
dt dt dt n! (n − 1)! n!
n=0 n=1 n=0

Hence, Eq. (7.45) is indeed a solution for arbitrary vectors c.

130
7.6.3 Quadratic forms
A quadratic form in the (real) variables x = (x1 , . . . , xn )T is an expression of the form
n
X
q(x) := Qij xi xj = xT Qx . (7.47)
i,j=1

where Q is a real symmetric n × n matrix with entries Qij . We have already encountered examples of such
quadratic forms in Eq. (6.84) and the comparison shows that they can be viewed as symmetric bi-linear
forms on Rn . Our present task it to simplify the quadratic form by diagonalizing the matrix Q. With the
diagonalizing basis transformation P and P T QP = Q̂ = diag(λ1 , . . . , λn ) and new coordinates defined by
x = P y we have
Xn
T T T
q(x) = x P Q̂P x = y Q̂y = λi yi2 . (7.48)
i=1

Hence, in the new coordinates y the cross terms in the quadratic form have been removed and only the
pure square terms, yi2 , are present. Note that they are multiplied by the eigenvalues of the matrix Q.

Application: Kinetic energy of a rotating rigid body


In Section (2), we have shown that the kinetic energy of a rotating rigid body is given by
1X 1
Ekin = Iij ωi ωj = ω T Iω . (7.49)
2 2
i,j

where ω = (ω1 , ω2 , ω3 )T is the angular velocity and I is the moment of inertia tensor of the rigid body.
Clearly, this is a quadratic form and by diagonalizing the moment of inertia tensor, P IP T = diag(I1 , I2 , I3 )
and introducing Ω = P ω we can write
3
1X
Ekin = Ii Ω2i . (7.50)
2
i=1

This simplification of the kinetic energy is an important step in understanding the dynamics of rigid
bodies.

Quadratic forms can be used to define quadratic curves (in two dimensions) or quadratic surfaces (in three
dimensions) by the set of all points x satisfying

xT Qx = c , (7.51)

with a real constant c. By diagonalizing the quadratic form, as in Eq. (7.48), the nature of the quadratic
curve or surface can be immediately read off from the eigenvalues λi of Q as indicated in the table below.

condition on eigenvalues λi two dimensions three dimensions


all λi equal, same sign as c circle sphere
all λi have same sign as c ellipse ellipsoid
λi with both signs hyperbola hyperboloid

131
In terms of the coordinates y = P x which diagonalize the quadratic form as in Eq. (7.48), the curve or
surface defined by Eq. (7.51) can be written as
X
λi yi2 = c . (7.52)
i

Focus on the case of an ellipse or ellipsoid. The standard form of the equation defining an ellipse or
ellipsoid is given by
X y2
i
l 2 =1, (7.53)
i i

where li can be interpreted as the lengths of the semi-axes. By comparison with Eq. (7.52) we see that
these lengths can be computed from the eigenvalues of the matrix Q by
r
c
li = . (7.54)
λi
In the basis with coordinates y the semi-axes are in the directions of the standard unit vectors ei . Hence,
in the original basis with coordinates x the semi-axis are in the directions vi = P ei , that is, in the
directions of the eigenvectors vi of Q.

Example 7.10: Quadratic curve in R2


Consider a quadratic curve in R2 which is defined by all points x = (x1 , x2 )T which satisfy the equation
q(x) = 3x21 + 2x1 x2 + 2x22 = 1
The quadratic form can also be written as q(x) = xT Ax where
 
3 1
A= .
1 2
The characteristic polynomials for A is
 
3−λ 1
χA (λ) = det = λ2 − 5λ + 5 ,
1 2−λ

which leads to eigenvalues λ± = (5 ± 5)/2. The corresponding eigenvectors (not normalized) are given
by  √   √ 
1+ 5 1− 5
v+ = , v− = . (7.55)
2 2
Since both eigenvalues are positive, but different this curve is an ellipse. In the diagonalizing coordinates
y = (y1 , y2 )T the equation for this ellipse can be written as
q(y) = λ+ y12 + λ− y22 = 1 .
y2 y2
Comparing with the standard form a12 + b22 = 1 of an ellipse shows that the lengths of the two half axes
are given by s s
1 2 1 2
a= p = √ , b= p = √ .
λ+ 5+ 5 λ− 5− 5
p
The directions of these two half-axes with lengths 1/ λ± are given by the eigenvectors v± in Eq. (7.55).

132
Literature
A large number of textbooks on the subject can be found, varying in style from “Vectors and Matrices for
Dummies” to hugely abstract treaties. I suggest a trip to the library in order to pick one or two books in
the middle ground that you feel comfortable with. Below is a small selection which have proved useful in
preparing the course.

• Mathematical Methods for Physics and Engineering, K. F. Riley, M. P. Hobson and S. J. Bence,
CUP 2002.
This is the recommended book for the first year physics course which covers vectors and matrices
and much of the other basic mathematics required. As the title suggests it is a “hands-on” book,
strong on explaining methods and concrete applications, rather weaker on presenting a coherent
mathematical exposition.

• Linear Algebra, S. Lang, Springer, 3rd edition.


A nice mathematics books, written by a famous mathematician and at a fairly informal level, but
following the mathematical logic of the subject.

• Linear Algebra. An Introductory Approach, C. W. Curtis, Springer 1996.


A useful mathematics book but, despite the understating title, more formal than Lang.

• Linear Algebra, K. Jänich, Springer 1994.


A mathematics book but with an attempt at intuitive presentation (many figures) and some con-
nections to physics.

133
A Definition of groups and fields
Definition A.1. (Definition of a group) A group G is a set with an operation
·:G×G→G, (g, h) → g · h “group multiplication”
satisfying:
(G1) g · (h · k) = (g · h) · k for all g, h, k ∈ G. “associativity”
(G2) There exists a 1 ∈ G such that 1 · g = g for all g ∈ G. “neutral element”
(G3) For all g ∈ G, there exists a g −1 ∈ G, such that g −1 · g = 1. “inverse”
The group is called Abelian if in addition
(G4) g · h = h · g for all g, h ∈ G. “commutativity”

Definition A.2. (Definition of a field) A field F is a set with two operations


(i) addition: + : F × F → F , (a, b) → a + b
(ii) multiplication: · : F × F → F , (a, b) → a · b
such that the following holds:
(F1) F is an Abelian group with respect to addition.
(F2) F \0 is an Abelian group with respect to multiplication.
(F3) a · (b + c) = a · b + a · c for all a, b, c ∈ F .

Standard examples of fields are the rational numbers, Q, the real numbers, R and the complex numbers,
C. Somewhat more exotic examples are the finite fields Fp = {0, 1, . . . , p − 1}, where p is a prime number
and addition and multiplication are defined by regular addition and multiplication of integers modulo p,
that is, by the remainder of a division by p. Hence, whenever the result of an addition or multiplication
exceeds p − 1 it is “brought back” into the range {0, 1, . . . , p − 1} by subtracting a suitable multiple of p.
The smallest field is F2 = {0, 1} containing just the neutral elements of addition and multiplication which
must exist in every field.

B Some basics of permutations


Permutations of n objects, the numbers {1, . . . , n} to be specific, are mathematically described by bijective
maps σ : {1, . . . , n} → {1, . . . , n} and the set Sn of all such permutations is given by

Sn = {σ : {1, . . . , n} → {1, . . . , n} | σ is bijective} . (B.1)

Clearly, this set has n! elements and it forms a group in the sense of Def. A.1, with composition of maps
(which is associative) as the group operation, the identity map as the neutral element and the inverse map
σ −1 ∈ Sn as the group inverse for σ ∈ Sn . Permutations are also sometimes written as
 
1 2 ··· n
σ= , (B.2)
σ(1) σ(2) · · · σ(n)

so as a 2 × n array of numbers (not a matrix in the sense of linear algebra), indicating that a number in
the first row is permuted into the number in the second row underneath. The permutation group S2 has
only two elements    
1 2 1 2
S2 = , , (B.3)
1 2 2 1

134
the identity element and the permutation which swaps 1 and 2. Clearly, S2 is Abelian (that is, map
composition commutes) but this is no longer true for n > 2. For example, in S3 , the two permutations
   
1 2 3 1 2 3
σ1 = , σ2 = (B.4)
1 3 2 2 1 3
do not commute since
   
1 2 3 1 2 3
σ1 ◦ σ2 = but σ2 ◦ σ1 = . (B.5)
3 1 2 2 3 1
The special permutations which swap two numbers and leave all other numbers unchanged are called
transpositions.
Lemma B.1. Every permutation σ ∈ Sn (for n > 1) can be written as σ = τ1 ◦ · · · ◦ τk , where τi are
transpositions.
Proof. Suppose that σ maps the first k1 −1 ≥ 0 numbers into themselves, so σ(i) = i for all i = 1, . . . , k1 −1
and this is the maximal such number, so that σ(k1 ) 6= σ(k1 ) and, indeed, σ(k1 ) > k1 . Then define τ1 as
the transposition which swaps k1 and σ(k1 ). The permutation σ1 = τ1 ◦ σ then leaves the first k2 − 1
numbers unchanged and, crucially, k2 > k1 . We can continue this process until, after at most n steps,
id = τk ◦ · · · τ1 ◦ σ. Since transpositions are their own inverse it follows that σ = τ1 ◦ · · · ◦ τk .

We would like to distinguish between even and odd permutations. Formally, this is achieved by the
following
Definition B.1. The sign of a permutation σ ∈ Sn is defined as
Y σ(i) − σ(j)
sgn(σ) = . (B.6)
i−j
i<j

A permutation σ is called even if sgn(σ) = 1 and it is called odd if sgn(σ) = −1.


We note that the numerator and the denominator on the RHS of Eq. (B.6) have, up to signs, the same
factors and, therefore, sgn(σ) ∈ {±1}. The sign satisfies the following important property.
Theorem B.1. sgn(σ ◦ ρ) = sgn(σ) sgn(ρ) for all σ, ρ ∈ Sn .
Proof.
Y σ(ρ(j)) − σ(ρ(i)) Y σ(ρ(j)) − σ(ρ(i)) Y ρ(j) − ρ(i)
sgn(σ ◦ ρ) = =
j−i ρ(j) − ρ(i) j−i
i<j i<j i<j
Y σ(ρ(j)) − σ(ρ(i))
= sgn(ρ) = sgn(σ) sgn(ρ) .
ρ(j) − ρ(i)
ρ(i)<ρ(j)

For a transposition τ ∈ Sn we have from Eq. (B.6) that


sgn(τ ) = −1 . (B.7)
If we write an arbitrary permutation σ ∈ Sn in terms of transpositions, σ = τ1 ◦ · · · ◦ τk , then, from
Theorem B.1, we find
sgn(σ) = sgn(τ1 ) · · · sgn(τk ) = (−1)k . (B.8)
This means that even permutations are precisely those which are generated by combining an even number
of transpositions and odd permutations those which are obtained from an odd number of transpositions.

135
C Tensors for the curious
Tensors are part of more advanced linear algebra and for this reason they are often not considered in an
introductory text. However, in many physics courses there is no room to return to the subject at a later
time and, as a result, tensors are often not taught at all. To many physicists, they remain mysterious,
despite their numerous applications in physics. If you do not want to remain perplexed, this appendix is
for you. It provides a short, no-nonsense introduction into tensors, starting where the main text has left
off.
We start with a vector space V over a field F with basis 1 , . . . , n and its dual V ∗ . Recall, that V ∗ is the
vector space of linear functionals V → F . From Section 6.5, we know that V ∗ has a dual basis 1∗ , . . . , n∗
satisfying
i∗ (j ) = δji . (C.1)
In particular, V and V ∗ have the same dimension. An obvious, but somewhat abstract problem which we
need to clarify first has to do with the “double-dual” of a vector space. In other words, what is the dual,
V ∗ ∗ of the dual vectors space V ∗ ? Our chosen terminology suggests the double-dual V ∗ ∗ should be the
original vector space V . This is indeed the case, in the sense of the following
Lemma C.1. The linear map  : V → V ∗ ∗ defined by (v)(ϕ) := ϕ(v) is bijective, that is, it is an
isomorphism between V and V ∗ ∗ .
Proof. From Lemma 3.1 and since dim(V ) = dim(V ∗ ) = dim(V ∗ ∗ ) all we need to show is that Ker() =
{0}. Start with a vector v = v i i ∈ Ker(). Then, for all ϕ ∈ V ∗ , we have 0 = (v)(ϕ) = ϕ(v). Choose
ϕ = j∗ and it follows that 0 = j∗ (v) = v j . Hence, all component v j vanish and v = 0.

Note that the definition of the above map  does not depend on a choice of basis. For this reason it is
also referred to as a canonical isomorphism between V or V ∗ ∗ . We should think of V and V ∗ ∗ as the
same space by identifying vectors v ∈ V with their images (v) ∈ V ∗ ∗ under . Since V ∗ ∗ ∼
= V consists
of linear functionals on V ∗ this means that the relation between V and V ∗ is “symmetric”. Not only can
elements ϕ ∈ V ∗ act on vectors v ∈ V but also the converse works. This is the essence of the relation
(v)(ϕ) := ϕ(v), defining the map , which, by abuse of notation, is often written as

v(ϕ) = ϕ(v) . (C.2)

Having put the vector space and its dual on equal footing we can now proceed to define tensors. We
consider two vector spaces V and W over F and define the tensor space

V ∗ ⊗ W ∗ := {τ : V × W → F | τ bi-linear} . (C.3)

In other words, the tensor space V ∗ ⊗W ∗ consists of all maps τ which assign to their two vector arguments
v ∈ V and w ∈ W a number τ (v, w) and are linear in each argument. Note that we can think of this
as a generalization of a linear functional. While a linear functional assign a number to a single vector
argument, the tensor τ does the same for two vector arguments. This suggests that tensors might be
“built up” from functionals. To this end, we introduce the tensor product ϕ ⊗ ψ between two functionals
ϕ ∈ V ∗ and ψ ∈ W ∗ by
(ϕ ⊗ ψ)(v, w) := ϕ(v)ψ(w) . (C.4)
Clearly, the so-defined map ϕ ⊗ ψ is an element of the tensor space V ∗ ⊗ W ∗ since it takes two vector
arguments and is linear in each of them (since ϕ and ψ are linear in their respective arguments).

Can we get all tensors from tensor products? In a certain sense, the answer is “yes” as explained in the
following

136
Lemma C.2. For a basis {i∗ }, where i = 1, . . . , n, of V ∗ and a basis {˜
a∗ }, where a = 1, . . . , m, of W ∗
the tensor products {∗ ⊗ ˜∗ } form a basis of V ⊗ W . In particular, dim(V ∗ ⊗ W ∗ ) = dim(V ∗ ) dim(W ∗ ).
i a ∗ ∗

Proof. We introduce the dual basis {j } on V and {˜ b } on W so that i∗ (j ) = δji and ˜a∗ (˜
b ) = δba .
As usual, we need to prove that the tensors i∗ ⊗ ˜a∗ are linearly independent and span the tensor space
V ∗ ⊗ W ∗ . We begin with linear independence.
X
τia i∗ ⊗ ˜a∗ = 0
i,a

Acting with this equation on the vector pair (j , ˜b ) gives
X X X
0= τia i∗ ⊗ ˜a∗ (j , ˜b ) = τia i∗ (j )˜
a∗ (˜
b ) = τia δji δba = τjb .
i,a i,a i,a

Hence, since all coefficients τjb = 0 linear independence follows.


To show that the space is spanned start with an arbitrary tensor τ ∈ V ∗ ⊗ W ∗ . We define its
“components” τia := τ (i , ˜a ) and the tensor
X
µ= τjb j∗ ⊗ ˜b∗ .
j,b

Since µ(i , ˜a ) = τia = τ (i , ˜a ), the tensors µ and τ coincide on a basis and, hence, µ = τ . We have,
therefore, written the arbitrary tensor τ as a linear combination of the j∗ ⊗ ˜b∗ .

The above Lemma provides us with a simple way to think about the tensors in V ∗ ⊗ W ∗ . Given a
b } on W the tensors in V ∗ ⊗ W ∗ are given by
basis {j } on V and {˜
X
τ= τia i∗ ⊗ ˜a∗ , (C.5)
i,a

where τia ∈ F are arbitrary coefficients. Often, the basis elements are omitted and the set of components
τia , labelled by two indices, are referred to as tensor. This can be viewed as a generalization of vectors
whose components are labelled by one index.
Tensoring can of course be repeated with multiple vector spaces. A particularly important tensor
space, on which we will focus from hereon, is
∗ j
V
| ⊗ ·{z
· · ⊗ V} ⊗ V · · ⊗ V }∗ = Span{i1 ⊗ · · · ⊗ ip ⊗ j∗1 ⊗ · · · ⊗ ∗q } ,
| ⊗ ·{z (C.6)
p q

formed from p factors of the vector space V and q factors of its dual V ∗ . Its dimension is dim(V )p+q . A
general element of this space can be written as a linear combination
i ···i j
X
τ= τj11···jqp i1 ⊗ · · · ⊗ ip ⊗ j∗1 ⊗ · · · ⊗ ∗q , (C.7)
i1 ,...,ip ,j1 ...,jq

i ···i
and is also referred to as a (p, q) tensor. It can also be represented by the components τj11···jqp which carry
p upper and q lower indices and practical applications are often phrased in those terms. From a (p, q)
i ···i ···kr
tensor τj11···jqp and an (r, s) tensor µlk11···l s
we can create a new tensor by multiplication and contraction
(that is summation) over some (or all) of the upper indices of τ and the lower indices of µ and vice versa.
Such a summation over same upper and lower indices is in line with the Einstein summation convention
and corresponds to the action of dual vectors on vectors, as discussion in Section 6.5.

It is probably best to discuss this explicitly for a number of examples. It turns out that many of the
objects we have introduced in the main part of the text can be phrased in the language of tensors.

137
Example C.1: Examples of tensors
(a) Vectors as tensors
A vector v = v i i in a vector space V with basis {i } is a (1, 0) tensor and, accordingly, its component
form is v i , an object with one upper index.
(b) Dual vectors as tensors
Linear functionals ϕ = ϕi i∗ in the dual vector space V ∗ with dual basis {i∗ } are (0, 1) tensors and are,
hence, represented by components ϕi with one lower index. As already discussed in Section 6.5, the action
of a linear functional on a vector is given by
ϕ(v) = ϕi v j i∗ (j ) = ϕi v j δji = ϕi vi , (C.8)
which corresponds to the contraction of a (1, 0) and a (0, 1) tensor over their single index to produce a
(0, 0) tensor, that is, a scalar.
(c) Linear maps and matrices as tensors
Consider a linear map f : V → V which is represented by a matrix A relative to the basis {i } of V .
Then, from
P Lemma 3.3, the components Ai j of A are obtained from the images of the basis vectors via
f (j ) = i A j i . We can re-write this relation as f (j ) = Ai k i k∗ (j ) and, stripping off the basis vector
i

j on either side, this leads to


f = Ai k i ⊗ k∗ . (C.9)
Hence, a linear map can be viewed as a (1, 1) tensor whose components Ai k have one upper and one lower
index (refining our notation from the main part of the text, where we have used two lower indices for
matrices) and form the entries of the representing matrix for f . In components, the action of the linear
map on the vector v = v k k can, of course, be written as a matrix multiplication
v i → Ai k v k . (C.10)
In tensor language this amounts to the contraction of a (1, 1) tensor (the matrix Ai k ) and a (1, 0) tensor
(the vector v k ) to produce another (1, 0) tensor (the resulting image vector Ai k v k ).
(d) The identity map as a tensor
The identity map, idV is of course represented by the unit matrix, 1, whose entries are given by the
Kronecker delta δki so that
idV = δki i ⊗ k∗ . (C.11)
From this point of view we can interpret the Kronecker delta as a (1, 1) tensor.
(e) Bi-linear forms as tensors
For a bi-linear form h · , · i on a vector space V with basis {i } define the metric gij := hi , j i, as we have
done in Section 6.5. Then we can write
h · , · i = gij i∗ ⊗ j∗ , (C.12)
so that the bi-linear form can be viewed as a (0, 2) tensor with components gij . The scalar product of two
vectors v = v i i and w = wj j is
hv, wi = gij v i wj , (C.13)
and can, hence, be viewed as the contraction of a (0, 2) tensor (the metric gij ) with two (1, 0) tensors (the
vectors v i and wj ) to form a scalar (the value gij v i wj of the scalar product).
If the bi-linear form is non-degenerate then, from Lemma 6.6, the metric gij is invertible. Its inverse
is written as a (2, 0) tensor g ij which satisfies
g ij gjk = δki . (C.14)

138
As we have seen in Lemma 6.6, a non-degenerate bi-linear form leads to an isomorphism ı : V → V ∗
between the vector space and its dual and we can use this to define a scalar product h · , · i∗ on the dual
vector space V ∗ by
hϕ, ψi∗ := hı−1 (ϕ), ı−1 (ψ)i . (C.15)
Since the representing matrix for ı is gij its inverse, ı−1 , is represented by g ij . Hence, we can write the
scalar product on the dual vector space as

hϕ, ψi∗ = gij g ik ϕk g jl ψl = g ij ϕi ψj or h · , · i∗ = g ij i ⊗ j . (C.16)

This shows that the scalar product h · , · i∗ on V ∗ can be viewed as a (2, 0) tensor with component g ij .
The identification of vector space and dual vector space by a non-degenerate bi-linear form (via the
map ı and its representing matrix gij ) can be extended to tensors and used to change their degree. We
can use the metric gij to lower one of the (upper) indices of a (p, q) tensor, thereby converting it into
a (p − 1, q + 1) tensor and the inverse metric g ij to raise a (lower) index of a (p, q) tensor to produce a
(p + 1, q − 1) tensor.
If the basis {i } is an ortho-normal basis of the scalar product h · , · i, then the metric is gij = δij
and its inverse g ij = δ ij and, from this point of view, the Kronecker delta δij is a (0, 2) tensor and its
upper-index version, δ ij , a (2, 0) tensor.
(f) Determinant as a tensor
We consider V = Rn (or V = Cn ) with the basis of standard unit vectors {ei } and the associated dual
basis {ei∗ }. The determinant is, by definition, linear in each of its n vectorial arguments and is, therefore,
a tensor. To make this more explicit we start with n vectors vi = vij ej and write

det(v1 , . . . , vn ) = i1 ···in v1i1 · · · vnin = i1 ···in ei∗1 ⊗ · · · ⊗ ei∗n (v1 , . . . , vn ) . (C.17)

Stripping off the vectorial arguments results in

det = i1 ···in ei∗1 · · · ei∗n , (C.18)

which shows that the determinant can be viewed as a (0, n) tensor whose components are given by the
Levi-Civita tensor i1 ···in . Computing the determinant amounts to contracting the Levi-Civita (0, n) tensor
into n (1, 0) tensors (that is, into n vectors), resulting in a scalar.

139

You might also like