Statistical ML
M. Keller
Basics for Statistical Machine Learning
Motivation
Linear Algebra Basics Linear Algebra
Vectors
Matrices
Determinant
Inverses
Mikaela Keller Diagonalization
IDIAP Research Institute
Martigny, Switzerland
mkeller[at]idiap.ch
July 2nd, 2007
1 / 22
Statistical ML
M. Keller
Motivation
Linear Algebra
Vectors
Matrices
Motivation Determinant
Inverses
Diagonalization
Linear Algebra Basics
2 / 22
Statistical ML
M. Keller
Motivation
Linear Algebra
Vectors
Matrices
Motivation Determinant
Inverses
Diagonalization
Linear Algebra Basics
3 / 22
Motivation Statistical ML
Concrete Example: Regression M. Keller
Motivation
Linear Algebra
Vectors
Matrices
Determinant
Inverses
Diagonalization
I Determination of abalone age by:
I Counting the number of rings in the shell through a
microscope ← time-consuming task.
I Through other measurements: sex, diameter, height,
whole weight, shell weight, etc. ← easy to obtain.
I Regression problem: training examples = {(easy
measurements, age)}. We want to predict the age of
abalone from the easy measurements alone.
4 / 22
Motivation Statistical ML
Concrete Example: Classification M. Keller
Motivation
Linear Algebra
2 Vectors
Matrices
Determinant
Inverses
0
Diagonalization
−2
−2 0 2
I Written digits classification:
I Automatic recognition of postal code from scanned
mail.
I Classification problem: training examples = {(image,
actual digit)}. We want to predict the correct digit
from a new image.
5 / 22
Motivation Statistical ML
Concrete Example: Density Estimation / Clustering M. Keller
Motivation
100
2 (i) Linear Algebra
90
Vectors
80 Matrices
70 0 Determinant
Inverses
60 Diagonalization
50
−2
40
1 2 3 4 5 6 −2 0 2
I Data compression / data visualization / data
exploration:
I Time between two eruptions vs duration of the previous
eruption.
I Unsupervised problem: training examples =
{(measurement)}. We want to “organize” the
information contained in the measurements.
6 / 22
Motivation Statistical ML
M. Keller
Motivation
Linear Algebra
I Most of the problems described previously end up Vectors
Matrices
reformulated into: Determinant
Inverses
I curves or surfaces to be discovered, Diagonalization
I ie systems of equations with unknowns to be solved,
I ie matrices manipulation operations.
⇒ Linear Algebra.
I Diverse sources of uncertainty:
I limited amount of examples,
I noise in the measurements,
I randomness inherent to the observed phenomena, etc.
⇒ Probability Theory
7 / 22
Statistical ML
M. Keller
Motivation
Linear Algebra
Motivation Vectors
Matrices
Determinant
Inverses
Linear Algebra Basics Diagonalization
Vectors
Matrices
Determinant
Inverses
Matrix Diagonalization
8 / 22
Vectors Statistical ML
M. Keller
Motivation
I Examples x are usually represented as vectors of m Linear Algebra
Vectors
components: Matrices
Determinant
Inverses
Diagonalization
x2
x1 x
x = ... , xT = (x1 , . . . , xm ) .
xm x1
I Inner product (aka dot product, scalar product):
y1
xT y = (x1 , . . . , xm ) ... = x1 y1 + . . . + xm ym .
ym
9 / 22
Vectors Statistical ML
M. Keller
Motivation
Linear Algebra
Vectors
I “x and y are orthogonal (x ⊥ y)” ⇔ xT y = 0. Matrices
Determinant
Inverses
I The norm (length) of x: Diagonalization
√
kxk = xT x
.
I The distance between 2 vectors x and y is defined as
d(x, y) = kx − yk:
d(x, y)2 = kxk2 + kyk2 − 2xT y
10 / 22
Matrices Statistical ML
M. Keller
Motivation
n Equations with m unknows x1 , . . . , xm : Linear Algebra
Vectors
Matrices
a11 x1 +
... + a1m xm = b1 Determinant
Inverses
.. ⇔ Diagonalization
.
an1 x1 + . . . +anm xm = bn
a11 . . . a1m x1 b1
.. . . .. .. = .. ⇔
. . . . .
an1 . . . anm xm bn
An×m xm×1 = bn×1 .
11 / 22
Matrices Statistical ML
M. Keller
Motivation
n Equations with m unknows x1 , . . . , xm : Linear Algebra
Vectors
Matrices
a11 x1 +
... + a1m xm = b1 Determinant
Inverses
.. ⇔ Diagonalization
.
an1 x1 + . . . +anm xm = bn
(a11 ,
. . . , a1m ) x1 b1
.. .. .. ⇔
. =
. .
(an1 , . . . , anm ) xm bn
Ax = b.
11 / 22
Matrices Statistical ML
M. Keller
Motivation
n Equations with m unknows x1 , . . . , xm : Linear Algebra
Vectors
Matrices
a11 x1 +
... + a1m xm = b1 Determinant
Inverses
.. ⇔ Diagonalization
.
an1 x1 + . . . +anm xm = bn
(a11 ,
. . . , a1m ) x1 b1
.. .. .. ⇔
. =
. .
(an1 , . . . , anm ) xm bn
Ax = b.
11 / 22
Matrices Statistical ML
Geometrical view M. Keller
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
2x1 − x2 =0 Diagonalization
x1 + 3x2 =2
12 / 22
Matrices Statistical ML
Geometrical view M. Keller
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
2x1 − x2 =0 Diagonalization
x1 + 3x2 =2
(1,2)
(0,0)
12 / 22
Matrices Statistical ML
Geometrical view M. Keller
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
2x1 − x2 =0 Diagonalization
x1 + 3x2 =2
12 / 22
Matrices Statistical ML
Geometrical view M. Keller
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
2x1 − x2 =0 Diagonalization
x1 + 3x2 =2
(−1,1)
(2,0)
12 / 22
Matrices Statistical ML
Geometrical view M. Keller
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
2x1 − x2 =0 Diagonalization
x1 + 3x2 =2
12 / 22
Matrices Statistical ML
M. Keller
n Equations with m unknows x1 , . . . , xm :
Motivation
Linear Algebra
Ax = b ⇔ Vectors
Matrices
Determinant
a11 . . . a1m x1 b1 Inverses
.. .. .. .. = .. ⇔ Diagonalization
. . . . .
an1 . . . anm xm bn
13 / 22
Matrices Statistical ML
M. Keller
n Equations with m unknows x1 , . . . , xm :
Motivation
Linear Algebra
Ax = b ⇔ Vectors
Matrices
Determinant
a11 . . . a1m x1 b1 Inverses
.. . . .. .. = .. Diagonalization
⇔
. . . . .
an1 . . . anm xm bn
a11 a1m b1
.. .. .. .
x1 . + . . . + xm . = .
an1 anm bn
A real valued matrix An×m is also seen as a linear transfor-
mation:
A : Rm −→ Rn
x −→ Ax
.
13 / 22
Matrices Statistical ML
Alternate geometrical view M. Keller
Motivation
2-D Example Linear Algebra
Vectors
Matrices
Determinant
Inverses
2x1 −x2 = 0 2 −1 0 Diagonalization
⇔ x1 + x2 = .
x1 +3x2 = 2 1 3 2
14 / 22
Matrices Statistical ML
Alternate geometrical view M. Keller
Motivation
2-D Example Linear Algebra
Vectors
Matrices
Determinant
Inverses
2x1 −x2 = 0 2 −1 0 Diagonalization
⇔ x1 + x2 = .
x1 +3x2 = 2 1 3 2
14 / 22
Matrices Statistical ML
Alternate geometrical view M. Keller
Motivation
2-D Example Linear Algebra
Vectors
Matrices
Determinant
Inverses
2x1 −x2 = 0 2 −1 0 Diagonalization
⇔ x1 + x2 = .
x1 +3x2 = 2 1 3 2
14 / 22
Matrices Statistical ML
Alternate geometrical view M. Keller
Motivation
2-D Example Linear Algebra
Vectors
Matrices
Determinant
Inverses
2x1 −x2 = 0 2 −1 0 Diagonalization
⇔ x1 + x2 = .
x1 +3x2 = 2 1 3 2
14 / 22
Matrices Statistical ML
Alternate geometrical view (No solution) M. Keller
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
Diagonalization
2x1 −2x2 = 0 2 −2 0
⇔ x1 + x2 = .
x1 −x2 = 2 1 −1 2
15 / 22
Determinant Statistical ML
M. Keller
Recursive Definition: Let A be a square matrix (m × m),
Motivation
a11 . . . a1m Linear Algebra
X m
det(A) = ... .. .. Vectors
= (−1)1+j a1j det(M1j ),
. .
Matrices
Determinant
am1 . . . amm j=1 Inverses
Diagonalization
where Mij is A without its line i and its column j and
det(m) = m for m scalar.
Example:
a11 a12 a13
det(A) = a21 a22 a23
a31 a32 a33
a a23 a a23 a a
= a11 22 + a12 21 + a13 21 22
a32 a33 a31 a33 a31 a32
= a11 (a22 a33 −a32 a23 )+a12 (a21 a33 −a31 a23 )+a13 (a21 a32 −a31 a22 )
16 / 22
Inverses Statistical ML
M. Keller
Motivation
Linear Algebra
I Definition: A square matrix Am×m is called non-singular Vectors
Matrices
or invertible if there exists a matrix Bm×m such that: Determinant
Inverses
Diagonalization
1 ... 0
AB = Im = ... . . . ... = BA.
0 ... 1
If such B exists it is called the inverse of A and noted
A−1 .
I “A is invertible” ⇔ det(A) 6= 0 ⇔ “Ax = 0 iff x = 0”.
I If A (square) is invertible, the solution of the system
Ax = b is x = A−1 b.
17 / 22
Determinants and Inverses Statistical ML
Geometrical view M. Keller
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
2 −1 Diagonalization
|det(A)| = |
| = |2 · 3 − 1 · (−1)|
1 3
a .2
a .1
18 / 22
Determinants and Inverses Statistical ML
Geometrical view M. Keller
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
2 −1 Diagonalization
|det(A)| = |
| = OP.OQ. sin(θ2 − θ1 ).
1 3
θ2 P
θ1
18 / 22
Matrices Statistical ML
M. Keller
I If A is rectangular andAT Ais invertible, the solution of Motivation
the system Ax = b is x = (AT A)−1 AT b. Linear Algebra
Vectors
I (AT A)−1 AT is called the pseudo-inverse of A. Matrices
Determinant
T Inverses
x1 Diagonalization
..
I Let Xn×m = . be a collection of examples.
xT
n
I The Gram matrix of this collection is:
T
x1 x1 . . . xT
1 xn
G = XXT = ... .. .. .
. .
T T
xn x1 . . . xn xn
I A real valued squared matrix A is said to be positive
semidefinite if for any vector z: zT Az ≥ 0.
I Gram matrices are positive semidefinite matrices.
19 / 22
Matrix Diagonalization Statistical ML
M. Keller
I An eigenvector u of A (square matrix) is a solution Motivation
(6= 0) of the equation: Au = λu ⇔ (A − λI)u = 0, for Linear Algebra
Vectors
a particular λ called the associated eigenvalue. Matrices
Determinant
I Eigenvalues are solution of the characteristic Inverses
Diagonalization
polynomial: det(A − λI) = 0.
I If An×n is real valued and symmetric then:
I all eigenvalues λ1 , . . . , λn are real valued and
I we can find n eigenvectors u1 , . . . , un such that ui ⊥ uj
and kuj k = 1, ie a new basis for Rn .
I If P = (u1 , . . . , un ), then A can be rewritten as:
λ1 . . . 0
.. . . .. PT .
A = P . . .
0 . . . λn
I “A positive semidefinite” ⇔ λi ≥ 0 for all i.
20 / 22
Singular Value Decomposition Statistical ML
M. Keller
Motivation
Linear Algebra
Vectors
Matrices
Determinant
I The Singular Value Decomposition is a generalization of Inverses
Diagonalization
matrix diagonalization for rectangular matrices.
I Any real valued matrix Mn×m can be rewritten as:
T
M = Un×n Σn×m Vm×m
where U and V are orthogonal matrices and σij = 0
unless i = j.
21 / 22
Acknowledgement Statistical ML
M. Keller
Motivation
Linear Algebra
Vectors
Matrices
Determinant
Inverses
I Sources of inspiration: Diagonalization
I Linear Algebra: Gilbert Strang MIT course and
“Elementary Linear Algebra” Keith Matthews (both on
the web).
I Some of the motivating figures: Christopher M.
Bishop’s book “Pattern Recognition and Machine
Learning”.
22 / 22