0% found this document useful (0 votes)
9 views

ML_Lec 3- Review of Linear Algebra

This document provides a comprehensive review of linear algebra concepts including vector and matrix notation, vector spaces, linear transformations, and eigenvalues. It covers fundamental definitions, properties, and operations related to vectors and matrices, as well as the Gram-Schmidt orthogonalization process and linear transformations. Additionally, it introduces MATLAB® as a tool for implementing these concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

ML_Lec 3- Review of Linear Algebra

This document provides a comprehensive review of linear algebra concepts including vector and matrix notation, vector spaces, linear transformations, and eigenvalues. It covers fundamental definitions, properties, and operations related to vectors and matrices, as well as the Gram-Schmidt orthogonalization process and linear transformations. Additionally, it introduces MATLAB® as a tool for implementing these concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

L3: Review of linear algebra

• Vector and matrix notation


• Vectors
• Matrices
• Vector spaces
• Linear transformations
• Eigenvalues and eigenvectors
• MATLAB® primer

1
Vector and matrix notation
– A d-dimensional (column) vector 𝑥 and its transpose are written as:
𝑥1
𝑥2
𝑥 = ⋮ and 𝑥 𝑇 = 𝑥1 𝑥2 … 𝑥𝑑
𝑥𝑑
– An 𝑛 × 𝑑 (rectangular) matrix and its transpose are written as
𝑎11 𝑎21 … 𝑎𝑛1
𝑎11 𝑎12 𝑎13 … 𝑎1𝑑
𝑎12 𝑎22 𝑎𝑛2
𝑎21 𝑎22 𝑎23 𝑎2𝑑
𝐴= and 𝑎𝑇 = 𝑎13 𝑎23 𝑎𝑛3
⋮ ⋱
⋮ ⋱
𝑎𝑛1 𝑎𝑛2 𝑎𝑛3 𝑎𝑛𝑑
𝑎1𝑑 𝑎2𝑑 𝑎𝑛𝑑
– The product of two matrices is
𝑏11 𝑏12 𝑏1𝑛 𝑐11 𝑐12 𝑐13 𝑐1𝑛
𝑎11 𝑎12 𝑎13 𝑎1𝑑 𝑏21 𝑏22 𝑏2𝑛 𝑐21 𝑐22 𝑐23 𝑐2𝑛
𝑎
𝐴𝐵 = 21
𝑎22 𝑎23 𝑎2𝑑
𝑏31 𝑏32 𝑏3𝑛 = 𝑐31 𝑐32 𝑐33 𝑐3𝑛
𝑎𝑚1 𝑎𝑚2 𝑎𝑚3 𝑎𝑚𝑑
𝑏𝑑1 𝑏𝑑2 𝑏𝑑𝑛 𝑐𝑚1 𝑐𝑚2 𝑐𝑚3 𝑐𝑚𝑛

𝑑
where 𝑐𝑖𝑗 = 𝑘=1 𝑎𝑖𝑘 𝑏𝑘𝑗

2
Vectors
– The inner product (a.k.a. dot product or scalar product) of two vectors is
defined by
𝑑
𝑇 𝑇
𝑥, 𝑦 = 𝑥 𝑦 = 𝑦 𝑥 = 𝑥𝑘 𝑦𝑘
𝑘=1
– The magnitude of a vector is
1
𝑑 2
𝑥 = 𝑥𝑇𝑥 = 𝑥𝑘 𝑥𝑘
𝑘=1
– The orthogonal projection of vector 𝑦 onto vector 𝑥 is 𝑦 𝑇 𝑢𝑥 𝑢𝑥
• where vector 𝑢𝑥 has unit magnitude and
the same direction as 𝑥
– The angle between vectors 𝑥 and 𝑦 is y
x
𝑥, 𝑦
𝑐𝑜𝑠𝜃 =
𝑥 𝑦 
ux
– Two vectors 𝑥 and 𝑦 are said to be
yTux
• orthogonal if 𝑥𝑇𝑦 = 0
• orthonormal if 𝑥𝑇𝑦 = 0 and |𝑥| = |𝑦| = 1
|x |

3
– A set of vectors 𝑥1, 𝑥2, … , 𝑥𝑛 are said to be linearly dependent if there exists a
set of coefficients 𝑎1, 𝑎2, … , 𝑎𝑛 (at least one different than zero) such that

𝑎1 𝑥1 + 𝑎2 𝑥2 … 𝑎𝑛 𝑥𝑛 = 0

– Alternatively, a set of vectors 𝑥1, 𝑥2, … , 𝑥𝑛 are said to be linearly independent if

𝑎1 𝑥1 + 𝑎2 𝑥2 … 𝑎𝑛 𝑥𝑛 = 0 ⇒ 𝑎𝑘 = 0 ∀𝑘

4
Matrices
– The determinant of a square matrix 𝐴𝑑𝑑 is
𝑑
𝑘+𝑖
𝐴 = 𝑎𝑖𝑘 𝐴𝑖𝑘 −1
𝑘=1
• where 𝐴𝑖𝑘 is the minor formed by removing the ith row and the kth column of 𝐴
• NOTE: the determinant of a square matrix and its transpose is the same: |𝐴| = |𝐴𝑇 |
– The trace of a square matrix 𝐴𝑑𝑑 is the sum of its diagonal elements
𝑑

𝑡𝑟 𝐴 = 𝑎𝑘𝑘
𝑘=1
– The rank of a matrix is the number of linearly independent rows (or columns)
– A square matrix is said to be non-singular if and only if its rank equals the
number of rows (or columns)
• A non-singular matrix has a non-zero determinant

5
– A square matrix is said to be orthonormal if 𝐴𝐴𝑇 = 𝐴𝑇 𝐴 = 𝐼
– For a square matrix A
• if 𝑥 𝑇 𝐴𝑥 > 0 ∀𝑥 ≠ 0, then 𝐴 is said to be positive-definite (i.e., the covariance
matrix)
• 𝑥 𝑇 𝐴𝑥 ≥ 0 ∀𝑥 ≠ 0, then A is said to be positive-semi-definite
– The inverse of a square matrix 𝐴 is denoted by 𝐴−1 and is such that
𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼
• The inverse 𝐴−1 of a matrix 𝐴 exists if and only if 𝐴 is non-singular
– The pseudo-inverse matrix 𝐴† is typically used whenever 𝐴−1 does not exist
(because 𝐴 is not square or 𝐴 is singular)
𝐴† = 𝐴𝑇 𝐴 −1 𝐴𝑇 with 𝐴† 𝐴 = 𝐼 (assuming 𝐴𝑇 𝐴 is non-singular)
• Note that A𝐴† ≠ 𝐼 in general

6
Vector spaces
– The n-dimensional space in which all the n-dimensional vectors reside is called
a vector space
– A set of vectors {𝑢1, 𝑢2, … 𝑢𝑛 } is said to form a basis for a vector space if any
arbitrary vector x can be represented by a linear combination of the 𝑢𝑖
𝑥 = 𝑎1 𝑢1 + 𝑎2 𝑢2 + ⋯ 𝑎𝑛 𝑢𝑛 u 3

• The coefficients {𝑎1, 𝑎2, … 𝑎𝑛 } are called


a3
the components of vector 𝑥 with respect
to the basis {𝑢𝑖} a
• In order to form a basis, it is necessary a1
and sufficient that the {𝑢𝑖 } vectors be u1

linearly independent a2

u2

≠0 𝑖=𝑗
– A basis {𝑢𝑖 } is said to be orthogonal if 𝑢𝑖𝑇 𝑢𝑗
=0 𝑖≠𝑗
=1 𝑖=𝑗
– A basis {𝑢𝑖 } is said to be orthonormal if 𝑢𝑖𝑇 𝑢𝑗
=0 𝑖≠𝑗
• As an example, the Cartesian coordinate base is an orthonormal base

7
– Given n linearly independent vectors {𝑥1, 𝑥2, … 𝑥𝑛}, we can construct an
orthonormal base 𝜙1 , 𝜙2 , … 𝜙𝑛 for the vector space spanned by {𝑥𝑖 } with
the Gram-Schmidt orthonormalization procedure (to be discussed in the RBF
lecture)
– The distance between two points in a vector space is defined as the
magnitude of the vector difference between the points
1
𝑑 2

𝑑𝐸 𝑥, 𝑦 = 𝑥 − 𝑦 = 𝑥𝑘 − 𝑦𝑘 2

𝑘=1
• This is also called the Euclidean distance

8
The Gram-Schmidt orthogonalization process

Let V be a vector space with an inner product.


Suppose x1 , x2 , . . . , xn is a basis for V . Let
v1 = x1 ,
hx2 , v1 i
v2 = x2 − v1 ,
hv1 , v1 i
hx3 , v1 i hx3 , v2 i
v3 = x3 − v1 − v2 ,
hv1 , v1 i hv2 , v2 i
.................................................
hxn , v1 i hxn , vn−1 i
vn = xn − v1 − · · · − vn−1 .
hv1 , v1 i hvn−1 , vn−1 i
Then v1 , v2 , . . . , vn is an orthogonal basis for V .
9
Orthogonalization / Normalization
An alternative form of the Gram-Schmidt process combines
orthogonalization with normalization.
Suppose x1 , x2 , . . . , xn is a basis for an inner
product space V . Let
v1
v1 = x1 , w1 = kv1 k ,
v2
v2 = x2 − hx2 , w1 iw1 , w2 = kv2 k ,
v3
v3 = x3 − hx3 , w1 iw1 − hx3 , w2 iw2 , w3 = kv3 k ,
.................................................
vn = xn − hxn , w1 iw1 − · · · − hxn , wn−1 iwn−1 ,
wn = kvvnn k .
Then w1 , w2 , . . . , wn is an orthonormal basis for V . 10
Example
Lei V = R3 ,vi h h Euclid an inn r product. V e ,, ill ,pp]y th Gram-8 h:i.nid algori hm
to or hogonalize h b i {(1 -1 1) (1 0 1) (1 1 2:) }.

Sep 1 V1 = ( 1 -1 1).

(1 ' 0. 1)· - ( ,110(11-1


)· ll-11 (1 -1 1)
1 ll2
Sep 2 (1,0 1) - i(l -1 1)
(½ � }).
>· l,-l2 l (11 11 1) (112 . 13 11
(11 ' 1 2) (1, 12-1 3 3, (1
. · - 11 1) l1 · - 1 1 1
· - 11 (333) 112 3
Sep 3 2' 51 2 1
(1, 1 2) - f(l -1 1) - 2(3 3, 3)
]
(� 0 !).
You an v rif that { (1 -1 1) (½, �: ½ ), (-1, 0 ! ) } for·m an or hogonal ba i for R 3 . or­
malizing th v , 1 or in I h orthogonal basis we obi ain I he orthonormal ba is

-v'3.3 -/3
{ 1(- -- --
,/3).· (J6 J6 J6).·.
1---
(. -J2.
.. . 2
--0-1
J22).· }
.
3 3 3, i6 ' 3 ' . 2 ' 2 .

11
Linear transformations
– A linear transformation is a mapping from a vector space 𝑋 𝑁 onto a vector
space 𝑌 𝑀 , and is represented by a matrix
• Given vector 𝑥𝜖𝑋 𝑁 , the corresponding vector y on 𝑌 𝑀 is computed as
𝑦1 𝑎11 𝑎12 𝑎1𝑁 𝑥1
𝑦2 𝑎21 𝑎22 𝑎2𝑁 𝑥2
⋮ = ⋮

𝑦𝑀 𝑎𝑀1 𝑎𝑀2 𝑎𝑀𝑁 𝑥𝑁
• Notice that the dimensionality of the two spaces does not need to be the same
• For pattern recognition we typically have 𝑀 < 𝑁 (project onto a lower-dim space)
– A linear transformation represented by a square matrix A is said to be
orthonormal when 𝐴𝐴𝑇 = 𝐴𝑇 𝐴 = 𝐼
• This implies that 𝐴𝑇 = 𝐴−1
• An orthonormal xform has the property of preserving the magnitude of the vectors
𝑦 = 𝑦 𝑇 𝑦 = 𝐴𝑥 𝑇 𝐴𝑥 = 𝑥 𝑇 𝐴𝑇 𝐴𝑥 = 𝑥 𝑇 𝑥 = 𝑥
• An orthonormal matrix can be thought of as a rotation of the reference frame
• The row vectors of an orthonormal xform are a set of orthonormal basis vectors
← 𝑎1 →
← 𝑎2 → 0 𝑖≠𝑗
𝑌𝑀×1 = 𝑋𝑁×1 with 𝑎𝑖𝑇 𝑎𝑗 =
1 𝑖=𝑗
← 𝑎𝑁 →

12
Eigenvectors and eigenvalues
– Given a matrix 𝐴𝑁×𝑁 , we say that 𝑣 is an eigenvector* if there exists a scalar 𝜆
(the eigenvalue) such that
𝐴𝑣 = 𝜆𝑣
– Computing the eigenvalues

𝑣=0 Trivial solution


𝐴𝑣 = 𝜆𝑣 ⇒ 𝐴 − 𝜆𝐼 𝑣 = 0 ⇒
𝐴 − 𝜆𝐼 = 0 Non-trivial solution

𝐴 − 𝜆𝐼 = 0 ⇒ 𝐴 − 𝜆𝐼 = 0 ⇒ 𝜆𝑁 + 𝑎1 𝜆𝑁−1 + 𝑎2 𝜆𝑁−2 + +𝑎𝑁−1 𝜆 + 𝑎0 = 0

Characteristic equation

*The "eigen-" in "eigenvector"


translates as "characteristic"

13
– The matrix formed by the column eigenvectors is called the modal matrix M
• Matrix Λ is the canonical form of A: a diagonal matrix with eigenvalues on the main
diagonal
𝜆1
↑ ↑ ↑
𝜆2
𝑀 = 𝑣1 𝑣2 𝑣𝑁 Λ =
↓ ↓ ↓
𝜆𝑁
– Properties
• If A is non-singular, all eigenvalues are non-zero
• If A is real and symmetric, all eigenvalues are real
– The eigenvectors associated with distinct eigenvalues are orthogonal
• If A is positive definite, all eigenvalues are positive

14
Interpretation of eigenvectors and eigenvalues
– If we view matrix 𝐴 as a linear transformation, an eigenvector represents an
invariant direction in vector space
• When transformed by 𝐴, any point lying on the direction defined by 𝑣 will remain
on that direction, and its magnitude will be multiplied by 𝜆

P’
y2

P P
x2
y=A x

v v d ’=  d
d

x1 y1

• For example, the transform that rotates 3-d vectors about the 𝑍 axis has vector
[0 0 1] as its only eigenvector and 𝜆 = 1 as its eigenvalue

 cos β  sin β 0 
 
A  sin β cos β 0
 
v  0 0 1
T

 0 0 1 

15
– Given the covariance matrix Σ of a Gaussian distribution
• The eigenvectors of Σ are the principal directions of the distribution
• The eigenvalues are the variances of the corresponding principal directions
– The linear transformation defined by the eigenvectors of Σ leads to vectors
that are uncorrelated regardless of the form of the distribution
• If the distribution happens to be Gaussian, then the transformed vectors will be
statistically independent
𝜆1
↑ ↑ ↑
𝜆2
Σ𝑀 = 𝑀Λ with 𝑀 = 𝑣1 𝑣2 𝑣𝑁 Λ =
↓ ↓ ↓
𝜆𝑁

𝑁 2
1 1 1 𝑦𝑖 − 𝜇𝑦𝑖
𝑓𝑥 𝑥 = 𝑁/2 1/2
exp − 𝑥 − 𝜇 𝑇 Σ −1 𝑥 − 𝜇 𝑓𝑦 𝑦 = exp −
2𝜋 Σ 2 √2𝜋𝜆𝑖 2𝜆𝑖
𝑖=1

x2 y2

x2 𝑦 = 𝑀𝑇𝑥

y2
v2 v1

x1 x1 y1 y1

16

You might also like