0% found this document useful (0 votes)

163 views

5 Pca

PCA is a technique used to reduce the dimensionality of large datasets by transforming the data to a new coordinate system. It works by extracting the principal components from the dataset, which are linear combinations of the original variables that capture the maximum variance in the data. The output of PCA are the principal components, which are orthogonal eigenvectors of the covariance matrix. The first few principal components that explain most of the variance can be selected to reduce the dimensionality of the dataset.

Uploaded by

SAMRIDDHI JAISWAL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

163 views

5 Pca

Uploaded by

SAMRIDDHI JAISWAL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Principal Component Analysis (PCA)

1. PCA is a method used to reduce number of variables in dataset

by extracting important one from a large dataset.
2. It reduces the dimension of our data with the aim of retaining
as much information as possible.
3. In other words, this method combines highly correlated
variables together to form a smaller number of an artificial set
of variables which is called principal components (PC) that
account for most variance in the data.
4. A PC can be defined as a linear combination of optimally-
weighted observed variables.
5. The PC retains max variation that was present in the original
components.
6. The PCs are eigenvectors of a covariance matrix, & hence they are
orthogonal(Two vectors are said to be orthogonal to each other if and only their dot product is zero.).
7. The output of PCA are these PCs, the number of which is less than
or equal to the number of original variables.
8. The PCs possess some useful properties which are listed below :-
• It transforms a high-dimensional dataset into a lower-dimensional representation while
preserving the most important information. PCA achieves this by identifying the principal
components that are linear combinations of the original variables.
• The variation present in the PC decrease as we move from the 1st
PC to the last one.
Steps in PCA:
a. Getting the dataset
b. Representing data into 2-D matrix, where row refer to data items and columns corresponds to
features.
c. Standardizing the data: If the features are on different scales, it's essential to standardize them
(mean=0, variance=1) to ensure that no feature dominates the analysis. Here the matrix will be
named as Z.
d. Calculating Covariance of Matrix(Z) : For this, take the matrix Z, transpose it and then multiply it
by Z. Output matrix will be covariance matrix of Z. This matrix shows how the features relate to
each other.
e. Eigenvalue Decomposition: PCA finds the eigenvalues and eigenvectors of the covariance matrix.
Each eigenvector represents a principal component, and the corresponding eigenvalue indicates
how much variance that component explains.
f. Sorting the Eigen vectors: Sort Eigen values in descending order and simultaneously sort the eigen
vectors accordingly.
g. Selection of Principal Components: The principal components are ranked by their corresponding
eigenvalues. The first few components that explain most of the variance are selected.
h. Removing less or unimportant features from new dataset: We will keep only the relevant or
important features in the new dataset and unimportant features will be removed.
Applications of PCA
Field Application of PCA

Image Processing Image compression and feature reduction

Biomedical Data Analysis Gene expression analysis, biomarker discovery

Finance and Economics Risk assessment, portfolio optimization, time series

NLP Text classification, document clustering, sentiment analysis

Environmental Science Analyzing environmental data, pollution detection

Remote Sensing Analyzing satellite and remote sensing data

Speech Recognition Speech pattern recognition, improving accuracy

Signal Processing Audio and video compression, denoising

Biometric Authentication Reducing dimensionality in biometric data

These are just some of the many diverse applications of PCA in different fields.
PCA's ability to reduce dimensionality and uncover meaningful patterns in data makes it a valuable tool across a
wide range of domains.
Compute the principal component using PCA Algorithm.

Given data = { 2, 3, 4, 5, 6, 7 ; 1, 5, 3, 6, 7, 8 }.

Consider the two dimensional patterns=➔>>

(2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8).
Step-01: Step-03:

Get data. Subtract mean vector (µ) from the given feature vectors.
The given feature vectors are- x1 – µ = (2 – 4.5, 1 – 5) = (-2.5, -4)
•x1 = (2, 1) x2 – µ = (3 – 4.5, 5 – 5) = (-1.5, 0)
•x2 = (3, 5) x3 – µ = (4 – 4.5, 3 – 5) = (-0.5, -2)
•x3 = (4, 3) x4 – µ = (5 – 4.5, 6 – 5) = (0.5, 1)
•x4 = (5, 6) x5 – µ = (6 – 4.5, 7 – 5) = (1.5, 2)
•x5 = (6, 7) x6 – µ = (7 – 4.5, 8 – 5) = (2.5, 3)
•x6 = (7, 8)

Step-02:
Calculate the mean vector (µ).
Mean vector (µ)
= ((2 + 3 + 4 + 5 + 6 + 7) / 6,
Feature vectors (xi) after subtracting mean vector (µ) are-
(1 + 5 + 3 + 6 + 7 + 8) / 6) = (4.5, 5)

Thus,
From step 3,we get:
Step-04:
Calculate the covariance matrix.

Covariance matrix

is given by-

Now,
Now, Covariance matrix = (m1 + m2 + m3 + m4 + m5 + m6) / 6

On adding the above matrices and dividing by 6, we get-

Step-05:

Calculate the eigen values and eigen vectors of the covariance matrix.
λ is an eigen value for a matrix M if it is a solution of the characteristic equation |M – λi| = 0..
So, we have-

From here,
(2.92 – λ)(5.67 – λ) – (3.67 x 3.67) = 0
16.56 – 2.92λ – 5.67λ + λ2 – 13.47 = 0
λ2 – 8.59λ + 3.09 = 0

Solving this quadratic equation, we get λ = 8.22, 0.38

Thus, two eigen values are λ1 = 8.22 and λ2 = 0.38.

Clearly, the second eigen value is very small compared to the first eigen value. So, the second eigen vector can be left out.
Eigen vector corresponding to the greatest eigen value is the PC for the given data set.

So, we find the eigen vector corresponding to eigen value λ1.

We use the following equation to find the eigen vector- MX = λX

where- M = Covariance Matrix X = Eigen vector λ = Eigen value

Substituting the values in the above equation, we get-

Solving these, we get-

2.92X1 + 3.67X2 = 8.22X1

3.67X1 + 5.67X2 = 8.22X2

On simplification, we get-

5.3X1 = 3.67X2 ………(1)

3.67X1 = 2.55X2 ………(2)

Equation (1): 5.3X1 = 3.67X2 Now, let's work with equation (2):
Equation (2): 3.67X1 = 2.55X2
Divide both sides of equation (2) by 3.67 to isolate X1:
Let's start with equation (1): (3.67X1) / 3.67 = (2.55X2) / 3.67

Divide both sides of equation (1) by 5.3 to isolate X1:

(5.3X1) / 5.3 = (3.67X2) / 5.3 Simplify:

Simplify: X1 = (2.55X2) / 3.67

X1 = (3.67X2) / 5.3

Now you have two expressions for X1: (i) X1 = (3.67X2) / 5.3 (ii) X1 = (2.55X2) / 3.67

So, you have found that X1 can be expressed in terms of X2 as follows: X1 = (3.67X2) / 5.3 = 0.69X2

Therefore, from equations (1) and (2), you can conclude that X1 is equal to 0.69 times X2.
From (1) and (2), X1 = 0.69X2

From (2), the eigen vector is-

Thus, principal component for the given data set is-

(Solution is in next slide)

Solution (B)

To find the eigenvector corresponding to equation (2), which is: 3.67X1 = 2.55X2

Set up a system of equations and solve for X1 and X2. The equation can be rewritten as: 3.67X1 - 2.55X2 = 0

This represents a single equation with two variables, X1 and X2. To find the eigenvector, solve this equation by
expressing one variable in terms of the other.

Let's solve for X1 in terms of X2: 3.67X1 = 2.55X2

Now, divide both sides by 3.67 to isolate X1: X1 = (2.55X2) / 3.67

So, the relationship between X1 and X2 from equation (2) is: X1 = (2.55X2) / 3.67

Now, to represent this relationship as an eigenvector, you can write it in the following form:

| X1 | | 2.55 |
|----| = |------|
| X2 | | 3.67 |
So, the eigenvector corresponding to equation (2) is [2.55, 3.67].

OSy Teaching Plan
No ratings yet
OSy Teaching Plan
10 pages
Master of Science-Computer Science-Syllabus
No ratings yet
Master of Science-Computer Science-Syllabus
22 pages
CS3353 Unit2
No ratings yet
CS3353 Unit2
51 pages
CTOOD CO1 CO2 Notes
100% (1)
CTOOD CO1 CO2 Notes
173 pages
PPL I-GGoyal U2.1 Structured - Data - Objects 2022-11-18 20 - 07 Office Lens
100% (1)
PPL I-GGoyal U2.1 Structured - Data - Objects 2022-11-18 20 - 07 Office Lens
49 pages
Sonali DBMS Notes
100% (13)
Sonali DBMS Notes
61 pages
Adsa Lab Manual
No ratings yet
Adsa Lab Manual
52 pages
Pointers C++ Slides
No ratings yet
Pointers C++ Slides
11 pages
PPS - Unit 1
No ratings yet
PPS - Unit 1
69 pages
PHD Progress Report PPT 20191222-c
No ratings yet
PHD Progress Report PPT 20191222-c
36 pages
Storage Allocation Strategies in Compiler Design
No ratings yet
Storage Allocation Strategies in Compiler Design
2 pages
UNIT 1 (Improving Software Economics) PDF
No ratings yet
UNIT 1 (Improving Software Economics) PDF
20 pages
Advanced C Workbook For Fybcs 2020
No ratings yet
Advanced C Workbook For Fybcs 2020
23 pages
AI Lab Manual Prolog Programs
No ratings yet
AI Lab Manual Prolog Programs
22 pages
Java Lab Record
No ratings yet
Java Lab Record
110 pages
Applet Life Cycle in Java
No ratings yet
Applet Life Cycle in Java
6 pages
Object Oriented Programming
No ratings yet
Object Oriented Programming
18 pages
Unit Ii
No ratings yet
Unit Ii
61 pages
Embedded System Kerala University Module 1 Notes
100% (1)
Embedded System Kerala University Module 1 Notes
13 pages
IT WorkShop Lab Manual
No ratings yet
IT WorkShop Lab Manual
111 pages
I Sem Lab Manul C Programming 23-24
No ratings yet
I Sem Lab Manul C Programming 23-24
41 pages
Software Construction Lecture 1
No ratings yet
Software Construction Lecture 1
30 pages
SORTING (Bubble Sort) Aim of The Experiment: Write A C Program That Implement Bubble Sort Method To Sort A Given
No ratings yet
SORTING (Bubble Sort) Aim of The Experiment: Write A C Program That Implement Bubble Sort Method To Sort A Given
12 pages
2023 Winter Question Paper (Msbte Study Resources)
No ratings yet
2023 Winter Question Paper (Msbte Study Resources)
2 pages
Unit 4 - Run - Time Environment
No ratings yet
Unit 4 - Run - Time Environment
34 pages
Oops With Java Bcs306a Notes
No ratings yet
Oops With Java Bcs306a Notes
81 pages
Chapter 9 (Machine-Independent Optimizations)
No ratings yet
Chapter 9 (Machine-Independent Optimizations)
37 pages
Macro Processor
100% (1)
Macro Processor
44 pages
Java Prac File Updated
100% (1)
Java Prac File Updated
24 pages
PPL Unit 3-1
No ratings yet
PPL Unit 3-1
25 pages
OOP - Final - LAB - Exam II A-B - Iqra Shahzad
0% (1)
OOP - Final - LAB - Exam II A-B - Iqra Shahzad
2 pages
Internship 7th Sem
No ratings yet
Internship 7th Sem
16 pages
Variables in Java: - : Unit 2
No ratings yet
Variables in Java: - : Unit 2
13 pages
C# Program To Demonstrate Multilevel Inheritance
No ratings yet
C# Program To Demonstrate Multilevel Inheritance
5 pages
Thoughtworks: TR Interview:-Interview Experience 1 - (90 Mins On Zoom, 2 Interviewers)
No ratings yet
Thoughtworks: TR Interview:-Interview Experience 1 - (90 Mins On Zoom, 2 Interviewers)
19 pages
Jntuk Ads Lab Manual
50% (2)
Jntuk Ads Lab Manual
27 pages
Graphs Assignment
No ratings yet
Graphs Assignment
5 pages
Module-1 Introduction To File Structures
No ratings yet
Module-1 Introduction To File Structures
50 pages
List of Programs Subject Code: PCS-307 Subject: OOP Using C++ Programming Lab
No ratings yet
List of Programs Subject Code: PCS-307 Subject: OOP Using C++ Programming Lab
4 pages
Communication Operations
No ratings yet
Communication Operations
70 pages
Spos Unit 2 PPT 2022 Compressed
No ratings yet
Spos Unit 2 PPT 2022 Compressed
89 pages
Enterprise Information Architecture Component Model - Chapter 5
100% (1)
Enterprise Information Architecture Component Model - Chapter 5
27 pages
Parallelism
No ratings yet
Parallelism
22 pages
PHY 206 Lecture 06
No ratings yet
PHY 206 Lecture 06
283 pages
3337 Spos MCQ Dheeraj Kumar
No ratings yet
3337 Spos MCQ Dheeraj Kumar
37 pages
Module 3 Cpps
No ratings yet
Module 3 Cpps
14 pages
Loaders and Linkers
100% (1)
Loaders and Linkers
15 pages
Algorithm Analysis and Design
No ratings yet
Algorithm Analysis and Design
83 pages
OPPS With C++
100% (2)
OPPS With C++
125 pages
Data Structures and Algorithms: Assignment 1
No ratings yet
Data Structures and Algorithms: Assignment 1
4 pages
Instruction Set, Addressing Modes, Assembler Directives
No ratings yet
Instruction Set, Addressing Modes, Assembler Directives
9 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
Advanced Unix Programming
From Everand
Advanced Unix Programming
Prof. N. B Venkateswarlu
No ratings yet
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
From Everand
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
Kameron Hussain
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Mathematical Approach To PCA
No ratings yet
Mathematical Approach To PCA
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Maths Pca
No ratings yet
Maths Pca
6 pages
Unit-3
No ratings yet
Unit-3
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
AspenEngineeringSuiteV8 0 Issues
No ratings yet
AspenEngineeringSuiteV8 0 Issues
52 pages
SEMINAR REPORT - 5G TECHNOLOGY
No ratings yet
SEMINAR REPORT - 5G TECHNOLOGY
44 pages
Humanitarian Diplomacy
No ratings yet
Humanitarian Diplomacy
14 pages
Atty. VLSalido - Banking Laws Phil Jud
No ratings yet
Atty. VLSalido - Banking Laws Phil Jud
75 pages
Ship Chartering Terms
No ratings yet
Ship Chartering Terms
8 pages
Data Analyst Cover Letter
100% (2)
Data Analyst Cover Letter
8 pages
9-25-19 - A Case Study of Impacts of Flooding in Espana Boulevard, Sampaloc District City of Manila
No ratings yet
9-25-19 - A Case Study of Impacts of Flooding in Espana Boulevard, Sampaloc District City of Manila
78 pages
M4 - L5 - Check in Activity
No ratings yet
M4 - L5 - Check in Activity
2 pages
Streamorder Wikipedia 1713580983337
No ratings yet
Streamorder Wikipedia 1713580983337
12 pages
ĐỀ THAM KHẢO SỐ 8 MÔN TIẾNG ANH TN THPTQG NĂM 2025_ĐÁP ÁN
No ratings yet
ĐỀ THAM KHẢO SỐ 8 MÔN TIẾNG ANH TN THPTQG NĂM 2025_ĐÁP ÁN
6 pages
Hotel Transylvania 2
No ratings yet
Hotel Transylvania 2
15 pages
Temario CQPA - Certified Quality Process Analyst
No ratings yet
Temario CQPA - Certified Quality Process Analyst
6 pages
Introduction To SAP Business One
No ratings yet
Introduction To SAP Business One
35 pages
aziz ouareb
No ratings yet
aziz ouareb
138 pages
Ge8161-Lab Programs
No ratings yet
Ge8161-Lab Programs
37 pages
Indian Labour Laws in Construction
No ratings yet
Indian Labour Laws in Construction
7 pages
Appendix 27 - CASH RECEIPTS REGISTER
No ratings yet
Appendix 27 - CASH RECEIPTS REGISTER
1 page
Answers Objective Proficiency
No ratings yet
Answers Objective Proficiency
10 pages
Oracle Linux
No ratings yet
Oracle Linux
3 pages
Quiz Black Scholes Model
0% (1)
Quiz Black Scholes Model
3 pages
Himachal Pradesh Judicial Service Preliminary Exam 2012
No ratings yet
Himachal Pradesh Judicial Service Preliminary Exam 2012
31 pages
Awareness of Tax
No ratings yet
Awareness of Tax
62 pages
CN Lab Mannual
No ratings yet
CN Lab Mannual
95 pages
CH 5 Distributed System PDF
No ratings yet
CH 5 Distributed System PDF
6 pages
Network Layers
100% (1)
Network Layers
40 pages
Chapter 1 - Introduction To IC Design
No ratings yet
Chapter 1 - Introduction To IC Design
40 pages
Tle 6 Ia Q3 Week 8 9
No ratings yet
Tle 6 Ia Q3 Week 8 9
11 pages
OIML R 68 Método de Calibración de Celdas de Conductividad
No ratings yet
OIML R 68 Método de Calibración de Celdas de Conductividad
7 pages
CitriSurf 3050 Product Information
No ratings yet
CitriSurf 3050 Product Information
2 pages

5 Pca

Uploaded by

5 Pca

Uploaded by

Principal Component Analysis (PCA)

1. PCA is a method used to reduce number of variables in dataset

Image Processing Image compression and feature reduction

Biomedical Data Analysis Gene expression analysis, biomarker discovery

Finance and Economics Risk assessment, portfolio optimization, time series

NLP Text classification, document clustering, sentiment analysis

Environmental Science Analyzing environmental data, pollution detection

Remote Sensing Analyzing satellite and remote sensing data

Speech Recognition Speech pattern recognition, improving accuracy

Signal Processing Audio and video compression, denoising

Biometric Authentication Reducing dimensionality in biometric data

Consider the two dimensional patterns=➔>>

On adding the above matrices and dividing by 6, we get-

Solving this quadratic equation, we get λ = 8.22, 0.38

So, we find the eigen vector corresponding to eigen value λ1.

We use the following equation to find the eigen vector- MX = λX

where- M = Covariance Matrix X = Eigen vector λ = Eigen value

Substituting the values in the above equation, we get-

Solving these, we get-

2.92X1 + 3.67X2 = 8.22X1

3.67X1 + 5.67X2 = 8.22X2

5.3X1 = 3.67X2 ………(1)

3.67X1 = 2.55X2 ………(2)

Divide both sides of equation (1) by 5.3 to isolate X1:

Simplify: X1 = (2.55X2) / 3.67

From (2), the eigen vector is-

(Solution is in next slide)

Let's solve for X1 in terms of X2: 3.67X1 = 2.55X2

Now, divide both sides by 3.67 to isolate X1: X1 = (2.55X2) / 3.67

You might also like