0% found this document useful (0 votes)

9 views

Lecture12-Ch8-ClassBasic-Part2

Data Mining notes

Uploaded by

patelprincy25082000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Lecture12-Ch8-ClassBasic-Part2

Data Mining notes

Uploaded by

patelprincy25082000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Data Mining:

Concepts and Techniques

— Chapter 8 —
Akhil Chaudhary

1
Chapter 8. Classification: Basic Concepts

n Classification: Basic Concepts

n Decision Tree Induction
n Bayes Classification Methods
n Rule-Based Classification
n Model Evaluation and Selection
n Summary

2
BayesClassification: Why?

n A statistical classifier: performs probabilistic prediction, i.e.,

predicts class membership probability. It is based on Bayes’
Theorem (described in next slide).

n Performance:

n A simple Bayesian classifier, naïve Bayesian classifier, leads

to performance that is comparable to decision tree and
selected neural network classifiers.

n Bayesian classifiers are highly accurate and fast when

applied to large data sets.

3
BayesClassification: Why?

n Incremental: Each training example can incrementally

increase/decrease the probability that a hypothesis is
correct — prior knowledge can be combined with observed
data.

n Class-conditional Independence (key assumption in naive Bayesian

classifier):

n Naive Bayesian classifier assumes that the effect of an attribute

value on a given class is independent of the effect of other
attributes’ values. This assumption is called class-conditional
independence.

4
Bayes’ Theorem

n Let X be a specific data tuple. In Bayesian terms, X is considered

“evidence.”

n Let H be some hypothesis that the data tuple X belongs to a

specified class C.

n For classification problems, we want to determine P(H|X), the

probability that the hypothesis H holds given the “evidence” or
observed data tuple X.
n In other words, we are looking for the probability that tuple X

belongs to class C, given that we know the attribute

description of X.
5
Bayes’ Theorem

n In Bayesian statistics, P(H|X) is the posterior probability of H,

conditional on X.

n For example, suppose that :

n The data set is about customers described using the

attributes age and income.

n X is a 35-year-old customer with an income of $40,000.

n H is the hypothesis that the customer will buy a computer.

n Then P(H|X) reflects the probability that customer X will buy a

computer given that we know the customer’s age and income.

6
Bayes’ Theorem

n In Bayesian statistics, P(H) is the prior probability of H.

n With the previous example, this is the probability that any given
customer will buy a computer, regardless of age, income, or any
other information, for that matter.
n The posterior probability, P(H|X), is based on more information
(e.g., customer information) than the prior probability, P(H), which
is independent of X.

n Similarly, P(X|H) is the posterior probability of X, conditional on

H. That is, it is the probability that a customer, X, is 35 years old
and earns $40,000, given that we know the customer will buy a
computer.
7
Bayes’ Theorem

n P(X) is the prior probability of X. Using our example, it is the

probability that a person from our set of customers is 35 years
old and earns $40,000.
n “How are these probabilities estimated?”
n P(H), P(X|H), and P(X) may be estimated using the given data
set, as we shall see in the next example.
n Bayes’ theorem is useful in that it provides a way of
calculating the posterior probability, P(H|X), using P(H),
P(X|H), and P(X).
n Theoretically, Bayes’ theorem can be described using:

8
Naive Bayesian Classification

The naive Bayesian classifier, or simple Bayesian classifier, works in

the following manner:
n 1) Let D be a training set of tuples. Each tuple is associated a

class label. As usual, each tuple is represented by an n-

dimensional vector, X = {x1, x2, … , xn}, the elements in the vector
correspond to attribute A1, A2, … , An, respectively.
n 2) Suppose that there are m classes, C1, C2, … , Cm. Given a tuple,

X, naive Bayesian classifier concludes that X belongs to the class

that has the highest posterior probability conditional on X.
n That is, the naive Bayesian classifier concludes that tuple X

belongs to the class Ci if and only if:

9
Naive Bayesian Classification
n Therefore, we need to maximize P(Ci|X). The class Ci which leads to
the maximum P(Ci|X) is called the maximum posteriori hypothesis.
With Bayes’ theorem (Eq. 8.10), we have:

n 3) As P(X) is a constant for all classes, to maximize P(Ci|X), we only

10
Naive Bayesian Classification

n 4) Given a data set with many attributes, it would be extremely

computationally expensive to compute P(X|Ci).
n To reduce computation in evaluating P(X|Ci), the assumption of

class-conditional independence is made.

n This presumes that the attributes’ values are conditionally

independent of one another, given the class label of the tuple

(i.e., that there is no dependence relationship between
attributes).
n Therefore, we have:

11
Naive Bayesian Classification

n We can easily estimate the probabilities P(x1|Ci), P(x2|Ci), …,

P(xn|Ci) from the training tuples.

n Recall that here xk refers to the value of attribute Ak for tuple

X. For each attribute, we need to know whether the attribute
is categorical or continuous-valued.

n a) If Ak is categorical, then P(xk|Ci) is the ratio of the number

of tuples that belong to class Ci and satisfy Ak = xk to |Ci,D|.
Note that |Ci,D| is the number of training tuples belonging to
class Ci.

12
Naive Bayesian Classification
b) If Ak is continuous-valued, then we need to do a bit more work,
but the calculation is pretty straightforward. A continuous-valued
attribute is typically assumed to have a Gaussian distribution with a
mean 𝜇 and standard deviation 𝜎 , defined by:

So that:

We simply need to compute 𝜇Ci and 𝜎Ci , which are the mean (i.e.,
average) and standard deviation, respectively, of the values of
attribute Ak for training tuples of class Ci . We then plug these two
quantities into Eq. (8.13), together with xk, to estimate P(xk|Ci) .

13
Naive Bayesian Classification

n For example:
n Let X = (35, $40,000), where A1 and A2 are the attributes age and
income, respectively.
n Let the class label attribute be buys_computer.
n The associated class label for X is yes (i.e., buys_computer = yes).
n Let’s suppose that age has not been discretized and therefore
exists as a continuous-valued attribute.
n Suppose that from the training set, we find that customers in D
who buy a computer are 38 ± 12 years of age. In other words, for
attribute age and this class, we have 𝜇 = 38 years and 𝜎= 12. We
can plug these quantities, along with x1 = 35 for our tuple X, into
Eq. (8.13) to estimate P(age = 35 | buys_computer = yes).

14
Naive Bayesian Classification

n 5) To predict the class label of X, P(X|Ci)P(Ci) is evaluated for each

class Ci. The classifier predicts that the class label of tuple X is the
class Ci if and only if:

In other words, the predicted class label for X is the class Ci for
which P(X|Ci)P(Ci) is the maximum.

15
Naive Bayesian Classification: Example

n To illustrate how naïve Bayesian classification works, we use the

same data set used for decision tree induction (i.e. Table 8.1).
n Specifically:
n The data tuples are described by the attributes age, income,
student, and credit_rating.
n The class label attribute, buys_computer, has two distinct
values (namely, {yes, no}).
n Let C1 correspond to the class buys_computer = yes and C2
correspond to buys_computer = no.
n The tuple we wish to classify is:

16
Naive Bayesian Classification: Example
n We need to maximize P(X|Ci)P(Ci), for i= 1, 2.
n P(Ci), the prior probability of each class, can be computed using
the training tuples:

n To compute P(X|Ci), for i= 1, 2, we compute the following

conditional probabilities:

17
Naive Bayesian Classification: Example
n Using these probabilities, we obtain:

n Similarly, we have:

n To find the class, Ci, that maximizes P(X|Ci)P(Ci), we compute:

n Therefore, the naive Bayesian classifier predicts buys_computer =

yes for tuple X .
18
Naive Bayesian Classification: A Trick
n Recall that Eq. (8.12) says:

n What if we encounter probability values of zero?

n Using the previous example, what if there are no training

tuples representing students for the class buys_computer = no ,

which results in P (student = yes|buys_computer = no) = 0?

n Plugging this zero value into Eq. (8.12) would return a zero
probability for P(X|Ci).
19
Naive Bayesian Classification: A Trick

n There is a simple trick to avoid this problem.

n We can assume that our training data set, D, is so large that adding
one to each count that we need would only make a negligible
difference to the estimated probability value, yet would
conveniently avoid the case of probability values of zero.
n This technique for probability estimation is known as the
Laplacian correction or Laplace estimator.

n Note that if we need to modify the counts, we have to revise the

denominator used in the probability calculation accordingly.

20
Naive Bayesian Classification: A Trick

n Suppose that for the class buys_computer = yes in some training

database, D, containing 1000 tuples, we have:
n 0 tuples with income = low,

n 990 tuples with income = medium,

n 10 tuples with income = high.

n The probabilities of these events, without the Laplacian correction,

are 0, 0.990 (from 990/1000), and 0.010 (from 10/1000),
respectively.

n Using the Laplacian correction for the three quantities, we pretend

that we have 1 more tuple for each income-value pair.
21
Naive Bayesian Classification: A Trick

n In this way, we instead obtain the following probabilities:

n The “corrected” probability estimates are close to the

“uncorrected” original probabilities, yet the zero probability
value is avoided.

Lecture 8 - Naive Bayes
No ratings yet
Lecture 8 - Naive Bayes
27 pages
AI notes
No ratings yet
AI notes
19 pages
2.3 Bayes classification
No ratings yet
2.3 Bayes classification
15 pages
Module 3- Bayesian Classifier (1)
No ratings yet
Module 3- Bayesian Classifier (1)
17 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
IME672 - Lecture 44
No ratings yet
IME672 - Lecture 44
16 pages
Lesson 3.3 - Supervised Learning Rule Based Classification
No ratings yet
Lesson 3.3 - Supervised Learning Rule Based Classification
43 pages
Unit6 -3 Classification-Bayesian_e224638f-6bb6-4684-a1a1-adb33ef1b15d
No ratings yet
Unit6 -3 Classification-Bayesian_e224638f-6bb6-4684-a1a1-adb33ef1b15d
15 pages
3 - Bayesian Classification
No ratings yet
3 - Bayesian Classification
15 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
Classification-Clustering
No ratings yet
Classification-Clustering
44 pages
Unit-4 DWDM
No ratings yet
Unit-4 DWDM
10 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Bayes Classification Method
No ratings yet
Bayes Classification Method
18 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
DWDM Unit 3 Part 2
No ratings yet
DWDM Unit 3 Part 2
8 pages
Bayes Classification Methods
No ratings yet
Bayes Classification Methods
22 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Unit-Iv Data Classification: Data Warehousing and Data Mining
No ratings yet
Unit-Iv Data Classification: Data Warehousing and Data Mining
7 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
8 - Classification NaiveBayes PDF
No ratings yet
8 - Classification NaiveBayes PDF
13 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Bayesian
No ratings yet
Bayesian
23 pages
9-Decision Tree Induction-23-01-2025
No ratings yet
9-Decision Tree Induction-23-01-2025
40 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
UNIT- iv
No ratings yet
UNIT- iv
169 pages
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
No ratings yet
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
2 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
23-Naive Bayes
No ratings yet
23-Naive Bayes
22 pages
TTDS Lecture 5
No ratings yet
TTDS Lecture 5
8 pages
6 Classification
No ratings yet
6 Classification
53 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
Naive Bayes.ppt
No ratings yet
Naive Bayes.ppt
24 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
21 pages
CSC 325 AI Lecture08 Supervised Learning Fall2024 Dr Raheel 20022025 034558pm
No ratings yet
CSC 325 AI Lecture08 Supervised Learning Fall2024 Dr Raheel 20022025 034558pm
29 pages
CS-DM Module-4
No ratings yet
CS-DM Module-4
22 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
07_Naive_Bayes
No ratings yet
07_Naive_Bayes
6 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
L25 - Naïve Bayes
No ratings yet
L25 - Naïve Bayes
18 pages
CSC-325-AI-Lecture08-Supervised-Learning
No ratings yet
CSC-325-AI-Lecture08-Supervised-Learning
32 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
DWM - Classification-Unit7
No ratings yet
DWM - Classification-Unit7
44 pages
Naive-By
No ratings yet
Naive-By
23 pages
Data Mining - Classification - Lecture04
No ratings yet
Data Mining - Classification - Lecture04
21 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
Lecture-7 Classification Using Naive Bays
No ratings yet
Lecture-7 Classification Using Naive Bays
19 pages
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
Bayesian Reasoning and Methods
No ratings yet
Bayesian Reasoning and Methods
341 pages
Statistical Inference For Data Science Itebooks download
No ratings yet
Statistical Inference For Data Science Itebooks download
34 pages
Bosses Without a Heart_ Socio-Demographic and Cross-cultural Determinants of Attitude Toward Emotional AI in the Workplace
No ratings yet
Bosses Without a Heart_ Socio-Demographic and Cross-cultural Determinants of Attitude Toward Emotional AI in the Workplace
23 pages
Who Ucla Avlt
No ratings yet
Who Ucla Avlt
11 pages
Bayesian_Inference_for_AI
No ratings yet
Bayesian_Inference_for_AI
22 pages
Full download Introduction to Bayesian Statistics 3rd Edition William M. Bolstad pdf docx
100% (13)
Full download Introduction to Bayesian Statistics 3rd Edition William M. Bolstad pdf docx
60 pages
(eBook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach instant download
100% (1)
(eBook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach instant download
46 pages
Universiteit Hasselt Concepts in Bayesian Inference Exam June 2015
No ratings yet
Universiteit Hasselt Concepts in Bayesian Inference Exam June 2015
8 pages
9920146(Ebook) Simulation with Python: Develop Simulation and Modeling in Natural Sciences, Engineering, and Social Sciences by Rongpeng Li, Aiichiro Nakano ISBN 9781484281857, 9781484281840, 1484281845, 1484281853 - Explore the complete ebook content with the fastest download
100% (2)
9920146(Ebook) Simulation with Python: Develop Simulation and Modeling in Natural Sciences, Engineering, and Social Sciences by Rongpeng Li, Aiichiro Nakano ISBN 9781484281857, 9781484281840, 1484281845, 1484281853 - Explore the complete ebook content with the fastest download
74 pages
Bayesian Inference: Statisticat, LLC
No ratings yet
Bayesian Inference: Statisticat, LLC
30 pages
Abell, Koumenta - 2019 - Case Studies and Statistics in Causal Analysis The Role of Bayesian Narratives
No ratings yet
Abell, Koumenta - 2019 - Case Studies and Statistics in Causal Analysis The Role of Bayesian Narratives
12 pages
Parameter Estimation
No ratings yet
Parameter Estimation
12 pages
IDS22Bayes Applications
No ratings yet
IDS22Bayes Applications
34 pages
Segment 1 - PPD
No ratings yet
Segment 1 - PPD
32 pages
Exercises: Applied Bayesian Analysis and Numerical Methods (STK4021)
No ratings yet
Exercises: Applied Bayesian Analysis and Numerical Methods (STK4021)
30 pages
An Introduction To Bayesian Statistics and MCMC Methods
No ratings yet
An Introduction To Bayesian Statistics and MCMC Methods
69 pages
Download Full (Ebook) Data Analysis with R by Tony Fischetti ISBN 9781788397339, 1788397339 PDF All Chapters
100% (6)
Download Full (Ebook) Data Analysis with R by Tony Fischetti ISBN 9781788397339, 1788397339 PDF All Chapters
65 pages
Lecture 20 - Bayesian Analysis
No ratings yet
Lecture 20 - Bayesian Analysis
4 pages
Full download Bayesian Optimization : Theory and Practice Using Python Peng Liu pdf docx
100% (4)
Full download Bayesian Optimization : Theory and Practice Using Python Peng Liu pdf docx
66 pages
Donald A. Berry. The Difficult and Ubiquitous Problems of Multiplicities.
No ratings yet
Donald A. Berry. The Difficult and Ubiquitous Problems of Multiplicities.
6 pages
Statistical Inference III: Mohammad Samsul Alam
No ratings yet
Statistical Inference III: Mohammad Samsul Alam
25 pages
Bridge
No ratings yet
Bridge
21 pages
[FREE PDF sample] Introduction to WinBUGS for Ecologists Bayesian approach to regression ANOVA mixed models and related analyses 1st Edition Marc Kery ebooks
100% (10)
[FREE PDF sample] Introduction to WinBUGS for Ecologists Bayesian approach to regression ANOVA mixed models and related analyses 1st Edition Marc Kery ebooks
50 pages
Bayesian Multiple Target Tracking 2nd Edition Lawrence D Stone - The full ebook set is available with all chapters for download
100% (2)
Bayesian Multiple Target Tracking 2nd Edition Lawrence D Stone - The full ebook set is available with all chapters for download
83 pages
Introduction To Bayesian Statistics
No ratings yet
Introduction To Bayesian Statistics
33 pages
Bayesian
No ratings yet
Bayesian
6 pages
Operationalizing Dynamic Pricing Models: Steffen Christ
No ratings yet
Operationalizing Dynamic Pricing Models: Steffen Christ
362 pages
Poisson Table
No ratings yet
Poisson Table
10 pages
Analytics of Observational data lec 10
No ratings yet
Analytics of Observational data lec 10
23 pages
Bayesian Analysis Made Simple an Excel GUI for WinBUGS (Philip Woodward) (Z-lib.org)
No ratings yet
Bayesian Analysis Made Simple an Excel GUI for WinBUGS (Philip Woodward) (Z-lib.org)
364 pages

Lecture12-Ch8-ClassBasic-Part2

Uploaded by

Lecture12-Ch8-ClassBasic-Part2

Uploaded by

Data Mining:

Concepts and Techniques

n Classification: Basic Concepts

n A statistical classifier: performs probabilistic prediction, i.e.,

n A simple Bayesian classifier, naïve Bayesian classifier, leads

n Bayesian classifiers are highly accurate and fast when

n Incremental: Each training example can incrementally

n Class-conditional Independence (key assumption in naive Bayesian

n Naive Bayesian classifier assumes that the effect of an attribute

n Let X be a specific data tuple. In Bayesian terms, X is considered

n Let H be some hypothesis that the data tuple X belongs to a

n For classification problems, we want to determine P(H|X), the

belongs to class C, given that we know the attribute

n In Bayesian statistics, P(H|X) is the posterior probability of H,

n For example, suppose that :

attributes age and income.

n H is the hypothesis that the customer will buy a computer.

n Then P(H|X) reflects the probability that customer X will buy a

n In Bayesian statistics, P(H) is the prior probability of H.

n Similarly, P(X|H) is the posterior probability of X, conditional on

n P(X) is the prior probability of X. Using our example, it is the

The naive Bayesian classifier, or simple Bayesian classifier, works in

class label. As usual, each tuple is represented by an n-

X, naive Bayesian classifier concludes that X belongs to the class

belongs to the class Ci if and only if:

n 3) As P(X) is a constant for all classes, to maximize P(Ci|X), we only

n 4) Given a data set with many attributes, it would be extremely

class-conditional independence is made.

independent of one another, given the class label of the tuple

n We can easily estimate the probabilities P(x1|Ci), P(x2|Ci), …,

n Recall that here xk refers to the value of attribute Ak for tuple

n a) If Ak is categorical, then P(xk|Ci) is the ratio of the number

n 5) To predict the class label of X, P(X|Ci)P(Ci) is evaluated for each

n To illustrate how naïve Bayesian classification works, we use the

n To compute P(X|Ci), for i= 1, 2, we compute the following

n To find the class, Ci, that maximizes P(X|Ci)P(Ci), we compute:

n Therefore, the naive Bayesian classifier predicts buys_computer =

n What if we encounter probability values of zero?

tuples representing students for the class buys_computer = no ,

n There is a simple trick to avoid this problem.

n Note that if we need to modify the counts, we have to revise the

n Suppose that for the class buys_computer = yes in some training

n 990 tuples with income = medium,

n 10 tuples with income = high.

n The probabilities of these events, without the Laplacian correction,

n Using the Laplacian correction for the three quantities, we pretend

n In this way, we instead obtain the following probabilities:

n The “corrected” probability estimates are close to the

You might also like