0% found this document useful (0 votes)
11 views

CIND119 Module 4 - How To Build A Decision Tree

Uploaded by

fa4886397
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

CIND119 Module 4 - How To Build A Decision Tree

Uploaded by

fa4886397
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

How to Build a Decision Tree?

Dr. Kanchana Padmanabhan

1
From Data to Decision Tree

FEATURES CLASS LABEL

ID Married Homeowner Children Car Tax Evasion

1 YES NO YES NO NO
2 YES NO NO NO NO
3 YES YES NO YES YES
4 NO YES NO YES YES
5 NO NO NO YES YES
6 NO YES NO YES YES

2
How to Use a Decision Tree

3
Hunt’s Algorithm & Example 1

1. Examine the data and find the best


feature for the first node, also known as
“root node”.
2. Split the data based on this feature.
3. Recurse on each corresponding “YES”
and “NO” paths.
a. If path contains datapoints that belong to
the same class, then it is a leaf node.
b. If the path contains datapoints that belong
to different classes, then we need to identify
the next feature to add to the tree.

4
Hunt’s Algorithm & Example 2

1. Examine the data and find the best


feature for the first node, also known as
“root node”.
2. Split the data based on this feature.
3. Recurse on each corresponding “YES”
and “NO” paths.
a. If path contains datapoints that belong to
the same class, then it is a leaf node.
b. If the path contains datapoints that belong
to different classes, then we need to identify
the next feature to add to the tree.

5
Impurity Measure: GINI Index

𝐺𝐼𝑁𝐼 ( 𝑡 ) =1− ∑ [ 𝑝( 𝑗/𝑡)¿ ]


2

𝑗
• is the relative frequency of class at
branch .
• The relative frequency of class at is
( 13 ) +( 23 ) ]=0.44
2 2
the number of datapoints that belong to 𝐺𝐼𝑁𝐼 ( 𝑀𝑎𝑟𝑟𝑖𝑒𝑑=𝑌𝐸𝑆)=1 −[
class at branch divided by the total
( 33 ) =0.0
2
𝐺𝐼𝑁𝐼 ( 𝑀𝑎𝑟𝑟𝑖𝑒𝑑=𝑁𝑂 )=1 −
number of datapoints at branch

Total GINI score for using Married is=𝟎 . 𝟒𝟒 +𝟎 . 𝟎=𝟎 .𝟒𝟒

6
References
• Hunt, J.W. & Szymanski, T.G. (1977, May).
A fast algorithm for computing longest common subsequences. Communications
of the ACM, 20(5), 350–353. Retrieved from
https://doi.org/10.1145/359581.359603

You might also like