ml unit3

A decision tree is a non-parametric supervised learning model used for classification and regression, characterized by its hierarchical structure of nodes. Key properties include its greedy nature, computational cost, and risk of overfitting, which can be mitigated through techniques like pruning and combining classifiers such as Random Forest and AdaBoost. The effectiveness of decision trees is measured using impurity functions like Shannon's Entropy, with the bias-variance trade-off being a crucial consideration in model complexity.

Uploaded by

maneeshgopisetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

ml unit3

Uploaded by

maneeshgopisetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Unit-3

A decision tree, which has a hierarchical structure made up of root, branches, internal, and leaf
nodes, is a non-parametric supervised learning approach used for classification and regression
applications.

Introduction to Decision Trees

• A decision tree (DT) is a common machine learning structure that splits a dataset into
subsets to improve purity and reduce entropy.
• It is a feature-based decision-making model that provides transparency and is easy to
interpret.

Properties of Decision Tree (DT) Algorithm:

1. Greedy Nature:
o The best feature for purity in splitting is used at the root node.
o Subsequent feature selection at child nodes depends on the parent node’s
feature.
o This sequential process may not always be optimal.
2. Computational Cost:
o Feature selection depends on the number of features and pattern complexity.
o Simple tests are required at each node to reduce computational demand.
o Large dimensionality increases computational difficulty.
3. Overfitting Risk:
o Deep decision trees (DTs) tend to overfit training data and perform poorly on
validation/test data.
o Overfitting can be managed through pruning, where deeper subtrees are
trimmed based on performance on validation data.

Solutions to Overfitting and Computational Cost:

• These issues are often tackled using combinations of classifiers like:

o AdaBoost
o Random Forest
o Gradient Boosting

Structure of a decision tree:

• Root Node: The initial node at the beginning of a decision tree, where the entire population
or dataset starts dividing based on various features or conditions.
• Decision Nodes: Nodes resulting from the splitting of root nodes are known as decision
nodes. These nodes represent intermediate decisions or conditions within the tree.
• Leaf Nodes: Nodes where further splitting is not possible, often indicating the final
classification or outcome. Leaf nodes are also referred to as terminal nodes.
• Sub-Tree: Similar to a subsection of a graph being called a sub-graph, a sub-section of a
these tree is referred to as a sub-tree. It represents a specific portion of the decision tree.
• Pruning: The process of removing or cutting down specific nodes in a tree to prevent
overfitting and simplify the model.
• Branch / Sub-Tree: A subsection of the entire is referred to as a branch or sub-tree. It
represents a specific path of decisions and outcomes within the tree.
• Parent and Child Node: In a decision tree, a node that is divided into sub-nodes is known as
a parent node, and the sub-nodes emerging from it are referred to as child nodes. The
parent node represents a decision or condition, while the child nodes represent the
potential outcomes or further decisions based on that condition.

Example of a decision tree:

Decision tree:
Popular Combinational ML Models( multiple decision trees):

1. Random Forest:
o A collection of decision trees.
o The final prediction is based on the combined results of multiple trees.
2. AdaBoost:
o Uses multiple weak learners to create a strong model.
o A weighted majority voting approach is used to improve overall accuracy.
3. Gradient Boosting:
o An advanced version of AdaBoost.
o Uses previous model errors to improve the next model.

These models are preferred in large-scale applications due to their ability to handle high-dimensional
data.

The bias–variance trade-off is a key factor:

o Simple models → Higher bias

o Complex models → Higher variance

Decision Trees for Classification

• A decision tree consists of:

o A root node representing the entire dataset.
o Child nodes formed by splitting data based on feature values.

Splitting Criteria:

• The quality of a split is measured using an impurity function.

• The most common impurity measure is Shannon’s Entropy, given by:

where:

o pi is the probability of class i(assuming CCC classes).

o Probabilities are estimated from the fraction of elements in each class.

Example 1:

• Consider a binary classification problem (C=2C = 2C=2).

• A dataset with 100 elements is used to compute probabilities.

Key Observations on Entropy:

• Entropy is always non-negative (≥0\geq 0≥0).

• When probability approaches zero, entropy also approaches zero.
• The minimum entropy value is 0, which occurs when data is perfectly classified.
Notes:

Example:
Show that for a C-class problem, the maximum entropy is log(c)?

Python DSA Notes-6
No ratings yet
Python DSA Notes-6
7 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
AIML Module 4 Imp
No ratings yet
AIML Module 4 Imp
5 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Lecture Note 5
No ratings yet
Lecture Note 5
7 pages
Unit 3
No ratings yet
Unit 3
31 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
Unit 4
No ratings yet
Unit 4
33 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Aiml Qb With Ans
No ratings yet
Aiml Qb With Ans
70 pages
DMI UNIT 4
No ratings yet
DMI UNIT 4
34 pages
AIML Removed
No ratings yet
AIML Removed
25 pages
AIML Removed Merged
No ratings yet
AIML Removed Merged
31 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
Decision Tree Comprehesive
No ratings yet
Decision Tree Comprehesive
7 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Decision Tree in ML
No ratings yet
Decision Tree in ML
21 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
m3
No ratings yet
m3
141 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
Introduction to Decision Tree Algorithm
No ratings yet
Introduction to Decision Tree Algorithm
11 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Assignment of Decision Tree
No ratings yet
Assignment of Decision Tree
15 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Decision Tree Classification Algorithm (2)
No ratings yet
Decision Tree Classification Algorithm (2)
11 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Trees Set-1
No ratings yet
Decision Trees Set-1
7 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
14 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
ML UNIT 2-2-40
No ratings yet
ML UNIT 2-2-40
39 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
ML pp7_u2
No ratings yet
ML pp7_u2
42 pages
HSMC
No ratings yet
HSMC
5 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Decision Trees
No ratings yet
Decision Trees
21 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Lecture 7 Overview of ML models
No ratings yet
Lecture 7 Overview of ML models
77 pages
AIML QB in Short Form
No ratings yet
AIML QB in Short Form
48 pages
Decision Treesnotes
No ratings yet
Decision Treesnotes
3 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Assignment of Decision Tree in Machine Learning
No ratings yet
Assignment of Decision Tree in Machine Learning
15 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ml unit 1
No ratings yet
Ml unit 1
15 pages
Ml unit 2
No ratings yet
Ml unit 2
11 pages
MATHS 2B 2M (60)...
No ratings yet
MATHS 2B 2M (60)...
2 pages
Unit II - Design Thinking and Innovation
No ratings yet
Unit II - Design Thinking and Innovation
17 pages
R23 dl and co imp questions
100% (1)
R23 dl and co imp questions
6 pages
Reasoning Under Uncertainty
No ratings yet
Reasoning Under Uncertainty
8 pages
ADSAA IMP QUESTIONS
No ratings yet
ADSAA IMP QUESTIONS
3 pages
IP Unit-1-1
No ratings yet
IP Unit-1-1
42 pages
CAI External Lab Schedule 2-1
No ratings yet
CAI External Lab Schedule 2-1
1 page
Prof. Shaik Naseera Department of CSE JNTUACEK, Kalikiri: Graph Traversals & Bi-Connected Components
No ratings yet
Prof. Shaik Naseera Department of CSE JNTUACEK, Kalikiri: Graph Traversals & Bi-Connected Components
59 pages
SOC Sample Format - 24
No ratings yet
SOC Sample Format - 24
9 pages
DAA Unit II
No ratings yet
DAA Unit II
13 pages
Unit-Iii (Part-2) Dynamic Progrogramming
No ratings yet
Unit-Iii (Part-2) Dynamic Progrogramming
42 pages
Project Report On Image Segmentation
No ratings yet
Project Report On Image Segmentation
4 pages
Implementation of Hamming Code
No ratings yet
Implementation of Hamming Code
11 pages
9.2 Secant Method, False Position Method, and Ridders' Method
No ratings yet
9.2 Secant Method, False Position Method, and Ridders' Method
6 pages
Vander Monde
No ratings yet
Vander Monde
11 pages
55-Julia-Large Dimension Parametrization With Convolutional Variational Autoencoder
No ratings yet
55-Julia-Large Dimension Parametrization With Convolutional Variational Autoencoder
20 pages
UNIT IV - Compression
No ratings yet
UNIT IV - Compression
39 pages
Compiler Design: Lexical Analysis Sample Exercises and Solutions
No ratings yet
Compiler Design: Lexical Analysis Sample Exercises and Solutions
30 pages
Ece3101l Lab5 Sampling and Reconstruction
No ratings yet
Ece3101l Lab5 Sampling and Reconstruction
5 pages
Artificial Variable Technique-Big M Method
No ratings yet
Artificial Variable Technique-Big M Method
9 pages
Factoring Different Polynomials
No ratings yet
Factoring Different Polynomials
18 pages
Chapter 1. Introduction To Neural Network
100% (1)
Chapter 1. Introduction To Neural Network
34 pages
SRM Valliammai Engineering College (An Autonomous Institution)
No ratings yet
SRM Valliammai Engineering College (An Autonomous Institution)
11 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Or Graphical
No ratings yet
Or Graphical
14 pages
Separating Components of Mixed Costs Seatwork No.1
No ratings yet
Separating Components of Mixed Costs Seatwork No.1
2 pages
Cholesky Decomposition
No ratings yet
Cholesky Decomposition
17 pages
Numerical Methods For ODE and PDE
No ratings yet
Numerical Methods For ODE and PDE
114 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
Super Resolution: Ms. Manisha A. Bhusa
No ratings yet
Super Resolution: Ms. Manisha A. Bhusa
5 pages
Chap3 Ftrans PDF
No ratings yet
Chap3 Ftrans PDF
4 pages
6 Stability of Discrete-Time Systems - Complete
No ratings yet
6 Stability of Discrete-Time Systems - Complete
40 pages
Matlab Pitch Wind PSO
No ratings yet
Matlab Pitch Wind PSO
4 pages
Regression Analysis in Machine Learning - Javatpoint
No ratings yet
Regression Analysis in Machine Learning - Javatpoint
1 page
Discrete Wavelet Transform (DWT) : Presented by
No ratings yet
Discrete Wavelet Transform (DWT) : Presented by
44 pages
E1 251 Linear and Nonlinear Op2miza2on
No ratings yet
E1 251 Linear and Nonlinear Op2miza2on
24 pages
CH 2 - Nonlinear Equations
No ratings yet
CH 2 - Nonlinear Equations
22 pages
Variational Autoencoder
No ratings yet
Variational Autoencoder
21 pages
Souptik29 May 2023
No ratings yet
Souptik29 May 2023
4 pages
3-2-9 - Soft Computing Lab
No ratings yet
3-2-9 - Soft Computing Lab
2 pages