0% found this document useful (0 votes)
1 views

ml unit3

A decision tree is a non-parametric supervised learning model used for classification and regression, characterized by its hierarchical structure of nodes. Key properties include its greedy nature, computational cost, and risk of overfitting, which can be mitigated through techniques like pruning and combining classifiers such as Random Forest and AdaBoost. The effectiveness of decision trees is measured using impurity functions like Shannon's Entropy, with the bias-variance trade-off being a crucial consideration in model complexity.

Uploaded by

maneeshgopisetty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

ml unit3

A decision tree is a non-parametric supervised learning model used for classification and regression, characterized by its hierarchical structure of nodes. Key properties include its greedy nature, computational cost, and risk of overfitting, which can be mitigated through techniques like pruning and combining classifiers such as Random Forest and AdaBoost. The effectiveness of decision trees is measured using impurity functions like Shannon's Entropy, with the bias-variance trade-off being a crucial consideration in model complexity.

Uploaded by

maneeshgopisetty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Unit-3

A decision tree, which has a hierarchical structure made up of root, branches, internal, and leaf
nodes, is a non-parametric supervised learning approach used for classification and regression
applications.

Introduction to Decision Trees

• A decision tree (DT) is a common machine learning structure that splits a dataset into
subsets to improve purity and reduce entropy.
• It is a feature-based decision-making model that provides transparency and is easy to
interpret.

Properties of Decision Tree (DT) Algorithm:

1. Greedy Nature:
o The best feature for purity in splitting is used at the root node.
o Subsequent feature selection at child nodes depends on the parent node’s
feature.
o This sequential process may not always be optimal.
2. Computational Cost:
o Feature selection depends on the number of features and pattern complexity.
o Simple tests are required at each node to reduce computational demand.
o Large dimensionality increases computational difficulty.
3. Overfitting Risk:
o Deep decision trees (DTs) tend to overfit training data and perform poorly on
validation/test data.
o Overfitting can be managed through pruning, where deeper subtrees are
trimmed based on performance on validation data.

Solutions to Overfitting and Computational Cost:

• These issues are often tackled using combinations of classifiers like:


o AdaBoost
o Random Forest
o Gradient Boosting

Structure of a decision tree:

• Root Node: The initial node at the beginning of a decision tree, where the entire population
or dataset starts dividing based on various features or conditions.
• Decision Nodes: Nodes resulting from the splitting of root nodes are known as decision
nodes. These nodes represent intermediate decisions or conditions within the tree.
• Leaf Nodes: Nodes where further splitting is not possible, often indicating the final
classification or outcome. Leaf nodes are also referred to as terminal nodes.
• Sub-Tree: Similar to a subsection of a graph being called a sub-graph, a sub-section of a
these tree is referred to as a sub-tree. It represents a specific portion of the decision tree.
• Pruning: The process of removing or cutting down specific nodes in a tree to prevent
overfitting and simplify the model.
• Branch / Sub-Tree: A subsection of the entire is referred to as a branch or sub-tree. It
represents a specific path of decisions and outcomes within the tree.
• Parent and Child Node: In a decision tree, a node that is divided into sub-nodes is known as
a parent node, and the sub-nodes emerging from it are referred to as child nodes. The
parent node represents a decision or condition, while the child nodes represent the
potential outcomes or further decisions based on that condition.

Example of a decision tree:

Decision tree:
Popular Combinational ML Models( multiple decision trees):

1. Random Forest:
o A collection of decision trees.
o The final prediction is based on the combined results of multiple trees.
2. AdaBoost:
o Uses multiple weak learners to create a strong model.
o A weighted majority voting approach is used to improve overall accuracy.
3. Gradient Boosting:
o An advanced version of AdaBoost.
o Uses previous model errors to improve the next model.

These models are preferred in large-scale applications due to their ability to handle high-dimensional
data.

The bias–variance trade-off is a key factor:

o Simple models → Higher bias


o Complex models → Higher variance

Decision Trees for Classification

• A decision tree consists of:


o A root node representing the entire dataset.
o Child nodes formed by splitting data based on feature values.

Splitting Criteria:

• The quality of a split is measured using an impurity function.


• The most common impurity measure is Shannon’s Entropy, given by:

where:

o pi is the probability of class i(assuming CCC classes).


o Probabilities are estimated from the fraction of elements in each class.

Example 1:

• Consider a binary classification problem (C=2C = 2C=2).


• A dataset with 100 elements is used to compute probabilities.

Key Observations on Entropy:

• Entropy is always non-negative (≥0\geq 0≥0).


• When probability approaches zero, entropy also approaches zero.
• The minimum entropy value is 0, which occurs when data is perfectly classified.
Notes:

Example:
Show that for a C-class problem, the maximum entropy is log(c)?

You might also like