0% found this document useful (0 votes)
12 views

Prac 6

Uploaded by

ayush050419
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Prac 6

Uploaded by

ayush050419
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Assignment 6: Practical Implementation of Decision Tree

What is a Decision Tree?


A Decision Tree is a supervised learning algorithm used for both classification
and regression tasks. It is a tree-like structure where:
 Internal nodes represent decisions or tests on features (attributes).
 Branches represent the outcome of the test (true/false or various
values).
 Leaf nodes represent the final prediction or class label.
Each path from the root to a leaf represents a classification decision or
regression prediction based on the features of the input data.
How Does a Decision Tree Work?
 The tree is built by recursively splitting the dataset into subsets based on
feature values, aiming to increase the homogeneity of the resulting
subsets.
 For classification, the homogeneity refers to having data points of the
same class in a subset.
 For regression, it refers to minimizing the variance within each subset.
The algorithm tries to find the best feature and corresponding threshold to
split the data at each step, typically by using criteria like Gini Impurity, Entropy,
or Variance Reduction.
Key Concepts:
1. Root Node: The starting point of the tree, where the first decision is
made based on a feature.
2. Splitting: Dividing a node into sub-nodes based on a feature. The goal is
to find the best split.
3. Decision Node: A node that further splits into more sub-nodes.
4. Leaf Node: The end node that holds the final output (class label in
classification or value in regression).
5. Pruning: Reducing the size of the tree to prevent overfitting by removing
branches that have little importance.
6. Impurity Measures:
o Gini Impurity: Used to measure how often a randomly chosen
element would be incorrectly classified.
o Entropy: Measures the uncertainty of a dataset. The higher the
entropy, the more mixed the dataset is in terms of classes.

Advantages of Decision Trees:


 Easy to Interpret: Decision trees are easy to visualize and interpret, even
for non-technical stakeholders.
 Non-linear Relationships: They can handle non-linear relationships
between features and the target variable.
 Handles Categorical Data: They naturally support both numerical and
categorical data.
 No Need for Feature Scaling: Decision trees do not require normalization
or standardization of features.
Disadvantages of Decision Trees:
 Overfitting: Decision trees can easily overfit the training data, especially
if they grow too deep.
 Bias towards Dominant Features: They may give preference to features
with many levels or numerical values.
 Instability: Small changes in the data can lead to significantly different
tree structures.

General Steps to Build a Decision Tree:


Step 1: Collect and Prepare the Data
 Gather Data: Assemble the dataset with input features (e.g., App Usage,
Screen On Time, Battery Drain) and target labels (e.g., User Behavior
Class).
 Clean Data: Handle missing values and ensure that all categorical
features are encoded numerically (e.g., convert Gender into 0 and 1 for
male and female, respectively).
Step 2: Choose Features and Label
 Select Features: Decide on which features you want to use for prediction
(e.g., App Usage, Age, Battery Drain).
 Select Target Label: The class you want to predict (e.g., User Behavior
Class).
Step 3: Split the Data into Training and Testing Sets
 Training Set: Used to train the decision tree model (typically 70-80% of
the data).
 Testing Set: Used to evaluate the performance of the model (remaining
20-30% of the data).
Step 4: Train the Decision Tree
 Tree Construction: The decision tree algorithm splits the training data by
recursively choosing features and thresholds that lead to the best
classification or prediction.
o Criterion for Splitting:
 Gini Impurity: Measures how often a randomly chosen data
point would be misclassified.
 Entropy: Measures the information gain of a split, helping to
reduce uncertainty.
o Stopping Criteria: The splitting stops when:
 A certain tree depth is reached.
 The subsets are pure (i.e., all data points belong to the same
class).
 Further splitting doesn't add significant improvement.
Step 5: Pruning (Optional)
 Prune the Tree: After the initial tree is constructed, pruning is applied to
remove unnecessary branches that do not contribute to improving
accuracy, thus avoiding overfitting.
o Pre-pruning: Limit the maximum depth of the tree or the
minimum number of samples required to split a node.
o Post-pruning: Remove branches from a fully grown tree by
evaluating the performance on a validation set.
Step 6: Test and Evaluate the Model
 Predict: Use the trained model to classify or predict the outcomes for
the test set.
 Evaluate: Measure performance using metrics like:
o Accuracy: The proportion of correctly predicted instances.
o Confusion Matrix: Shows true positives, false positives, true
negatives, and false negatives.
o Precision, Recall, F1-Score: Useful for imbalanced datasets.
Step 7: Visualize the Decision Tree
 Most tools (like Python’s scikit-learn or Excel add-ons) allow you to
visualize the decision tree structure. Visualizing the tree makes it easy to
interpret the decision paths and understand which features were
important.
Step 8: Use the Model for Predictions
 After the model has been evaluated, you can use it to classify or predict
outcomes on new, unseen data.

You might also like