Unit 4 Learning
Unit 4 Learning
LEARNING
1 Supervised Learning
Definition: The model learns from labeled data (input-output pairs). The goal is to predict the output for new, unseen
data based on the patterns learned from the training set.
OR
Definition: A machine learning model is trained using labeled data, where the input features and corresponding
outputs are known.
Examples:
Classification: Predicting discrete labels (e.g., email spam vs. not spam).
Evaluation Metrics:
Precision & Recall: Measures of the relevance of the predictions, especially important in imbalanced datasets.
2 Unsupervised Learning
Definition: The model learns from unlabeled data. The goal is to uncover hidden patterns or structures without
predefined outputs.
Examples:
Dimensionality Reduction: Reducing the number of features while maintaining essential information (e.g., PCA).
3 Semi-supervised Learning
Definition: Uses both labeled and unlabeled data. It is particularly useful when labeling data is costly or time-
consuming.
Example: Labeling a small portion of a dataset and using a large portion of unlabeled data to train a model.
4 Reinforcement Learning
Definition: Learning through trial and error. The model (agent) interacts with the environment and receives feedback
in the form of rewards or penalties, which helps improve future decisions.
Equation:
Goal: Find weights w that minimize the error (typically using MSE loss).
Logistic Regression
Definition: Used for binary classification tasks. Outputs probabilities between 0 and 1 using the sigmoid function.
Equation:
p = 1 / (1 + exp(-z)) where z = wX + b.
Key Differences:
Applications:
Decision Trees
Overview:
A Decision Tree is a supervised learning algorithm that is used for both classification and regression tasks. It models
decisions and their possible consequences, including chance event outcomes, resource costs, and utility.
Structure:
• Nodes: Each internal node represents a test on an attribute (feature).
• Edges: Each branch represents the outcome of the test.
• Leaves: Terminal nodes represent class labels or continuous output values.
How it Works:
1. Splitting: The data is split into subsets based on the feature values. The goal is to create subsets that are as
pure as possible, meaning the data in each subset is as homogenous as possible.
2. Choosing the Best Split: Common methods for choosing the best feature to split on are:
o Gini Index: Measures the "impurity" of a node. The goal is to minimize Gini.
o Information Gain: Measures the effectiveness of a feature in classifying data. The feature that
results in the highest information gain is chosen for splitting.
3. Tree Depth: A deeper tree may lead to better fitting of data but could also lead to overfitting.
1. Linear SVM:
o Definition: Used when the data is linearly separable (i.e., a straight line or hyperplane can separate
the data into classes).
o Working: The SVM algorithm finds the hyperplane that best divides the classes by maximizing the
margin between them.
o Applications: Email spam filtering, sentiment analysis.
2. Non-Linear SVM:
o Definition: Used when the data is not linearly separable. The Kernel Trick is applied to map data
into a higher-dimensional space where it becomes linearly separable.
3. Common kernels include:
o Linear Kernel: For linearly separable data.
o Polynomial Kernel: For more complex decision boundaries.
o Radial Basis Function (RBF) Kernel: For highly complex and non-linear boundaries.
Advantages:
• Effective in High Dimensions: SVM works well in spaces with many features (e.g., text classification).
• Robust to Overfitting: Especially in high-dimensional spaces.
• Flexible: Can be adapted for both classification and regression (SVR - Support Vector Regression).
Disadvantages:
• Computationally Expensive: Especially for large datasets, training an SVM can be time-consuming.
• Memory Intensive: Requires large memory for storing support vectors.
• Sensitivity to Parameters: Performance is sensitive to the choice of the kernel, regularization parameter,
and other hyperparameters.
Applications:
• Text classification (e.g., spam filtering).
• Image classification (e.g., face detection).
• Bioinformatics (e.g., protein structure classification).
1. Training: The network is trained using a dataset and a loss function (e.g., Mean Squared Error for
regression, Cross-Entropy for classification). The goal is to minimize the error between predicted outputs
and true labels.
Advantages:
• Adaptability: Can be used for a variety of tasks, from classification to regression to complex tasks like
natural language processing.
• Non-linearity: Neural networks can model complex, non-linear relationships.
• Scalability: Can scale to very large datasets, particularly with deep networks.
Disadvantages:
• Require Large Datasets: Neural networks require a lot of data to perform well.
• Computationally Intensive: Training large networks requires significant computational resources (e.g.,
GPUs).
• Interpretability: Neural networks are often described as "black-box" models because understanding why a
network makes a particular decision can be difficult.
Applications:
• Image Recognition: Convolutional Neural Networks (CNNs) are a class of neural networks used for image
recognition tasks.
• Speech Recognition: Used in virtual assistants like Siri and Alexa.
• Natural Language Processing (NLP): Used in tasks like translation, sentiment analysis, and text
generation.