0% found this document useful (0 votes)
16 views

PA

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

PA

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Multiple regresssions

Linear regression using multiple inputs, often referred to as multiple


linear regression, is a statistical method used to model the
relationship between a dependent variable two or more independent
variables It is an extension of simple linear regression, which deals
with only one independent variable
**Data Collection: Gather data for the dependent variable and
multiple independent variables, ensuring the data is suitable for
regression (e.g., no severe multicollinearity).**Model Fitting: Use the
least squares method to fit the regression model, minimizing the
difference between observed and predicted values.**Coefficient
Estimation: Calculate coefficients that show how much the
dependent variable changes with a one-unit change in each
independent variable, assuming others remain constant.**Statistical
Significance: Test the significance of each coefficient with t-tests and
the overall model with an F-test to determine if the relationships are
statistically meaningful.** Model Evaluation:1.R-squared: Measures
the proportion of variance in the dependent variable explained by
the independent variables.2.Residual Analysis: Check if errors are
randomly distributed.3.Multicollinearity: Ensure independent
variables aren't too highly correlated. **Prediction: Use the validated
model to predict the dependent variable based on new data.
Linear regression with multiple outputs is a variation of traditional
linear regression that is used when you want to predict multiple
dependent variables simultaneously, rather than just a single
dependent variable. This technique is also known as multivariate
linear regression.  In standard linear regression, you have a single
dependent variable (Y) and one or more independent variables (X)
and aim to find a linear relationship that best describes the
relationship between them. The general equation for simple linear
regression is: Y = β0 + β1X + ε
Qualitative vs. Quantitative Attributes
1. Qualitative Attributes:#Definition: Qualitative attributes (also called
categorical attributes) are non-numeric and describe qualities or
characteristics of an entity.//Examples:*Color of a car (Red,
Blue)*Gender (Male, Female).//Usage: Used for classification tasks;
converted to numeric form using encoding techniques like One-Hot or
Label Encoding.2. Quantitative Attributes:#Definition: Quantitative
attributes are numeric and describe measurable quantities or
amounts..//Examples:*Height (170 cm)*Price ($19.99).//Usage: Used
directly in calculations and regression tasks; may require normalization.
When to Use:#Qualitative: When dealing with categories (e.g.,
predicting customer preferences).#Quantitative: When measuring
quantities (e.g., predicting prices based on numeric features).
Example Scenario: Analyzing retail customer data:*Qualitative: Gender,
Payment Method for understanding categories.*Quantitative: Age,
Purchase Amount for predicting spending patterns.
Method of Least Squares
1. Purpose://The method of least squares is used to find the best-
fitting line (or model) by minimizing the residual sum of squares
(RSS), which represents the difference between observed and
predicted values. 2. Residual Sum of Squares (RSS)://RSS Formula:
RSS=∑(yi−y^i)2RSS = \sum (y_i - \hat{y}_i)^2RSS=∑(yi−y^i)2 where
yiy_iyi are the observed values and y^i\hat{y}_iy^i are the predicted
values. 3. Choosing Coefficients (β\betaβ)://The coefficients β0,β1,
…,βn\beta_0, \beta_1, \dots, \beta_nβ0,β1,…,βn are chosen by
minimizing the RSS.//Minimization Process:**Calculate the partial
derivatives of the RSS with respect to each β\betaβ.**Set these
derivatives to zero and solve the resulting equations to find the
values of β\betaβ that minimize the RSS. 4. Outcome:*The resulting
coefficients β\betaβ provide the line or model that best fits the data
by minimizing the overall prediction error.
Linear Discriminant Analysis (LDA) is a statistical and machine learning
technique used for dimensionality reduction and classification. It is primarily
employed in the field of pattern recognition and machine learning for tasks like
face recognition, image classification, and data compression. LDA differs from
other dimensionality reduction techniques, such as Principal Component
Analysis (PCA), because it takes into account the class labels of the data points.
● Data Preprocessing: LDA begins with a labeled dataset, where each data
point is associated with a class label. This dataset is used for both
dimensionality reduction and classification tasks. ● Calculate Class Means: For
each class in the dataset, LDA calculates the mean (average) of the feature
vectors belonging to that class. This results in as many class means as there are
classes in the data. ● Calculate Scatter Matrices: LDA then computes two
scatter matrices: ● Within-class scatter matrix (Sw): This measures the variance
of data points within each class. It is calculated by summing up the covariance
matrices of individual classes. ● Between-class scatter matrix (Sb): This
measures the variance between class means. It is calculated by finding the
covariance between the class means and then scaling it by the number of data
points in each class. ● Eigenvalue Decomposition: The next step involves
calculating the eigenvectors and eigenvalues of the matrix Sw^-1 * Sb. These
eigenvectors represent the directions in the feature space along which the
classes are best separated. ● Selecting Discriminant Vectors: The eigenvectors
with the highest eigenvalues are selected as the discriminant vectors. These
vectors capture the most important information for class discrimination. ●
Projecting Data: To reduce the dimensionality of the data, you can project the
original data onto the discriminant vectors. The number of discriminant vectors
chosen typically depends on the desired dimensionality reduction. ●
Classification: LDA can also be used for classification tasks. After reducing the
dimensionality of the data using the discriminant vectors, you can apply a
classifier (e.g., linear discriminant analysis, logistic regression) to classify new
data points. Usage in Predictions:*Training: Fit the LDA model on labeled
training data, learning the means, covariance matrix, and prior probabilities of
each class.Prediction: For new data points, LDA calculates the probability of
belonging to each class and assigns the point to the class with the highest
probability.
Perceptron is a basic linear classification method suitable for simple
tasks with linearly separable data, while logistic regression offers
more flexibility and sophistication, making it a common choice for
various classification problems. More advanced neural networks are
typically preferred for handling complex, nonlinear classification
tasks. Perceptron learning algorithm is a linear classification method,
and it's important to understand it in the context of other linear
classification methods. Linear classification methods are used for
separating data points into different classes using linear decision
boundaries, such as lines or hyperplanes.

This step function or Activation function is vital in ensuring that


output is mapped between (0,1) or (-1,1). Take note that the weight
of input indicates a node’s strength. Similarly, an input value gives the
ability the shift the activation function curve
Context in Predictive Algorithms:**The Perceptron algorithm is used
in situations where data is linearly separable. It's a foundation for
more complex neural networks and is the basis for algorithms like
Support Vector Machines (SVM).**In predictive modeling, it serves
as a simple, interpretable model for binary classification but is limited
to linearly separable data.
Ridge regression shrinks the regression coefficients by imposing a penalty
on their size. The ridge coefficients minimize a penalized residual sum
Ridge regression extends linear regression by adding a regularization term
to the OLS cost function. **Purpose: Addresses multicollinearity and
stabilizes coefficient estimates by adding a penalty term to the regression
model.**Penalty Term: Adds the square of the magnitude of coefficients
(L2 regularization) . **Formula: Minimize(∑(y−y^)2+λ∑βi^2)**Effect:
Shrinks coefficients towards zero but does not set any coefficients exactly
to zero, keeping all features in the model. **Example: Predicting house
prices with features such as size, location, and number of bedrooms.
Ridge regression will reduce the effect of correlated features like size and
location but keeps all features in the model to prevent multicollinearity.
Lasso regression, short for "Least Absolute Shrinkage and Selection
Operator" regression, is a linear regression technique used for feature
selection and regularization in statistical modeling and machine
learning**Purpose: Performs both regularization and variable selection
by adding a penalty term that can shrink some coefficients to
zero.**Penalty Term: Adds the absolute value of the magnitude of
coefficients (L1 regularization).**Formula:
Minimize(∑(y−y^)2+λ∑∣βi∣)**Effect: Shrinks some coefficients to exactly
zero, effectively excluding less important features from the model.**
Example: Predicting house prices with many features, including less
relevant ones like the number of fireplaces. Lasso regression might set the
coefficients of less important features (e.g., number of fireplaces) to zero,
simplifying the model and improving interpretability.
Lasso regression, short for "Least Absolute Shrinkage and Selection
Operator" regression, is a linear regression technique used for feature
selection and regularization in statistical modeling and machine learning.
It is particularly useful when dealing with datasets that have a large
number of features, as it helps prevent overfitting and simplifies the
model by automatically selecting a subset of the most important features.
A Generalised Additive Model (GAM) is an extension of the multiple linear
model, which recall is

In order to allow for non-linear effects a GAM replaces each linear component
βjxj with a smooth non-linear function fj(xj)

This is called an additive model because we estimate each fj(xj) for j=1,2,3,....,p
and then add together all of these individual contributions.
1. Generalized Additive Models (GAMs):**Description: GAMs extend
generalized linear models by allowing non-linear relationships between
predictors and the response variable using smooth functions.**Form:
g(E(Y))=β0+f1(X1)+f2(X2)+⋯+fn(Xn)g(E(Y)) = \beta_0 + f_1(X_1) + f_2(X_2) + \
cdots + f_n(X_n)g(E(Y))=β0+f1(X1)+f2(X2)+⋯+fn(Xn) where ggg is a link
function, β0\beta_0β0 is the intercept, and fif_ifi are smooth functions of the
predictors.**Use Case: Useful when the relationship between predictors and
the response is not strictly linear but can be modeled with smooth functions.
2. Additive Models (AMs):**Description: AMs are a subset of GAMs where the
predictors are added linearly, but each predictor can be transformed non-
linearly.**Form: Y=β0+f1(X1)+f2(X2)+⋯+fn(Xn)+ϵY = \beta_0 + f_1(X_1) +
f_2(X_2) + \cdots + f_n(X_n) + \epsilonY=β0+f1(X1)+f2(X2)+⋯+fn(Xn)+ϵ where
fif_ifi are non-linear functions of predictors, and ϵ\epsilonϵ is the error
term.**Use Case: Useful for modeling complex, non-linear relationships
without assuming a specific functional form for the relationship.
3. Smooth Additive Models:**Description: A type of additive model where
smooth functions are used to model relationships between predictors and the
response.**Form: Similar to GAMs, with smooth functions applied to
predictors but focusing specifically on smoothness in the modeling process.
**Use Case: Suitable for capturing smooth, non-linear effects in the
data.Regression Tree://Purpose: Predicts a continuous target variable by
splitting data into subsets based on feature values.//Process:**Splitting: At
each node, the tree splits the data based on a feature that best separates the
target variable into homogeneous groups.**Leaf Nodes: Each terminal leaf
node represents a predicted value, calculated as the mean of the target
variable in that subset.
Gini Index and Split Criteria:**Gini Index: Primarily used in classification trees,
it measures the impurity of a node. For regression trees, the focus is on
variance reduction rather than Gini Index.**Split Criteria for Regression
Trees:Variance Reduction: Chooses splits that minimize the variance of the
target variable within the resulting subsets. The goal is to reduce the variance
in the target variable as much as possible with each split, improving predictive
accuracy.
Gradient definition:The gradient is a vector that indicates the direction and
rate of the steepest increase of a function. It is used in optimization to find the
minimum or maximum of a function.
Components:*For a function f(x)f(x)f(x) with multiple variables x1,x2,…,xn the
gradient is a vector of partial derivatives:

Interpretation://Positive Gradient:**Indicates that the function is increasing in


the direction of the gradient.**If the gradient is positive for a specific variable,
increasing that variable will increase the function value. This suggests moving
in the direction of the gradient to increase the function's value.//Negative
Gradient:**Indicates that the function is decreasing in the direction of the
gradient.**If the gradient is negative for a specific variable, increasing that
variable will decrease the function value. This suggests moving in the opposite
direction of the gradient to decrease the function's value.
Gaining Insight into Exponential Loss Function1. Exponential Loss
Function://Definition: The exponential loss function, used in boosting
algorithms, is defined as L(y,y^)=exp(−y⋅y^) where y is the true label and y^\
hat{y}y^ is the predicted value.2. Insights from Properties://Sensitivity to
Misclassification: The exponential loss penalizes misclassifications heavily,
especially when the prediction is far from the true label. This makes it sensitive
to outliers. //Boosting Effect: In boosting, this loss function drives the model to
focus more on hard-to-classify instances by increasing their weights, improving
overall model performance.//Gradient Behavior: The gradient of the
exponential loss function increases with the magnitude of the prediction error,
guiding the optimization process to correct large errors effectively

You might also like