0% found this document useful (0 votes)
13 views17 pages

Multiple Linear Regression Slides

Multiple linear regression is a statistical method used to estimate the relationship between multiple independent variables and a dependent variable, with specific assumptions that must be met for valid results. Key assumptions include the independence of observations, absence of multicollinearity, and normal distribution of residuals. The analysis output includes various statistics and plots to assess model fit, significance of predictors, and overall predictive capability of the model.

Uploaded by

maryam9899999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views17 pages

Multiple Linear Regression Slides

Multiple linear regression is a statistical method used to estimate the relationship between multiple independent variables and a dependent variable, with specific assumptions that must be met for valid results. Key assumptions include the independence of observations, absence of multicollinearity, and normal distribution of residuals. The analysis output includes various statistics and plots to assess model fit, significance of predictors, and overall predictive capability of the model.

Uploaded by

maryam9899999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Multiple Regression

Mahira Ahmad
Multiple Linear Regression
 Multiple linear regression is used to estimate the relationship
between two or more independent variables and one
dependent variable.
 For example, you could use multiple regression to understand
whether exam performance can be predicted based on revision
time, test anxiety, lecture attendance and gender.
 Assumption #1: Your dependent variable should be measured
on a continuous scale.
 Assumption #2: You have two or more independent variables,
which can be either continuous (i.e.,
an interval or ratio variable) or categorical.
Multiple Linear Regression
 Assumption #3: You should have independence of observations
the Durbin–Watson statistic is a test statistic used to detect the presence of autocorrelation.
The Durbin-Watson statistic will always have a value between 0 and 4.
A value of 2.0 means that there is no autocorrelation detected in the sample. Values
from 0 to less than 2 indicate positive autocorrelation and values from 2 to 4 indicate negative
autocorrelation. Autocorrelation is a mathematical representation of the degree of similarity
between a given time series and a lagged version of itself over successive time intervals
 Assumption #4: There should be no significant outliers, high leverage
points or highly influential points.
 The value should not above then + 3
 Assumption #5: There needs to be a linear relationship between (a) the
dependent variable and each of your independent variables.
 Path: Graph →chart builder →scatter/Dot →IV (x axis) →DV (y axis) →OK
 Assumption #6: Your data needs to show homoscedasticity, which is where the
variances along the line of best fit remain similar as you move along the line.
 Path: Analyze →regression→linear→move IV→move DV→plots→ZRESID (yaxis) →ZPRED (x
axis)→OK
 Assumption #7: Your data must not show multicollinearity, which occurs when you have two
or more independent variables that are highly correlated with each other. This leads to problems
with understanding which independent variable contributes to the variance explained in the
dependent variable, as well as technical issues in calculating a multiple regression model.
 Path: Analyze →regression→linear→move IV→move DV→Stat →co linearity diagnostics
→Ok
Multicollinearity may be checked in multiple ways:
1) Correlation matrix – When computing a matrix of Pearson’s bivariate correlations among all
independent variables, the magnitude of the correlation coefficients should be less than .90.
2)Tolerance below 0.1 indicates a serious problem. Tolerance below 0.2 indicates a potential
problem (Menard, 1995).
3) Variance Inflation Factor (VIF) – The VIFs of the linear regression indicate the degree that the
variances in the regression estimates are increased due to multicollinearity. VIF values higher than
10 indicate that multicollinearity is a problem.
Multiple Linear Regression
 Assumption #8: Finally, you need to check that the residuals
(errors) are approximately normally distributed. The
difference between the observed value of the dependent variable
(y) and the predicted value (ŷ). Two common methods to check
this assumption include using: (a) a histogram (with a
superimposed normal curve) and a Normal P-P Plot; or (b) a
Normal Q-Q Plot.
 Path: Analyze →regression→linear→move IV→move
DV→plots→ZRESID(yaxis) →ZPRED (x axis)→probability
plots → continue → OK
Running Analysis
 Click Analyze > Regression > Linear... on the main menu, as shown below:
Interpreting the Output of Multiple
Regression Analysis
 SPSS Statistics generate few tables of output for a
multiple regression analysis. Only three main tables
are required to understand the results from the
multiple regression procedure, if no assumptions have
been violated then analysis will be run.
 This includes relevant scatterplots and partial
regression plots, histogram (with superimposed
normal curve), Normal P-P Plot and Normal Q-Q
Plot, correlation coefficients and Tolerance/VIF
values, casewise diagnostics and studentized deleted
residuals.
How to write about Assumptions

First, Multiple regression was run to test the predictors of social


interaction anxiety. The assumption of independent error was met as the
Durbin Watson Value (2.19) was between 0-4. The assumption of no perfect
multicollinearity was tested by checking the tolerance value and the
assumption was met as all the values were (.48) greater than 0.2. The
assumptions of homoscedasticity, linearity and normally distributed errors
were also met.
Determining How Well the Model Fits
 Residuals: The differences between what the model predicts and the observed data are
usually called residuals.
 R2 tells us how much of the variance in outcome variable is accounted for by the regression
model from our sample.
 Adjusted R2. The adjusted value tells us how much variance in outcome variable would be
accounted for if the model had been derived from the population from which the sample was
taken.
 R2 change. This measure is a useful way to assess the contribution of new predictors (or
blocks) to explaining variance in the outcome.

 ANOVA tells us whether the model, overall, results in a significantly good degree of
prediction of the outcome variable. However, the ANOVA doesn’t tell us about the individual
contribution of variables in the model. The ANOVA also tells us whether the model is a
significant fit of the data overall.

 Regression Coefficient-b: The value of b represents the change in the outcome resulting
from a unit change in the predictor. A regression coefficient of 0 means a unit change in the
predictor variable results in no change in the predicted value of the outcome (the predicted
value of the outcome does not change at all).
 Logically if a variable significantly predicts an outcome, then it should have a b-value that
is different from zero. This hypothesis is tested using a t-test. The t-statistic tests the null
hypothesis that the value of b is 0; therefore, if it is significant we gain confidence in the
hypothesis that the b-value is significantly different from 0 and that the predictor variable
contributes significantly to our ability to estimate values of the outcome.
 The b-value tells us the strength of the relationship between a predictor and the outcome
variable. If it is significant (Sig. < .05 in the SPSS table) then the predictor variable
significantly predicts the outcome variable. If the value is positive we can tell that there is
a positive relationship between the predictor and the outcome, whereas a negative
coefficient represents a negative relationship. They also tell us to what degree each
predictor affects the outcome if the effects of all other predictors are held constant.
 The standardized beta values are all measured in standard deviation units and so are
directly comparable: therefore, they provide a better insight into the ‘importance’ of a
predictor in the model.

 The standard error tells us something about how different b values would be across
different samples. If the standard error is very small, then it means that most samples are
likely to have a b-value similar to the one in our sample (because there is little variation
across samples).
Determining How Well the Model Fits
 It can be seen from value of 0.66 that independent variables explain 66% of the variability of
dependent variable,(social interaction anxiety). Moreover, Adjusted R Square" (adj. R2) tells only
significant variance that is always less than R Square.
Statistical Significance
 The F-ratio in the ANOVA table tests whether the overall regression model is a
good fit for the data. The table shows that the independent variables statistically
significantly predict the dependent variable, F(2, 131) = 129.88, p < .001 (i.e., the
regression model is a good fit of the data).
Estimated Model Coefficients
 This is obtained from the Coefficients table, as shown below:
Statistical Significance of the Independent
Variables
 An unstandardized coefficient/ beta (B) has units and a 'real life' scale. An
unstandardized coefficient represents the amount of change in a dependent variable Y due to a change of
1 unit of independent variable X.
 A standardized beta coefficient compares the strength of the effect of each individual independent
variable to the dependent variable. The higher the absolute value of the beta coefficient, the stronger
the effect
 If p < .05, it can be concluded that the coefficients are statistically significantly different to 0 (zero).
The t-value and corresponding p-value are located in the "t" and "Sig." columns, in above displayed
table.
 It can be seen from the "Sig." column that all independent variable coefficients are statistically
significantly different from 0.
 Std. Error – These are the standard errors associated with the coefficients. The standard error is used
for testing whether the parameter is significantly different from 0 by dividing the parameter estimate by
the standard error to obtain a t-value .
 The error term is the difference between the expected price at a particular time and the price that was
actually observed. ...
Report Write Up in APA 7 Format

 Table Format (see table # 7.16, page # 219)


 Interpretation format
 Table description

You might also like