Multiple Regression
Mahira Ahmad
Multiple Linear Regression
Multiple linear regression is used to estimate the relationship
between two or more independent variables and one
dependent variable.
For example, you could use multiple regression to understand
whether exam performance can be predicted based on revision
time, test anxiety, lecture attendance and gender.
Assumption #1: Your dependent variable should be measured
on a continuous scale.
Assumption #2: You have two or more independent variables,
which can be either continuous (i.e.,
an interval or ratio variable) or categorical.
Multiple Linear Regression
Assumption #3: You should have independence of observations
the Durbin–Watson statistic is a test statistic used to detect the presence of autocorrelation.
The Durbin-Watson statistic will always have a value between 0 and 4.
A value of 2.0 means that there is no autocorrelation detected in the sample. Values
from 0 to less than 2 indicate positive autocorrelation and values from 2 to 4 indicate negative
autocorrelation. Autocorrelation is a mathematical representation of the degree of similarity
between a given time series and a lagged version of itself over successive time intervals
Assumption #4: There should be no significant outliers, high leverage
points or highly influential points.
The value should not above then + 3
Assumption #5: There needs to be a linear relationship between (a) the
dependent variable and each of your independent variables.
Path: Graph →chart builder →scatter/Dot →IV (x axis) →DV (y axis) →OK
Assumption #6: Your data needs to show homoscedasticity, which is where the
variances along the line of best fit remain similar as you move along the line.
Path: Analyze →regression→linear→move IV→move DV→plots→ZRESID (yaxis) →ZPRED (x
axis)→OK
Assumption #7: Your data must not show multicollinearity, which occurs when you have two
or more independent variables that are highly correlated with each other. This leads to problems
with understanding which independent variable contributes to the variance explained in the
dependent variable, as well as technical issues in calculating a multiple regression model.
Path: Analyze →regression→linear→move IV→move DV→Stat →co linearity diagnostics
→Ok
Multicollinearity may be checked in multiple ways:
1) Correlation matrix – When computing a matrix of Pearson’s bivariate correlations among all
independent variables, the magnitude of the correlation coefficients should be less than .90.
2)Tolerance below 0.1 indicates a serious problem. Tolerance below 0.2 indicates a potential
problem (Menard, 1995).
3) Variance Inflation Factor (VIF) – The VIFs of the linear regression indicate the degree that the
variances in the regression estimates are increased due to multicollinearity. VIF values higher than
10 indicate that multicollinearity is a problem.
Multiple Linear Regression
Assumption #8: Finally, you need to check that the residuals
(errors) are approximately normally distributed. The
difference between the observed value of the dependent variable
(y) and the predicted value (ŷ). Two common methods to check
this assumption include using: (a) a histogram (with a
superimposed normal curve) and a Normal P-P Plot; or (b) a
Normal Q-Q Plot.
Path: Analyze →regression→linear→move IV→move
DV→plots→ZRESID(yaxis) →ZPRED (x axis)→probability
plots → continue → OK
Running Analysis
Click Analyze > Regression > Linear... on the main menu, as shown below:
Interpreting the Output of Multiple
Regression Analysis
SPSS Statistics generate few tables of output for a
multiple regression analysis. Only three main tables
are required to understand the results from the
multiple regression procedure, if no assumptions have
been violated then analysis will be run.
This includes relevant scatterplots and partial
regression plots, histogram (with superimposed
normal curve), Normal P-P Plot and Normal Q-Q
Plot, correlation coefficients and Tolerance/VIF
values, casewise diagnostics and studentized deleted
residuals.
How to write about Assumptions
First, Multiple regression was run to test the predictors of social
interaction anxiety. The assumption of independent error was met as the
Durbin Watson Value (2.19) was between 0-4. The assumption of no perfect
multicollinearity was tested by checking the tolerance value and the
assumption was met as all the values were (.48) greater than 0.2. The
assumptions of homoscedasticity, linearity and normally distributed errors
were also met.
Determining How Well the Model Fits
Residuals: The differences between what the model predicts and the observed data are
usually called residuals.
R2 tells us how much of the variance in outcome variable is accounted for by the regression
model from our sample.
Adjusted R2. The adjusted value tells us how much variance in outcome variable would be
accounted for if the model had been derived from the population from which the sample was
taken.
R2 change. This measure is a useful way to assess the contribution of new predictors (or
blocks) to explaining variance in the outcome.
ANOVA tells us whether the model, overall, results in a significantly good degree of
prediction of the outcome variable. However, the ANOVA doesn’t tell us about the individual
contribution of variables in the model. The ANOVA also tells us whether the model is a
significant fit of the data overall.
Regression Coefficient-b: The value of b represents the change in the outcome resulting
from a unit change in the predictor. A regression coefficient of 0 means a unit change in the
predictor variable results in no change in the predicted value of the outcome (the predicted
value of the outcome does not change at all).
Logically if a variable significantly predicts an outcome, then it should have a b-value that
is different from zero. This hypothesis is tested using a t-test. The t-statistic tests the null
hypothesis that the value of b is 0; therefore, if it is significant we gain confidence in the
hypothesis that the b-value is significantly different from 0 and that the predictor variable
contributes significantly to our ability to estimate values of the outcome.
The b-value tells us the strength of the relationship between a predictor and the outcome
variable. If it is significant (Sig. < .05 in the SPSS table) then the predictor variable
significantly predicts the outcome variable. If the value is positive we can tell that there is
a positive relationship between the predictor and the outcome, whereas a negative
coefficient represents a negative relationship. They also tell us to what degree each
predictor affects the outcome if the effects of all other predictors are held constant.
The standardized beta values are all measured in standard deviation units and so are
directly comparable: therefore, they provide a better insight into the ‘importance’ of a
predictor in the model.
The standard error tells us something about how different b values would be across
different samples. If the standard error is very small, then it means that most samples are
likely to have a b-value similar to the one in our sample (because there is little variation
across samples).
Determining How Well the Model Fits
It can be seen from value of 0.66 that independent variables explain 66% of the variability of
dependent variable,(social interaction anxiety). Moreover, Adjusted R Square" (adj. R2) tells only
significant variance that is always less than R Square.
Statistical Significance
The F-ratio in the ANOVA table tests whether the overall regression model is a
good fit for the data. The table shows that the independent variables statistically
significantly predict the dependent variable, F(2, 131) = 129.88, p < .001 (i.e., the
regression model is a good fit of the data).
Estimated Model Coefficients
This is obtained from the Coefficients table, as shown below:
Statistical Significance of the Independent
Variables
An unstandardized coefficient/ beta (B) has units and a 'real life' scale. An
unstandardized coefficient represents the amount of change in a dependent variable Y due to a change of
1 unit of independent variable X.
A standardized beta coefficient compares the strength of the effect of each individual independent
variable to the dependent variable. The higher the absolute value of the beta coefficient, the stronger
the effect
If p < .05, it can be concluded that the coefficients are statistically significantly different to 0 (zero).
The t-value and corresponding p-value are located in the "t" and "Sig." columns, in above displayed
table.
It can be seen from the "Sig." column that all independent variable coefficients are statistically
significantly different from 0.
Std. Error – These are the standard errors associated with the coefficients. The standard error is used
for testing whether the parameter is significantly different from 0 by dividing the parameter estimate by
the standard error to obtain a t-value .
The error term is the difference between the expected price at a particular time and the price that was
actually observed. ...
Report Write Up in APA 7 Format
Table Format (see table # 7.16, page # 219)
Interpretation format
Table description