0% found this document useful (0 votes)
34 views

Regression and Correlation

- Regression and correlation analyze the relationships between variables. Correlation determines the strength of linear relationships while regression determines the form or nature of relationships. - Scatter plots graphically show the relationships between two variables. Correlation coefficients measure the strength of linear relationships from -1 to 1. Simple linear regression models fit a line to data to predict the dependent variable from the independent variable. - Statistical tests assess whether correlation or regression coefficients are statistically different from hypothesized values. Confidence intervals provide ranges that estimated coefficients are likely to fall within.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Regression and Correlation

- Regression and correlation analyze the relationships between variables. Correlation determines the strength of linear relationships while regression determines the form or nature of relationships. - Scatter plots graphically show the relationships between two variables. Correlation coefficients measure the strength of linear relationships from -1 to 1. Simple linear regression models fit a line to data to predict the dependent variable from the independent variable. - Statistical tests assess whether correlation or regression coefficients are statistically different from hypothesized values. Confidence intervals provide ranges that estimated coefficients are likely to fall within.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

Regression and Correlation

-Relationships between variables. For example:


• How do the sales of a product depend on the
price charged?
•How does the strength of material depend on
temperature?
•To what extent is metal pitting related to pollution?
•How strong is the link between inflation and
employment rates?
•How can we use the amount of fertilizer used to
predict crop yields?
These are essentially two types of problem:
•CORRELATION problems which involve measuring
the strength of relationship.
•REGRESSION problems which are concerned with
the form or nature of a relationship.

SCATTER PLOT – a graphical presentation of the


variables by plotting the points in the XY-plane.
- Gives us the idea on the strength and form of
relationships between the two variables.
CORRELATION ANALYSIS

Objective: to determine the degree or strength


of the linear association between the values of
two variables, X and Y.
 
The analysis does not differentiate the dependent
and the independent variable.
 
A correlation coefficient measures how weak or
strong the linear relationship is.
 
Pearson’s Correlation Coefficient, 
• Most commonly used measure of linear
association between two (interval or ratio)
variables, X and Y
• Denoted by  = XY / XY
• Estimated by the sample correlation coefficient,
SPXY
r
SS X SS Y

where:  X  Y 
SPXY   XY    SS X

 X
 X  
2  2 

n   n 
   

( Y ) 2
SS Y   Y 
2

n
• Range of values: -1    1 and -1  r  1
• Qualitative interpretation of  and r
Absolute Value of Strength of Linear Relationship
Correlation Coefficient Between X and Y
0.01 – 0.20 Very weak
0.21 – 0.40 Weak
0.41 – 0.60 Moderate
0.61 – 0.80 Strong
0.81 – 0.99 Very strong
Example: It is suspected that there is some
relationship between relative humidity and tensile
strength of certain material. The following
measurements are obtained.
Relative Humidity(%) Tensile strength
45 80
55 67
65 58
80 55
95 30

Estimate the strength /degree of relationship


between relative humidity(%) and tensile strength.
Test of Hypothesis on 
 
Ho: There is no linear relationship between X and Y ( = 0 ).
 
Ha: i)   0 ii)  > 0 iii)  < 0
  r n2
Test stat: ct  ~ t ( n2)
2
1 r

Dec. Rule: Reject Ho if i)  tc  > t2(n-2)


ii) tc > t(n-2)
iii) tc < -t(n-2).
Else, fail to reject Ho.
REGRESSION ANALYSIS
 
OBJECTIVE: To determine the probable form of the
relationship between X and Y where X and Y are
paired variables.
 

The relationship between the variables X and Y is


represented by a statistical model of the form:
Y = f(X) + 
 
where Y – the response or dependent variable
X – the explanatory or independent variable
( attempts to explain the outcomes)
 - the random error component
SIMPLE LINEAR REGRESSION MODEL:

Yi   0  1 X i   i
where
0 – the regression constant; true Y – intercept
1 – the regression coefficient; measure of true
change in Y per unit change in X
i – the random error associated with Yi for a
given Xi
Yi, Xi – ith observed value for Y and X,
respectively 
ASSUMPTIONS UNDERLYING THE SLR MODEL

• The values of the independent variable X may


either be fixed or random.
• The X’s are measured without error.
• The Y’s are statistically independent.
• For each value of X, there is a sub-population of
Y – values that is normally distributed.
• The variances of the sub-populations are all
equal.
• The mean of the sub-populations of Y all lie on
the same straight line.
Based from a SRS of size n, an estimate of the
model is: Yˆi  b0  b1 X i
where b0 – the estimated regression constant
b1 – the estimated regression coefficient
 

The estimators b0 and b1 are obtained by minimizing


the sum of squares of errors (LEAST SQUARES
ESTIMATION
n n
PROCEDURE – LSE ). That is,
min  ei2    Yi  (  0  1 X i ) 
2

i 1 i 1

The results of the minimization are as follows:


SPXY
b1 
SS X
and
b0  Y  b1 X
Adequacy of the Predicting Equation
• An overall measure of adequacy of the equation is
provided by the coefficient of multiple
determination, R2.
• R2 gives the proportion of total variation in Y that is
accounted for by the independent variable X.
• R2 ranges from 0 to 1 or 0% to 100%. The nearer
it is to 1 or 100%, the better is the fit of the model.
b1SP XY
R 
2
*100%
SSY
Test of Hypothesis
A test of hypothesis about the regression
coefficient, 1 can be performed at a certain
level of significance .
 
Using the t – test
Ho: 1 = 1* vs i) Ha: 1  1* or
ii) Ha: 1 > 1* or
iii) Ha: 1 < 1*
Test Statistic: tc = b1 - 1*
s.e.(b1) 
where: SSy  b1 SPXY
s.e.(b1 ) 
SS X (n  2)

Decision Rule: Reject Ho if i)  tc  > t2(n-2)


ii) tc > t(n-2)
iii) tc < -t(n-2).
Else, fail to reject Ho.
 
 
 II. Using the F – test in the ANOVA
Ho: 1 = 1* vs Ha: 1  1*
Test Statistic: Fc
Decision Rule: Reject Ho if Fc > F( 1, n-2 ).
Else, fail to reject Ho.
 
Analysis of Variance Table:
Sources of Mean
Variation df SS Square
Regression 1 b1SPXY MSR
Error n – 2 SSY - b1SPXY MSE
TOTAL n–1 SSY
Using the t – test
Ho: 0 = 0* vs i) Ha: 0  0* or
ii) Ha: 0 > 0* or
iii) Ha: 0 < 0*

Test Statistic: tc = b0 - 0*


s.e.(b0)
where:
( SSy  b1 SPXY )  X 2
s.e.(b0 ) 
SS X ( n  2) n
Interval Estimation of 0 and 1:
A (1-) x 100% confidence interval for 0:
b0 t ( n 2)
se(b0 )
2
where
MSE  X 2
se(b0 ) 
nSS X

A (1-) x 100% confidence interval for 1:


b1 t ( n2)
se (b1 )
2
where
MSE
se(b1 ) 
  SS X

You might also like