The Line of Best Fit: LESSON 19.2
The Line of Best Fit: LESSON 19.2
LESSON 19.2
The Line of Best Fit
Table of Contents
Learning Competency 1
Learning Objectives 1
Essential Questions 2
Prerequisite Skills and Topics 2
Lesson Proper 3
A. Introduction to the Lesson 3
B. Discussion 6
C. Practice & Feedback 14
Performance Assessment 22
Worksheet Answer Key 23
Synthesis 27
Bibliography 27
Grade 11 • Unit 19: The Regression Line
Learning Competency
At the end of the lesson, the learners should be able to draw the best-fit line on
a scatter plot.
Learning Objectives
At the end of this lesson, the learners should be able to do the following:
1
Grade 11 • Unit 19: The Regression Line
Essential Questions
At the end of this lesson, the student should be able to answer the following questions:
● How does the coefficient of determination help us in using the regression line?
● How can you make use of the regression line of two variables if the correlation
coefficient is not significant?
Topics:
● Math 8 Unit 3: Linear Equations | Lesson 3: Slope of a Line
● Math 8 Unit 3: Linear Equations | Lesson 4: Writing Linear Equations in Standard
and Slope-Intercept Form
● Math 8 Unit 3: Linear Equations | Lesson 5: Graphing Linear Equations
● Math 10 Unit 21: Research Design | Lesson 2: Basics of Data Collection
● Math 10 Unit 21: Research Design | Lesson 4: Data Analysis
2
Grade 11 • Unit 19: The Regression Line
Lesson Proper
Duration: 5 minutes
Methodology:
1. Present a pair of concepts (e.g., travel time and distance).
2. Ask the students to stand if the given concepts have a direct proportion.
3. Call volunteers to explain their answers.
4. Present another pair of concepts. Continue in this manner as many times as
desired.
Expected Results:
Sample results:
travel time and distance: direct proportion
The longer the travel time, the farther the distance traveled.
3
Grade 11 • Unit 19: The Regression Line
Guide Questions:
1. What are other common examples involving direct proportion? Indirect
proportion?
2. Can the relationship between dependent and independent variables be both
direct and indirect? Cite examples.
3. How can proportions illustrate the relationship between two variables?
Duration: 10 minutes
Methodology:
1. Divide the class into four groups.
2. Give the following sets of coordinates.
Groups 1 and 3: {(0,1), (1,3), (−1, −1)}
Groups 2 and 4: {(−2,3), (1, −1), (4, −5)}
3. Ask each group to graph the points and determine the slope of the line
formed.
4. Let a representative from each group briefly present their work.
4
Grade 11 • Unit 19: The Regression Line
Expected Results:
Groups 1 and 3:
Slope: 2
Groups 2 and 4:
4
Slope: − 3
Guide Questions:
1. How do you define the slope of a line?
2. How do you find the slope of a line?
3. What is the relationship between the two variables if they have a negative
slope? Positive slope?
5
Grade 11 • Unit 19: The Regression Line
Teacher’s Notes
To help better gauge students’ readiness for this lesson, you may assign the short test
given in the Test Your Prerequisite Skills section of the corresponding study guide.
B. Discussion
Teacher’s Notes
You may use the Learn about It! Slides in the presentation file to discuss the following
key concepts and examples. Make sure to address student questions before jumping
from one concept to another.
• Scatter Plot – a diagram that uses coordinates to show values for two
variables
Example:
The diagram below is an example of a scatter plot.
10
6
Y
4
0
0 2 4 6 8 10 12 14 16
X
6
Grade 11 • Unit 19: The Regression Line
𝑛−2
𝑡 = 𝑟√ ,
1 − 𝑟2
Then, we compare 𝑡 to the critical values obtained from the 𝑡-table using the
significance level 𝛼 and the degrees of freedom 𝑑𝑓 = 𝑛 − 2.
• Line of Best Fit or the Regression Line – a straight line that best represents
the data on a scatter plot
(∑𝑌)(∑𝑋 2 ) − (∑𝑋)(∑𝑋𝑌)
𝑎=
𝑛(∑𝑋 2 ) − (∑𝑋)2
𝑛(∑𝑋𝑌) − (∑𝑋)(∑𝑌)
𝑏=
𝑛(∑𝑋 2 ) − (∑𝑋)2
7
Grade 11 • Unit 19: The Regression Line
Example:
The line of best fit of the scatter plot above is shown below.
10
6
Y
4
0
0 2 4 6 8 10 12 14 16
X
Example:
If the value of the correlation coefficient is 𝑟 = 0.86, we can solve the coefficient
of determination by taking its square.
𝑟 2 = (0.86)2
𝑟 2 = 0.7396
8
Grade 11 • Unit 19: The Regression Line
Example 1
The correlation coefficient 𝑟 between an independent variable 𝑋 and a dependent
variable 𝑌 is 𝑟 = 0.65. Determine how a value of 𝑌 can be predicted from the value of
𝑋 in the regression line.
Solution:
To determine how certain we can predict the dependent variable from the
independent variable, we obtain the coefficient of determination. Since the value of
the correlation coefficient is 𝑟 = 0.65, we can solve the coefficient of determination by
taking its square.
𝑟 2 = (0.65)2
𝑟 2 = 0.4225
Therefore, there is a 42.25% certainty that we can predict the correct value of 𝑌 from
a given value of 𝑋 using the regression line.
Example 2
Test the significance of the correlation coefficient 𝑟 = 0.24 of an independent variable
𝑋 and a dependent variable 𝑌 at 𝛼 = 0.05. The sample size is 𝑛 = 20.
Solution:
1. State the null and the alternative hypotheses.
9
Grade 11 • Unit 19: The Regression Line
𝑑𝑓 = 𝑛 − 2
= 20 − 2
= 18.
Using 𝑑𝑓 = 18, we obtain the critical values using the 𝑡-table. At 𝛼 = 0.05, the
critical values are ±2.101. Thus, the rejection region lies at 𝑡 > 2.101 or 𝑡 < −2.101.
𝑛−2
𝑡 = 𝑟√
1 − 𝑟2
20 − 2
𝑡 = (0.24)√
1 − (0.24)2
𝑡 = 1.049
Since 𝑡 = 1.049 < 2.101, it does not lie in the rejection region. Thus, we fail to
reject 𝐻0 .
10
Grade 11 • Unit 19: The Regression Line
Example 3
Perform a regression analysis on the following data at 𝛼 = 0.2. Assume that 𝑋 is the
independent variable and 𝑌 is the dependent variable.
Solution:
1. Determine the dependent and the independent variables.
𝑿 𝒀 𝑿𝟐 𝒀𝟐 𝑿𝒀
21 12 441 144 252
20 14 400 196 280
29 17 841 289 493
21 15 441 225 315
24 18 576 324 432
19 17 361 289 323
27 18 729 324 486
25 17 625 289 425
11
Grade 11 • Unit 19: The Regression Line
∑𝑋 ∑𝑌 ∑𝑋 2 ∑𝑌 2 ∑𝑋𝑌
= 186 = 128 = 4 414 = 2 080 = 3 006
𝑛∑𝑋𝑌 − (∑𝑋)(∑𝑌)
𝑟=
√[𝑛∑𝑋 2 − (∑𝑋)2 ][𝑛∑𝑌 2 − (∑𝑌)2 ]
8 ⋅ 3 006 − (186)(128)
𝑟=
√[8 ⋅ 4 414 − (186)2 ][8 ⋅ 2 080 − (128)2 ]
𝑟 = 0.561
𝑛−2
𝑡 = 𝑟√
1 − 𝑟2
12
Grade 11 • Unit 19: The Regression Line
8−2
𝑡 = 0.561√
1 − (0.561)2
𝑡 = 1.660
Since 𝑡 = 1.660 > 1.440, it lies in the rejection region. Thus, we reject 𝐻0 . Therefore,
we say that there is a significant relationship between the independent variable
and the dependent variable.
(∑𝑌)(∑𝑋 2 ) − (∑𝑋)(∑𝑋𝑌)
𝑎=
𝑛(∑𝑋 2 ) − (∑𝑋)2
128 ⋅ 4414 − 186 ⋅ 3006
𝑎=
8 ⋅ 4414 − (186)2
𝑎 = 8.207
𝑛(∑𝑋𝑌) − (∑𝑋)(∑𝑌)
𝑏=
𝑛(∑𝑋 2 ) − (∑𝑋)2
8 ⋅ 3 006 − 186 ⋅ 128
𝑏=
8 ⋅ 4 414 − (186)2
𝑏 = 0.335
The coefficient of determination shows how certain the regression line can predict
the value of 𝑌 given a value of 𝑋. It can be solved using 𝑟 2 . Since the correlation
coefficient is 𝑟 = 0.561, we can solve the coefficient of determination.
13
Grade 11 • Unit 19: The Regression Line
𝑟 2 = (0.561)2
𝑟 2 = 0.3147
Since 𝒓𝟐 = 𝟎. 𝟑𝟏𝟒𝟕, the predicted value 𝑌 from the regression line 𝒀 = 𝟎. 𝟑𝟑𝟓𝒙 +
𝟖. 𝟐𝟎𝟕 has a 31.47% certainty.
Problem 1
The correlation coefficient 𝑟 between an independent variable 𝑋 and a dependent
variable 𝑌 is 𝑟 = 0.95. Determine how certain 𝑌 can be predicted from 𝑋 in the
regression line.
Solution:
To determine how certain we can predict the dependent variable from the
independent variable, we obtain the coefficient of determination. Since the value of
the correlation coefficient is 𝑟 = 0.95, we can solve the coefficient of determination by
taking its square.
14
Grade 11 • Unit 19: The Regression Line
𝑟 2 = (0.95)2
𝑟 2 = 0.9025
Thus, there is a 90.25% certainty that we can predict the correct value of 𝑌 from a
given value of 𝑋 using the regression line.
Problem 2
Test the significance of the correlation coefficient 𝑟 = 0.62 of an independent variable
𝑋 and a dependent variable 𝑌 at 𝛼 = 0.05. The sample size is 𝑛 = 10.
Solution:
1. State the null and the alternative hypotheses.
15
Grade 11 • Unit 19: The Regression Line
𝑛−2
𝑡 = 𝑟√
1 − 𝑟2
10 − 2
𝑡 = (0.62)√
1 − (0.62)2
𝑡 = 2.235
Since 𝑡 = 2.235 < 2.306, it does not lie in the rejection region. Thus, we fail to
reject 𝐻0 .
Problem 3
Eliza is going to perform a regression analysis on her collected data. She obtained a
correlation coefficient 𝑟 = 0.261 from the data that she took from a sample of size 26.
Is there a significant relationship between the two variables that she has at 𝛼 = 0.01?
Solution:
1. State the null and the alternative hypotheses.
16
Grade 11 • Unit 19: The Regression Line
𝑛−2
𝑡 = 𝑟√
1 − 𝑟2
26 − 2
𝑡 = (0.261)√
1 − (0.261)2
𝑡 = 1.325
Since 𝑡 = 1.325 < 2.797, it does not lie in the rejection region. Thus, we fail to
reject 𝐻0 .
17
Grade 11 • Unit 19: The Regression Line
4. Ask each group to assign a representative to show their solution on the board and
discuss as a group how they come up with their solution.
5. Inform the student of the accuracy of his answer and solution, and in the case
when there is some sort of misconception, give the student opportunity to work
with his/her peers to re-analyze the problem and then lead them in the right
direction to find the correct answer.
Problem 4
Test the significance of the correlation coefficient 𝑟 = 0.43 of an independent variable
𝑋 and a dependent variable 𝑌 at 𝛼 = 0.05. The sample size is 𝑛 = 10.
Solution:
1. State the null and the alternative hypotheses.
18
Grade 11 • Unit 19: The Regression Line
𝑛−2
𝑡 = 𝑟√
1 − 𝑟2
10 − 2
𝑡 = (0.43)√
1 − (0.43)2
𝑡 = 1.347
Since 𝑡 = 1.347 < 2.306, it does not lie in the rejection region. Thus, we fail to
reject 𝐻0 .
Problem 5
Perform a regression analysis on the following data at 𝛼 = 0.05. Assume that 𝑋 is the
independent variable and 𝑌 is the dependent variable.
𝑿 𝒀
14 9
15 10
19 7
23 9
20 8
21 9
19
Grade 11 • Unit 19: The Regression Line
17 8
24 11
Solution:
1. Determine the dependent and the independent variables.
𝑿 𝒀 𝑿𝟐 𝒀𝟐 𝑿𝒀
14 9 196 81 126
15 10 225 100 150
19 7 361 49 133
23 9 529 81 207
20 8 400 64 160
21 9 441 81 189
17 8 289 64 136
24 11 576 121 264
∑𝑋 = 153 ∑𝑌 = 71 ∑𝑋 2 = 3 017 ∑𝑌 2 = 641 ∑𝑋𝑌 = 1 365
𝑛∑𝑋𝑌 − (∑𝑋)(∑𝑌)
𝑟=
√[𝑛∑𝑋 2 − (∑𝑋)2 ][𝑛∑𝑌 2 − (∑𝑌)2 ]
8 ⋅ 1 365 − (153)(71)
𝑟=
√[8 ⋅ 3 017 − (153)2 ][8 ⋅ 641 − (71)2 ]
𝑟 = 0.227
20
Grade 11 • Unit 19: The Regression Line
𝑛−2
𝑡 = 𝑟√
1 − 𝑟2
8−2
𝑡 = 0.227√
1 − (0.227)2
𝑡 = 0.571
Since 𝑡 = 0.571 < 2.447, it does not lie in the rejection region. Thus, we failed to reject
the null hypothesis. It follows that there is no significant relationship between the
independent variable and the dependent variable.
21
Grade 11 • Unit 19: The Regression Line
Performance Assessment
This performance assessment serves as a formative assessment, divided into three sets
based on the student's level of learning. Click on the link provided on the lesson page to
access each worksheet.
Teacher’s Notes
For a standard performance assessment, regardless of the student's level of learning,
you may give the problem items provided in the Check Your Understanding section of
the study guide.
22
Grade 11 • Unit 19: The Regression Line
Worksheet I
A.
1. 4.41%
2. 33.64%
3. 38.44%
4. 8.41%
5. 14.44%
B.
1. 𝑑𝑓 = 10
Rejection region: 𝑡 < −3.169, 𝑡 > 3.169
𝑡 = 1.068
failed to reject 𝐻0
2. 𝑑𝑓 = 8
Rejection region: 𝑡 < −2.306, 𝑡 > 2.306
𝑡 = 2.419
reject 𝐻0
3. 𝑑𝑓 = 13
Rejection region: 𝑡 < −2.160, 𝑡 > 2.160
𝑡 = 4.351
reject 𝐻0
23
Grade 11 • Unit 19: The Regression Line
Worksheet II
A.
1. 𝑑𝑓 = 13
Rejection region: 𝑡 < −2.160, 𝑡 > 2.160
𝑡 = 1.347
failed to reject 𝐻0
2. 𝑑𝑓 = 8
Rejection region: 𝑡 < −3.355, 𝑡 > 3.355
𝑡 = 1.863
failed to reject 𝐻𝑜
3. 𝑑𝑓 = 10
Rejection region: 𝑡 < −2.228, 𝑡 > 2.228
𝑡 = 4.706
reject 𝐻0
B.
𝑿 𝒀 𝑿𝟐 𝒀𝟐 𝑿𝒀
24 21 576 441 504
20 42 400 1 764 840
30 34 900 1 156 1 020
48 29 2 304 841 1 392
56 46 3 136 2 116 2 576
∑𝑋 = 178 ∑𝑌 = 172 ∑𝑋 2 = 7316 ∑𝑌 2 = 6 318 ∑𝑋𝑌 = 6 332
𝑟 = 0.333
𝑎 = 26.809
24
Grade 11 • Unit 19: The Regression Line
𝑏 = 0.213
𝑌 = 0.213𝑋 + 26.809
Worksheet III
A.
1. 𝑑𝑓 = 6
Rejection region: 𝑡 < −2.447, 𝑡 > 2.447
𝑡 = 1.234
failed to reject 𝐻0
2. 𝑑𝑓 = 8
Rejection region: 𝑡 < 2.306, 𝑡 > 2.306
𝑡 = 2.067
failed to reject 𝐻0
B.
1.
𝑿 𝒀 𝑿𝟐 𝒀𝟐 𝑿𝒀
24 51 576 2 601 1 224
30 42 900 1 764 1 260
40 44 1 600 1 936 1 760
48 59 2 304 3 481 2 832
60 66 3 600 4 356 3 960
∑𝑋 = 202 ∑𝑌 = 262 ∑𝑋 2 = 8 980 ∑𝑌 2 = 14 138 ∑𝑋𝑌 = 11 036
𝑟 = 0.779
𝑎 = 30.148
𝑏 = 0.551
𝑌 = 0.551𝑋 + 30.148
25
Grade 11 • Unit 19: The Regression Line
2.
𝑿 𝒀 𝑿𝟐 𝒀𝟐 𝑿𝒀
25 41 625 1 681 1 025
30 34 900 1 156 1 020
40 42 1 600 1 764 1 680
38 34 1 444 1 156 1 292
44 29 1 936 841 1 276
27 43 729 1 849 1 161
36 54 1 296 2 916 1 944
∑𝑋 = 240 ∑𝑌 = 277 ∑𝑋 2 = 8 530 ∑𝑌 2 = 11 363 ∑𝑋𝑌 = 9 398
𝑟 = −0.285
𝑎 = 50.848
𝑏 = −0.329
𝑌 = −0.329𝑋 + 50.848
26
Grade 11 • Unit 19: The Regression Line
Synthesis
Application and To integrate values and build a connection to the real world, ask
Values Integration students the following questions:
1. In research, when is it important to know the best line that
would represent a set of data?
2. Why do you think variables need to have a significant
correlation before finding their regression line?
Bridge to the Next To spark interest for the next lesson, ask students the following
Topic questions:
1. What does it mean when a line has a negative slope?
Positive slope?
2. What does it mean when a set of data forms a straight line?
Bibliography
Foley, Ben. “What is Regression Analysis and Why Should I Use It?” Surveygizmo. Retrieved
17 September 2019 from https://bit.ly/2XAyMC4.
Glen, Stephanie. “Regression Analysis: Step by Step Articles, Videos, Simple Definitions.”
Statistics How To. Retrieved 17 September 2019 from
27
Grade 11 • Unit 19: The Regression Line
https://www.statisticshowto.com/probability-and-statistics/regression-analysis/.
28