0% found this document useful (0 votes)
52 views

HW12

The document outlines a homework assignment focused on visualizing and analyzing the relationship between weight, acceleration, and miles per gallon (mpg) using R programming. It includes tasks such as creating subsets for light and heavy cars, generating scatter plots with regression lines, and reporting regression summaries for both subsets. Additionally, it explores moderation effects through various regression models, including interaction terms and their correlations.

Uploaded by

kan000911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

HW12

The document outlines a homework assignment focused on visualizing and analyzing the relationship between weight, acceleration, and miles per gallon (mpg) using R programming. It includes tasks such as creating subsets for light and heavy cars, generating scatter plots with regression lines, and reporting regression summaries for both subsets. Additionally, it explores moderation effects through various regression models, including interaction terms and their correlations.

Uploaded by

kan000911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1 HW12

The answers are colored in red except the code and the plot.
109090023
Student number: 108099036
BACS - HW 12
Question 1) Let’s visualize how weight and acceleration are related to mpg.
a. Let’s visualize how weight might moderate the relationship between acceleration and
mpg:

library(magrittr)

## Warning: package 'magrittr' was built under R version 4.0.5

# Create a data.frame called cars_log with log-transformed columns for mpg, weight, and accele
ration
# model_year and origin don’t have to be transformed
auto <- read.table("C:/Users/user/Downloads/auto-data (2).txt", header=FALSE, na.strings = "?",
stringsAsFactors = F)
names(auto) <- c("mpg", "cylinders", "displacement", "horsepower", "weight",
"acceleration", "model_year", "origin", "car_name")
cars_log <- with(auto, data.frame(log(mpg), log(weight), log(acceleration), model_year, origin))

i.Create two subsets of your data, one for light-weight cars (less than mean weight)
and one for heavy cars (higher than the mean weight)
HINT: consider carefully how you compare log weights to mean weight

# use subset function to seperate two parts: subset(資料表,篩選邏輯)


light_weight_cars <- subset(cars_log, log.weight. < log(mean(auto$weight)))
heavy_weight_cars <- subset(cars_log, log.weight. >= log(mean(auto$weight)))

ii.Create a single scatter plot of acceleration vs. mpg, with different colors and/or shapes for light
versus heavy cars

# 在 cars_log 新增一行 weight_level


library(dplyr)

cars_log <- cars_log %>% mutate(weight_level = ifelse(log.weight.>= log(mean(auto$weight)),


"heavy", "light"))
library(ggplot2)

ggplot(data = cars_log, aes(x = log.acceleration., y=log.mpg., col = factor(weight_level)))+


geom_point()
2 HW12
The answers are colored in red except the code and the plot.
Student number: 108099036

iii.Draw two slopes of acceleration-vs-mpg over the scatter plot:


one slope for light cars and one slope for heavy cars (distinguish them by appearance)
# one slope for light cars
# one slope for heavy cars (distinguish them by appearance)
# ggplot 的 slope: geom_smooth
ggplot(data = cars_log, aes(x = log.acceleration., y=log.mpg., col = factor(weight_level)))+
geom_point()+geom_smooth(method = "lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'


3 HW12
The answers are colored in red except the code and the plot.
Student number: 108099036

b. Report the full summaries of two separate regressions for light and heavy cars where
log.mpg. is dependent on log.weight., log.acceleration., model_year and origin

# log.mpg. is dependent on log.weight., log.acceleration., model_year and origin


heavy <- lm(data = heavy_weight_cars, log.mpg.~ log.weight. + log.acceleration. + model_year +
factor(origin))
light <- lm(data = light_weight_cars, log.mpg.~ log.weight. + log.acceleration. + model_year + fa
ctor(origin))
summary(heavy)

##
## Call:
## lm(formula = log.mpg. ~ log.weight. + log.acceleration. + model_year +
## factor(origin), data = heavy_weight_cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.36811 -0.06937 0.00607 0.06969 0.43736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.188679 0.759983 9.459 < 2e-16 ***
## log.weight. -0.822352 0.077206 -10.651 < 2e-16 ***
## log.acceleration. 0.040140 0.057380 0.700 0.4852
## model_year 0.030317 0.003573 8.486 1.14e-14 ***
## factor(origin)2 0.091641 0.040392 2.269 0.0246 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
4 HW12
The answers are colored in red except the code and the plot.
Student number: 108099036
##
## Residual standard error: 0.1212 on 166 degrees of freedom
## Multiple R-squared: 0.7179, Adjusted R-squared: 0.7111
## F-statistic: 105.6 on 4 and 166 DF, p-value: < 2.2e-16

summary(light)

##
## Call:
## lm(formula = log.mpg. ~ log.weight. + log.acceleration. + model_year +
## factor(origin), data = light_weight_cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.36464 -0.07181 0.00349 0.06273 0.31339
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.86661 0.52767 13.013 <2e-16 ***
## log.weight. -0.83437 0.05662 -14.737 <2e-16 ***
## log.acceleration. 0.10956 0.05630 1.946 0.0529 .
## model_year 0.03383 0.00198 17.079 <2e-16 ***
## factor(origin)2 0.05129 0.01980 2.590 0.0102 *
## factor(origin)3 0.02621 0.01846 1.420 0.1571
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1112 on 221 degrees of freedom
## Multiple R-squared: 0.7292, Adjusted R-squared: 0.7231
## F-statistic: 119 on 5 and 221 DF, p-value: < 2.2e-16

c. (not graded) Using your intuition only: What do you observe about light versus heavy
cars so far?
By seeing from the plot, the slopes of two datas are similar. Yet, as it seems that there are more
samples in light_weight_cars, hence, it might be easier to reach the significant level.

Question 2) Using the fully transformed dataset from above (cars_log), to test whether we have
moderation.
a. (not graded) Between weight and acceleration ability (in seconds), use your intuition and
experience to state which variable might be a moderating versus independent variable, in
affecting mileage.
Acceleration ability (in seconds) might be a moderating versus independent variable, in affecting
mileage.

b. Use various regression models to model the possible moderation on log.mpg.:


(use log.weight., log.acceleration., model_year and origin as independent variables)
i.Report a regression without any interaction terms
5 HW12
The answers are colored in red except the code and the plot.
Student number: 108099036
rgr <- lm(log.mpg. ~ log.weight. + log.acceleration. + model_year + factor(origin), data =
cars_log)
summary(rgr)
##
## Call:
## lm(formula = log.mpg. ~ log.weight. + log.acceleration. + model_year +
## factor(origin), data = cars_log)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.38275 -0.07032 0.00491 0.06470 0.39913
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.431155 0.312248 23.799 < 2e-16 ***
## log.weight. -0.876608 0.028697 -30.547 < 2e-16 ***
## log.acceleration. 0.051508 0.036652 1.405 0.16072
## model_year 0.032734 0.001696 19.306 < 2e-16 ***
## factor(origin)2 0.057991 0.017885 3.242 0.00129 **
## factor(origin)3 0.032333 0.018279 1.769 0.07770 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1156 on 392 degrees of freedom
## Multiple R-squared: 0.8856, Adjusted R-squared: 0.8841
## F-statistic: 606.8 on 5 and 392 DF, p-value: < 2.2e-16

ii.Report a regression with an interaction between weight and acceleration


# ii. Report a regression with an interaction between weight and acceleration
rgr_inter <- lm(log.mpg. ~ log.weight. + log.acceleration. + log.weight.*log.acceleration., data =
cars_log)
summary(rgr_inter)
##
## Call:
## lm(formula = log.mpg. ~ log.weight. + log.acceleration. + log.weight. *
## log.acceleration., data = cars_log)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.49728 -0.10145 -0.01102 0.09665 0.56416
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.0249 3.6950 4.337 1.84e-05 ***
## log.weight. -1.6878 0.4578 -3.687 0.000259 ***
## log.acceleration. -1.8252 1.3537 -1.348 0.178351
## log.weight.:log.acceleration. 0.2529 0.1681 1.505 0.133123
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
6 HW12
The answers are colored in red except the code and the plot.
Student number: 108099036
##
## Residual standard error: 0.1613 on 394 degrees of freedom
## Multiple R-squared: 0.7763, Adjusted R-squared: 0.7746
## F-statistic: 455.7 on 3 and 394 DF, p-value: < 2.2e-16

iii.Report a regression with a mean-centered interaction term


log.weight.mc <- scale(cars_log$log.weight., center = TRUE, scale = FALSE)
log.acc.mc <- scale(cars_log$log.acceleration., center = TRUE, scale = FALSE)
rgr_mc <- lm(log.mpg. ~ log.weight.mc + log.acc.mc + log.weight.mc*log.acc.mc, data =
cars_log)
summary(rgr_mc)
##
## Call:
## lm(formula = log.mpg. ~ log.weight.mc + log.acc.mc + log.weight.mc *
## log.acc.mc, data = cars_log)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.49728 -0.10145 -0.01102 0.09665 0.56416
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.106831 0.008857 350.791 < 2e-16 ***
## log.weight.mc -0.997466 0.031930 -31.239 < 2e-16 ***
## log.acc.mc 0.187500 0.051862 3.615 0.000339 ***
## log.weight.mc:log.acc.mc 0.252948 0.168071 1.505 0.133123
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1613 on 394 degrees of freedom
## Multiple R-squared: 0.7763, Adjusted R-squared: 0.7746
## F-statistic: 455.7 on 3 and 394 DF, p-value: < 2.2e-16

iv.Report a regression with an orthogonalized interaction term


lw_x_lc <- cars_log$log.weight.*cars_log$log.acceleration.
inter_rgr <- lm(lw_x_lc~cars_log$log.weight.+cars_log$log.acceleration., data = cars_log)
inter_ortho <- inter_rgr$residuals
rgr_ortho <- lm(log.mpg. ~ log.weight. + log.acceleration. + inter_ortho, data = cars_log)
summary(rgr_ortho)
##
## Call:
## lm(formula = log.mpg. ~ log.weight. + log.acceleration. + inter_ortho,
## data = cars_log)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.49728 -0.10145 -0.01102 0.09665 0.56416
##
7 HW12
The answers are colored in red except the code and the plot.
Student number: 108099036
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.48669 0.33430 31.369 < 2e-16 ***
## log.weight. -1.00048 0.03187 -31.395 < 2e-16 ***
## log.acceleration. 0.21084 0.04949 4.260 2.56e-05 ***
## inter_ortho 0.25295 0.16807 1.505 0.133
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1613 on 394 degrees of freedom
## Multiple R-squared: 0.7763, Adjusted R-squared: 0.7746
## F-statistic: 455.7 on 3 and 394 DF, p-value: < 2.2e-16

c. For each of the interaction term strategies above (raw, mean-centered, orthogonalized)
what is the correlation between that interaction term and the two variables that you multiplied
together?
# raw
#the correlation between interaction term and weight
cor(cars_log$log.weight., cars_log$log.weight.*cars_log$log.acceleration.)
## [1] 0.1083055

#the correlation between interaction term and acceleration

cor(cars_log$log.acceleration., cars_log$log.weight.*cars_log$log.acceleration.)

## [1] 0.852881

#mean-centered

#the correlation between interaction term and weight


cor(log.weight.mc, log.weight.mc*log.acc.mc)

## [,1]
## [1,] -0.2026948

#the correlation between interaction term and acceleration

cor(log.acc.mc, log.weight.mc*log.acc.mc)

## [,1]
## [1,] 0.3512271

#orthogonalized interaction terms

#the correlation between interaction term and weight


cor(cars_log$log.weight., inter_ortho)

## [1] 2.468461e-17

#the correlation between interaction term and acceleration

cor(cars_log$log.acceleration., inter_ortho)
8 HW12
The answers are colored in red except the code and the plot.
Student number: 108099036
## [1] -6.804111e-17

Question 3) We saw earlier that the number of cylinders does not seem to directly influence mpg
when car weight is also considered. But might cylinders have an indirect relationship with mpg
through its weight?

Let’s check whether weight mediates the relationship between cylinders and mpg, even when
other factors are controlled for. Use log.mpg., log.weight., and log.cylinders as your main
variables, and keep log.acceleration., model_year, and origin as control variables (see gray

variables in diagram).
a. Let’s try computing the direct effects first:
i.Model 1: Regress log.weight. over log.cylinders. only
(check whether number of cylinders has a significant direct effect on weight)
Yes, it has a significant direct effect on weight
cars_log <- with(auto, data.frame(log(mpg), log(weight), log(acceleration),log(cylinders),
model_year, origin))
weight_cylinder_regr <- lm(log.weight.~log.cylinders., data = cars_log)
summary(weight_cylinder_regr)
##
## Call:
## lm(formula = log.weight. ~ log.cylinders., data = cars_log)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.35473 -0.09076 -0.00147 0.09316 0.40374
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.60365 0.03712 177.92 <2e-16 ***
## log.cylinders. 0.82012 0.02213 37.06 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1329 on 396 degrees of freedom
## Multiple R-squared: 0.7762, Adjusted R-squared: 0.7757
## F-statistic: 1374 on 1 and 396 DF, p-value: < 2.2e-16

ii.Model 2: Regress log.mpg. over log.weight. and all control variables


(check whether weight has a significant direct effect on mpg with other variables statistically
controlled?)
Yes, it has a significant direct effect on mpg with other variables statistically controlled.
mpg_weight_regr <- lm(log.mpg.~log.weight. + log.acceleration. + model_year +
factor(origin), data = cars_log)
summary(mpg_weight_regr)
##
## Call:
9 HW12
The answers are colored in red except the code and the plot.
Student number: 108099036
## lm(formula = log.mpg. ~ log.weight. + log.acceleration. + model_year +
## factor(origin), data = cars_log)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.38275 -0.07032 0.00491 0.06470 0.39913
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.431155 0.312248 23.799 < 2e-16 ***
## log.weight. -0.876608 0.028697 -30.547 < 2e-16 ***
## log.acceleration. 0.051508 0.036652 1.405 0.16072
## model_year 0.032734 0.001696 19.306 < 2e-16 ***
## factor(origin)2 0.057991 0.017885 3.242 0.00129 **
## factor(origin)3 0.032333 0.018279 1.769 0.07770 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1156 on 392 degrees of freedom
## Multiple R-squared: 0.8856, Adjusted R-squared: 0.8841
## F-statistic: 606.8 on 5 and 392 DF, p-value: < 2.2e-16

b. What is the indirect effect of cylinders on mpg? (use the product of slopes between model
1 & 2)
weight_cylinder_regr$coefficients[2] * mpg_weight_regr$coefficients[2]
## log.cylinders.
## -0.7189275

c. Let’s bootstrap for the confidence interval of the indirect effect of cylinders on mpg
i.Bootstrap regression models 1 & 2, and compute the indirect effect each time:
boot_mediation<-function(model1, model2, dataset) {
boot_index<-sample(1:nrow(dataset), replace=TRUE)
data_boot<-dataset[boot_index, ]
regr1 <-lm(model1, data_boot)
regr2 <-lm(model2, data_boot)
return(regr1$coefficients[2] * regr2$coefficients[2])
}
set.seed(42)
indirect<-replicate(2000, boot_mediation(weight_cylinder_regr, mpg_weight_regr, cars_log))

what is its 95% CI of the indirect effect of log.cylinders. on log.mpg.?

quantile(indirect, probs=c(0.025, 0.975))

## 2.5% 97.5%
## -0.7784044 -0.6610106
10 HW12
The answers are colored in red except the code and the plot.
Student number: 108099036
ii.Show a density plot of the distribution of the 95% CI of the indirect effect
plot(density(indirect), main = "density plot")
abline(v=quantile(indirect, probs=c(0.025, 0.975)))

You might also like