HW12
HW12
The answers are colored in red except the code and the plot.
109090023
Student number: 108099036
BACS - HW 12
Question 1) Let’s visualize how weight and acceleration are related to mpg.
a. Let’s visualize how weight might moderate the relationship between acceleration and
mpg:
library(magrittr)
# Create a data.frame called cars_log with log-transformed columns for mpg, weight, and accele
ration
# model_year and origin don’t have to be transformed
auto <- read.table("C:/Users/user/Downloads/auto-data (2).txt", header=FALSE, na.strings = "?",
stringsAsFactors = F)
names(auto) <- c("mpg", "cylinders", "displacement", "horsepower", "weight",
"acceleration", "model_year", "origin", "car_name")
cars_log <- with(auto, data.frame(log(mpg), log(weight), log(acceleration), model_year, origin))
i.Create two subsets of your data, one for light-weight cars (less than mean weight)
and one for heavy cars (higher than the mean weight)
HINT: consider carefully how you compare log weights to mean weight
ii.Create a single scatter plot of acceleration vs. mpg, with different colors and/or shapes for light
versus heavy cars
b. Report the full summaries of two separate regressions for light and heavy cars where
log.mpg. is dependent on log.weight., log.acceleration., model_year and origin
##
## Call:
## lm(formula = log.mpg. ~ log.weight. + log.acceleration. + model_year +
## factor(origin), data = heavy_weight_cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.36811 -0.06937 0.00607 0.06969 0.43736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.188679 0.759983 9.459 < 2e-16 ***
## log.weight. -0.822352 0.077206 -10.651 < 2e-16 ***
## log.acceleration. 0.040140 0.057380 0.700 0.4852
## model_year 0.030317 0.003573 8.486 1.14e-14 ***
## factor(origin)2 0.091641 0.040392 2.269 0.0246 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
4 HW12
The answers are colored in red except the code and the plot.
Student number: 108099036
##
## Residual standard error: 0.1212 on 166 degrees of freedom
## Multiple R-squared: 0.7179, Adjusted R-squared: 0.7111
## F-statistic: 105.6 on 4 and 166 DF, p-value: < 2.2e-16
summary(light)
##
## Call:
## lm(formula = log.mpg. ~ log.weight. + log.acceleration. + model_year +
## factor(origin), data = light_weight_cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.36464 -0.07181 0.00349 0.06273 0.31339
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.86661 0.52767 13.013 <2e-16 ***
## log.weight. -0.83437 0.05662 -14.737 <2e-16 ***
## log.acceleration. 0.10956 0.05630 1.946 0.0529 .
## model_year 0.03383 0.00198 17.079 <2e-16 ***
## factor(origin)2 0.05129 0.01980 2.590 0.0102 *
## factor(origin)3 0.02621 0.01846 1.420 0.1571
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1112 on 221 degrees of freedom
## Multiple R-squared: 0.7292, Adjusted R-squared: 0.7231
## F-statistic: 119 on 5 and 221 DF, p-value: < 2.2e-16
c. (not graded) Using your intuition only: What do you observe about light versus heavy
cars so far?
By seeing from the plot, the slopes of two datas are similar. Yet, as it seems that there are more
samples in light_weight_cars, hence, it might be easier to reach the significant level.
Question 2) Using the fully transformed dataset from above (cars_log), to test whether we have
moderation.
a. (not graded) Between weight and acceleration ability (in seconds), use your intuition and
experience to state which variable might be a moderating versus independent variable, in
affecting mileage.
Acceleration ability (in seconds) might be a moderating versus independent variable, in affecting
mileage.
c. For each of the interaction term strategies above (raw, mean-centered, orthogonalized)
what is the correlation between that interaction term and the two variables that you multiplied
together?
# raw
#the correlation between interaction term and weight
cor(cars_log$log.weight., cars_log$log.weight.*cars_log$log.acceleration.)
## [1] 0.1083055
cor(cars_log$log.acceleration., cars_log$log.weight.*cars_log$log.acceleration.)
## [1] 0.852881
#mean-centered
## [,1]
## [1,] -0.2026948
cor(log.acc.mc, log.weight.mc*log.acc.mc)
## [,1]
## [1,] 0.3512271
## [1] 2.468461e-17
cor(cars_log$log.acceleration., inter_ortho)
8 HW12
The answers are colored in red except the code and the plot.
Student number: 108099036
## [1] -6.804111e-17
Question 3) We saw earlier that the number of cylinders does not seem to directly influence mpg
when car weight is also considered. But might cylinders have an indirect relationship with mpg
through its weight?
Let’s check whether weight mediates the relationship between cylinders and mpg, even when
other factors are controlled for. Use log.mpg., log.weight., and log.cylinders as your main
variables, and keep log.acceleration., model_year, and origin as control variables (see gray
variables in diagram).
a. Let’s try computing the direct effects first:
i.Model 1: Regress log.weight. over log.cylinders. only
(check whether number of cylinders has a significant direct effect on weight)
Yes, it has a significant direct effect on weight
cars_log <- with(auto, data.frame(log(mpg), log(weight), log(acceleration),log(cylinders),
model_year, origin))
weight_cylinder_regr <- lm(log.weight.~log.cylinders., data = cars_log)
summary(weight_cylinder_regr)
##
## Call:
## lm(formula = log.weight. ~ log.cylinders., data = cars_log)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.35473 -0.09076 -0.00147 0.09316 0.40374
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.60365 0.03712 177.92 <2e-16 ***
## log.cylinders. 0.82012 0.02213 37.06 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1329 on 396 degrees of freedom
## Multiple R-squared: 0.7762, Adjusted R-squared: 0.7757
## F-statistic: 1374 on 1 and 396 DF, p-value: < 2.2e-16
b. What is the indirect effect of cylinders on mpg? (use the product of slopes between model
1 & 2)
weight_cylinder_regr$coefficients[2] * mpg_weight_regr$coefficients[2]
## log.cylinders.
## -0.7189275
c. Let’s bootstrap for the confidence interval of the indirect effect of cylinders on mpg
i.Bootstrap regression models 1 & 2, and compute the indirect effect each time:
boot_mediation<-function(model1, model2, dataset) {
boot_index<-sample(1:nrow(dataset), replace=TRUE)
data_boot<-dataset[boot_index, ]
regr1 <-lm(model1, data_boot)
regr2 <-lm(model2, data_boot)
return(regr1$coefficients[2] * regr2$coefficients[2])
}
set.seed(42)
indirect<-replicate(2000, boot_mediation(weight_cylinder_regr, mpg_weight_regr, cars_log))
## 2.5% 97.5%
## -0.7784044 -0.6610106
10 HW12
The answers are colored in red except the code and the plot.
Student number: 108099036
ii.Show a density plot of the distribution of the 95% CI of the indirect effect
plot(density(indirect), main = "density plot")
abline(v=quantile(indirect, probs=c(0.025, 0.975)))