This tutorial covers hypothesis testing and interpreting interaction effects using tidy regression output in R, specifically the broom
and margins
packages.
Consider this example: Women at the Deer Valley Utility Company claim that the company does not reward their job performances to the same degree as the job performances of men. The Director of the Office of Equal Opportunity wants to know if there is statistical evidence to support this complaint.
We have the following data for 60 people:
Salary
: thousands of dollars.
Gender
: “1” for men and “0” for women.
Rating
: The employee’s average performance rating over the last two years. The scale has a top score of 100. The company claims that performance rating is the primary factor in the determination of salary.
Credits
earned either in college courses or company programs.
Note: Click on the Code
button to see the R code for each table and figure. The R Markdown file that made this document is here. You can reproduce and/or edit it by cloning this repository: https://github.com/judgelord/813/
load("data/EX3.Rdata") # loads data as "d"
First, let’s examine the raw data.
# scatterplots
ggplot(d) +
aes(x = Rating, y = Salary) +
geom_point(aes(alpha = Credits))
# add color by Gender, and save plot as p for future use
p <- ggplot(d) +
aes(x = Rating, y = Salary, color = Gender) +
geom_point(aes(alpha = Credits)) + scale_color_discrete()
p
# means per group
p + geom_hline(aes(yintercept = `mean(Salary)`, color = Gender), data = d %>% group_by(Gender) %>% summarise(mean(Salary)))
m <- lm(Salary ~ Gender + Rating, data = d) %>%
augment()
p + geom_line(aes(y = m$.fitted))
# quick y ~ mx + b linear regression per group
p + geom_smooth(method = lm, se = F, fullrange = T)
H1: Job performances of women are rewarded differently than the job performances of men. That is, the relationship between salary and performance differs by gender.
H0: There is no difference in how men’s performance and women’s performance are rewarded. That is, the relationship between salary and performance does not differ by gender.
(There are least two other ways to write this hypothesis and at least one slightly different hypothesis that might better address the question.)
The dependent variable is salary. For employee \(i\), let their salary be \(y_i\) in the model \(y_i = \beta_0 + ... + \epsilon_i\). \(\beta_0\) is the predicted salary, \(\hat{y}\), when all other variables in the model are 0.
Does the model, \(y_i = \beta_0 + \beta_1*Gender_i + \epsilon_i\), test the relationship of interest?
model <- lm(Salary ~ Gender, data = d)
m <- model %>%
tidy(conf.int = TRUE)
m
## # A tibble: 2 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 62.8 0.875 71.7 2.47e-58 61.0 64.5
## 2 GenderMen 7.43 1.20 6.20 6.39e- 8 5.03 9.83
ggplot(m %>% filter(term != "(Intercept)")) +
aes(x = term,
y = estimate,
ymin = conf.low,
ymax = conf.high) +
geom_pointrange() +
geom_hline(yintercept = 0, color = "grey") +
coord_flip() +
labs(x="", y="OLS Estimate")
# illustrating with yhat formula; more easily done with augment()
b0 <- m$estimate[1]
b1 <- m$estimate[2]
p +
geom_line(aes(color = "Men", # yhat for men
y = b0 + b1*1) ) +
geom_line(aes(color = "Women", # yhat for women
y = b0 + b1*0) ) +
geom_ribbon(aes(ymax = b0 + b1*1,
ymin = b0 + b1*0), alpha = .1, color = NA)
Basically, a t-test (compare model output to simple t-test of difference in mean Salary
by Gender
).
# tidy model output object
m
## # A tibble: 2 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 62.8 0.875 71.7 2.47e-58 61.0 64.5
## 2 GenderMen 7.43 1.20 6.20 6.39e- 8 5.03 9.83
# t-test
t.test(Salary ~ Gender, data = d) %>% tidy()
## # A tibble: 1 x 10
## estimate estimate1 estimate2 statistic p.value parameter conf.low
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -7.43 62.8 70.2 -6.38 4.39e-8 53.5 -9.77
## # … with 3 more variables: conf.high <dbl>, method <chr>,
## # alternative <chr>
Does the model, \(y_i = \beta_0 + \beta_1*Gender_i + \beta_2*Rating_i + \epsilon_i\), test the relationship of interest?
model_1 <- lm(Salary ~ Gender + Rating, data = d)
m1 <- model_1 %>%
tidy(conf.int = TRUE)
m1
## # A tibble: 3 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 44.7 1.10 40.6 9.61e-44 42.5 46.9
## 2 GenderMen 6.57 0.485 13.5 1.91e-19 5.60 7.54
## 3 Rating 0.257 0.0149 17.3 2.20e-24 0.228 0.287
ggplot(m1 %>% filter(term != "(Intercept)")) +
aes(x = term,
y = estimate,
ymin = conf.low,
ymax = conf.high) +
geom_pointrange() +
geom_hline(yintercept = 0, color = "grey") +
coord_flip() +
labs(x="", y="OLS Estimate")
# illustrating with yhat formula; more easily done with augment()
b0 <- m1$estimate[1]
b1 <- m1$estimate[2]
b2 <- m1$estimate[3]
p +
geom_line(aes(color = "Men", # yhat for men
y = b0 + b1*1 + b2*Rating) ) +
geom_line(aes(color = "Women", # yhat for women
y = b0 + b1*0 + b2*Rating) ) +
geom_ribbon(aes(ymax = b0 + b1*1+ b2*Rating,
ymin = b0 + b1*0+ b2*Rating), alpha = .1, color = NA)
Why does this model fail to test the hypothesis? What hypothesis did it test? How should we interpret the coefficient of 6.6 on Gender? How should we interpret the coefficient of 0.3 on Performance Rating?
m1 <- augment(model_1)
p +
geom_line(aes(y = m1$.fitted)) + # with .fitted from augment()
scale_x_continuous(limits = c(-20, max(d$Rating))) +
scale_y_continuous(limits = c(b0-5, max(d$Salary))) +
#geom_hline(yintercept = b0) +
geom_label(aes(color = "Men"), x = 0, y = b0 + b1, label = expression(beta[0]+beta[1]*1), hjust = 0, vjust = 0,show.legend = FALSE, check_overlap = T)+
geom_label(aes(color = "Women"), x = 0, y = b0, label = expression(beta[0]+beta[1]*0), hjust = 0, vjust = 1, show.legend = FALSE, check_overlap = T)+
geom_label(aes(color = "Men"), x = 0, y = b0 + b1, label = round(b0 + b1,1), hjust = 1, color = "black", show.legend = FALSE, check_overlap = T)+
geom_label(aes(color = "Women"), x = 0, y = b0, label = round(b0,1), hjust = 1, color = "black", show.legend = FALSE, check_overlap = T)+
geom_point(aes(color = "Men"), x = 0, y = b0 + b1, shape = 1)+
geom_point(aes(color = "Women"), x = 0, y = b0, shape = 1)
Let’s also plot the residuals. Aside from interpretation, we want to know where our model is a better or worse fit with the data, especially if residuals seem to vary systematically over the range of our data.
augment
computes tidy residuals, among other cool things.
p +
geom_line(aes(y = m1$.fitted)) + # with .fitted from augment()
geom_point(aes(y = m1$.fitted), shape = 1, alpha = .2) + # with .fitted from augment()
geom_segment(aes(xend = Rating, yend = m1$.fitted ), alpha = .2, size = 2)
ggplot(m1) +
aes(y = .resid, x = Rating) +
geom_point(aes(color = Gender)) +
scale_color_discrete() +
## to show how risiduals are the distance between an
## observation and the regression line:
geom_hline(yintercept = 0, color = "dark grey") +
geom_text(x= mean(m1$Rating), y = 0,
label = "Regression line") +
geom_col(aes(fill = Gender), alpha = .2, position ="identity") +
## + labels:
labs(title = "Residuals (Observed - Predicted Salary)",
y = "Residuals (in thousands of dollars)")
The model, \(y_i = \beta_0 + \beta_1*Gender_i + \beta_2*Rating_i + \beta_3*Gender_i*Rating_i + \epsilon_i\), does test the relationship of interest; how gender may affect the relationship between performance and pay, i.e. is there a significant interaction of gender and performance on predicted pay?
## Note: when we include the interaction, lm() adds the direct effects
model_2 <- lm(Salary ~ Gender*Rating, data = d)
m2 <- model_2 %>%
tidy(conf.int = TRUE)
m2
## # A tibble: 4 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 49.5 1.21 40.8 2.50e-43 47.0 51.9
## 2 GenderMen -3.25 1.75 -1.85 6.92e- 2 -6.76 0.264
## 3 Rating 0.189 0.0168 11.3 5.04e-16 0.156 0.223
## 4 GenderMen:Rating 0.137 0.0238 5.74 4.02e- 7 0.0889 0.184
ggplot(m2 %>% filter(term != "(Intercept)")) +
aes(x = term,
y = estimate,
ymin = conf.low,
ymax = conf.high) +
geom_pointrange() +
geom_hline(yintercept = 0, color = "grey") +
coord_flip() +
labs(x="", y="OLS Estimate")
# illustrating with yhat equasion; more easily done with augment()
b0 <- m2$estimate[1]
b1 <- m2$estimate[2]
b2 <- m2$estimate[3]
b3 <- m2$estimate[4]
p +
geom_line(aes(color = "Men", # yhat for men
y = b0 + b1*1 + b2*Rating + b3*1*Rating) ) +
geom_line(aes(color = "Women", # yhat for women
y = b0 + b1*0 + b2*Rating+ b3*0*Rating) ) +
geom_ribbon(aes(ymax = b0 + b1*1+ b2*Rating+ b3*1*Rating,
ymin = b0 + b1*0+ b2*Rating+ b3*0*Rating), alpha = .1, color = NA)
How should we interpret a \(\beta_0\) of 49.47?
How should we interpret the coefficient of -3.25 on Gender?
How should we interpret the coefficient of 0.189 on Rating?
How should we interpret the coefficient of 0.137 on Gender*Rating?
This model seems to fit the data well, for this range of Ratings, but do we have theoretical reasons for suspicion? Why or why not? If so, how might we adjust our data or model correct this?
m2 <- augment(model_2)
p +
geom_line(aes(y = m2$.fitted)) + # with .fitted from augment()
scale_x_continuous(limits = c(0, max(d$Rating))) +
scale_y_continuous(limits = c(45, max(d$Salary))) +
geom_hline(yintercept = b0) +
geom_label(aes(color = "Men"), x = 0, y = b0 + b1, label = expression(beta[0]+beta[1]*1), hjust = 0, vjust = 1, show.legend = FALSE)+
geom_label(aes(color = "Women"), x = 0, y = b0, label = expression(beta[0]+beta[1]*0), hjust = 0,vjust = 0, show.legend = FALSE)+
geom_point(aes(color = "Men"), x = 0, y = b0 + b1, shape = 1)+
geom_point(aes(color = "Women"), x = 0, y = b0, shape = 1)
What is a more meaningful interpretation?
\(\beta_2*Rating + \beta_3*Gender*Rating = \beta_2 + \beta_3*Gender\) per Rating unit.
For every additional performance rating point, women get paid an additional $189 and men get paid an additional $326.
We can still calculate the effect of Gender or Rating. When there is an interaction term in the model, this is called the marginal effect.
The margins
package calculates confidence intervals on marginal effects.
The Average Marginal Effect is the average change in y when x increases by one unit. Note the similar effect size as we found for the coefficient on Gender in the model with no interaction term. In theory, how are these estimates similar? How are they different?
marginal_effects <- margins(model_2)
summary(marginal_effects)
## factor AME SE z p lower upper
## GenderMen 6.5831 0.3886 16.9386 0.0000 5.8213 7.3448
## Rating 0.2621 0.0119 21.9875 0.0000 0.2387 0.2855
me <- as_tibble(summary(marginal_effects))
ggplot(me) +
aes(x = factor,
y = AME,
ymin = lower,
ymax = upper)+
geom_pointrange() +
geom_hline(yintercept = 0, color = "gray80") +
coord_flip() +
labs(x = NULL, y = "Average Marginal Effect")
## use the cplot method in the margins library to do the work of calculating effects but without drawing its plot.
cplot_points <- cplot(model_2, x = "Gender", draw = F)
## xvals yvals upper lower
## 1 Women 63.10542 63.66189 62.54896
## 2 Men 69.68850 70.20867 69.16833
ggplot(data = cplot_points) +
aes(x = reorder(xvals, yvals),
y = yvals,
ymin = lower,
ymax = upper) +
geom_pointrange() +
coord_flip() +
labs(x = NULL, y = "Predicted Salary (thousands of dollars)")
Let’s also plot the residuals. Aside from interpretation, we want to know where our model is a better or worse fit with the data, especially if residuals seem to vary systematically over the range of our data.
augment
computes tidy residuals, among other cool things.
p +
geom_line(aes(y = m2$.fitted)) + # with .fitted from augment()
geom_point(aes(y = m2$.fitted), shape = 1, alpha = .2) + # with .fitted from augment()
geom_segment(aes(xend = Rating, yend = m2$.fitted ), alpha = .2, size = 2)
ggplot(m2) +
aes(y = .resid, x = Rating) +
geom_point(aes(color = Gender)) +
scale_color_discrete() +
## to show how risiduals are the distance between an
## observation and the regression line:
geom_hline(yintercept = 0, color = "dark grey") +
geom_text(x= mean(m2$Rating), y = 0,
label = "Regression line") +
geom_col(aes(fill = Gender), alpha = .2, position ="identity") +
## + labels:
labs(title = "Residuals (Observed - Predicted Salary)",
y = "Residuals (in thousands of dollars)")
Testing the hypothesis that adding the interaction term affects goodness of fit against the null hypothesis that the two models are equivalent:
library(lmtest)
lrtest(model_1, model_2)
## Likelihood ratio test
##
## Model 1: Salary ~ Gender + Rating
## Model 2: Salary ~ Gender * Rating
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 4 -121.03
## 2 5 -107.15 1 27.758 1.375e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary()
returns some statistics about our model. R-squared (\(1- RSS/TSS\)) indicates how much of the variance in salary our models explain. The F statistic is the mean regression sum of squares divided by the mean error sum of squares, so a larger F means more variance explained by the model and/or less unexplained variance. Its value will range from zero (a model that explains nothing compared to how much it does not explain) to an arbitrarily large number (a model that explains a lot compared to how much it does not explain). The p-value is for the null hypothesis for the full model is true (i.e., that all of the regression coefficients are zero).
Which model has a higher R-Squared? Why? Is a higher R-Squared value always better?
summary(model_1)
##
## Call:
## lm(formula = Salary ~ Gender + Rating, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8982 -1.5364 0.0581 1.3178 4.4643
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.68752 1.10151 40.57 <2e-16 ***
## GenderMen 6.56843 0.48546 13.53 <2e-16 ***
## Rating 0.25737 0.01485 17.33 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.866 on 57 degrees of freedom
## Multiple R-squared: 0.904, Adjusted R-squared: 0.9006
## F-statistic: 268.4 on 2 and 57 DF, p-value: < 2.2e-16
summary(model_2)
##
## Call:
## lm(formula = Salary ~ Gender * Rating, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.96811 -1.06116 0.07077 1.17007 2.72592
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.47020 1.21324 40.775 < 2e-16 ***
## GenderMen -3.24973 1.75423 -1.853 0.0692 .
## Rating 0.18929 0.01680 11.270 5.04e-16 ***
## GenderMen:Rating 0.13650 0.02378 5.739 4.02e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.494 on 56 degrees of freedom
## Multiple R-squared: 0.9396, Adjusted R-squared: 0.9363
## F-statistic: 290.2 on 3 and 56 DF, p-value: < 2.2e-16
glance()
gives us even more statistics with which to compare model fit. AIC and BIC are penalized-likelihood criteria; they penalize models for adding more variables (“measure of fit + complexity penalty”) because adding more variables will usually give a better fit for any sample of data, but possibly a worse model for the broader population (and for parsimony of theory!). Thus, lower AIC and BIC are better.
glance(model_1)
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.904 0.901 1.87 268. 9.83e-30 3 -121. 250. 258.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
glance(model_2)
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.940 0.936 1.49 290. 4.41e-34 4 -107. 224. 235.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
Suppose, rather than forcing a binary response, the company surveyed gender as a continuous variable, asking employees to report the extent to which their gender performance conformed with hegemonic masculinity on a scale from 0-100.
The data:
library(truncnorm)
## recode gender as tuncated normal
## with mean "men"" and "women" at upper and lower quartiles
## and 0 or 100 being two standard deviations from each mean, respectivly
d %<>% mutate(Masculinity = ifelse(Gender == "Men",
rtruncnorm(60, a=0, b=100, mean = 75, sd = 12.5) ,
rtruncnorm(60, a=0, b=100, mean = 25, sd = 12.5)
))
library(ggridges)
ggplot(d) +
aes(x = Masculinity, fill = ..x.., y = 0) +
geom_density_ridges_gradient(scale = 3, rel_min_height = 0.01) +
labs(y = "Density", x = "Hegemonic Masculinity Conformance") +
guides(fill = "none")
# scatterplot
p <- ggplot(d) +
aes(x = Rating, y = Salary, color = Masculinity) +
geom_point() + scale_color_viridis_c()
p
The model: \(y_i = \beta_0 + \beta_1*Gender_i + \beta_2*Rating_i + \beta_3*Gender_i*Rating_i + \epsilon_i\)
## Note: when we include the interaction, lm() adds the direct effects
model_3 <- lm(Salary ~ Masculinity*Rating, data = d)
m3 <- model_3 %>%
tidy(conf.int = TRUE)
m3
## # A tibble: 4 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 49.2 2.69 18.3 2.75e-25 4.38e+1 54.6
## 2 Masculinity -0.0402 0.0481 -0.835 4.07e- 1 -1.37e-1 0.0562
## 3 Rating 0.160 0.0372 4.29 7.03e- 5 8.52e-2 0.234
## 4 Masculinity:Rat… 0.00215 0.000660 3.26 1.92e- 3 8.27e-4 0.00347
ggplot(m3 %>% filter(term != "(Intercept)")) +
aes(x = term,
y = estimate,
ymin = conf.low,
ymax = conf.high) +
geom_pointrange() +
geom_hline(yintercept = 0, color = "grey") +
coord_flip() +
labs(x="", y="OLS Estimate")
# illustrating with yhat formula; more easily done with augment()
b0 <- m3$estimate[1]
b1 <- m3$estimate[2]
b2 <- m3$estimate[3]
b3 <- m3$estimate[4]
# We can pick a few values (e.g., quartiles)
p +
geom_line(aes(linetype = "25% Masculinity",
y = b0 + b1*25 + b2*Rating+ b3*25*Rating), color = "black") +
geom_line(aes(linetype = "75% Masculinity",
y = b0 + b1*75 + b2*Rating + b3*75*Rating), color = "black" ) +
geom_ribbon(aes(ymax = b0 + b1*75+ b2*Rating+ b3*75*Rating,
ymin = b0 + b1*25+ b2*Rating+ b3*25*Rating), alpha = .1, color = NA)
# Or we can pick a bunch of values (e.g., all combinations of values present in our data)
m3 <- augment(model_3) %>%
# a unique id for each obs
mutate(id = row_number())
# add an intercept observation for each person
m3intercept <- m3 %>% full_join(m3 %>% mutate(.fitted = b0 + Masculinity*b1,
Salary = b0 + Masculinity*b1,
Rating = 0))
ggplot(m3intercept) +
aes(x = Rating, y = Salary, color = Masculinity) +
geom_line(aes(y = .fitted, group = id))+
geom_point() +
scale_color_viridis_c()
How should we interpret a \(\beta_0\) of 49.188?
How should we interpret the coefficient of -0.04 on Masculinity?
How should we interpret the coefficient of 0.16 on Rating?
How should we interpret the coefficient of 0.002 on Masculinity*Rating?
\(\beta_2*Rating + \beta_3*Masculinity*Rating = \beta_2 + \beta_3*Masculinity\) per Rating unit.
For every additional performance rating point, a person reporting no conformance with hegemonic masculinity gets paid an additional $160 and a person reporting perfect conformance with hegemonic masculinity get paid an additional $375.
We can still calculate the effect of Masculinity or Rating. When there is an interaction term in the model, this is called the marginal effect.
The margins
package calculates confidence intervals on marginal effects.
The Average Marginal Effect is the average change in y when x increases by one unit. Note the similar effect size as we found for the coefficient of Masculinity in the model with no interaction term. In theory, how are these estimates similar? How are they different?
Notice that the AME of Masculinity is significantly different from 0, even though the direct effect was not.
marginal_effects <- margins(model_3)
summary(marginal_effects)
## factor AME SE z p lower upper
## Masculinity 0.1146 0.0114 10.0823 0.0000 0.0923 0.1369
## Rating 0.2718 0.0180 15.0996 0.0000 0.2365 0.3070
me <- as_tibble(summary(marginal_effects))
ggplot(me) +
aes(x = factor,
y = AME,
ymin = lower,
ymax = upper)+
geom_hline(yintercept = 0, color = "gray80") +
geom_pointrange() +
coord_flip() +
labs(x = NULL, y = "Average Marginal Effect")
## use the cplot method in the margins library to do the work calculating effects but without drawing its plot.
cplot_points <- cplot(model_3, x = "Masculinity", draw = F)
## xvals yvals upper lower
## 1 7.45953 61.54732 62.69767 60.39696
## 2 11.23511 61.98010 63.05827 60.90193
## 3 15.01068 62.41289 63.42074 61.40504
## 4 18.78626 62.84568 63.78547 61.90588
## 5 22.56184 63.27846 64.15301 62.40392
## 6 26.33742 63.71125 64.52402 62.89848
## 7 30.11299 64.14404 64.89936 63.38871
## 8 33.88857 64.57682 65.28010 63.87354
## 9 37.66415 65.00961 65.66752 64.35170
## 10 41.43972 65.44240 66.06307 64.82172
## 11 45.21530 65.87518 66.46830 65.28206
## 12 48.99088 66.30797 66.88460 65.73134
## 13 52.76645 66.74076 67.31291 66.16860
## 14 56.54203 67.17354 67.75353 66.59356
## 15 60.31761 67.60633 68.20596 67.00670
## 16 64.09319 68.03912 68.66911 67.40912
## 17 67.86876 68.47190 69.14152 67.80229
## 18 71.64434 68.90469 69.62165 68.18773
## 19 75.41992 69.33748 70.10808 68.56687
## 20 79.19549 69.77026 70.59959 68.94093
ggplot(data = cplot_points) +
aes(x = xvals,
y = yvals,
ymin = lower,
ymax = upper) +
geom_pointrange() +
geom_hline(yintercept = 0, color = "grey") +
labs(x = "Masculinity",
y = "Predicted Salary")
Let’s also plot the residuals. Aside from interpretation, we want to know where our model is a better or worse fit with the data, especially if residuals seem to vary systematically over the range of our data.
augment
computs tidy residuals, among other cool things.
ggplot(m3) +
aes(y = .resid, x = Rating) +
geom_point(aes(color = Masculinity)) +
## to show how risiduals are the distance between an
## observation and the regression line:
geom_hline(yintercept = 0,
color = "dark grey") +
scale_color_viridis_c() +
geom_text(x= mean(m3$Rating),
y = 0,
label = "Regression line") +
geom_col(aes(fill = Masculinity),
alpha = .4,
position ="identity") +
## + labels:
labs(title = "Residuals (Observed - Predicted Salary)",
y = "Residuals (in thousands of dollars)")
Which model has a higher R-Squared? Why? Is a higher R-Squared always better?
summary(model_2)
##
## Call:
## lm(formula = Salary ~ Gender * Rating, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.96811 -1.06116 0.07077 1.17007 2.72592
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.47020 1.21324 40.775 < 2e-16 ***
## GenderMen -3.24973 1.75423 -1.853 0.0692 .
## Rating 0.18929 0.01680 11.270 5.04e-16 ***
## GenderMen:Rating 0.13650 0.02378 5.739 4.02e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.494 on 56 degrees of freedom
## Multiple R-squared: 0.9396, Adjusted R-squared: 0.9363
## F-statistic: 290.2 on 3 and 56 DF, p-value: < 2.2e-16
summary(model_3)
##
## Call:
## lm(formula = Salary ~ Masculinity * Rating, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.9400 -1.3512 -0.0496 1.3754 5.0054
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.1884790 2.6890212 18.292 < 2e-16 ***
## Masculinity -0.0401978 0.0481305 -0.835 0.40716
## Rating 0.1597006 0.0371901 4.294 7.03e-05 ***
## Masculinity:Rating 0.0021494 0.0006602 3.256 0.00192 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.256 on 56 degrees of freedom
## Multiple R-squared: 0.8622, Adjusted R-squared: 0.8548
## F-statistic: 116.8 on 3 and 56 DF, p-value: < 2.2e-16
Which model has a lower AIC/BIC? Why?
glance(model_2)
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.940 0.936 1.49 290. 4.41e-34 4 -107. 224. 235.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
glance(model_3)
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.862 0.855 2.26 117. 4.48e-24 4 -132. 274. 284.
## # … with 2 more variables: deviance <dbl>, df.residual <int>