This tutorial covers hypothesis testing and interpreting interaction effects using tidy regression output in R, specifically the broom and margins packages.

Consider this example: Women at the Deer Valley Utility Company claim that the company does not reward their job performances to the same degree as the job performances of men. The Director of the Office of Equal Opportunity wants to know if there is statistical evidence to support this complaint.

We have the following data for 60 people:

Salary: thousands of dollars.

Gender: “1” for men and “0” for women.

Rating: The employee’s average performance rating over the last two years. The scale has a top score of 100. The company claims that performance rating is the primary factor in the determination of salary.

Credits earned either in college courses or company programs.

Note: Click on the Code button to see the R code for each table and figure. The R Markdown file that made this document is here. You can reproduce and/or edit it by cloning this repository: https://github.com/judgelord/813/

load("data/EX3.Rdata") # loads data as "d"

First, let’s examine the raw data.

# scatterplots
ggplot(d) + 
  aes(x = Rating, y = Salary) + 
  geom_point(aes(alpha = Credits)) 

# add color by Gender, and save plot as p for future use
p <- ggplot(d) + 
  aes(x = Rating, y = Salary, color = Gender) + 
  geom_point(aes(alpha = Credits)) + scale_color_discrete()
p

What do we mean when we say there is a pay gap?

# means per group
p + geom_hline(aes(yintercept = `mean(Salary)`, color = Gender), data = d %>% group_by(Gender) %>% summarise(mean(Salary)))

m <- lm(Salary ~ Gender + Rating, data = d) %>% 
  augment() 

p + geom_line(aes(y = m$.fitted)) 

# quick y ~ mx + b linear regression per group
p + geom_smooth(method = lm, se = F, fullrange = T)

Hypotheses

H1: Job performances of women are rewarded differently than the job performances of men. That is, the relationship between salary and performance differs by gender.

H0: There is no difference in how men’s performance and women’s performance are rewarded. That is, the relationship between salary and performance does not differ by gender.

(There are least two other ways to write this hypothesis and at least one slightly different hypothesis that might better address the question.)

Linear regression

The dependent variable is salary. For employee \(i\), let their salary be \(y_i\) in the model \(y_i = \beta_0 + ... + \epsilon_i\). \(\beta_0\) is the predicted salary, \(\hat{y}\), when all other variables in the model are 0.


A model

Does the model, \(y_i = \beta_0 + \beta_1*Gender_i + \epsilon_i\), test the relationship of interest?

model <- lm(Salary ~ Gender, data = d) 
m <- model %>% 
  tidy(conf.int = TRUE) 
m
## # A tibble: 2 x 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)    62.8      0.875     71.7  2.47e-58    61.0      64.5 
## 2 GenderMen       7.43     1.20       6.20 6.39e- 8     5.03      9.83
ggplot(m %>% filter(term != "(Intercept)")) + 
  aes(x = term,
      y = estimate, 
      ymin = conf.low, 
      ymax = conf.high) + 
  geom_pointrange() + 
  geom_hline(yintercept = 0, color = "grey") + 
  coord_flip() + 
  labs(x="", y="OLS Estimate")

Let’s plot the results against our data!

# illustrating with yhat formula; more easily done with augment()
b0 <- m$estimate[1]
b1 <- m$estimate[2]

p +
  geom_line(aes(color = "Men", # yhat for men
                y = b0 + b1*1) ) + 
  geom_line(aes(color = "Women", # yhat for women
                y = b0 + b1*0) )  +
  geom_ribbon(aes(ymax = b0 + b1*1, 
                  ymin = b0 + b1*0), alpha = .1, color = NA)

Basically, a t-test (compare model output to simple t-test of difference in mean Salary by Gender).

# tidy model output object
m 
## # A tibble: 2 x 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)    62.8      0.875     71.7  2.47e-58    61.0      64.5 
## 2 GenderMen       7.43     1.20       6.20 6.39e- 8     5.03      9.83
# t-test
t.test(Salary ~ Gender, data = d) %>% tidy()
## # A tibble: 1 x 10
##   estimate estimate1 estimate2 statistic p.value parameter conf.low
##      <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>
## 1    -7.43      62.8      70.2     -6.38 4.39e-8      53.5    -9.77
## # … with 3 more variables: conf.high <dbl>, method <chr>,
## #   alternative <chr>

Another model

Does the model, \(y_i = \beta_0 + \beta_1*Gender_i + \beta_2*Rating_i + \epsilon_i\), test the relationship of interest?

model_1 <- lm(Salary ~ Gender + Rating, data = d) 
m1 <- model_1 %>% 
  tidy(conf.int = TRUE) 
m1
## # A tibble: 3 x 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)   44.7      1.10        40.6 9.61e-44   42.5      46.9  
## 2 GenderMen      6.57     0.485       13.5 1.91e-19    5.60      7.54 
## 3 Rating         0.257    0.0149      17.3 2.20e-24    0.228     0.287
ggplot(m1 %>% filter(term != "(Intercept)")) + 
  aes(x = term,
      y = estimate, 
      ymin = conf.low, 
      ymax = conf.high) + 
  geom_pointrange() + 
  geom_hline(yintercept = 0, color = "grey") + 
  coord_flip() + 
  labs(x="", y="OLS Estimate")

Let’s plot the results against our data!

# illustrating with yhat formula; more easily done with augment()
b0 <- m1$estimate[1]
b1 <- m1$estimate[2]
b2 <- m1$estimate[3]

p +
  geom_line(aes(color = "Men", # yhat for men
                y = b0 + b1*1 + b2*Rating) ) + 
  geom_line(aes(color = "Women", # yhat for women
                y = b0 + b1*0 + b2*Rating) )  +
  geom_ribbon(aes(ymax = b0 + b1*1+ b2*Rating, 
                  ymin = b0 + b1*0+ b2*Rating), alpha = .1, color = NA)

Interpretation

Why does this model fail to test the hypothesis? What hypothesis did it test? How should we interpret the coefficient of 6.6 on Gender? How should we interpret the coefficient of 0.3 on Performance Rating?

m1 <- augment(model_1)

p + 
  geom_line(aes(y = m1$.fitted)) + # with .fitted from augment()
  scale_x_continuous(limits = c(-20, max(d$Rating))) + 
  scale_y_continuous(limits = c(b0-5, max(d$Salary))) + 
  #geom_hline(yintercept =  b0) + 
  geom_label(aes(color = "Men"), x = 0, y = b0 + b1, label = expression(beta[0]+beta[1]*1), hjust = 0, vjust = 0,show.legend = FALSE, check_overlap = T)+ 
  geom_label(aes(color = "Women"), x = 0, y = b0, label = expression(beta[0]+beta[1]*0), hjust = 0, vjust = 1, show.legend = FALSE, check_overlap = T)+
    geom_label(aes(color = "Men"), x = 0, y = b0 + b1, label = round(b0 + b1,1), hjust = 1,  color = "black",  show.legend = FALSE, check_overlap = T)+
  geom_label(aes(color = "Women"), x = 0, y = b0, label = round(b0,1), hjust = 1, color = "black", show.legend = FALSE, check_overlap = T)+
  geom_point(aes(color = "Men"), x = 0, y = b0 + b1, shape = 1)+ 
  geom_point(aes(color = "Women"), x = 0, y = b0, shape = 1) 

Fit

Let’s also plot the residuals. Aside from interpretation, we want to know where our model is a better or worse fit with the data, especially if residuals seem to vary systematically over the range of our data.

augment computes tidy residuals, among other cool things.

p + 
  geom_line(aes(y = m1$.fitted)) + # with .fitted from augment()
  geom_point(aes(y = m1$.fitted), shape = 1, alpha = .2) + # with .fitted from augment()
  geom_segment(aes(xend = Rating, yend = m1$.fitted ), alpha = .2, size = 2)

ggplot(m1) +
  aes(y = .resid, x = Rating) + 
  geom_point(aes(color = Gender)) + 
  scale_color_discrete() + 
  ## to show how risiduals are the distance between an 
  ## observation and the regression line:
  geom_hline(yintercept = 0, color = "dark grey") +
  geom_text(x= mean(m1$Rating), y = 0, 
            label = "Regression line") +
  geom_col(aes(fill = Gender), alpha = .2, position ="identity") +
  ## + labels:
  labs(title = "Residuals (Observed - Predicted Salary)",
       y = "Residuals (in thousands of dollars)") 

Yet another model

The model, \(y_i = \beta_0 + \beta_1*Gender_i + \beta_2*Rating_i + \beta_3*Gender_i*Rating_i + \epsilon_i\), does test the relationship of interest; how gender may affect the relationship between performance and pay, i.e. is there a significant interaction of gender and performance on predicted pay?

## Note: when we include the interaction, lm() adds the direct effects
model_2 <- lm(Salary ~ Gender*Rating, data = d) 

m2 <- model_2 %>% 
  tidy(conf.int = TRUE) 
m2
## # A tibble: 4 x 7
##   term             estimate std.error statistic  p.value conf.low conf.high
##   <chr>               <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)        49.5      1.21       40.8  2.50e-43  47.0       51.9  
## 2 GenderMen          -3.25     1.75       -1.85 6.92e- 2  -6.76       0.264
## 3 Rating              0.189    0.0168     11.3  5.04e-16   0.156      0.223
## 4 GenderMen:Rating    0.137    0.0238      5.74 4.02e- 7   0.0889     0.184
ggplot(m2 %>% filter(term != "(Intercept)")) + 
  aes(x = term,
      y = estimate, 
      ymin = conf.low, 
      ymax = conf.high) + 
  geom_pointrange() + 
  geom_hline(yintercept = 0, color = "grey") + 
  coord_flip() + 
  labs(x="", y="OLS Estimate")

Let’s plot the results against our data!

# illustrating with yhat equasion; more easily done with augment()
b0 <- m2$estimate[1]
b1 <- m2$estimate[2]
b2 <- m2$estimate[3]
b3 <- m2$estimate[4]

p +
  geom_line(aes(color = "Men", # yhat for men
                y = b0 + b1*1 + b2*Rating + b3*1*Rating) ) + 
  geom_line(aes(color = "Women", # yhat for women
                y = b0 + b1*0 + b2*Rating+ b3*0*Rating) )  +
  geom_ribbon(aes(ymax = b0 + b1*1+ b2*Rating+ b3*1*Rating, 
                  ymin = b0 + b1*0+ b2*Rating+ b3*0*Rating), alpha = .1, color = NA)

Interpretation

  • How should we interpret a \(\beta_0\) of 49.47?

  • How should we interpret the coefficient of -3.25 on Gender?

  • How should we interpret the coefficient of 0.189 on Rating?

  • How should we interpret the coefficient of 0.137 on Gender*Rating?

  • This model seems to fit the data well, for this range of Ratings, but do we have theoretical reasons for suspicion? Why or why not? If so, how might we adjust our data or model correct this?

m2 <-  augment(model_2)

p + 
  geom_line(aes(y = m2$.fitted)) + # with .fitted from augment()
  scale_x_continuous(limits = c(0, max(d$Rating))) + 
  scale_y_continuous(limits = c(45, max(d$Salary))) + 
  geom_hline(yintercept =  b0) + 
  geom_label(aes(color = "Men"), x = 0, y = b0 + b1, label = expression(beta[0]+beta[1]*1), hjust = 0, vjust = 1, show.legend = FALSE)+ 
  geom_label(aes(color = "Women"), x = 0, y = b0, label = expression(beta[0]+beta[1]*0), hjust = 0,vjust = 0, show.legend = FALSE)+
  geom_point(aes(color = "Men"), x = 0, y = b0 + b1, shape = 1)+ 
  geom_point(aes(color = "Women"), x = 0, y = b0, shape = 1) 

Better interpretation

What is a more meaningful interpretation?

The joint effect.

\(\beta_2*Rating + \beta_3*Gender*Rating = \beta_2 + \beta_3*Gender\) per Rating unit.

For every additional performance rating point, women get paid an additional $189 and men get paid an additional $326.


We can still calculate the effect of Gender or Rating. When there is an interaction term in the model, this is called the marginal effect.

The margins package calculates confidence intervals on marginal effects.

The Average Marginal Effect is the average change in y when x increases by one unit. Note the similar effect size as we found for the coefficient on Gender in the model with no interaction term. In theory, how are these estimates similar? How are they different?

marginal_effects <- margins(model_2)
summary(marginal_effects)
##     factor    AME     SE       z      p  lower  upper
##  GenderMen 6.5831 0.3886 16.9386 0.0000 5.8213 7.3448
##     Rating 0.2621 0.0119 21.9875 0.0000 0.2387 0.2855
me <- as_tibble(summary(marginal_effects))

ggplot(me) + 
  aes(x = factor,
      y = AME, 
      ymin = lower, 
      ymax = upper)+
  geom_pointrange() + 
  geom_hline(yintercept = 0, color = "gray80") +
  coord_flip() +
  labs(x = NULL, y = "Average Marginal Effect") 

## use the cplot method in the margins library to do the work of calculating effects but without drawing its plot.
cplot_points <- cplot(model_2, x = "Gender", draw = F)
##   xvals    yvals    upper    lower
## 1 Women 63.10542 63.66189 62.54896
## 2   Men 69.68850 70.20867 69.16833
ggplot(data = cplot_points) +
  aes(x = reorder(xvals, yvals),
      y = yvals, 
      ymin = lower, 
      ymax = upper) + 
  geom_pointrange() + 
  coord_flip() +
  labs(x = NULL, y = "Predicted Salary (thousands of dollars)") 

Fit

Let’s also plot the residuals. Aside from interpretation, we want to know where our model is a better or worse fit with the data, especially if residuals seem to vary systematically over the range of our data.

augment computes tidy residuals, among other cool things.

p + 
  geom_line(aes(y = m2$.fitted)) + # with .fitted from augment()
  geom_point(aes(y = m2$.fitted), shape = 1, alpha = .2) + # with .fitted from augment()
  geom_segment(aes(xend = Rating, yend = m2$.fitted ), alpha = .2, size = 2)

ggplot(m2) +
  aes(y = .resid, x = Rating) + 
  geom_point(aes(color = Gender)) + 
  scale_color_discrete() + 
  ## to show how risiduals are the distance between an 
  ## observation and the regression line:
  geom_hline(yintercept = 0, color = "dark grey") +
  geom_text(x= mean(m2$Rating), y = 0, 
            label = "Regression line") +
  geom_col(aes(fill = Gender), alpha = .2, position ="identity") +
  ## + labels:
  labs(title = "Residuals (Observed - Predicted Salary)",
       y = "Residuals (in thousands of dollars)") 

Likelihood Ratio Test of Nested Models

Testing the hypothesis that adding the interaction term affects goodness of fit against the null hypothesis that the two models are equivalent:

library(lmtest)
lrtest(model_1, model_2)
## Likelihood ratio test
## 
## Model 1: Salary ~ Gender + Rating
## Model 2: Salary ~ Gender * Rating
##   #Df  LogLik Df  Chisq Pr(>Chisq)    
## 1   4 -121.03                         
## 2   5 -107.15  1 27.758  1.375e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparing models

F-statistics and R-squared

summary() returns some statistics about our model. R-squared (\(1- RSS/TSS\)) indicates how much of the variance in salary our models explain. The F statistic is the mean regression sum of squares divided by the mean error sum of squares, so a larger F means more variance explained by the model and/or less unexplained variance. Its value will range from zero (a model that explains nothing compared to how much it does not explain) to an arbitrarily large number (a model that explains a lot compared to how much it does not explain). The p-value is for the null hypothesis for the full model is true (i.e., that all of the regression coefficients are zero).

Which model has a higher R-Squared? Why? Is a higher R-Squared value always better?

summary(model_1)
## 
## Call:
## lm(formula = Salary ~ Gender + Rating, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8982 -1.5364  0.0581  1.3178  4.4643 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 44.68752    1.10151   40.57   <2e-16 ***
## GenderMen    6.56843    0.48546   13.53   <2e-16 ***
## Rating       0.25737    0.01485   17.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.866 on 57 degrees of freedom
## Multiple R-squared:  0.904,  Adjusted R-squared:  0.9006 
## F-statistic: 268.4 on 2 and 57 DF,  p-value: < 2.2e-16
summary(model_2)
## 
## Call:
## lm(formula = Salary ~ Gender * Rating, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.96811 -1.06116  0.07077  1.17007  2.72592 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      49.47020    1.21324  40.775  < 2e-16 ***
## GenderMen        -3.24973    1.75423  -1.853   0.0692 .  
## Rating            0.18929    0.01680  11.270 5.04e-16 ***
## GenderMen:Rating  0.13650    0.02378   5.739 4.02e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.494 on 56 degrees of freedom
## Multiple R-squared:  0.9396, Adjusted R-squared:  0.9363 
## F-statistic: 290.2 on 3 and 56 DF,  p-value: < 2.2e-16

AIC and BIC

glance() gives us even more statistics with which to compare model fit. AIC and BIC are penalized-likelihood criteria; they penalize models for adding more variables (“measure of fit + complexity penalty”) because adding more variables will usually give a better fit for any sample of data, but possibly a worse model for the broader population (and for parsimony of theory!). Thus, lower AIC and BIC are better.

glance(model_1)
## # A tibble: 1 x 11
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <int>  <dbl> <dbl> <dbl>
## 1     0.904         0.901  1.87      268. 9.83e-30     3  -121.  250.  258.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
glance(model_2)
## # A tibble: 1 x 11
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <int>  <dbl> <dbl> <dbl>
## 1     0.940         0.936  1.49      290. 4.41e-34     4  -107.  224.  235.
## # … with 2 more variables: deviance <dbl>, df.residual <int>

Interactions with continuous variables

Suppose, rather than forcing a binary response, the company surveyed gender as a continuous variable, asking employees to report the extent to which their gender performance conformed with hegemonic masculinity on a scale from 0-100.

The data:

library(truncnorm)
## recode gender as tuncated normal 
## with mean "men"" and "women" at upper and lower quartiles
## and 0 or 100 being two standard deviations from each mean, respectivly
d %<>% mutate(Masculinity = ifelse(Gender == "Men", 
                         rtruncnorm(60, a=0, b=100, mean = 75, sd = 12.5) ,
                         rtruncnorm(60, a=0, b=100, mean = 25, sd = 12.5)
                         ))

library(ggridges)
ggplot(d) +
  aes(x = Masculinity, fill = ..x.., y = 0) +
  geom_density_ridges_gradient(scale = 3, rel_min_height = 0.01) + 
  labs(y = "Density", x = "Hegemonic Masculinity Conformance") +
  guides(fill = "none")

# scatterplot
p <- ggplot(d) + 
  aes(x = Rating, y = Salary, color = Masculinity) + 
  geom_point() + scale_color_viridis_c()
p

The model: \(y_i = \beta_0 + \beta_1*Gender_i + \beta_2*Rating_i + \beta_3*Gender_i*Rating_i + \epsilon_i\)

## Note: when we include the interaction, lm() adds the direct effects
model_3 <- lm(Salary ~ Masculinity*Rating, data = d) 

m3 <- model_3 %>% 
  tidy(conf.int = TRUE) 
m3
## # A tibble: 4 x 7
##   term             estimate std.error statistic  p.value conf.low conf.high
##   <chr>               <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)      49.2      2.69        18.3   2.75e-25  4.38e+1  54.6    
## 2 Masculinity      -0.0402   0.0481      -0.835 4.07e- 1 -1.37e-1   0.0562 
## 3 Rating            0.160    0.0372       4.29  7.03e- 5  8.52e-2   0.234  
## 4 Masculinity:Rat…  0.00215  0.000660     3.26  1.92e- 3  8.27e-4   0.00347
ggplot(m3 %>% filter(term != "(Intercept)")) + 
  aes(x = term,
      y = estimate, 
      ymin = conf.low, 
      ymax = conf.high) + 
  geom_pointrange() + 
  geom_hline(yintercept = 0, color = "grey") + 
  coord_flip() + 
  labs(x="", y="OLS Estimate")

Let’s plot the results against our data!

# illustrating with yhat formula; more easily done with augment()
b0 <- m3$estimate[1]
b1 <- m3$estimate[2]
b2 <- m3$estimate[3]
b3 <- m3$estimate[4]

# We can pick a few values (e.g., quartiles)
p +
  geom_line(aes(linetype = "25% Masculinity", 
                y = b0 + b1*25 + b2*Rating+ b3*25*Rating), color = "black")  +
  geom_line(aes(linetype = "75% Masculinity", 
                y = b0 + b1*75 + b2*Rating + b3*75*Rating), color = "black" ) + 
  geom_ribbon(aes(ymax = b0 + b1*75+ b2*Rating+ b3*75*Rating, 
                  ymin = b0 + b1*25+ b2*Rating+ b3*25*Rating), alpha = .1, color = NA)

# Or we can pick a bunch of values (e.g., all combinations of values present in our data)
m3 <-  augment(model_3) %>%
  # a unique id for each obs
  mutate(id = row_number()) 

# add an intercept observation for each person
m3intercept <- m3 %>% full_join(m3 %>% mutate(.fitted = b0 + Masculinity*b1,
                                Salary = b0 + Masculinity*b1,
                                Rating = 0)) 

ggplot(m3intercept) + 
  aes(x = Rating, y = Salary, color = Masculinity) + 
  geom_line(aes(y = .fitted, group = id))+ 
  geom_point() + 
  scale_color_viridis_c() 

Interpretation

  • How should we interpret a \(\beta_0\) of 49.188?

  • How should we interpret the coefficient of -0.04 on Masculinity?

  • How should we interpret the coefficient of 0.16 on Rating?

  • How should we interpret the coefficient of 0.002 on Masculinity*Rating?

Better interpretation

The joint effect.

\(\beta_2*Rating + \beta_3*Masculinity*Rating = \beta_2 + \beta_3*Masculinity\) per Rating unit.

For every additional performance rating point, a person reporting no conformance with hegemonic masculinity gets paid an additional $160 and a person reporting perfect conformance with hegemonic masculinity get paid an additional $375.


We can still calculate the effect of Masculinity or Rating. When there is an interaction term in the model, this is called the marginal effect.

The margins package calculates confidence intervals on marginal effects.

The Average Marginal Effect is the average change in y when x increases by one unit. Note the similar effect size as we found for the coefficient of Masculinity in the model with no interaction term. In theory, how are these estimates similar? How are they different?

Notice that the AME of Masculinity is significantly different from 0, even though the direct effect was not.

marginal_effects <- margins(model_3)
summary(marginal_effects)
##       factor    AME     SE       z      p  lower  upper
##  Masculinity 0.1146 0.0114 10.0823 0.0000 0.0923 0.1369
##       Rating 0.2718 0.0180 15.0996 0.0000 0.2365 0.3070
me <- as_tibble(summary(marginal_effects))

ggplot(me) + 
  aes(x = factor,
      y = AME, 
      ymin = lower, 
      ymax = upper)+
  geom_hline(yintercept = 0, color = "gray80") +
  geom_pointrange() + 
  coord_flip() +
  labs(x = NULL, y = "Average Marginal Effect") 

## use the cplot method in the margins library to do the work calculating effects but without drawing its plot.
cplot_points <- cplot(model_3, x = "Masculinity", draw = F)
##       xvals    yvals    upper    lower
## 1   7.45953 61.54732 62.69767 60.39696
## 2  11.23511 61.98010 63.05827 60.90193
## 3  15.01068 62.41289 63.42074 61.40504
## 4  18.78626 62.84568 63.78547 61.90588
## 5  22.56184 63.27846 64.15301 62.40392
## 6  26.33742 63.71125 64.52402 62.89848
## 7  30.11299 64.14404 64.89936 63.38871
## 8  33.88857 64.57682 65.28010 63.87354
## 9  37.66415 65.00961 65.66752 64.35170
## 10 41.43972 65.44240 66.06307 64.82172
## 11 45.21530 65.87518 66.46830 65.28206
## 12 48.99088 66.30797 66.88460 65.73134
## 13 52.76645 66.74076 67.31291 66.16860
## 14 56.54203 67.17354 67.75353 66.59356
## 15 60.31761 67.60633 68.20596 67.00670
## 16 64.09319 68.03912 68.66911 67.40912
## 17 67.86876 68.47190 69.14152 67.80229
## 18 71.64434 68.90469 69.62165 68.18773
## 19 75.41992 69.33748 70.10808 68.56687
## 20 79.19549 69.77026 70.59959 68.94093
ggplot(data = cplot_points) +
  aes(x = xvals,
      y = yvals, 
      ymin = lower, 
      ymax = upper) + 
  geom_pointrange() + 
  geom_hline(yintercept = 0, color = "grey") + 
  labs(x = "Masculinity",
       y = "Predicted Salary") 

Fit

Let’s also plot the residuals. Aside from interpretation, we want to know where our model is a better or worse fit with the data, especially if residuals seem to vary systematically over the range of our data.

augment computs tidy residuals, among other cool things.

ggplot(m3) +
  aes(y = .resid, x = Rating) + 
  geom_point(aes(color = Masculinity)) + 
  ## to show how risiduals are the distance between an 
  ## observation and the regression line:
  geom_hline(yintercept = 0, 
             color = "dark grey") +
  scale_color_viridis_c() +
  geom_text(x= mean(m3$Rating), 
            y = 0, 
            label = "Regression line") +
  geom_col(aes(fill = Masculinity), 
           alpha = .4, 
           position ="identity") +
  ## + labels:
  labs(title = "Residuals (Observed - Predicted Salary)",
       y = "Residuals (in thousands of dollars)") 


Comparing the same model with different measures of gender.

F-statistics and R-squared

Which model has a higher R-Squared? Why? Is a higher R-Squared always better?

summary(model_2)
## 
## Call:
## lm(formula = Salary ~ Gender * Rating, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.96811 -1.06116  0.07077  1.17007  2.72592 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      49.47020    1.21324  40.775  < 2e-16 ***
## GenderMen        -3.24973    1.75423  -1.853   0.0692 .  
## Rating            0.18929    0.01680  11.270 5.04e-16 ***
## GenderMen:Rating  0.13650    0.02378   5.739 4.02e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.494 on 56 degrees of freedom
## Multiple R-squared:  0.9396, Adjusted R-squared:  0.9363 
## F-statistic: 290.2 on 3 and 56 DF,  p-value: < 2.2e-16
summary(model_3)
## 
## Call:
## lm(formula = Salary ~ Masculinity * Rating, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9400 -1.3512 -0.0496  1.3754  5.0054 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        49.1884790  2.6890212  18.292  < 2e-16 ***
## Masculinity        -0.0401978  0.0481305  -0.835  0.40716    
## Rating              0.1597006  0.0371901   4.294 7.03e-05 ***
## Masculinity:Rating  0.0021494  0.0006602   3.256  0.00192 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.256 on 56 degrees of freedom
## Multiple R-squared:  0.8622, Adjusted R-squared:  0.8548 
## F-statistic: 116.8 on 3 and 56 DF,  p-value: < 2.2e-16

AIC and BIC

Which model has a lower AIC/BIC? Why?

glance(model_2)
## # A tibble: 1 x 11
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <int>  <dbl> <dbl> <dbl>
## 1     0.940         0.936  1.49      290. 4.41e-34     4  -107.  224.  235.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
glance(model_3)
## # A tibble: 1 x 11
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <int>  <dbl> <dbl> <dbl>
## 1     0.862         0.855  2.26      117. 4.48e-24     4  -132.  274.  284.
## # … with 2 more variables: deviance <dbl>, df.residual <int>

Bonus: How does the standard deviation we pick for simulating our masculinity data affect our ability to explain differences in pay?