Building on possible dependent and explanatory variables compiled in this google sheet, this pages describes possible designs to study the effects of redistricting, starting with the most basic models.
As political scientists on the UW 2020 project, our comparative advantage is that our colleagues in other disciplines are creating and comparing alternative maps, both those that exist and hypothetical alternatives. For our purposes, these maps will come with two key types of information:
Measures of both district partisanship and community relate theoretically to measures of representation. If we can estimate relationships between district characteristics and representation for districts we observe, we can then estimate the quality of representation for hypothetical districts. Many kinds of representation may be of interest. The list below is incomplete and will grow as we develop measures of different forms of representation.
For example, NOMINATE scores.
Our preliminary analysis countying keywords in floor speeches is here.
To see what comparing district partisanship to speeches will look like, see this comparison of district partisan ship to the total number of speeches. (There is no relationship between presidential vote share and the one’s total number of speeches, but there may be in the content of those speeches.)
The relationship between district partisanship and the number of references to one’s district.
The relationship between district partisanship and the amount bipartisan rhetoric.
The relationship between district partisanship and the amount of partisan vitriol.
Another possible line of inquiry lies in assessing tradeoffs among the normative goals of districting. (See relationships among goals in the exploratory DAG)
For simplicity, assume a perfect correlation of “interest” and partisanship. (Interest and community are squishy concepts anyway.) Maximizing the goal of “preserving” “communities of interest” (by which we mean grouping communities of interest, even where they have previously been split) then results in maximizing the difference in two-party vote share. In a world without other constraints, maximizing the grouping of communities of interest would yield some districts that are all majority party and others that are all minority party. In the extreme, unconstrained world, grouping communities of interest into districts leads to stable districts that are always represented by the same party that enjoys an overwhelming majority in the district. These districts are never competitive in a general election but are presumably very competitive in primary elections.
The following hypothetical outcomes each take one goal to the extreme. All have the same total number of majority and minority party voters, allocated differently across 15 districts, yielding a different number of seats for the majority party.
# imagined data maximizing interest coherence
tibble(district = 1:15,
interest <-majority_party_vote = c(rep(100,8),
rep(0,7)) ) %>%
mutate(minority_party_vote = 100-majority_party_vote)
# imagined data maximizing competitiveness
tibble(district = 1:15,
competitive <-majority_party_vote = c(100,
rep(50,14) )) %>%
mutate(minority_party_vote = 100-majority_party_vote)
# imagined data maximizing majority seats
tibble(district = factor(1:15),
majority <-majority_party_vote = c(rep(55,14),
rep(30,1)))%>%
mutate(minority_party_vote = 100-majority_party_vote)
# imagined data maximizing
tibble(district = factor(1:15),
minority <-minority_party_vote = c(rep(10,2),
20,
rep(55,12) ) ) %>%
mutate(majority_party_vote = 100-minority_party_vote)
function(d){
dplot <-ggplot(d) +
aes(x = factor(district),
y = majority_party_vote) +
geom_col() +
geom_hline(yintercept = 50) +
geom_text(aes(label = ifelse(majority_party_vote == 45 | minority_party_vote ==45,
"Cracked", NA)),
vjust = -1) +
geom_text(aes(label = ifelse(majority_party_vote %in% c(10,20,30) | minority_party_vote %in% c(10,20,30),
"Packed", NA)),
vjust = -1) +
scale_y_continuous(limits = c(0,100) ) +
labs(x = "District")
}
dplot(interest) +
labs(title = "Districting that maximizes interest alignment within districts",
y = str_c("Majority Party Vote Share \n Expected Seats = ", sum(interest$majority_party_vote>50)))
dplot(competitive) +
labs(title = "Districting that maximizes the number of competitive districts\n (and thus proportionality, in expectation)",
y = str_c("Majority Party Vote Share \n Expected Seats = 1 + (14*.5) = 8"))
dplot(majority) +
labs(title = "Districting that maximizes majority-party seats",
y = str_c("Majority Party Vote Share \n Expected Seats = ", sum(majority$majority_party_vote>50)))
dplot(minority) +
labs(title = "Districting that maximizes minority-party seats",
y = str_c("Majority Party Vote Share \n Expected Seats = ", sum(minority$majority_party_vote>50)))
Assume continuous measures of representation, polarization, ideological extremity, in the range \(\{0,1\}\) like those described in the “DVs” sheet.
Also, assume a measure of partisan advantage, “packed” and “cracked” indicators, and other explanatory variables like those described in the “EVs” sheet.
Let \(y_i\) be a measure of representation for official \(i\).
Let \(v_i\) be the margin of victory of official \(i\) (or the margin for official \(i\)’s party in their district, etc.).
Let \(p_i\) be an indicator of whether \(i\)’s district is packed.
We can simulate data for a hypothetical set of safe districts (e.g., \(v > 5%\)). For illustration, I set the mean of \(Y\) to .4 for packed districts and .6 for non-packed districts.
tibble(packed = T,
packed <-vote_margin = sample(seq(5,45, 1),
100,
replace = T),
representation = rnorm(100,
.4,
.1))
tibble(packed = F,
notpacked <-vote_margin = sample(seq(5,45, 1),
100,
replace = T),
representation = rnorm(100,
.6,
.1))
full_join(packed, notpacked)
d <-
ggplot(d) +
aes(x = representation,
fill = packed) +
geom_histogram()
ggplot(d) +
aes(x = vote_margin,
fill = packed) +
geom_histogram()
We can then estimate representation given the vote margin and whether a district is packed, \(y|p,v\)
A linear fit would look like this:
\(\hat{y_i} = \beta_0 + \beta_1 p_i + \beta_2 v_i\)
ggplot(d) +
aes(x = vote_margin,
y = representation,
color = packed) +
geom_point(alpha = .2) +
geom_smooth(method = "lm") +
scale_color_viridis_d()
Now let \(y_i\) be a measure of ideological extremity.
## d %<>% mutate(ideological_extremity = ifelse(packed == 1,...))
$ideological_extremity <- rbeta(200, 1, 3)
d
ggplot(d) +
aes(x = ideological_extremity,
fill = packed) +
geom_histogram()
We can then estimate ideological extremity given the vote margin and whether a district is packed, \(y|p,v\).
A linear fit would look like this:
\(\hat{y_i} = \beta_0 + \beta_1 p_i + \beta_2 v_i\)
lm(ideological_extremity ~ packed + vote_margin,
m2 <-data = d) %>%
augment(se_fit = T)
ggplot(m2) +
aes(x = vote_margin,
y = ideological_extremity,
color = packed,
fill = packed ) +
geom_point(alpha = .2) +
geom_ribbon(aes(ymin = .fitted - .se.fit,
ymax = .fitted + .se.fit),
alpha = .5,
color = NA) +
geom_line(aes(y = .fitted)) +
scale_color_viridis_d()+
scale_fill_viridis_d()
Now consider the set of districts with voter splits like those engineered by a partisan gerrymander (margins \(v\) about +5% for the advantaged party).
Let \(c_i\) be a measure of whether representative \(i\)’s district is “cracked.”
For \(v_i \in[5,10]\), we can then test for a difference in mean ideological extremity by whether a district is cracked \(y|c\).
## use same sampe dist
$cracked <- d$packed
d
d %>% filter(vote_margin >= 5,
d1 <-<= 10)
vote_margin
%<>% group_by(cracked) %>%
d1 mutate(mean = mean(ideological_extremity))
%>% ggplot() +
d1 aes(x = vote_margin,
y = ideological_extremity,
color = cracked) +
geom_boxplot() +
geom_point() +
facet_grid(. ~ cracked) +
#geom_line(aes(y = mean))+
scale_color_viridis_d()
t.test(x = d1 %>% filter(cracked) %>%
.$ideological_extremity,
y = d1 %>% filter(!cracked) %>%
.$ideological_extremity)
##
## Welch Two Sample t-test
##
## data: d1 %>% filter(cracked) %>% .$ideological_extremity and d1 %>% filter(!cracked) %>% .$ideological_extremity
## t = 2.088, df = 32.877, p-value = 0.04462
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.003480229 0.269571174
## sample estimates:
## mean of x mean of y
## 0.3485650 0.2120393
Now let \(y_1\) be a measure of representation of Black residents.
Let \(b_i\) be the percent of \(i\)’s district’s residents that identify as Black.
## use same sample dist
$black_representation <- d$representation
d
## draw percent black
$percent_black <- rbeta(200, 1, 3)
d
ggplot(d) +
aes(x = percent_black,
fill = cracked) +
geom_histogram()
A linear fit would look like this:
\(\hat{y_i} = \beta_0 + \beta_1 c_i + \beta_2 b_i + \beta_3 c_i b_i\)
lm(black_representation ~ cracked*percent_black,
m2 <-data = d) %>%
augment(se_fit = T)
ggplot(m2) +
aes(x = percent_black,
y = black_representation,
color = cracked,
fill = cracked ) +
geom_point(alpha = .2) +
geom_ribbon(aes(ymin = .fitted - .se.fit,
ymax = .fitted + .se.fit),
alpha = .5,
color = NA) +
geom_line(aes(y = .fitted)) +
scale_color_viridis_d()+
scale_fill_viridis_d()
There are two sub-questions:
Ideally, we want is counterfactual levels representation and polarization at different levels of partisan advantage.
The level of partisan advantage is always confounded by time (in within-state) or state characteristics (in across-state analysis).