This vignette shows examples of assessing bias in literature review networks based on covariates from metadata about the studies and authors included or excluded from the review on redistricting in the main manuscript. Specifically, for each study, we collect metadata on the lead author’s gender, H-Index, and total number of citations. We then assess the impact of selecting studies on covariates in two ways:

  1. First, we subset the network (e.g., to studies where the lead author is a man) and observe how many nodes and edges are missing in these subsets. This reveals the contributions of underrepresented scholars to the network by showing what we lose if they are excluded.

  2. Second, we draw random samples of 100 studies weighted by covariates. This simulates a literature review that is biased (e.g., toward scholars who are men or have many citations). We then compare these biased samples to an unweighted random sample of studies in the network.

1 Metadata

1.1 Lead Author Gender, H-Index, and Citation

# Load replication version of main data and metadata on citations
load(here::here("replication_data","literature_metadata.rda"))
load(here::here("replication_data","literature.rda"))

names(literature_metadata) %<>% janitor::make_clean_names()

literature_metadata %<>% 
  rename(author_gender = author_sex)

literature_metadata%>% kable()
id author year publication title citations outside_u_s author_gender author_h_index author_citations
Hayes & McKee 2011 Hayes & McKee 2011 AJPS The Intersection of Redistricting, Race, and Participation 41 NA M 19 2675
Katz, King & Rosenblatt 2020 Katz, King & Rosenblatt 2020 APSR Theoretical Foundations and Empirical Evaluations of Partisan Fairness in District-Based Democracies 20 NA M 30 18684
Lo 2013 Lo 2013 QJPS Legislative Responsiveness to Gerrymandering: Evidence from the 2003 Texas Redistricting 20 NA M 18 1611
Chen & Rodden 2013 Chen & Rodden 2013 QJPS Unintentional Gerrymandering: Political Geography and Electoral Bias in Legislatures 361 NA M 8 375
Matsusaka 2010 Matsusaka 2010 QJPS Popular Control of Public Policy: A Quantitative Approach 138 NA M 38 10435
Moskowitz & Schneer 2019 Moskowitz & Schneer 2019 QJPS Reevaluating Competition and Turnout in US House Elections 10 NA M 2 18
McGhee & Shor 2017 McGhee & Shor 2017 Perspectives on Politics Has the Top Two Primary Elected More Moderates? 20 NA M 17 1809
Wildgen & Engstrom 1980 Wildgen & Engstrom 1980 Legislative Studies Quarterly Spatial Distribution of Partisan Support and the Seats/Votes Relationship 30 NA NA NA NA
Cain 1985 Cain 1985 APSR Assessing the Partisan Effects of Redistricting 243 NA M 44 9625
Buchler 2005 Buchler 2005 Journal of Theoretical Politics Competition, Representation and Redistricting: The Case Against Competitive Congressional Districts 81 NA M NA NA
Lublin & McDonald 2006 Lublin & McDonald 2006 Election Law Journal Is It Time to Draw the Line?: The Impact of Redistricting on Competition in State House Elections 44 NA M 26 3164
Caughey et al. 2017 Caughey et al.  2017 JOP Incremental Democracy: The Policy Effects of Partisan Control of State Government 92 NA M 17 1959
Glazer et al. 1987 Glazer et al.  1987 AJPS Partisan and Incumbency Effects of 1970s Congressional Redistricting 102 NA NA NA NA
Jacobson 2005 Jacobson 2005 Political Science Quarterly Polarized Politics and the 2004 Congressional and Presidential Elections 134 NA M 49 16104
Abramowitz et al. 2006 Abramowitz et al.  2006 JOP Incumbency, Redistricting, and the Decline of Competition in US House Elections 461 NA M 53 14228
Cain et al. 2005 Cain et al.  2005 Brookings Institution From Equality to Fairness: The Path of Political Reform since Baker v. Carr 56 NA M 44 9625
Griffin & Newman 2007 Griffin & Newman 2007 JOP The Unequal Representation of Latinos and Whites 145 NA M 16 1923
Grofman et al. 2000 Grofman et al.  2000 NCL Review Drawing Effective Minority Districts: A Conceptual Framework and Some Empirical Evidence 165 NA M 74 22114
McDonald 2006 McDonald 2006 PS: Political Science & Politics Drawing the Line on District Competition 76 NA M 30 3441
Desposato & Petrocik 2003 Desposato & Petrocik 2003 AJPS The Variable Incumbency Advantage: New Voters, Redistricting, and the Personal Vote 186 NA M 27 3003
Ashworth & Bueno de Mesquita 2006 Ashworth & Bueno de Mesquita 2006 JOP Delivering the Goods: Legislative Particularism in Different Electoral and Institutional Settings 201 NA M 17 2856
Hayes & McKee 2008 Hayes & McKee 2008 American Politics Research Toward a One-Party South? 65 NA M 19 2675
Winburn & Wagner 2010 Winburn & Wagner 2010 Political Research Quarterly Carving Voters Out: Redistricting’s Influence on Political Information, Turnout, and Voting Behavior 46 NA M 11 450
Hayes & McKee 2009 Hayes & McKee 2009 AJPS The Participatory Effects of Redistricting 85 NA M 19 2675
Chen 2010 Chen 2010 AJPS The Effect of Electoral Geography on Pork Barreling in Bicameral Legislatures 38 NA M 8 375
Cameron et al. 1996 Cameron et al.  1996 APSR Do Majority-Minority Districts Maximize Substantive Black Representation in Congress? 709 NA M 28 6226
Canon 1999 Canon 1999 Legislative Studies Quarterly Electoral Systems and the Representation of Minority Interests in Legislatures 77 NA M 21 2864
Gay 2007 Gay 2007 JOP Legislating Without Constraints: The Effect of Minority Districting on Legislators’ Responsiveness to Constituency Preferences 47 NA F NA NA
Bratton & Haynie 1999 Bratton & Haynie 1999 JOP Agenda Setting and Legislative Success in State Legislatures: The Effects of Gender and Race 740 NA F NA NA
Wyrick 1991 Wyrick 1991 American Politics Quarterly Management of Political Influence: Gerrymandering in the 1980s 10 NA M NA NA
Barabas & Jerit 2004 Barabas & Jerit 2004 State Politics & Policy Quaterly Redistricting Principles and Racial Representation 44 NA M 20 3789
Shotts 2003 Shotts 2003 JOP Does Racial Redistricting Cause Conservative Policy Outcomes? Policy Preferences of Southern Representatives in the 1980s and 1990s 76 NA M 22 3066
Bullock 1995 Bullock 1995 American Politics Quarterly The Impact of Changing the Racial Composition of Congressional Districts on Legislators’ Roll Call Behavior 63 NA M NA NA
Overby & Cosgrove 1996 Overby & Cosgrove 1996 JOP Unintended Consequences? Racial Redistricting and the Representation of Minority Interests 164 NA M NA NA
Sharpe & Garand 2001 Sharpe & Garand 2001 Political Research Quarterly Race, Roll Calls, and Redistricting: The Impact of Race-Based Redistricting on Congressional Roll-Call 48 NA M NA NA
LeVeaux & Garand 2003 LeVeaux & Garand 2003 Social Science Quarterly Race‐Based Redistricting, Core Constituencies, and Legislative Responsiveness to Constituency Change* 13 NA F NA NA
Lyons & Galderisi 1995 Lyons & Galderisi 1995 Political Research Quarterly Incumbency, Reapportionment, and US House Redistricting 43 NA M NA NA
Hirsch 2003 Hirsch 2003 Election Law Journal The United States House of Unrepresentatives: What Went Wrong in the Latest Round of Congressional Redistricting 159 NA NA NA NA
Grofman 1982 Grofman 1982 Political Geography Quarterly Reformers, Politicians, and the Courts: A Preliminary Look at US Redistricting in the 1980s 12 NA M 74 22114
Forgette & Winkle 2006 Forgette & Winkle 2006 Social Science Quarterly Partisan Gerrymandering and the Voting Rights Act 16 NA M NA NA
Hetherington et al. 2003 Hetherington et al.  2003 JOP The Redistricting Cycle and Strategic Candidate Decisions in US House Races 91 NA M 24 10690
Carson et al. 2006 Carson et al.  2006 AJPS The Electoral Costs of Party Loyalty in Congress 342 NA M 24 5637
Lublin 1999 Lublin 1999 APSR Racial Redistricting and African-American Representation: A Critique of “Do Majority-Minority Districts Maximize Substantive Black Representation in Congress?” 216 NA M 26 3164
Forgette & Platt 2005 Forgette & Platt 2005 Political Geography Redistricting Principles and Incumbency Protection in the US Congress 33 NA M NA NA
Carson & Crespin 2004 Carson & Crespin 2004 State Politics & Policy Quaterly The Effect of State Redistricting Methods on Electoral Competition in United States House of Representatives Races 100 NA M 24 5637
Katz et al. 2020 Katz et al.  2020 APSR Theoretical Foundations and Empirical Evaluations of Partisan Fairness in District-Based Democracies 20 NA M 30 18684
Bertelli & Carson 2011 Bertelli & Carson 2011 Electoral Studies Small Changes, Big Results: Legislative Voting Behavior in the Presence of New Voters 12 NA M 29 3204
Chen & Cottrell 2016 Chen & Cottrell 2016 Electoral Studies Evaluating Partisan Gains from Congressional Gerrymandering: Using Computer Simulations to Estimate the Effect of Gerrymandering in the U.S. House 62 NA M 8 375
Hunt 2018 Hunt 2018 Electoral Studies When Does Redistricting Matter? Changing Conditions and Their Effects on Voter Turnout 8 NA M 2 25
Sauger & Grofman 2016 Sauger & Grofman 2016 Electoral Studies Partisan Bias and Redistricting in France 15 1 M 21 1492
Wong 2019 Wong 2019 BJPS Gerrymandering in Electoral Autocracies: Evidence from Hong Kong 16 1 NA 11 454
Incerti 2018 Incerti 2018 Electoral Studies The Optimal Allocation of Campaign Funds in US House Elections 5 NA M 11 454
Limbocker & You 2020 Limbocker & You 2020 Electoral Studies Campaign Styles: Persistency in Campaign Resource Allocation 2 NA M 6 119
Carson et al. 2014 Carson et al.  2014 State Politics & Policy Quaterly Reevaluating the Effects of Redistricting on Electoral Competition, 1972–2012 38 NA M 24 5637
Makse 2014 Makse 2014 State Politics & Policy Quaterly The Redistricting Cycle, Partisan Tides, and Party Strategy in State Legislative Elections 12 NA M 9 474
Hood & McKee 2013 Hood & McKee 2013 State Politics & Policy Quaterly Unwelcome Constituents: Redistricting and Countervailing Partisan Tides 7 NA M 21 1948
Goedert 2017 Goedert 2017 State Politics & Policy Quaterly The Pseudoparadox of Partisan Mapmaking and Congressional Competition 6 NA M 5 183
Kirkland 2013 Kirkland 2013 State Politics & Policy Quaterly Wallet-Based Redistricting: Evidence for the Concentration of Wealth in Majority Party Districts 6 NA M 15 927
Carsey et al. 2017 Carsey et al.  2017 State Politics & Policy Quaterly Rethinking the Normal Vote, the Personal Vote, and the Impact of Legislative Professionalism in U.S. State Legislative Elections 6 NA M 25 5340
Stephanopoulos & McGhee 2015 Stephanopoulos & McGhee 2015 University of Chicago Law Review Partisan Gerrymandering and the Efficiency Gap 345 NA M 23 1917
Chen & Rodden 2015 Chen & Rodden 2015 Election Law Journal Cutting Through the Thicket: Redistricting Simulations and the Detection of Partisan Gerrymanders 87 NA M 8 375
Barnes & Solomon 2020 Barnes & Solomon 2020 Political Analysis Gerrymandering and Compactness: Implementation Flexibility and Abuse 13 NA M 11 652
Atsusaka 2021 Atsusaka 2021 APSR A Logical Model for Predicting Minority Representation: Application to Redistricting and Voting Rights Cases 0 NA M 1 3
Gatesman & Unwin 2021 Gatesman & Unwin 2021 Political Analysis Lattice Studies of Gerrymandering Strategies 0 NA M 1 1
Magleby & Mosesson 2018 Magleby & Mosesson 2018 Political Analysis A New Approach for Developing Neutral Redistricting Plans 23 NA M 6 148
Deford, Eubank & Rodden 2020 Deford, Eubank & Rodden 2020 Political Analysis Partisan Dislocation: A Precinct-Level Measure of Representation and Gerrymandering 0 NA M 10 309
Krasa & Polborn 2018 Krasa & Polborn 2018 APSR Political Competition in Legislative Elections 41 NA M 14 673
Saxon 2020 Saxon 2020 Political Analysis Reviving Legislative Avenues for Gerrymandering Reform with a Flexible, Automated Tool 3 NA M 5 457
Kang 2017 Kang 2017 Michigan Law Review Gerrymandering and the Constitutional Norm Against Government Partisanship 61 NA M 22 1501
Stephanopoulos 2012 Stephanopoulos 2012 University of Pennsylvania Law Review Redistricting and the Territorial Community 61 NA M 23 1917
Altman & McDonald 2010 Altman & McDonald 2010 Duke Journal of Constitutional Law and Public Policy The Promise and Perils of Computers in Redistricting 87 NA M 31 7948
McDonald & Best 2015 McDonald & Best 2015 Election Law Journal Unfair Partisan Gerrymanders in Politics and Law: A Diagnostic Applied to Six Cases 73 NA M 25 5220
Tam Cho & Liu 2016 Tam Cho & Liu 2016 Election Law Journal Toward a Talismanic Redistricting Tool: A Computational Method for Identifying Extreme Redistricting Plans 76 NA F 18 1171
McGhee 2014 McGhee 2014 Legislative Studies Quarterly Measuring Partisan Bias in Single-Member District Electoral Systems 97 NA M 17 1809
Wang 2016 Wang 2016 Stanford Law Review Three Tests for Practical Evaluation of Partisan Gerrymandering 97 NA M 44 9711
Cox & Holden 2011 Cox & Holden 2011 University of Chicago Law Review Reconsidering Racial and Partisan Gerrymandering 76 NA M 20 2488
Stewart et al. 2019 Stewart et al.  2019 Nature Information Gerrymandering and Undemocratic Decisions 81 NA M 14 1090
Siegel-Hawley 2013 Siegel-Hawley 2013 Harvard Educational Review Educational Gerrymandering? Race and Attendance Boundaries in a Demographically Changing Suburb 58 NA F 27 2839
Richards 2014 Richards 2014 American Educational Research Journal The Gerrymandering of School Attendance Zones and the Segregation of Public Schools 91 NA F 11 815
Fraga 2016 Fraga 2016 JOP Redistricting and the Causal Impact of Race on Voter Turnout 67 NA M 12 714
De Assis et al. 2014 De Assis et al.  2014 Computers & Operations Research A Redistricting Problem Applied to Meter Reading in Power Distribution Networks 52 NA F 5 220
Hayes et al. 2010 Hayes 2010 Legislative Studies Quarterly Redistricting, Responsiveness, and Issue Attention 52 NA M 11 544
Liu et al. 2016 Liu et al.  2016 Swarm and Evolutionary Computation PEAR: A Massively Parallel Evolutionary Computation Approach for Political Redistricting Optimization and Analysis 62 NA M 22 1415
Yoshinaka & Murphy 2011 Yoshinaka & Murphy 2011 Political Research Quarterly The Paradox of Redistricting: How Partisan Mapmakers Foster Competition but Disrupt Representation 53 NA M 15 1486
Webster 2013 Webster 2013 Political Geography Reflections on Current Criteria to Evaluate Redistricting Plans 53 NA M 24 1875
Gentry et al. 2013 Gentry et al.  2013 American Journal of Transplantation Addressing Geographic Disparities in Liver Transplantation Through Redistricting 137 NA F 30 3498
Grainger 2010 Grainger 2010 The Journal of Law and Economics Redistricting and Polarization: Who Draws the Lines in California? 50 NA M 13 972
Masket et al. 2012 Masket et al.  2012 PS: Political Science & Politics The Gerrymanderers are Coming! Legislative Redistricting Won’t Affect Competition or Polarization Much, No Matter Who Does It 57 NA M 23 3217
Altman & McDonald 2011 Altman & McDonald 2011 Journal of Statistical Software BARD: Better Automated Redistricting 84 NA M 31 7948
Gul & Pesendorfer 2010 Gul & Pesendorfer 2010 American Economic Review Strategic Redistricting 67 NA M 24 3381
Cain 2011 Cain 2011 Yale Law Journal Redistricting Commissions: A Better Political Buffer 128 NA M 44 9625
Arrington 2016 Arrington 2016 Election Law Journal A Practical Procedure for Detecting a Partisan Gerrymander 5 NA M 7 126
Ladewig 2018 Ladewig 2018 Election Law Journal ‘‘Appearances Do Matter’’: Congressional District Compactness and Electoral Turnout 0 NA M 10 498
Campisi et al. 2019 Campisi et al.  2019 Election Law Journal Declination as a Metric to Detect Partisan Gerrymandering 5 NA F 3 19
Makse 2012 Makse 2012 Election Law Journal Defining Communities of Interest in Redistricting Through Initiative Voting 17 NA M 9 474
Gimpel & Harbridge-Yong 2020 Gimpel & Harbridge-Yong 2020 Election Law Journal Conflicting Goals of Redistricting: Do Districts That Maximize Competition Reckon with Communities of Interest? 1 NA M 42 7339
Chen 2017 Chen 2017 Election Law Journal The Impact of Political Geography on Wisconsin Redistricting: An Analysis of Wisconsin’s Act 43 Assembly Districting Plan 22 NA M 8 375
Ansolabehere & Snyder 2012 Ansolabehere & Snyder 2012 Election Law Journal The Effects of Redistricting on Incumbents 26 NA M 37 5606
Sabouni & Shelton 2021 Sabouni & Shelton 2021 Election Law Journal State Legislative Redistricting: The Effectiveness of Traditional Districting Principles in the 2010 Wave 0 NA M 1 4
Williamson 2019 Williamson 2019 Election Law Journal Examining the Effects of Partisan Redistricting on Candidate Entry Decisions 2 NA M 7 155
Veomett 2018 Veomett 2018 Election Law Journal Efficiency Gap, Voter Turnout, and the Efficiency Principle 23 NA F 3 37
Tamas 2019 Tamas 2019 Election Law Journal American Disproportionality: A Historical Analysis of Partisan Bias in Elections to the U.S. House of Representatives 5 NA M 6 91
Duchin et al. 2019 Duchin et al.  2019 Election Law Journal Locating the Representational Baseline: Republicans in Massachusetts 25 NA F 8 154
McGhee 2017 McGhee 2017 Election Law Journal Measuring Efficiency in Redistricting 29 NA M 17 1809
Wang et al. 2018 Wang et al.  2018 Election Law Journal An Antidote for Gobbledygook: Organizing the Judge’s Partisan Gerrymandering Toolkit into Tests of Opportunity and Outcome 5 NA M 44 9711
Caughey et al. 2017b Caughey et al.  2017 Election Law Journal Partisan Gerrymandering and the Political Process: Effects on Roll-Call Voting and State Policies 31 NA M 17 1959
Powell et al. 2020 Powell et al.  2020 Election Law Journal Partisan Gerrymandering, Clustering, or Both? A New Approach to a Persistent Question 2 NA M 5 110
Fougere et al. 2010 Fougere et al.  2010 Election Law Journal Partisanship, Public Opinion, and Redistricting 33 NA M 2 17
Best et al. 2018 Best et al.  2018 Election Law Journal Considering the Prospects for Establishing a Packing Gerrymandering Standard 32 NA F 10 522
Warrington 2018 Warrington 2018 Election Law Journal Quantifying Gerrymandering Using the Vote Distribution 34 NA M 15 736
Gardner 2012 Gardner 2012 Election Law Journal How to Do Things with Boundaries: Redistricting and the Construction of Politics 14 NA M 10 881
Goedert 2014 Goedert 2014 Election Law Journal Redistricting, Risk, and Representation: How Five State Gerrymanders Weathered the Tides of the 2000s 8 NA M 5 183
Wang 2016b Wang 2016 Election Law Journal Three Practical Tests for Gerrymandering: Application to Maryland and Wisconsin 38 NA M 44 9711
Ramachandran & Gold 2018 Ramachandran & Gold 2018 Election Law Journal Using Outlier Analysis to Detect Partisan Gerrymanders: A Survey of Current Approaches and Future Directions 9 NA F 5 104
Nagle 2019 Nagle 2019 Election Law Journal What Criteria Should Be Used for Redistricting Reform? 16 NA M 82 25088
# split out multiple cites per edge 
literature_long <- literature %>% 
  mutate(id = str_split(cites, ";")) %>% 
  unnest(id)

# merge edgelist with metadata
literature_long %<>% full_join(literature_metadata)
literature_long %>% 
  ggplot() +
  aes(x = author_h_index, fill = author_gender)+
  geom_histogram()

literature_long %>% 
  ggplot() +
  aes(x = author_citations, fill = author_gender)+
  geom_histogram()


library(ggraph)

1.2 The Full Graph

lit <- literature_long %>% 
  distinct(to, from) %>% 
  review()

lit
## A netlit_review object with the following components:
## 
## $edgelist
##  - 69 edges
##  - edge attributes: edge_betweenness
## $nodelist
##  - 56 nodes
##  - node attributes: degree_in, degree_out, degree_total, betweenness
## $graph
##    an igraph object
# best seed 1,4, *5*
set.seed(5)

netlit_plot <- function(g){
ggraph(g, layout = 'fr') + 
  geom_node_point(
    aes(color = degree_total %>% as.factor() ),
    size = 6, 
    alpha = .7
    ) + 
  geom_edge_arc2(
    start_cap = circle(3, 'mm'),
    end_cap = circle(6, 'mm'),
    aes(
      color = edge_betweenness,
      ),
    curvature = 0,
    arrow = arrow(length = unit(2, 'mm'), 
                  type = "open")
    ) +
  geom_edge_loop(
      start_cap = circle(5, 'mm'),
      end_cap = circle(2, 'mm'),
      aes( color = edge_betweenness),
      n = 300,
      strength = .6,
    arrow = arrow(length = unit(2, 'mm'), 
                  type = "open")
    ) +
  geom_node_text( aes(label = name), size = 2.3) + 
  ggplot2::theme_void() + 
  theme(legend.position="bottom") + 
  labs(edge_color = "Edge Betweenness",
       color = "Total Degree\nCentrality",
       edge_linetype = "") + 
scale_edge_color_viridis(option = "plasma", 
                         begin = 0, 
                         end = .9, 
                         direction = -1, 
                         guide = "legend") +
  scale_color_viridis_d(option = "mako", 
                        begin = 1, 
                        end = .5)
}


g <- literature_long %>% 
  distinct(to, from) %>% 
  review()  %>% 
  .$graph 

g %>% 
  netlit_plot()


# for plotting bias
netlit_bias_plot <- function(subgraph){
  
  # lit with edge attribute indicating missing from subgraph 
lit <- literature_long %>% 
  distinct(to, from) %>% 
    left_join( subgraph$edgelist %>% distinct(to, from) %>% mutate(missing_edges = "Not missing") 
) %>% 
    mutate(missing_edges = replace_na(missing_edges, "Missing")) 

lit %<>% 
  review(edge_attributes = names(lit))  
  
#  missing nodes 
  missing_nodes <- lit$nodelist$node[!lit$nodelist$node %in% subgraph$nodelist$node]

  set.seed(5)

ggraph(lit$g, layout = 'fr') + 
  geom_node_point(
    aes(color = ifelse(name %in% missing_nodes, "Missing", "Not Missing")),
    size = 6, 
    alpha = .7
    ) + 
  geom_edge_arc2(
    start_cap = circle(3, 'mm'),
    end_cap = circle(6, 'mm'),
    aes(
      color = missing_edges,
      ),
    curvature = 0,
    arrow = arrow(length = unit(2, 'mm'), 
                  type = "open")
    ) +
  geom_edge_loop(
      start_cap = circle(5, 'mm'),
      end_cap = circle(2, 'mm'),
      aes(color = missing_edges),
      n = 300,
      strength = .6,
    arrow = arrow(length = unit(2, 'mm'), 
                  type = "open")
    ) +
  geom_node_text( aes(label = name), size = 2.3) + 
  ggplot2::theme_void() + 
  theme(legend.position="bottom") + 
  labs(edge_color = "",
       color = "",
       edge_linetype = "") +
  scale_color_discrete() + 
  scale_edge_color_discrete()
}


literature_long %<>%
  mutate(author_is_man = author_gender == "M")

2 Biased Samples

# biased sample weights 
literature_long %<>% 
    mutate(unbiased = .5,
           weight = case_when(
      author_is_man ~ .6,
      !author_is_man ~ .4,
      TRUE~ .5 
    ))


# a function to sample the network 
sample_lit <- function(n, literature_long, prob){
  
  # create an index for the sample
  samp_idx <- sample(seq_len(nrow(literature_long)), 
                     100, # 100 draws = number of studies to draw 
                     prob=prob # with prob var provided 
                     )
  
  # subset sample to index 
  sample <- literature_long %>% 
    rowid_to_column() %>% 
    filter(rowid %in% samp_idx) %>% 
    distinct(to, from) %>% 
    review()
  
    return(sample)
}
n_samples <-1000

2.1 Random draws of 100 studies (1000 draws)

There are 165 studies in the original literature review. We draw 100 of them—first at random, then weighted random samples. For each type of simulated bias we use 1000 draws.

random_samples <- map(1:n_samples, # 100 samples 
                      sample_lit,
                      literature_long=literature_long, 
                      prob = literature_long$unbiased)

samples <- random_samples

mean_edge_betw <- . %>% pull(edge_betweenness) %>% mean()
mean_node_betw <- . %>% pull(betweenness) %>% mean()
mean_node_degree <- . %>% pull(degree_total) %>% mean()

# make a table of the total number of nodes, edges, and the graph object for plotting
summarise_samples <- function(samples){
summary <- tibble(
  #edge stats
  edges = samples %>% map(1) %>% modify(nrow) %>% unlist(),
  edge_between_mean = samples %>% map(1) %>% modify(mean_edge_betw) %>% unlist(),
  # nodes stats
    nodes = samples %>% map(2) %>% modify(nrow) %>% unlist(),
  node_between_mean = samples %>% map(2) %>% modify(mean_node_betw) %>% unlist(),
  node_degree_mean = samples %>% map(2) %>% modify(mean_node_degree) %>% unlist(),
  #graph stats 
  communities = samples %>% map(3) %>% modify(cluster_walktrap) %>% modify(length) %>% unlist(),
  diameter = samples %>% map(3) %>% modify(diameter)  %>% unlist(),
  graph = samples %>% map(3)
  )
return(summary)
}

summary <- summarise_samples(samples)

random <- summary %>% mutate(
  sample = "Random"
)

# map(random$graph, netlit_plot)
map(random_samples[1:10], netlit_bias_plot) 

Average nodes recovered: 43.8

Average node betweenness recovered: 2.9607115

Average edges recovered: 46.94

Average edge betweenness recovered: 5.1360215

Average node degree recovered: 2.1470984

Average communities recovered: 10.12

Average diameter recovered: 4.65


2.2 Gender-biased draws

2.2.1 pr(cite|man) = .60, pr(cite|woman) = .40

#  biased samples
gender_samples <- map(1:n_samples, sample_lit,literature_long=literature_long, prob = literature_long$weight)

samples <- gender_samples

summary <- summarise_samples(samples)

gender <- summary %>% mutate(sample = "Gender bias favoring men")
  
# map(gender_samples[1:10], netlit_bias_plot)
map(gender_samples[1:10], netlit_bias_plot)

Average nodes recovered: 44.25

Average node betweenness recovered: 2.9773211

Average edges recovered: 47.472

Average edge betweenness recovered: 5.1271756

Average node degree recovered: 2.1480552

Average communities recovered: 10.328

Average diameter recovered: 4.667


2.2.2 pr(man) = 1, pr(woman) = .30

# biased sample weights 
literature_long %<>% 
    mutate(weight = case_when(
      author_is_man ~ 1,
      !author_is_man ~ .3,
      TRUE~ .5 
    ))


#  biased samples
gender_samples <- map(1:n_samples, sample_lit,literature_long=literature_long, prob = literature_long$weight)

samples <- gender_samples

summary <- summarise_samples(samples)

gender <- summary %>% mutate(
  sample = "Gender bias favoring men"
)
  
#map(gender$graph, netlit_plot)
map(gender_samples[1:10], netlit_bias_plot)

Average nodes recovered: 45.339

Average node betweenness recovered: 3.3160159

Average edges recovered: 48.951

Average edge betweenness recovered: 5.5354191

Average node degree recovered: 2.1615376

Average communities recovered: 10.722

Average diameter recovered: 4.837


2.2.3 pr(man) = .30, pr(woman) = 1

# biased sample weights 
literature_long %<>% 
    mutate(weight = case_when(
      author_is_man ~ .3,
      !author_is_man ~ 1,
      TRUE~ .5 
    ))

gender_samples2 <- samples <- map(1:n_samples, sample_lit,literature_long=literature_long, prob = literature_long$weight)


# biased samples
summary <- summarise_samples(samples)

gender2 <- summary %>% mutate(
  sample = "Gender bias favoring women"
)
  
#map(gender$graph, netlit_plot)
map(gender_samples2[1:10], netlit_bias_plot)

Average nodes recovered: 42.591

Average node betweenness recovered: 2.3101483

Average edges recovered: 44.627

Average edge betweenness recovered: 4.3232846

Average node degree recovered: 2.0983178

Average communities recovered: 9.96

Average diameter recovered: 4.249


2.3 H-Index-biased draws

(replacing NA HIndex with 0)

literature_long %<>%
  mutate(author_h_index = replace_na(author_h_index, 0 ))

#  biased samples
hindex_samples <- samples <- map(1:n_samples, sample_lit,literature_long=literature_long, prob = literature_long$weight)



summary <- summarise_samples(samples)

hindex <- summary %>% mutate(
  sample = "H-Index bias"
)
  
#map(gender$graph, netlit_plot)
map(hindex_samples[1:10], netlit_bias_plot)

Average nodes recovered: 42.591

Average node betweenness recovered: 2.3101483

Average edges recovered: 44.627

Average edge betweenness recovered: 4.3232846

Average node degree recovered: 2.0983178

Average communities recovered: 9.96

Average diameter recovered: 4.249


2.4 Citation-biased draws

(replacing NA author citations with 0)

literature_long %<>%
  mutate(author_citations = replace_na(author_citations, 0 ))

# gender-biased samples
citations_samples <- map(1:n_samples, sample_lit,literature_long=literature_long, prob = literature_long$author_citations)

samples <- citations_samples

summary <- summarise_samples(samples)

citations <- summary %>% mutate(
  sample = "Citations bias"
)
  
# map(citations$graph, netlit_plot)
map(citations_samples[1:10], netlit_bias_plot) # %>% .[c(1:10)]

Average nodes recovered: 46.811

Average node betweenness recovered: 3.754296

Average edges recovered: 51.905

Average edge betweenness recovered: 6.1469967

Average node degree recovered: 2.2184447

Average communities recovered: 10.823

Average diameter recovered: 4.638

3 Comparing Biases

s <- full_join(random, gender) %>% 
  full_join(gender2) %>% 
  full_join(hindex) %>% 
  full_join(citations)

round2 <- . %>% round(1)

s_table <- s %>% group_by(sample) %>% 
  select(where(is.numeric)) %>% summarise_all(mean) %>% 
  group_by(sample) %>% 
  mutate_all(round2) %>% 
  arrange(rev(sample))

color.me <- which(s_table$sample == "Random")

names(s_table) %<>% str_remove("_mean")

s_table %>% 
  kable(booktabs = T) %>% 
  kable_styling()  
sample edges edge_between nodes node_between node_degree communities diameter
Random 46.9 5.1 43.8 3.0 2.1 10.1 4.7
H-Index bias 44.6 4.3 42.6 2.3 2.1 10.0 4.2
Gender bias favoring women 44.6 4.3 42.6 2.3 2.1 10.0 4.2
Gender bias favoring men 49.0 5.5 45.3 3.3 2.2 10.7 4.8
Citations bias 51.9 6.1 46.8 3.8 2.2 10.8 4.6
s %>% 
  ggplot() + 
  aes(x = nodes, fill = sample, color = sample) +
  geom_density(alpha = .3) + 
  scale_color_viridis_d() + 
  scale_fill_viridis_d() +
  theme_minimal() + 
  labs(color = "", 
       fill = "", y = "Density",
       x = "Nodes Recovered (out of 56)") + 
  theme(axis.text.y = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank())

s %>% 
  ggplot() + 
  aes(x = edges, fill = sample, color = sample) +
  geom_density(alpha = .3) + 
  scale_color_viridis_d() + 
  scale_fill_viridis_d() +
  theme_minimal() + 
  labs(color = "", 
       fill = "", y = "Density",
       x = "Edges Recovered (out of 69)") + 
  theme(axis.text.y = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank())


s %>% 
  ggplot() + 
  aes(x = edge_between_mean, fill = sample, color = sample) +
  geom_density(alpha = .3) + 
  scale_color_viridis_d() + 
  scale_fill_viridis_d() +
  theme_minimal() + 
  labs(color = "", 
       fill = "", y = "Density",
       x = "Average Edge Betweenness") + 
  theme(axis.text.y = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank())

s %>% 
  ggplot() + 
  aes(x = node_between_mean, fill = sample, color = sample) +
  geom_density(alpha = .3) + 
  scale_color_viridis_d() + 
  scale_fill_viridis_d() +
  theme_minimal() + 
  labs(color = "", 
       fill = "", y = "Density",
       x = "Average Node Betweenness") + 
  theme(axis.text.y = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank())

s %>% 
  ggplot() + 
  aes(x = node_degree_mean, fill = sample, color = sample) +
  geom_density(alpha = .3) + 
  scale_color_viridis_d() + 
  scale_fill_viridis_d() +
  theme_minimal() + 
  labs(color = "", 
       fill = "", y = "Density",
       x = "Average Degree") + 
  theme(axis.text.y = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank())
  
s %>% 
  ggplot() + 
  aes(x = communities, fill = sample, color = sample) +
  geom_density(alpha = .3) + 
  scale_color_viridis_d() + 
  scale_fill_viridis_d() +
  theme_minimal() + 
  labs(color = "", 
       fill = "", y = "Density",
       x = "Communities") + 
  theme(axis.text.y = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank())

s %>% 
  ggplot() + 
  aes(x = diameter, fill = sample, color = sample) +
  geom_density(alpha = .3) + 
  scale_color_viridis_d() + 
  scale_fill_viridis_d() +
  theme_minimal() + 
  labs(color = "", 
       fill = "", y = "Density",
       x = "Diameter") + 
  theme(axis.text.y = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank())