Iterative Human Coding and Computational Text Analysis: Assessing the Effects of Public Pressure on Policy

Devin Judge-Lord | Harvard University |


Human coding and computational text analysis are more powerful when combined in an iterative workflow.

  1. Text analysis tools can strategically select texts for human coders—texts representing larger samples and outlier texts of high inferential value.
  2. Preprocessing can speed up hand-coding by extracting features like names and key sentences.
  3. Humans and computers can iteratively tag entities using regex tables and group texts by key features (e.g., identify lobbying coalitions by common policy demands)

Applying simple search and text-reuse methods to public comments on all U.S. federal agency rules, a sample of 10,894 hand-coded comments yields 41 million as-good-as-hand-coded comments regarding both the organizations that mobilized them and the extent to which policy changed in the direction they sought.

Hand-coding dynamic data

Workflow: googlesheets4 allows analysis and improving data in real-time. For example, in Fig. 1:

  • The “org_name” column is populated with a guess from automated methods. As humans identify new organizations and aliases, other documents with the same entity strings are auto-coded to match human coding.
  • As humans identify each organization’s policy “ask,” other texts with the same ask are put in their coalition.
  • If the organization and coalition become known, it no longer needs hand coding.

Fig. 1: Coded Public Comments in a Google Sheet

Regex tables to tag entities

  • Deductive: Start with databases of known entities.
Table 1: Lookup Table Deduced from Center for Responsive Politics Lobbying Data, Collapsed into an Initial Regular Expression Table
Entity Pattern
3M Co 3M Co|3M Health Information Systems|Ceradyne|Cogent Systems|Hybrivet Systems
Teamsters Union Brotherhood of Locomotive Engineers (and|&) Trainmen|Brotherhood of Maint[a-z]* of Way Employ|Teamsters
  • Inductive: Add entity strings that frequently appear in the data to regex tables.
  • Iterative: Add to regex tables as humans identify new entities or new aliases for known entities. Update data (Google Sheets) to speed hand coding.

Fig 2: Iteratively Building Regex Tables

For example, the legislators package uses a regex table, adding variants (e.g., “AOC”) to standard legislator names to detect them in messy text.


Results: Who mobilizes public comments?

Of 58 million public comments on proposed agency rules, the top 100 organizations mobilized 43,938,811. The top ten organizations mobilized 25,947,612.

Table 2: The Top 5 Organizations Mobilized 20 Million Public Comments
Organization Rules Lobbied On Pressure Campaigns Percent (Campaigns /Rules) Comments Average per Campaign
NRDC 530 62 11.7% 5,939,264 95,795
Sierra Club 591 110 18.6% 5,111,922 46,472
CREDO 90 41 45.6% 3,019,150 73,638
Environmental Defense Fund 111 31 27.9% 2,849,517 91,920
Center For Biological Diversity 572 86 15.0% 2,815,509 32,738
Earthjustice 235 59 25.1% 2,080,583 35,264

Grouping with text reuse

Fig. 3: Iteratively Group Documents

Fig 4: Identifying Groups of Linked Documents with Text Reuse (a 10-gram Window Function)

  • Document A shares no 10-word phrases with the others
  • B, C, and D share some text (they are part of an organized mass comment campaign)
  • E and F are the same text that was submitted twice

Results: Most public comments result from organized pressure campaigns

Fig. 5: Public Comments on Regulations.gov, 2005-2020

Comments that share a 10-gram with 99 or more others are part of a mass comment campaign.


Grouping with key phrases

  1. Humans identify groups of selected documents (e.g., lobbying coalitions)
  2. Humans copy and paste key phrases
  3. Computer puts other documents containing those phrases in the same group (coalition)

Preprocessing tip: Summaries speed hand-coding (e.g., use textrank to select representative sentences).


Results: Larger coalitions → more likely to win

Fig. 6: Lobbying Success by Campaign Size

Public pressure on climate and environmental justice greatly affected policy documents (Fig. 7), but a few organizations dominate lobbying coalitions (Table 2). When tribal governments or local groups lobby without the support of national advocacy organizations, policymakers typically ignore them.

Fig. 7: Policy Text Change by Coalition Size

Next steps

  • Compare exact entity linking (regex tables) to probabilistic methods (linkit, fastlink, supervised classified with hand-coded training set)
  • Compare exact grouping (e.g., by policy demands) to supervised probabilistic classifiers/clustering