Human coding and computational text analysis are more powerful when combined in an iterative workflow.
Applying simple search and text-reuse methods to public comments on all U.S. federal agency rules, a sample of 10,894 hand-coded comments yields 41 million as-good-as-hand-coded comments regarding both the organizations that mobilized them and the extent to which policy changed in the direction they sought.
Workflow: googlesheets4
allows analysis and improving data in real-time. For example, in Fig. 1:
Fig. 1: Coded Public Comments in a Google Sheet
Entity | Pattern |
---|---|
3M Co | 3M Co|3M Health Information Systems|Ceradyne|Cogent Systems|Hybrivet Systems |
Teamsters Union | Brotherhood of Locomotive Engineers (and|&) Trainmen|Brotherhood of Maint[a-z]* of Way Employ|Teamsters |
Fig 2: Iteratively Building Regex Tables
For example, the legislators
package uses a regex table, adding variants (e.g., “AOC”) to standard legislator names to detect them in messy text.
Of 58 million public comments on proposed agency rules, the top 100 organizations mobilized 43,938,811. The top ten organizations mobilized 25,947,612.
Organization | Rules Lobbied On | Pressure Campaigns | Percent (Campaigns /Rules) | Comments | Average per Campaign |
---|---|---|---|---|---|
NRDC | 530 | 62 | 11.7% | 5,939,264 | 95,795 |
Sierra Club | 591 | 110 | 18.6% | 5,111,922 | 46,472 |
CREDO | 90 | 41 | 45.6% | 3,019,150 | 73,638 |
Environmental Defense Fund | 111 | 31 | 27.9% | 2,849,517 | 91,920 |
Center For Biological Diversity | 572 | 86 | 15.0% | 2,815,509 | 32,738 |
Earthjustice | 235 | 59 | 25.1% | 2,080,583 | 35,264 |
Fig. 3: Iteratively Group Documents
Fig 4: Identifying Groups of Linked Documents with Text Reuse (a 10-gram Window Function)
Fig. 5: Public Comments on Regulations.gov, 2005-2020
Comments that share a 10-gram with 99 or more others are part of a mass comment campaign.
Preprocessing tip:
Summaries speed hand-coding (e.g., use textrank
to select representative sentences).
Fig. 6: Lobbying Success by Campaign Size