Analyzing hundreds of thousands of letters, emails, and phone calls between legislators and federal agencies.
This repository contains code to merge, augment, and analyze data on congressional correspondence with the federal bureaucracy.
“Legislator Advocacy on Behalf of Constituents and Corporate Donors”
data_id, as well as a LetterID that is unique to each letter or phone call.
Agency id’s are preserved in ID. Otherwise, this is the row number of the datasheet. See metadata documentation.(See AJPS Dataverse linked above)
Member data from https://www.voteview.com/ via the legislators package.
Committee membership data come from https://github.com/judgelord/committees, which includes Charles Stewart III and Jonathan Woon, Congressional Committee Assignments, 103rd to 114th Congresses, 1993–2017, http://web.mit.edu/17.251/www/data_page.html, with corrections (originally discussed in #12, now in https://github.com/judgelord/committees/issues) and then merged with historical committee membership data from the version history of @unitedstates-project committee membership data.
State Population is from the U.S. Census
Oversight committee jurisdiction data come from Lewis and Selin, crosswalked with committee data above in https://github.com/judgelord/committees
The FOIA data are cleaned using scripts in the repo and linked to other data via ICPSR numbers using the legislators R package: https://judgelord.github.io/legislators/
(There will soon be an agencies package to link agency names to datasets.)
Here are some tasks that anyone can do:

All datasheets must have these columns:
FROM is the column with the name(s) of the Member(s) of Congress that signed the letter. If names are in multiple columns, a new FROM column will be created in the script to clean that data.DATE is the date of the letter (or the best approximation).SUBJECT is a summary of the letter’s content. If more than one column contains substantive information, these are added to SUBJECT in the script cleaning the data.Most datasheets have additional columns, such as the letter’s text, priority level, date of reply, or the person in the agency tasked with responding to the letter. Because such information is not consistent across agencies, these are dropped when sheets are merged. They can be added back in for a more detailed analysis of specific departments or agencies. For example, see the more detailed analysis of FERC.
Other columns required for applying the codebook are added by the function in prep sheets.R.
data_list.RIf extractMemberName() fails to match:
pattern variable. There are two main causes of failing to match:members data that comes with the legislators package. Please add to or open an issue on the legislators-data repolegislators-data repo
- If you suspect it is an uncommon typo or unique to that agency, fix it with find and replace in the Google Sheet or with a regex in the clean scriptThere will eventually be a process for users to submit additional permutations and typos to the legislators package data. Until then, use github issues.
If the pattern exists, but extractMemberName() fails to find it, this may be a new or existing bug. please add to or open an issue on the legislators repo
Where there is insufficient information to identify a letter’s date or author, the NOTES column should include “FOIA,” and commits tagging observations to FOIA should reference #76
Data that are ready for coding should have an open issue named “apply codebook to AGENCY.”
Where there is insufficient information to code a letter, the NOTES column should include “FOIA” and #76 should be tagged in the “apply codebook” issue.

