This repository contains code to merge, augment, and analyze data on congressional correspondence with the federal bureaucracy.

Publications using these data

“How Shifting Priorities and Capacity Affect Policy Work and Constituency Service: Evidence from a Census of Legislator Requests to U.S. Federal Agencies”
- Documentation and replication code https://github.com/judgelord/corr
- Replication data on Harvard Dataverse: https://doi.org/10.7910/DVN/LWOCWO
“Legislator Advocacy on Behalf of Constituents and Corporate Donors”
- Documentation and replication code https://github.com/judgelord/ferc

Data

Correspondence data come from FOIA requests, FOIA reading rooms, and web scraping disclosed correspondence. Some data include the full text of letters, but most are in the form of correspondence logs maintained by agencies, which may include phone, email, letterhead contacts (#92). Some letters are signed by more than one member, so each member-level observation is given a unique data_id, as well as a LetterID that is unique to each letter or phone call. Agency id’s are preserved in ID. Otherwise, this is the row number of the datasheet. See metadata documentation.

(See AJPS Dataverse linked above)

Member data from https://www.voteview.com/ via the legislators package.
- chamber and party from J. B. Lewis et al. (2022) via voteview.com (also available on dataverse)
Committee membership data come from https://github.com/judgelord/committees, which includes Charles Stewart III and Jonathan Woon, Congressional Committee Assignments, 103rd to 114th Congresses, 1993–2017, http://web.mit.edu/17.251/www/data_page.html, with corrections (originally discussed in #12, now in https://github.com/judgelord/committees/issues) and then merged with historical committee membership data from the version history of @unitedstates-project committee membership data.
State Population is from the U.S. Census
- state population from U.S. Census Bureau (2019)
Oversight committee jurisdiction data come from Lewis and Selin, crosswalked with committee data above in https://github.com/judgelord/committees

Software

The FOIA data are cleaned using scripts in the repo and linked to other data via ICPSR numbers using the legislators R package: https://judgelord.github.io/legislators/ (There will soon be an agencies package to link agency names to datasets.)

Want to help?

Here are some tasks that anyone can do:

Find letters that Members of Congress write to agencies (e.g., letters they post on their website) and email them to CorrespondenceResearch@gmail.com. We will check to see if they are in our data and add them.

For collaborators

Data are stored in Google Sheets in the project’s Google Drive in the “datasheets” folder
Some still need to be extracted from PDFs [#77]
Data extracted from PDFs but not yet uploaded to Google Drive should have an open issue named “add AGENCY data to drive.”
Memes should be posted to #158

All datasheets must have these columns:

FROM is the column with the name(s) of the Member(s) of Congress that signed the letter. If names are in multiple columns, a new FROM column will be created in the script to clean that data.
DATE is the date of the letter (or the best approximation).
SUBJECT is a summary of the letter’s content. If more than one column contains substantive information, these are added to SUBJECT in the script cleaning the data.

Most datasheets have additional columns, such as the letter’s text, priority level, date of reply, or the person in the agency tasked with responding to the letter. Because such information is not consistent across agencies, these are dropped when sheets are merged. They can be added back in for a more detailed analysis of specific departments or agencies. For example, see the more detailed analysis of FERC.

Other columns required for applying the codebook are added by the function in prep sheets.R.

Cleaning

Sheets that need cleaning should have an open issue named “clean script for AGENCY” (e.g., “clean script EPA”)
When the clean script is done, remember to add it to data_list.R
If additional work is needed, there may be an issue called “debug AGENCY” (e.g., “debug EPA”)

If extractMemberName() fails to match:

Inspect the pattern variable. There are two main causes of failing to match:
Missing permutations of names in the members data that comes with the legislators package. Please add to or open an issue on the legislators-data repo
Typos - If it is a common typo that we are likely to see in other data (not just from that agency), please add to or open an issue on the legislators-data repo - If you suspect it is an uncommon typo or unique to that agency, fix it with find and replace in the Google Sheet or with a regex in the clean script

There will eventually be a process for users to submit additional permutations and typos to the legislators package data. Until then, use github issues.

If the pattern exists, but extractMemberName() fails to find it, this may be a new or existing bug. please add to or open an issue on the legislators repo

Where there is insufficient information to identify a letter’s date or author, the NOTES column should include “FOIA,” and commits tagging observations to FOIA should reference #76

Coding

Codebook

Data that are ready for coding should have an open issue named “apply codebook to AGENCY.”

Interesting letters/anecdotes should be tagged with [#172]

Where there is insufficient information to code a letter, the NOTES column should include “FOIA” and #76 should be tagged in the “apply codebook” issue.

Validating

Validation issues should begin with “validate.”

correspondence