3.2 Success metrics (editorial choice)
Let’s read this data in! Since we’re using the same dataset that FOIA Predictor uses, it with a lot of extras. Along with the request itself, it also calculated bits and pieces, too, like average sentence length and a readability score.
import pandas as pd
pd.set_option("display.max_columns", 20)
pd.set_option("display.max_colwidth", 100)
columns = ['trackingID', 'title', 'agency', 'date_submitted',
'closed_date', 'url', 'status', 'char_count', 'word_count', 'ref_data',
'sen_count', 'avg_sen_len', 'closed_datetime', 'ref_foia',
'ref_fees', 'phone_number', 'hyperlink', 'email_address', 'ref_date',
'readability', 'specificity', 'high_success_rate_agency', 'request']
df = pd.read_csv("data/recent-requests-data-for-model.csv", usecols=columns)
df.head(3)
trackingID | title | agency | date_submitted | closed_date | url | status | request | char_count | word_count | sen_count | avg_sen_len | closed_datetime | ref_foia | ref_fees | phone_number | hyperlink | email_address | ref_date | readability | ref_data | specificity | high_success_rate_agency |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
24540 | “02/29/16 - SLCPD Abdi Mohamed Protest Action Plan and Debrief Docs” | 4223 | 2016-03-18 | 2016-04-11 | /foi/salt-lake-city-359/022916-slcpd-abdi-mohamed-protest-action-plan-and-debrief-docs-24540/ | no_docs | “- Action Plan(s) for policing demonstrations, protests and other events on February 29, 2016]. Please include all draft and final versions along with any associated metadata. - After Action Reports a... | 1008 | 194 | 9 | 21.55556 | 2016-04-11 | 0 | 1 | 0 | 0 | 0 | 1 | 13.820355 | 1 | 8 | 0 |
34051 | “1026812 documents” | 503 | 2017-02-26 | 2017-03-20 | /foi/chicago-169/1026812-documents-34051/ | done | “The following documents from the IAD investigation under the CR number 1026812, identified by their attachment number and description: 6. Synoptic report of Sgt. Richard Downs 19. Handwritten stateme... | 568 | 114 | 12 | 9.50000 | 2017-03-20 | 0 | 0 | 1 | 0 | 0 | 0 | 6.787368 | 0 | 32 | 1 |
31682 | “1033 MOU and annual (2015/16) inventory form (Illinois Dept of CMS)” | 4074 | 2017-01-05 | 2017-01-20 | /foi/illinois-168/1033-mou-and-annual-201516-inventory-form-illinois-dept-of-cms-31682/ | done | "-The current memorandum of agreement (MOA) or memorandum of understanding (MOU) with the Defense Logistics Agency, Disposition Services regarding the 1033 equipment surplus program administered by th... | 473 | 79 | 2 | 39.50000 | 2017-01-20 | 0 | 0 | 0 | 1 | 0 | 1 | 20.000000 | 0 | 9 | 1 |
When you read in the dataset of fulfilled FOIA requests, you immediately need to make some editorial decisions. The thing we’re looking for - whether a request was fulfilled or denied - is not actually exactly in the dataset.
What our dataset has instead is a status
column. The statuses look like this:
## done 2749
## no_docs 1879
## processed 1491
## ack 945
## rejected 739
## fix 455
## abandoned 304
## payment 247
## appealing 176
## partial 82
## submitted 27
## Name: status, dtype: int64
We see denied in there and done
technically means fulfilled, but we also see a lot of other things. We see when the request is the agency says the documents don’t exist, or they aren’t responding to emails, or the requester isn’t responding to emails, or more things that might not be totally clear.
The question is what counts as fulfilled, and do we use all of these documents? Let’s look at some of our options.
- Option One: Remove everything except documents that were marked as either accepted or rejected. Accepted counts as fulfilled, denied counts as not fulfilled.
- Option Two: Keep everything. Accepted counts as fulfilled, everything not ‘accepted’ counts as denied.
- **Option Three: Accepted counts as fulfilled, denied counts as not fulfilled. “Documents don’t exist” also counts as denied, because maybe you were just too vague, or too specific. “Abandoned” counts as denied, because again, maybe you were too vague or too specific in your original request.
- Options Four through One Hundred: You have a lot of options here. Picking which ones matter, picking what counts as fulfilled, what counts as denied. And there might not a right answer, maybe you and someone else have different ideas about what counts as a fulfilled request.
The FOIA Predictor uses Option Two, which casts the most narrow net for accepted and the widest net for denied. Because machine learning loves to do things with numbers, we’re now going to count 1
as success, and 0
as a denied.
## 0 6345
## 1 2749
## Name: successful, dtype: int64
More denials than successful requests, but I’m actually impressed at how many successes we have!