3.2 Success metrics (editorial choice)

Let’s read this data in! Since we’re using the same dataset that FOIA Predictor uses, it with a lot of extras. Along with the request itself, it also calculated bits and pieces, too, like average sentence length and a readability score.

import pandas as pd
pd.set_option("display.max_columns", 20)
pd.set_option("display.max_colwidth", 100)

columns = ['trackingID', 'title', 'agency', 'date_submitted',
          'closed_date', 'url', 'status', 'char_count', 'word_count', 'ref_data',
          'sen_count', 'avg_sen_len', 'closed_datetime', 'ref_foia',
          'ref_fees', 'phone_number', 'hyperlink', 'email_address', 'ref_date',
          'readability', 'specificity', 'high_success_rate_agency', 'request']
df = pd.read_csv("data/recent-requests-data-for-model.csv", usecols=columns)
df.head(3)

trackingID	title	agency	date_submitted	closed_date	url	status	request	char_count	word_count	sen_count	avg_sen_len	closed_datetime	ref_fees	phone_number	hyperlink	ref_date	readability	ref_data	specificity	high_success_rate_agency
24540	“02/29/16 - SLCPD Abdi Mohamed Protest Action Plan and Debrief Docs”	4223	2016-03-18	2016-04-11	/foi/salt-lake-city-359/022916-slcpd-abdi-mohamed-protest-action-plan-and-debrief-docs-24540/	no_docs	“- Action Plan(s) for policing demonstrations, protests and other events on February 29, 2016]. Please include all draft and final versions along with any associated metadata. - After Action Reports a...	1008	194	9	21.55556	2016-04-11	1	0	0	1	13.820355	1	8	0
34051	“1026812 documents”	503	2017-02-26	2017-03-20	/foi/chicago-169/1026812-documents-34051/	done	“The following documents from the IAD investigation under the CR number 1026812, identified by their attachment number and description: 6. Synoptic report of Sgt. Richard Downs 19. Handwritten stateme...	568	114	12	9.50000	2017-03-20	0	1	0	0	6.787368	0	32	1
31682	“1033 MOU and annual (2015/16) inventory form (Illinois Dept of CMS)”	4074	2017-01-05	2017-01-20	/foi/illinois-168/1033-mou-and-annual-201516-inventory-form-illinois-dept-of-cms-31682/	done	"-The current memorandum of agreement (MOA) or memorandum of understanding (MOU) with the Defense Logistics Agency, Disposition Services regarding the 1033 equipment surplus program administered by th...	473	79	2	39.50000	2017-01-20	0	0	1	1	20.000000	0	9	1

When you read in the dataset of fulfilled FOIA requests, you immediately need to make some editorial decisions. The thing we’re looking for - whether a request was fulfilled or denied - is not actually exactly in the dataset.

What our dataset has instead is a status column. The statuses look like this:

df.status.value_counts()

## done         2749
## no_docs      1879
## processed    1491
## ack           945
## rejected      739
## fix           455
## abandoned     304
## payment       247
## appealing     176
## partial        82
## submitted      27
## Name: status, dtype: int64

We see denied in there and done technically means fulfilled, but we also see a lot of other things. We see when the request is the agency says the documents don’t exist, or they aren’t responding to emails, or the requester isn’t responding to emails, or more things that might not be totally clear.

The question is what counts as fulfilled, and do we use all of these documents? Let’s look at some of our options.

Option One: Remove everything except documents that were marked as either accepted or rejected. Accepted counts as fulfilled, denied counts as not fulfilled.
Option Two: Keep everything. Accepted counts as fulfilled, everything not ‘accepted’ counts as denied.
**Option Three: Accepted counts as fulfilled, denied counts as not fulfilled. “Documents don’t exist” also counts as denied, because maybe you were just too vague, or too specific. “Abandoned” counts as denied, because again, maybe you were too vague or too specific in your original request.
Options Four through One Hundred: You have a lot of options here. Picking which ones matter, picking what counts as fulfilled, what counts as denied. And there might not a right answer, maybe you and someone else have different ideas about what counts as a fulfilled request.

The FOIA Predictor uses Option Two, which casts the most narrow net for accepted and the widest net for denied. Because machine learning loves to do things with numbers, we’re now going to count 1 as success, and 0 as a denied.

df['successful'] = (df.status == "done").astype(int)
df.successful.value_counts()

## 0    6345
## 1    2749
## Name: successful, dtype: int64

More denials than successful requests, but I’m actually impressed at how many successes we have!