3.2 Success metrics (editorial choice)

Let’s read this data in! Since we’re using the same dataset that FOIA Predictor uses, it with a lot of extras. Along with the request itself, it also calculated bits and pieces, too, like average sentence length and a readability score.

import pandas as pd
pd.set_option("display.max_columns", 20)
pd.set_option("display.max_colwidth", 100)

columns = ['trackingID', 'title', 'agency', 'date_submitted',
          'closed_date', 'url', 'status', 'char_count', 'word_count', 'ref_data',
          'sen_count', 'avg_sen_len', 'closed_datetime', 'ref_foia',
          'ref_fees', 'phone_number', 'hyperlink', 'email_address', 'ref_date',
          'readability', 'specificity', 'high_success_rate_agency', 'request']
df = pd.read_csv("data/recent-requests-data-for-model.csv", usecols=columns)
df.head(3)
trackingID title agency date_submitted closed_date url status request char_count word_count sen_count avg_sen_len closed_datetime ref_foia ref_fees phone_number hyperlink email_address ref_date readability ref_data specificity high_success_rate_agency
24540 “02/29/16 - SLCPD Abdi Mohamed Protest Action Plan and Debrief Docs” 4223 2016-03-18 2016-04-11 /foi/salt-lake-city-359/022916-slcpd-abdi-mohamed-protest-action-plan-and-debrief-docs-24540/ no_docs “- Action Plan(s) for policing demonstrations, protests and other events on February 29, 2016]. Please include all draft and final versions along with any associated metadata. - After Action Reports and/or written debriefs of demonstrations, protests and other events on the day. Please include all draft and final versions along with any associated metadata.. Please include in your search, any and all documents, correspondence within and outside the department, emails, training presentations, PowerPoint slides, and Microsoft Word documents, which discuss the demonstrations and or protests through the date of this request. The requested documents will be made available to the general public free of charge as part of the public information service at MuckRock.com, and is not being made for commercial usage. In the event that fees cannot be waived, I would be grateful if you would inform me of the total charges in advance of fulfilling my request. I would prefer the request filled electronically, by e-mail attachment if available or CD-ROM if not. Thank you in advance for your anticipated cooperation in this matter. I look forward to receiving your response to this request within 5 business days, as the statute requires.” 1008 194 9 21.55556 2016-04-11 0 1 0 0 0 1 13.820355 1 8 0
34051 “1026812 documents” 503 2017-02-26 2017-03-20 /foi/chicago-169/1026812-documents-34051/ done “The following documents from the IAD investigation under the CR number 1026812, identified by their attachment number and description: 6. Synoptic report of Sgt. Richard Downs 19. Handwritten statement of Lt. John Brundage 24-30. Interviews with Cmdr. Leo Schmitz, Lt. John Brundage, Sgt. Patrick Quinn, P.O. Brenda Gomez-Sanchez, P.P.O Milton Kinnison, and Sgt. Sean Ronan 54-55. Results of email account searches of Cmdr. Leo Schmitz and Sgt. Sean Ronan 56-59. All in-car camera footage 78. Cmdr. Leo Schmitz’s Blackberry log 80. Cmdr. Leo Schmitz’s response to OCIC report 81. OCIC report retrieved from Sgt. Sean Ronan’s email 87. Cmdr. Leo Schmitz’s disciplinary history 89. Lt. John Brundage’s disciplinary history” 568 114 12 9.50000 2017-03-20 0 0 1 0 0 0 6.787368 0 32 1
31682 “1033 MOU and annual (2015/16) inventory form (Illinois Dept of CMS)” 4074 2017-01-05 2017-01-20 /foi/illinois-168/1033-mou-and-annual-201516-inventory-form-illinois-dept-of-cms-31682/ done "-The current memorandum of agreement (MOA) or memorandum of understanding (MOU) with the Defense Logistics Agency, Disposition Services regarding the 1033 equipment surplus program administered by the DLA Law Enforcement Support Office -The annual inventory form for years 2015 and 2016 required to be completed by the state coordinator of the 1033 program According to the DLA FAQ page regarding the 1033 program, CMS is the agency in charge of coordinating the 1033 program for Illinois. See http://www.dispositionservices.dla.mil/leso/Pages/StateCoordinatorList.aspx"; 473 79 2 39.50000 2017-01-20 0 0 0 1 0 1 20.000000 0 9 1

When you read in the dataset of fulfilled FOIA requests, you immediately need to make some editorial decisions. The thing we’re looking for - whether a request was fulfilled or denied - is not actually exactly in the dataset.

What our dataset has instead is a status column. The statuses look like this:

df.status.value_counts()
## done         2749
## no_docs      1879
## processed    1491
## ack           945
## rejected      739
## fix           455
## abandoned     304
## payment       247
## appealing     176
## partial        82
## submitted      27
## Name: status, dtype: int64

We see denied in there and done technically means fulfilled, but we also see a lot of other things. We see when the request is the agency says the documents don’t exist, or they aren’t responding to emails, or the requester isn’t responding to emails, or more things that might not be totally clear.

The question is what counts as fulfilled, and do we use all of these documents? Let’s look at some of our options.

  • Option One: Remove everything except documents that were marked as either accepted or rejected. Accepted counts as fulfilled, denied counts as not fulfilled.
  • Option Two: Keep everything. Accepted counts as fulfilled, everything not ‘accepted’ counts as denied.
  • **Option Three: Accepted counts as fulfilled, denied counts as not fulfilled. “Documents don’t exist” also counts as denied, because maybe you were just too vague, or too specific. “Abandoned” counts as denied, because again, maybe you were too vague or too specific in your original request.
  • Options Four through One Hundred: You have a lot of options here. Picking which ones matter, picking what counts as fulfilled, what counts as denied. And there might not a right answer, maybe you and someone else have different ideas about what counts as a fulfilled request.

The FOIA Predictor uses Option Two, which casts the most narrow net for accepted and the widest net for denied. Because machine learning loves to do things with numbers, we’re now going to count 1 as success, and 0 as a denied.

df['successful'] = (df.status == "done").astype(int)
df.successful.value_counts()
## 0    6345
## 1    2749
## Name: successful, dtype: int64

More denials than successful requests, but I’m actually impressed at how many successes we have!