3.3 Features (editorial choice)

Usually at this point we would think about what measurements to take for each FOIA request. These features are what we use to describe a request.

In this case we don’t have to think very hard: the work was already done for us! The FOIA Predictor csv file came with a lot of pre-computed features, and we’re just going to use the ones that FOIA Predictor uses:

  • word count
  • average sentence length
  • references to FOIA law
  • references to fees
  • inclusion of hyperlinks
  • inclusion of email address
  • specificity
  • whether the agency fulfills >50% of requested FOIA requests

There are more features available in our dataset - readability, for example - but FOIA Predictor only uses the ones above, so we’ll copy them.

features = df[[
  'successful', 
  'high_success_rate_agency', 
  'word_count', 'avg_sen_len', 
  'ref_foia', 'ref_fees', 'hyperlink', 
  'email_address', 'specificity'
]]
features.head()
successful high_success_rate_agency word_count avg_sen_len ref_foia ref_fees hyperlink email_address specificity
0 0 194 21.55556 0 1 0 0 8
1 1 114 9.50000 0 0 0 0 32
1 1 79 39.50000 0 0 1 0 9
1 0 44 14.66667 0 0 1 0 2
0 1 24 24.00000 0 0 0 0 3

It’s easy enough to just follow in their tracks, right? The high_success_rate_agency field is going to be an interesting one, but let’s move on to our analysis for now.