3.3 Features (editorial choice)
Usually at this point we would think about what measurements to take for each FOIA request. These features are what we use to describe a request.
In this case we don’t have to think very hard: the work was already done for us! The FOIA Predictor csv file came with a lot of pre-computed features, and we’re just going to use the ones that FOIA Predictor uses:
- word count
- average sentence length
- references to FOIA law
- references to fees
- inclusion of hyperlinks
- inclusion of email address
- specificity
- whether the agency fulfills >50% of requested FOIA requests
There are more features available in our dataset - readability, for example - but FOIA Predictor only uses the ones above, so we’ll copy them.
features = df[[
'successful',
'high_success_rate_agency',
'word_count', 'avg_sen_len',
'ref_foia', 'ref_fees', 'hyperlink',
'email_address', 'specificity'
]]
features.head()
successful | high_success_rate_agency | word_count | avg_sen_len | ref_foia | ref_fees | hyperlink | email_address | specificity |
---|---|---|---|---|---|---|---|---|
0 | 0 | 194 | 21.55556 | 0 | 1 | 0 | 0 | 8 |
1 | 1 | 114 | 9.50000 | 0 | 0 | 0 | 0 | 32 |
1 | 1 | 79 | 39.50000 | 0 | 0 | 1 | 0 | 9 |
1 | 0 | 44 | 14.66667 | 0 | 0 | 1 | 0 | 2 |
0 | 1 | 24 | 24.00000 | 0 | 0 | 0 | 0 | 3 |
It’s easy enough to just follow in their tracks, right? The high_success_rate_agency
field is going to be an interesting one, but let’s move on to our analysis for now.