4.1 Designing our model

Now that our data is prepared, we need to build the thing that’s going to make the predictions for us, the thing we ask: “excuse me, but is this FOIA request going to succeed?” In technical terms this “thing” is called a model, because it models the interaction between the sentence length, etc, and whether our request is accepted or denied.

4.1.1 Classification problems

Any time you’re trying to predict a yes/no or a category, you’re looking at a classification problem. With a classification problem you have examples of both categories, and you train the model about the difference between the two groups.

In this case, we have two classes of documents: successful or unsuccessful, 1 and 0. We also have a set of documents that we know are in one category or the other. We’ll use these known documents to then make predictions about unknown documents (the FOIA you submit to the Predictor).