1
Introduction
2
Introduction
2.1
The original story
2.2
More stuff here?
3
Reading in and preparing our data
4
Defining our terms
4.1
Manually downgrading Part I offenses
5
Teaching computers to read
5.0.1
Data cleaning
5.1
Tokenizing
5.2
Vectorizing
6
Classification
6.1
What is a classifier?
6.2
Training our classifier
6.3
Making predictions with our model
7
Finding misclassified offenses
7.1
Trying other classifiers
7.1.1
Trying: LogisticRegression
7.1.2
Trying: LinearSVC
7.2
Sparse vs dense data
8
Improving our classifier
8.1
Predictions vs probabilities
8.2
Improving our vectorizer
9
Shortcomings
10
Something else
8
Improving our classifier