6.3 Making predictions with our model
Now that our model has read through almost 40,000 narratives, it should have a decent idea of what aggravated assault vs. simple assault is. We can test how good it is with a couple fake sentences:
# we already learned the words above, so we just use .transform to count them
sample_X = vec.transform([
"S SHOT AND STABBED V WITH A GUN AND THE GUN HAD A KNIFE ON IT",
"S PUNCHED THEIR NEIGHBOR"
])
clf.predict(sample_X)
## array([1, 0])
For the first one our model saw words associated with Part I crimes, so it predicted 1
- an aggravated assault - and for the second one it saw less serious words, so it predicted 0
- a sample assault. Seems reasonable to me!