6.1 What is a classifier?

Now that we have the narratives vectorized - turned into numbers, something a computer can understand - we can teach our machine which words to associate with which kinds of crimes.

Just like a human being, the computer will go through each sentence, seeing which words are usually found in a Part I crime and which are found in a Part II crime. Instead of just remembering them, though, it will use the data we created - every row is a sentence, every row is a word, and 0, 1, 2, etc are how many times the word appeared.

We’ll start by using a Random Forest, which is just one among many different machine learning techniques.

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=100)