6.1 What is a classifier?
Now that we have the narratives vectorized - turned into numbers, something a computer can understand - we can teach our machine which words to associate with which kinds of crimes.
Just like a human being, the computer will go through each sentence, seeing which words are usually found in a Part I crime and which are found in a Part II crime. Instead of just remembering them, though, it will use the data we created - every row is a sentence, every row is a word, and 0
, 1
, 2
, etc are how many times the word appeared.
We’ll start by using a Random Forest, which is just one among many different machine learning techniques.