5.2 Interpreting our classifier

Now we have a machine that can tell us whether a complain is suspicious or not. We taught it using a list of words to pay attention to (aka features), but how does it really work?

The classifier looks at words the same way we do - some are more important than others, and will either mark a comment as suspicious or not. We can use a library called eli5 to explain the which words the classifier is looking at!

import eli5

feature_names = list(training_features.drop(columns='is_suspicious').columns)
eli5.explain_weights_df(clf, feature_names=feature_names)
target feature weight
1 violent 10.3622783
1 explode 1.2691041
1 air bag 1.2681947
1 airbag 0.9455542
1 <BIAS> -2.8487556
1 shrapnel -6.5557720
1 failed -8.6491094
1 did not deploy -10.6061532

A positive number means it means “this looks suspicious!”, while a negative number suggests the complain is not suspicious. We can see that “violent” is a strong suggested that we might want to read the complaint, while “did not deploy” means that even though it’s airbag-related it probably doesn’t concern us.