5.2 Interpreting our classifier
Now we have a machine that can tell us whether a complain is suspicious or not. We taught it using a list of words to pay attention to (aka features), but how does it really work?
The classifier looks at words the same way we do - some are more important than others, and will either mark a comment as suspicious or not. We can use a library called eli5
to explain the which words the classifier is looking at!
import eli5
feature_names = list(training_features.drop(columns='is_suspicious').columns)
eli5.explain_weights_df(clf, feature_names=feature_names)
target | feature | weight |
---|---|---|
1 | violent | 10.3622783 |
1 | explode | 1.2691041 |
1 | air bag | 1.2681947 |
1 | airbag | 0.9455542 |
1 | <BIAS> | -2.8487556 |
1 | shrapnel | -6.5557720 |
1 | failed | -8.6491094 |
1 | did not deploy | -10.6061532 |
A positive number means it means “this looks suspicious!”, while a negative number suggests the complain is not suspicious. We can see that “violent” is a strong suggested that we might want to read the complaint, while “did not deploy” means that even though it’s airbag-related it probably doesn’t concern us.