5.2 Interpreting our classifier

Now we have a machine that can tell us whether a complain is suspicious or not. We taught it using a list of words to pay attention to (aka features), but how does it really work?

The classifier looks at words the same way we do - some are more important than others, and will either mark a comment as suspicious or not. We can use a library called eli5 to explain the which words the classifier is looking at!

import eli5

feature_names = list(training_features.drop(columns='is_suspicious').columns)
eli5.explain_weights_df(clf, feature_names=feature_names)

target	feature	weight
1	violent	10.3622783
1	explode	1.2691041
1	air bag	1.2681947
1	airbag	0.9455542
1	<BIAS>	-2.8487556
1	shrapnel	-6.5557720
1	failed	-8.6491094
1	did not deploy	-10.6061532

A positive number means it means “this looks suspicious!”, while a negative number suggests the complain is not suspicious. We can see that “violent” is a strong suggested that we might want to read the complaint, while “did not deploy” means that even though it’s airbag-related it probably doesn’t concern us.