7 Finding misclassified offenses

But how will we use this to find potentially misclassified cases?

First, we’ll show our model all of our case narratives, and it will predict whether each one should be Part I or Part II. More violent ones should be predicted as 1, and less violent ones will be predicted as 0.

X = vec.transform(df.DO_NARRATIVE)
df['prediction'] = clf.predict(X)
df.prediction.value_counts()
## 0    25579
## 1     5873
## Name: prediction, dtype: int64

Now that we have a prediction for each offense, we’ll then compare those predictions to the actual classification. If a report is classified as simple assault but the classifier thinks it should be aggravated assault, it’s probably worth investigating!

df[(df.reported == 0) & (df.prediction == 1)].head()
CCDESC DO_NARRATIVE is_part_i reported prediction
86 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT DO-VICT AND SUSP WERE INVOLVED IN AN ARGUMENT SUSP BRANDISHED A KNIFE ANDBEGAN SWINGING AT VICTS VICTS IN FEAR COULD NOT LEAVE THE APT 1 0 1
140 BATTERY - SIMPLE ASSAULT DO-SUSP ATTEMPTED TO STEAL CIGARETTES SUSP WAS CONFRONTED STRUCK VICT INTHE HEAD WITH 12 PACK OF COKE 0 0 1
179 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT DO-VICTS WERE PKD IN VEH VICTS HEARD SUSP YELL TO THEM SUSP THEN PRODUCED A HANDGUN FIRD 5 SHOTS AT VEH STRIKING IT 1 TIME VICTS FLED LOC IN VEH 1 0 1
275 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT DO-SUSP CHASED V1 V2 SUSP THEN TC INTO V1 VEH 3 TIMES SUSP MACED V2 ON FACE CAUSING INJURIES SUSP THEN DROVE SB ON WESTERN TO UNK LOC 1 0 1
277 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT DO-DURING A DISPUTE ON THE CAMPUS OF USC SUSP PRODUCED FIREARM AND FIRED SEVERAL GUNSHOTS STRIKING FOUR VICTS 1 0 1

How many of our misclassified crimes did we catch? Let’s see what was predicted for the ones that started off as Part I, that we then downgraded to Part II.

df[(df.is_part_i == 1) & (df.reported == 0)].prediction.value_counts()
## 0    1819
## 1     125
## Name: prediction, dtype: int64

…well, that was pretty terrible. From the Part I crimes that we downgraded to Part II, only about 8% were correctly identified as Part I.