Taking a closer look at our crime classifier's shortcomings#

In our last project, we built a detector to identify serious assaults that were downgraded into simple assaults. In this notebook we'll see how our algorithm operates, and address any issues about what this technique might miss.

Read online Download notebook Interactive version

Imports and setup#

First we'll set some options up to make everything display correctly. It's mostly because these assault descriptions can be quite long, and the default is to truncate text after a few words.

import pandas as pd
import numpy as np
from sklearn.svm import LinearSVC

pd.set_option('display.max_colwidth', 200)
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 300)

%matplotlib inline

Repeat our analysis#

First we'll repeat the majority of our processing and analysis from the first notebook, then we'll get into the critique.

Read in our data#

Our dataset is going to be a database of crime reports between 2008 and 2012. It will start off with two columns:

CCDESC, what criminal code was violated
DO_NARRATIVE, a short text description of what happened

We're going to use this description to see if we can separate serious cases of assault compared to non-serious cases of assault.

We won't be covering the process of vectorizing our dataset and creating our classifier in this notebook. Instead, we're going to focus on analyzing the possible shortcomings of our analysis, both conceptually and technically.

# Read in our dataset
df = pd.read_csv("data/2008-2012.csv")

# Only use reports classified as types of assault
df = df[df.CCDESC.str.contains("ASSAULT")].copy()

# Classify as serious or non-serious
df['serious'] = df.CCDESC.str.contains("AGGRAVATED") | df.CCDESC.str.contains("DEADLY")
df['serious'] = df['serious'].astype(int)

# Downgrade 15% from aggravated to simple assault
serious_subset = df[df.serious == 1].sample(frac=0.15)
df['downgraded'] = 0
df.loc[serious_subset.index, 'downgraded'] = 1
df.loc[serious_subset.index, 'serious'] = 0

# Examine the first few
df.head()

	CCDESC	DO_NARRATIVE	serious
2	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-S APPRCHED V AND STATED ARE YOU GOING TO FCK ME V REPLIED NO SUSP PULL ED OUT A KNIFE AND STATED IM HERE TO HURT YOU BTCH S USED PROFANITIES	1
4	BATTERY - SIMPLE ASSAULT	DO-SUSP USED RIGHT FIST TO PUNCH VICT IN THE HEAD ONCE N PULL VICT HAIR FOR APPRX 15 SECONDS	0
9	BATTERY - SIMPLE ASSAULT	DO-S APPROACHED V IN VEH S SLAPPED AND LUNGGED AT V	0
11	BATTERY - SIMPLE ASSAULT	DO-V STATED THAT SUSP CONFRT HER WHEN SHE TRIED TO APPR HER HUSBAND SUSP AND V HUSBAND ARE FRNDS SUSP YELLED STAY AWAY FROM HIM AND PUSHED V	0
16	BATTERY - SIMPLE ASSAULT	DO-SUSPS WERE VERBALLY ABUSING VICT DURING WHICH TIME S1 STRUCK VICT THREETIMES ON THE BACK OF HIS LEFT SHOULDER	0

Vectorize#

%%time

from nltk.stem import SnowballStemmer
from sklearn.feature_extraction.text import TfidfVectorizer

stemmer = SnowballStemmer('english')
class StemmedTfidfVectorizer(TfidfVectorizer):
    def build_analyzer(self):
        analyzer = super(StemmedTfidfVectorizer,self).build_analyzer()
        return lambda doc:(stemmer.stem(word) for word in analyzer(doc))

vectorizer = StemmedTfidfVectorizer(min_df=20, max_df=0.5)

X = vectorizer.fit_transform(df.DO_NARRATIVE)
words_df = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names())
words_df.head(5)

CPU times: user 1min 5s, sys: 2.43 s, total: 1min 8s
Wall time: 1min 10s

	15	...	yell	you
0	0.0000	...	0.000000	0.313373
1	0.4045	...	0.000000	0.000000
2	0.0000	...	0.000000	0.000000
3	0.0000	...	0.226112	0.000000
4	0.0000	...	0.000000	0.000000

5 rows × 2879 columns

Classify#

%%time

X = words_df
y = df.serious

clf = LinearSVC()
clf.fit(X, y)

CPU times: user 4.62 s, sys: 2.77 s, total: 7.39 s
Wall time: 8.66 s

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
          intercept_scaling=1, loss='squared_hinge', max_iter=1000,
          multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
          verbose=0)

Analyzing our classifier#

What words are important?#

Before we see how our classifier performs, let's take a look at which words point towards a report being either simple or aggravated assault. In this case, we're going to look at the top and bottom 20 terms.

import eli5

eli5.show_weights(clf, vec=vectorizer, top=(20,20), horizontal_layout=True)

y=1 top features

Weight^?	Feature
+2.685	stab
+2.358	shot
+2.238	bat
+2.100	knife
+1.958	shovel
+1.952	boil
+1.908	cutter
+1.893	machet
+1.835	gasolin
+1.833	crowbar
+1.810	bottl
+1.805	shoot
+1.800	brass
+1.766	stabb
+1.761	hammer
+1.760	brick
+1.691	rearend
+1.672	fire
+1.668	ram
+1.646	sveh
… 1840 more positive …
… 1000 more negative …
-0.973	hate
-1.005	patient
-1.014	ph
-1.028	dept
-1.051	wouldnt
-1.071	w3
-1.084	phne
-1.085	spat
-1.093	grade
-1.095	subj2
-1.105	grabbedv
-1.121	inuri
-1.157	airsoft
-1.158	vincent
-1.234	flick
-1.252	egg
-1.259	prob
-1.343	push
-1.362	cp
-1.398	ppa

We can throw it into a graph form, too.

eli5.explain_weights_df(
    clf, 
    vec=vectorizer, top=(20,20)
).plot(
    x='feature', 
    y='weight',
    kind='barh',
    figsize=(10,10)
)

<matplotlib.axes._subplots.AxesSubplot at 0x23b3244a8>

Lots of interesting stuff in there!

Does it sound reasonable which terms imply aggravated vs simple assault?
Which ones are misspellings? Does that worry you?
Are there any terms in there you don't quite understand?

I personally don't understand ppa, so I'm going to look it up in the dataset.

df[df.DO_NARRATIVE.str.contains("PPA")].head()

	CCDESC	DO_NARRATIVE
286	BATTERY - SIMPLE ASSAULT	DO-S HUSBAND BECAME VERY ANGRY FOR NO APPARENT REASON AND GRABBED V RIGHT ARM AND TWISTED V ARM IN AN AWKWARD POSITION S THEN PUSHED V BODY AGAINST REFR
1506	BATTERY - SIMPLE ASSAULT	DO-V AND S INVOLVED IN ARGUMENT V AND S THEN ENGAGED IN MUTUAL COMBAT FLIGHT V AND S REFUSED PPA
1653	BATTERY - SIMPLE ASSAULT	DO-WIT 1 AND VICT WERE INVOLVED IN A HEATED DISPUTE THAT ESCALATED I MUTUAL COMBAT WITHOUT INJURIES WIT 1 AND VICT DID NOT WISH FOR PPA
2857	BATTERY - SIMPLE ASSAULT	DO-S APPROACHED V AND PUSHED HIM WITH AN OPEN HAND V THEN TOOK S INTO CUSTODY AND PLACED UNDER PPA
9340	BATTERY - SIMPLE ASSAULT	DO-S STRUCK V ON FACE AND CHEST WITH CLOSED FIST S BLAMED V FOR NOT BEINGABLE TO FIND SUSPS SISTER S PLACED UNDER PPA FOR BATTERY

PPA stands for Private Person's Arrest, and you can find read more about it and even see the form that's filled out. PPAs are often performed by private security guards.

plastic also seems to be popular under non-serious assaults.

df[df.DO_NARRATIVE.str.contains("PLASTIC")].head()

	CCDESC	DO_NARRATIVE
2014	BATTERY - SIMPLE ASSAULT	DO-SUSP BECAME ANGRY N THREW PLASTIC CUP AT VICT IN BACK SUSP WAS ARRESTED
7554	BATTERY - SIMPLE ASSAULT	DO- V AND S BECAME INVOLVED IN A VERBAL DISPUTE S THREW A PLASTIC BTL AT VHITTING V ON HER FACE S PUSHED V CHEST WITH 2 OPEN HANDS THEN FLED LOC
8259	BATTERY - SIMPLE ASSAULT	DO-S APPROACHED V ON SIDEWALK WITHOUT SAYING ANYTHING S WHIPPED V 3 OR 4 TIMES ON HER LEFT UPPER ARM WITH A YELLOW PLASTIC CLOTH WHIP
8897	BATTERY - SIMPLE ASSAULT	DO-SUSP THREW A ISOPROPYL RUBBING ALCOHOL RUBBING ALCOHOL BOTTLE WHICH WASFULL AND PLASTIC BOTTLE AT VIC CAUSING VS EYE 2 SWELL S FLED TO UNK LOC
9339	BATTERY - SIMPLE ASSAULT	DO-S SON IS LIVING WITH VICTS AND NOT ATTENDING SCHOOL SUSP BECOMES ANGRY W VICTS AND POKES V1 W PLASTIC BROOM HANDLE IN LF SIDE AND PUSHES V2 ON LF SIDE

Guess they aren't too dangerous?

Making predictions to find downgraded crimes#

To see if our algorithm can find downgraded reports, we'll first ask it to make predictions on each of the descriptions we have. If a report is listed as not serious, but the algorithm thinks it should be serious, we should examine the report further.

# Feed the classifier the word counts (X) to have it make the prediction
df['prediction'] = clf.predict(X)

# Let's also how certain the classifier is
df['prediction_dist'] = clf.decision_function(X)

df.head()

	CCDESC	DO_NARRATIVE	serious	prediction	prediction_dist
2	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-S APPRCHED V AND STATED ARE YOU GOING TO FCK ME V REPLIED NO SUSP PULL ED OUT A KNIFE AND STATED IM HERE TO HURT YOU BTCH S USED PROFANITIES	1	1	0.242907
4	BATTERY - SIMPLE ASSAULT	DO-SUSP USED RIGHT FIST TO PUNCH VICT IN THE HEAD ONCE N PULL VICT HAIR FOR APPRX 15 SECONDS	0	0	-1.045373
9	BATTERY - SIMPLE ASSAULT	DO-S APPROACHED V IN VEH S SLAPPED AND LUNGGED AT V	0	0	-0.266867
11	BATTERY - SIMPLE ASSAULT	DO-V STATED THAT SUSP CONFRT HER WHEN SHE TRIED TO APPR HER HUSBAND SUSP AND V HUSBAND ARE FRNDS SUSP YELLED STAY AWAY FROM HIM AND PUSHED V	0	0	-1.177920
16	BATTERY - SIMPLE ASSAULT	DO-SUSPS WERE VERBALLY ABUSING VICT DURING WHICH TIME S1 STRUCK VICT THREETIMES ON THE BACK OF HIS LEFT SHOULDER	0	0	-0.909114

Crimes with a 1 in serious are serious, and ones with a 1 in downgraded were downgraded. If either of those columns is 1, then prediction would also be 1 for a correct prediction.

We're also going to add how certain the classifier is about its classification. Different classifiers use different metrics, but for a LinearSVC it's called .decision_function. The closer to 0 the score is, the less sure the algorithm is.

Let's evaluate our classifier#

When you build a classifier, you'll talk about your evaluation metric, what you use to judge how well your algorithm performed. Typically this is accuracy - how often was your prediction correct?

How often did our prediction match whether a crime was listed as serious?#

(df.prediction == df.serious).value_counts(normalize=True)

True     0.878836
False    0.121164
dtype: float64

88% doesn't seem that bad!

Remember, though, 15% of the serious crimes have been downgraded. We don't actually care whether the prediction matches if the crime has been downgraded. We need to see whether we correctly predicted reports marked as serious or downgraded reports.

How often did we match the true serious/not serious value?#

Since we're interested in uncovering the secretly-serious reports, we want to see whether it's serious or downgraded.

(df.prediction == (df.serious | df.downgraded)).value_counts(normalize=True)

True     0.891616
False    0.108384
dtype: float64

We actually did better when including the secrets! 89%!

While this seems good, it isn't what we're actually after. We're specifically doing research on finding downgraded reports, so what we're interested in is how often we found reports marked as non-serious that were downgraded from serious.

How often did we catch downgrades?#

# Only select downgraded reports
downgraded_df = df[df.downgraded == 1]

# How often did we predict they were serious?
(downgraded_df.prediction == 1).value_counts(normalize=True)

True     0.649725
False    0.350275
Name: prediction, dtype: float64

# And again, without the percentage
(downgraded_df.prediction == 1).value_counts()

True     4602
False    2481
Name: prediction, dtype: int64

We were able to find around 4,500 of our 7,000 downgraded offenses. That's about 65% of them.

Whether this is good or bad is up for discussion at the end, but let's turn to examining the cases we got wrong.

Where are our errors?#

Taking a look at machetes#

I think that crimes involving machetes are pretty often serious crimes, let's take a look at a few.

df[df.DO_NARRATIVE.str.contains("MACHETE")].head(10)

	CCDESC	DO_NARRATIVE	serious	downgraded	prediction	prediction_dist
2020	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-SUSP SWUNG MACHETE AT VICTIMS 1 AND 2	1	0	1	0.920397
2867	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-SUSPECT APPROACHED VICT WITH A MACHETE IN HIS RIGHT HAND AND STATED IM GONNA CUT YOU UP IM GONNA KILL YOU	1	0	1	0.235707
4807	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-SUSP CONFRONTS VICT SUSP PRODUCES A MACHETE AND STRIKES VICT ON HIS LEFT FOREARM	1	0	1	0.523461
11822	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-S2 ASKED V WHERE YOU FROM S1 PLACED A MACHETE AGAINST V NECK S2 PTD A HANDGUN AT V & PULLED TRIGGER GUN DID NOT FIRE POSSIBLY MARA VIA GANG	1	0	1	1.482832
11951	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-V AND S INVOLVED IN DISPUTE OVER PARKING SPACE S BECAME ANGRY AND BRANDISHED KNIFE IN THREATENING MANNER V IN FEAR BRANDISHED MACHETE AND CALLED PD	1	0	1	0.535158
13478	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-SUSP SWUNG MACHETE AT VICTIMS 1 AND 2	0	1	1	0.920397
13493	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-SUSP HELD MACHETE STATING IM GOING TO KILL YOU	0	1	1	0.622618
18731	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-S CHARGED V W/LARGE MACHETE FOR UNK REASON.	1	0	1	0.377714
19798	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-S1 USED MACHETE TO STRIKE V1 AND ATTEMPTED TO STRIKE VICTS 2 AND 3 WAS PUNCHED AND PUSHED BY UNK S BOTH SUSPS 1 AND 2 MADE VERBAL THREATS TOWARDS VICT	1	0	1	0.269689
20738	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-S APPROACHED V ONE AND V TWO WITH A MACHETE S WIELD THE MACHETE AT VICTIMS V ONE SUSTAINED A LACERATION TO HIS LEFT HAND	1	0	1	0.721254

For those first 10 it looks like we predicted all of these accurately. But what about the machete-related crimes that our algorithm is very certain are non-serious assaults?

df[df.DO_NARRATIVE.str.contains("MACHETE")].sort_values(by='prediction_dist').head(10)

	CCDESC	DO_NARRATIVE	serious	downgraded	prediction_dist
446813	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-V ASKED S TO RETURN MONEY THTAT HE LENT S REFUSED AND GRABBED A MACHETEAND STRUCK V IN THE HAND CAUSING INJURY	1	0	-0.826212
397915	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-SUSP AND VICT BECAME INVOLVED IN A VERBAL DISPUTE SUSP STRUCK VICT WITHMACHETE ON LEFT ARM CAUSING AN APPROX 3 INCH SCRATCH SUSP FLED WB 6TH ST ON FOO	1	0	-0.747439
813468	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-INVL IN A DISPUTE OVER LIVING ARRANGEMENTS S BECAME ANGRY AND RETRIEVEDA MACHETE S ADAVANCE ON V WHO FLED CALL PD	0	1	-0.389622
125167	INTIMATE PARTNER - SIMPLE ASSAULT	DO-SUSP AND VICT IN COHAB DATING RELATIONSHIP SUSP STRIKES VICT LEAVING VERIFIABLE INJURIES SUSP THREATENS VICT WITH MACHETE	0	0	-0.343098
56152	BATTERY - SIMPLE ASSAULT	DO-V AND S HAD A DATING RELATIONSHIP S BECAME ANGRY WHEN HE WAS TOLD THAT V HAD A MISCARRIAGE S HIT V IN HER ARM AND FACE WITH HIS FIST S WAVED A MACHETE	0	0	-0.337210
358217	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-SUSP GRABBED MACHETET AND SWANG IT AT 3 VICT MISSING VICT S STRUCK V1 ON EYE LEFT WITH CLOSED FIST	0	1	-0.325980
336027	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-SUSP GRABBED MACHETET AND SWANG IT AT 3 VICT MISSING VICT S STRUCK V1 ON EYE LEFT WITH CLOSED FIST	0	1	-0.325980
377572	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-SUSP GRABBED MACHETET AND SWANG IT AT 3 VICT MISSING VICT S STRUCK V1 ON EYE LEFT WITH CLOSED FIST	1	0	-0.325980
783976	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-DEFT AND VICT ARE BOYFRIEND AND GIRLFRIEND BUT DO NOT LIVE TOGETHER OR HAVE ANY CHILDREN IN COMMON DEFT STRUCK VICTIM WITH A MACHETE	1	0	-0.264033
250874	INTIMATE PARTNER - SIMPLE ASSAULT	DO-SUS AND VIC INVLVD IN ARGUMENT SUS THREATENED VIC WITH MACHETE PUNCHED VIC AND PULLED VIC HAIR	0	0	-0.259355

Some of them involve a machete being used as a threat, instead of it actually striking the person. Others involve actual violence with the machete, and seem like they should be classified as serious.

Reading through the descriptions carefully, you'll notice that the last description has a little typo in it - MACHETET instead of MACHETE. If we corrected the typo, will the classifier correctly predict it as serious?

sentence = "DO-SUSP GRABBED MACHETET AND SWANG IT AT 3 VICT MISSING VICT S STRUCK V1 ON EYE LEFT WITH CLOSED FIST"
sample_X = vectorizer.transform([sentence])
clf.predict(sample_X)

array([0])

sentence = "DO-SUSP GRABBED MACHETE AND SWANG IT AT 3 VICT MISSING VICT S STRUCK V1 ON EYE LEFT WITH CLOSED FIST"
sample_X = vectorizer.transform([sentence])
clf.predict(sample_X)

array([1])

Yes! The vectorizer we used to count word doesn't know that MACHETET isn't a word, so it counted it as one. It's only because we're aware of spelling errors that we were able to adjust it into MACHETE.

While this isn't the case for all of themes classified pieces - for example, the first three would still be incorrectly predicted as non-serious - a small typo can definitely derail our classifier.

Examining more general misses#

Let's look at a handful of the reports we predicted incorrectly. We'll sort by using the distance measurement to find the ones that the classifier was very sure were non-serious.

predicted_correctly = (df.prediction == (df.serious | df.downgraded))

df[~predicted_correctly].sort_values(by='prediction_dist').head(15)

	CCDESC	DO_NARRATIVE	serious	downgraded	prediction_dist
614781	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO- S PUSHED V AGAINST WALL AND SPIT IN V FACE	1	0	-1.863295
806556	INTIMATE PARTNER - AGGRAVATED ASSAULT	DO-S AND V ARE MARRIED AND HAVE 2 CHILDRE IN COMMON S PUNCHED V OVER MOUTH CAUSING VISIBLE INJURIES	0	1	-1.843700
702413	INTIMATE PARTNER - AGGRAVATED ASSAULT	DO-SUSP AND VICT IN A COHAB RELATIONSHIP SUSP INTOXICATED BECOMES ANGRY AND PUNCHES VICT IN FACE ON ARMS AND HANDS	1	0	-1.734601
820133	INTIMATE PARTNER - AGGRAVATED ASSAULT	DO-SUSP AND VICT ENGAGED IN ARGUEMENT SUSP SPAT ON VICTS FACE AND PUNCHED VICTS RT EYE WITH CLOSED FIST	1	0	-1.710756
587771	INTIMATE PARTNER - AGGRAVATED ASSAULT	DO-SUSP GRABBED VICT BY HER HAIR AND PUNCHED HER WITH CLOSED FIST	1	0	-1.707439
266679	INTIMATE PARTNER - AGGRAVATED ASSAULT	DO- SUSP AND VICT MARRIED AND LIVING TOGETHER SUSP PUSHED VICT CAUSING VISIBLE INJURIES	1	0	-1.691697
736789	INTIMATE PARTNER - AGGRAVATED ASSAULT	DO-S AND V ENGAGED IN ARGUMENT S PUSHED V INTO WALL N GRABBED V BY NECK N TWISTED HER ARM S PUSHED V INTO COUCH PUNCHED V IN NOSE WITH CLOSED FIST	1	0	-1.665152
269172	INTIMATE PARTNER - AGGRAVATED ASSAULT	DO-S PUSHED V ON HER CHEST 1 TIME CAUSING NO VISIBLE INJURY	1	0	-1.641210
800119	INTIMATE PARTNER - AGGRAVATED ASSAULT	DO-SUSP AND VICT ENGAGED IN ARGUEMENT SUSP PUNCHED SCRATCHED SPAT AND BIT VICT CAUSING VISIBLE INJURIES	1	0	-1.625314
735455	INTIMATE PARTNER - AGGRAVATED ASSAULT	DO-SUSP ENGAGED IN ARGUMENT BECAME ANGRY AND RIPPED VICTS SHIRT SCRATCH HER CHEST	1	0	-1.601117
665149	INTIMATE PARTNER - AGGRAVATED ASSAULT	DO-SUSP ENGAGED IN AN ARGUMENT BECAME ANGRY AND RIPPED VICTS SHIRT AND SCRATCH HER CHEST	1	0	-1.571920
210515	INTIMATE PARTNER - AGGRAVATED ASSAULT	DO-SUSP PUNCHED VICT IN FACE AFTER BECOMING UPSET CAUSING VISIBLE BRUSING SUSP AND VICT COHABITANTS	0	1	-1.569717
396583	ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT	DO-SUSP AND VICT ENGAGED IN A VERBAL ARGUMENT SUSP SHOVED VICT AND STABBEDVICT	1	0	-1.564844
566003	INTIMATE PARTNER - AGGRAVATED ASSAULT	DO-SUSP PUNCHED VICT IN FACE WITH CLOSED FIST THEN PUSHED VICT INTO WALL	0	1	-1.560422
748258	INTIMATE PARTNER - AGGRAVATED ASSAULT	DO-VICT AND SUSP INVOLVED IN VERBAL DISPUTE SUSP THEN SLAPPED AND PUNCHED VICT	1	0	-1.558839

While the most important words for the classifier were weapons - baseball bats, knives, guns - causing visible injury through punching can also be classified as aggravated assault. It looks like the classifier doesn't agree.

punch_df = df[df.DO_NARRATIVE.str.contains("PUNCH")]

print("Predicted")
print(punch_df.prediction.value_counts(normalize=True))

print("Actual")
print((punch_df.serious | punch_df.downgraded).value_counts(normalize=True))

Predicted
0    0.947689
1    0.052311
Name: prediction, dtype: float64
Actual
0    0.87224
1    0.12776
dtype: float64

While the LAPD classifies assault as aggravated 13% of the time if punching is involved, the algorithm only classifies it as aggravated 5% of the time.

Now we can look at the opposite - cases where our predictor was certain something was a serious assault, but it was filed as a simple assault.

df[~predicted_correctly].sort_values(by='prediction_dist', ascending=False).head(n=20)

	CCDESC	DO_NARRATIVE	prediction	prediction_dist
549646	INTIMATE PARTNER - SIMPLE ASSAULT	DO-SUSP STABBED VICT WITH GLASS BOTTLE	1	2.248362
509809	BATTERY - SIMPLE ASSAULT	DO-S ATT TO STAB V WITH A KNIFE	1	2.016210
228905	INTIMATE PARTNER - SIMPLE ASSAULT	DO- V AND S WERE INVOLV IN VERBL ARGUMENT S THEN HIT V ON THE HEAD WITH A PAN AND ATT TO STAB V WITH KNIFE S CUT V WITH KNIFE	1	2.015522
695489	INTIMATE PARTNER - SIMPLE ASSAULT	DO-SUSP STABBED VICT	1	1.884723
131514	INTIMATE PARTNER - SIMPLE ASSAULT	DO-SUSP STABBED VICT	1	1.884723
57764	INTIMATE PARTNER - SIMPLE ASSAULT	DO-SUSP STABBED V	1	1.884723
326830	INTIMATE PARTNER - SIMPLE ASSAULT	DO-SUSP BECAME ANGRY AT VICT SUSP STABBED VICT WITH A KNIFE	1	1.851642
52396	BATTERY - SIMPLE ASSAULT	DO-SUSP AND VICT IN VERBAL DISPUTE SUSP PRODUCED A KNIFE AND STABBED VICT	1	1.750208
118342	INTIMATE PARTNER - SIMPLE ASSAULT	DO- S AND V IN VERBQAL ARGUMENT S CHARGED V WITH KNIFE S CUTS STABS VICT	1	1.650043
789560	INTIMATE PARTNER - SIMPLE ASSAULT	DO-S APP V WITH KITCHEN KNIFE S ATT TO STAB V S CUT V HANDS AS V BLOCKEDKNIFE	1	1.647613
487104	INTIMATE PARTNER - SIMPLE ASSAULT	DO-S APP V WITH KITCHEN KNIFE S ATT TO STAB V S CUT V HANDS AS V BLOCKEDKNIFE	1	1.647613
49439	INTIMATE PARTNER - SIMPLE ASSAULT	DO- SUSP ANGRY STRUCK VICT WITH CANE AND STICK S ATTEMP TO STAB VICT WITH KITCHEN KNIFE	1	1.631264
97504	INTIMATE PARTNER - SIMPLE ASSAULT	DO-S BROKE BOTTLE ON VICTS HEAD V ATTEMPTED TO LEAVE WHEN S STABBED THE VICT WITH THE BROKEN GLASS BOTTLE	1	1.619978
753041	INTIMATE PARTNER - SIMPLE ASSAULT	DO-SUSP AND VICT INVOLVED IN VERBAL DISPUTE SUSP THEN STABBED VICT WITH KNIFE	1	1.601037
112222	INTIMATE PARTNER - SIMPLE ASSAULT	DO-S GRABBED KNIFE AND LUNGED AT VICT ATT TO STAB	1	1.483512
749619	INTIMATE PARTNER - SIMPLE ASSAULT	DO-S ANGRY AT V S SLASHED V W BUTCHER KNIFE CAUSING LACERATIONS	1	1.412755
715246	BATTERY - SIMPLE ASSAULT	DO-S PUSHED V2 THEN HIT V1 OVER THE HEAD WITH A GLASS BEER BOTTLE S INTENTIONALLY COLLIDED HIS VEH INTO VICTS VEHICLE	1	1.361179
772124	INTIMATE PARTNER - SIMPLE ASSAULT	DO-S STRUCK V WITH GAS CAN S STAB HER WITH A SOCKET RENCH AND S SWUNG HAMMER AT HER	1	1.341561
619805	INTIMATE PARTNER - SIMPLE ASSAULT	DO-SUSP BECAME IRATE BIT VICT AND ATTEMPTED TO STAB VICT WITH KNIFE	1	1.282413
687964	INTIMATE PARTNER - SIMPLE ASSAULT	DO-S ATTEMPTED TO STAB V WITH A TEN INCHES KNIFE	1	1.265541

Again, mostly reports of domestic violence. This time they often involve stabbing and knives, words that we've already seem are high triggers for a report to be marked as aggravated.

punch_df = df[df.DO_NARRATIVE.str.contains("KNIF")]

print("Predicted")
print(punch_df.prediction.value_counts(normalize=True))

print("Actual")
print((punch_df.serious | punch_df.downgraded).value_counts(normalize=True))

Predicted
1    0.809238
0    0.190762
Name: prediction, dtype: float64
Actual
1    0.863279
0    0.136721
dtype: float64

Oddly, though, it looks like the classifier tends to under-report stabbings, not over-report. Although 86% of reports involving the word "STAB" are marked as aggravated, the classifier only marks them as serious 81% of the time.

It appears that domestic abuse cases might be an especially problematic topic in terms of classification, with more going into the situation than just a clear-cut definition of assault categories.

Review#

We reproduced an ersatz version of a Los Angeles Times piece where they uncovered serious assaults that had been downgraded by the LAPD to simple assault. We don't have access to the original classifications, so we used a dataset of assaults between 2008 and 2012 and downgraded a random 15% of the serious assaults.

Using text analysis, we first analyzed the words used in a description of assault - less common words were given more weight, and incredibly common words were left out altogether. Using these results, we then created a classifier, teaching the classifier which words were associated with simple assault compared to aggravated assault.

Finally, we used the classifier to predict whether each assault was aggravated or simple assault. If a crime was predicted as serious but marked as non-serious, it needed to be examined as a possible downgrade. Our algorithm correctly pointed out around 65% of the randomly downgraded crimes.

Discussion topics#

Our algorithm had 88% accuracy overall, but only 65% in detecting downgraded crimes. What's the difference here? How important is one score compared to the other?
We only hit around 65% accuracy in finding downgraded crimes. Is this a useful score? How does it compare to random guessing, or going one-by-one through the crimes marked as non-serious?
What techniques could we have used to find downgraded crimes if we didn't use machine learning?
Is there a difference between looking at the prediction - the 0 or 1 - and looking at the output of decision_function?
What happens if our algorithm errs on the side of calling non-serious crimes serious crimes? What if it errs on the side of calling serious crimes non-serious crimes?
If we want to find more downgraded cases (but do more work), we'll want to err on the side of examining more potentially-serious cases. Is there a better method than picking random cases?
One of our first steps was to eliminate all crimes that weren't assaults. How do you think this helped or hindered our analysis?
Why did we use LinearSVC instead of another classifier such as LogisticRegression, RandomForest or Naive Bayes (MultinomialNB)? Why might we try or not try those?
You don't work for the LAPD, so you can only be so sure what should and shouldn't be a serious crime. What can you do to help feel confident that a case should be one or the other, or that our algorithm is working as promised?
In this case, we randomly picked serious crimes to downgrade. Would it be easier or more difficult if the LAPD was systematically downgrading certain types of serious crimes? Can you think of a way to around that sort of trickery?
Many people say you need to release your data and analysis in order to have people trust what you've done. With something like this dataset, however, you're dealing with real things that happened to real people, many of whom would probably prefer to keep these things private. Is that a reasonable expectation? If it is, what can be done to bridge the gap between releasing all of the original data and keeping our process secret?