3 Reading in and preparing our data

While the LA Times analyzed all crimes between October 2012 and September 2013 (and later 2005-2012), we’re going to simplify things a bit and only look at assaults in 2012.

from tabulate import tabulate
import pandas as pd
pd.set_option("display.max_columns", 20)
pd.set_option("display.max_colwidth", 200)

df = pd.read_csv("data/2012_assaults.csv")
df.shape
## (31452, 2)

Overall it will be about 39 thousand cases to analyze, which isn’t too bad! The data itself isn’t too crazy, either:

df.head(2)
CCDESC DO_NARRATIVE
ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT DO-SUSPS PULL UP NEXT TO VICT IN VEH SUSP1 SUSP2 SUSP3 EXIT VEH RUSH VICTSUSP1 PRODUCED FOLDING KNIFE AND STABBED VICT IN STOMACH SUSPS FLEE IN VEH
INTIMATE PARTNER - SIMPLE ASSAULT DO-VICT AND SUSP HAVE 2 CHILDREN IN COMMON BOTH INV IN A VEBAL ARGUMENT SUSP BECOMES IRATE AND HITS VICT

Our dataset has a lot of columns in it, including the date and time of the crime, some classification codes, as well as a brief description. It’s these last few categories that we’ll be most interested in, so we’ll remove the columns we don’t need to keep things a little cleaner.

df = df[['CCDESC','DO_NARRATIVE']].copy()
df.head()
CCDESC DO_NARRATIVE
ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT DO-SUSPS PULL UP NEXT TO VICT IN VEH SUSP1 SUSP2 SUSP3 EXIT VEH RUSH VICTSUSP1 PRODUCED FOLDING KNIFE AND STABBED VICT IN STOMACH SUSPS FLEE IN VEH
INTIMATE PARTNER - SIMPLE ASSAULT DO-VICT AND SUSP HAVE 2 CHILDREN IN COMMON BOTH INV IN A VEBAL ARGUMENT SUSP BECOMES IRATE AND HITS VICT
INTIMATE PARTNER - SIMPLE ASSAULT DO-SUSP PUSHED THE VICT AND SPANKED VICTIM APPROX THREE TIMES NOT CAUSING VISIBLE INJURY
BATTERY - SIMPLE ASSAULT DO-S1 V1 HAVE AND ALTERCATION OVER MONEY S1 BECAME ANGRY WITH V1 FOR NOT GIVING HER MONEY S1 THEN GOT ON TOP OF V1 AND ATTEMP TO WRESTLE THE MONEY AWAY
BATTERY - SIMPLE ASSAULT DO-S WAS VERBALLY CONFRONTED BY V WHO WAS ACCROSS THE STREET AFTER S DOG DEFACATED S APPROACHED V AND HIT PR HAND