3 Reading in and preparing our data
While the LA Times analyzed all crimes between October 2012 and September 2013 (and later 2005-2012), we’re going to simplify things a bit and only look at assaults in 2012.
from tabulate import tabulate
import pandas as pd
pd.set_option("display.max_columns", 20)
pd.set_option("display.max_colwidth", 200)
df = pd.read_csv("data/2012_assaults.csv")
df.shape
## (31452, 2)
Overall it will be about 39 thousand cases to analyze, which isn’t too bad! The data itself isn’t too crazy, either:
CCDESC | DO_NARRATIVE |
---|---|
ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT | DO-SUSPS PULL UP NEXT TO VICT IN VEH SUSP1 SUSP2 SUSP3 EXIT VEH RUSH VICTSUSP1 PRODUCED FOLDING KNIFE AND STABBED VICT IN STOMACH SUSPS FLEE IN VEH |
INTIMATE PARTNER - SIMPLE ASSAULT | DO-VICT AND SUSP HAVE 2 CHILDREN IN COMMON BOTH INV IN A VEBAL ARGUMENT SUSP BECOMES IRATE AND HITS VICT |
Our dataset has a lot of columns in it, including the date and time of the crime, some classification codes, as well as a brief description. It’s these last few categories that we’ll be most interested in, so we’ll remove the columns we don’t need to keep things a little cleaner.
CCDESC | DO_NARRATIVE |
---|---|
ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT | DO-SUSPS PULL UP NEXT TO VICT IN VEH SUSP1 SUSP2 SUSP3 EXIT VEH RUSH VICTSUSP1 PRODUCED FOLDING KNIFE AND STABBED VICT IN STOMACH SUSPS FLEE IN VEH |
INTIMATE PARTNER - SIMPLE ASSAULT | DO-VICT AND SUSP HAVE 2 CHILDREN IN COMMON BOTH INV IN A VEBAL ARGUMENT SUSP BECOMES IRATE AND HITS VICT |
INTIMATE PARTNER - SIMPLE ASSAULT | DO-SUSP PUSHED THE VICT AND SPANKED VICTIM APPROX THREE TIMES NOT CAUSING VISIBLE INJURY |
BATTERY - SIMPLE ASSAULT | DO-S1 V1 HAVE AND ALTERCATION OVER MONEY S1 BECAME ANGRY WITH V1 FOR NOT GIVING HER MONEY S1 THEN GOT ON TOP OF V1 AND ATTEMP TO WRESTLE THE MONEY AWAY |
BATTERY - SIMPLE ASSAULT | DO-S WAS VERBALLY CONFRONTED BY V WHO WAS ACCROSS THE STREET AFTER S DOG DEFACATED S APPROACHED V AND HIT PR HAND |