5 Teaching computers to read
Last time we did text analysis, we picked a custom list of words that, if found, might imply sexual assault.
We could do the same thing here, trying to find crimes with especially violent words that were classified as Part II “simple” assault. That’s actually exactly how The LA Times did their original research! I n their published piece they say:
Reporters searched the summaries for terms such as “stab” and “knife” to flag incidents that might meet the FBI criteria for serious offenses. They then read thousands of the summaries, which are typically two or three sentences long. They also reviewed court and police records for dozens of cases.
The problem with this is that we would have to guess useful words - like “stab,” “knife,” “gun,” “shot” - and then read through all of the results that come up. But what about other situations, ones that might be less clear-cut to non-experts, or assaults that involved less nontraditional weapons?
For example, machetes make appearances in plenty of aggravated assaults:
CCDESC | DO_NARRATIVE | is_part_i | reported | |
---|---|---|---|---|
1469 | ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT | DO-SUSP STRUCK VICTS NOSE W/MACHETE | 1 | 0 |
1752 | ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT | DO-SUSP HELD MACHETE HANDLE AT SHOULDER HEIGHT AND CHARGED AT VICTIM CLOSING THE DISTANCE | 1 | 1 |
2339 | INTIMATE PARTNER - SIMPLE ASSAULT | DO-S ENGAGED V IN A VERBAL ARGUMENT S CHOKED THE V UNTIL SHE LOST CONSCIOUSNESS ONCE V REGAINED CONSCIOUSNESS THE S PUT A MACHETE TO THE V NECK | 0 | 0 |
2416 | ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT | DO-SUSP CONCEALED VICT AND SWUNG A MACHETE A VICT SUSP CHASED VICT UNTIL VICT FLAGGED DOWN PD SUSP WAS ARRESTED FOR ADW | 1 | 0 |
2805 | ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT | DO-SUSP AND VICT WERE WERE INVOLVED IN ARGUMENT SUSP TOLD VICT HE WAS GOING TO KILL HIM AND STRUCK HIM W A MACHETE | 1 | 1 |
One thing we could do is talk to experts, and see what words might be useful to search for. We could also read many many many narratives, and eventually learn some of the more obscure weapons and techniques that might make something more serious Part I offense. Once we had our more complete list, we could then search the documents those words and review the cases.
Both of those sound like a lot of work - this is what machine learning was invented for!
Instead of reading thousands of narratives and learning what words are important, we could just have the computer read the narratives for us. The computer can go case-by-case, reading documents, finding the words for each of them, and then figure out which words are more likely to imply a Part I vs. Part II crime.
When we start, the computer won’t know the difference between “stab” and “punch.” After some training, though, it will actually notice that “stab” appears more often with aggravated assaults, while “punch” is typically for simple assaults. Once we’ve told it to read enough cases, we can give the computer a description of a crime it’ll be able to guess which type the crime should be classified as!