4.1 Designing our features

Let’s take a look at what the airbag issue is, according Consumer Reports:

Vehicles made by 19 different automakers have been recalled to replace frontal airbags on the driver’s side or passenger’s side, or both in what NHTSA has called “the largest and most complex safety recall in U.S. history.” The airbags, made by major parts supplier Takata, were mostly installed in cars from model year 2002 through 2015. Some of those airbags could deploy explosively, injuring or even killing car occupants.

At the heart of the problem is the airbag’s inflator, a metal cartridge loaded with propellant wafers, which in some cases has ignited with explosive force. If the inflator housing ruptures in a crash, metal shards from the airbag can be sprayed throughout the passenger cabin—a potentially disastrous outcome from a supposedly life-saving device.

If we’re going through a list of vehicle complaints, it isn’t too hard for us to figure out which complaints we might want to investigate further. If the complaint’s about seatbelts or rear-view mirrors, we probably don’t care about it. If the word “airbag” shows up in the description, though, we’re going to start paying attention.

We aren’t interested in all complaints with the word “airbag,” though. Since we’re worried about exploding airbags, something like “the airbag did not deploy” would get our attention because of the word “airbag,” but then we could ignore it once we saw the airbag just didn’t work.

4.1.1 Selecting our features

Since we just read a long long list of airbag complaints, we can probably brainstorm some words or phrases that might make a comment interesting or not interesting. A quick start might be these few:

  • airbag
  • air bag
  • failed
  • did not deploy
  • violent
  • explode
  • shrapnel

These features are the things that the machine learning algorithm is going to pay attention to. For each comment, we’ll say either YES it contains this word or NO it does not contain this word. There are lots of words in each complaint, but these are the only ones it will care about!