Analyzing whether larger cars cause more deadly crashes

Reproducing a research paper on the impact of weight on car accidents, along with a look at a state-based car crash database.

logistic regression feature engineering confidence intervals seaborn

Summary

This chapter reproduces an academic study regarding car weight and fatalities in car crashes. While it isn't an actual piece of journalism, it's a complicated pieces of data finding, cleaning, and combining, with many decisions made along the way.

Notebooks, Assignments, and Walkthroughs

Feature selection and engineering

Use car crash data from the state of Maryland to learn about feature engineering and feature selection (with a logistic regression classifier).

Combine Excel files across multiple sheets and save as CSV files

Open a folder full of Excel files from the Maryland DOT, then extract and combine the data into a series of CSV files.

Create make model weights csv

Before we can analyze our data, we'll need to combine vehicle weights with makes and models, as well as clean up the results a bit.

Find car data from VINs

By using a car's unique VIN identifier, we can use a government database to easily track down a car's make, model and year.

Combine VINs and weights

A simple bit of data wrangling.

Clean combine and filter data

After combining from so many sources, we need to filter out the car crashes we're interested in. We're curious about 2-car accidents that happen between light vehicles.