{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# A final cleaning, combining, and filtering of our vehicle data\n",
                "\n",
                "After combining from so many sources, we need to filter out the car crashes we're interested in. We're curious about 2-car accidents that happen between light vehicles (no tractor-trailers)."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<p class=\"reading-options\">\n  <a class=\"btn\" href=\"/car-crashes-weight-regression/05-clean-combine-and-filter-data\">\n    <i class=\"fa fa-sm fa-book\"></i>\n    Read online\n  </a>\n  <a class=\"btn\" href=\"/car-crashes-weight-regression/notebooks/05 - Clean combine and filter data.ipynb\">\n    <i class=\"fa fa-sm fa-download\"></i>\n    Download notebook\n  </a>\n  <a class=\"btn\" href=\"https://colab.research.google.com/github/littlecolumns/ds4j-notebooks/blob/master/car-crashes-weight-regression/notebooks/05 - Clean combine and filter data.ipynb\" target=\"_new\">\n    <i class=\"fa fa-sm fa-laptop\"></i>\n    Interactive version\n  </a>\n</p>"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 61,
            "metadata": {},
            "outputs": [],
            "source": [
                "import pandas as pd\n",
                "pd.set_option(\"display.max_columns\", 100)\n",
                "pd.set_option(\"display.max_rows\", 100)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# Read in our data"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## The vehicles involved in the crash"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 49,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>AREA_DAMAGED_CODE1</th>\n",
                            "      <th>AREA_DAMAGED_CODE2</th>\n",
                            "      <th>AREA_DAMAGED_CODE3</th>\n",
                            "      <th>AREA_DAMAGED_CODE_IMP1</th>\n",
                            "      <th>AREA_DAMAGED_CODE_MAIN</th>\n",
                            "      <th>BODY_TYPE_CODE</th>\n",
                            "      <th>COMMERCIAL_FLAG</th>\n",
                            "      <th>CONTI_DIRECTION_CODE</th>\n",
                            "      <th>CV_BODY_TYPE_CODE</th>\n",
                            "      <th>DAMAGE_CODE</th>\n",
                            "      <th>DRIVERLESS_FLAG</th>\n",
                            "      <th>FIRE_FLAG</th>\n",
                            "      <th>GOING_DIRECTION_CODE</th>\n",
                            "      <th>GVW_CODE</th>\n",
                            "      <th>HARM_EVENT_CODE</th>\n",
                            "      <th>HAZMAT_SPILL_FLAG</th>\n",
                            "      <th>HIT_AND_RUN_FLAG</th>\n",
                            "      <th>HZM_NUM</th>\n",
                            "      <th>MOVEMENT_CODE</th>\n",
                            "      <th>NUM_AXLES</th>\n",
                            "      <th>PARKED_FLAG</th>\n",
                            "      <th>REPORT_NO</th>\n",
                            "      <th>SPEED_LIMIT</th>\n",
                            "      <th>TOWED_AWAY_FLAG</th>\n",
                            "      <th>TOWED_VEHICLE_CONFIG_CODE</th>\n",
                            "      <th>VEHICLE_ID</th>\n",
                            "      <th>VEH_MAKE</th>\n",
                            "      <th>VEH_MODEL</th>\n",
                            "      <th>VEH_YEAR</th>\n",
                            "      <th>VIN_NO</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>8.0</td>\n",
                            "      <td>9.0</td>\n",
                            "      <td>10.0</td>\n",
                            "      <td>10.0</td>\n",
                            "      <td>10.0</td>\n",
                            "      <td>23.08</td>\n",
                            "      <td>N</td>\n",
                            "      <td>E</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>5</td>\n",
                            "      <td>N</td>\n",
                            "      <td>N</td>\n",
                            "      <td>E</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>9.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>N</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>N</td>\n",
                            "      <td>ADJ487004H</td>\n",
                            "      <td>30</td>\n",
                            "      <td>Y</td>\n",
                            "      <td>0</td>\n",
                            "      <td>000238fd-44fa-4cd5-8eb7-41ab30500bec</td>\n",
                            "      <td>CHEVY</td>\n",
                            "      <td>TAHOE</td>\n",
                            "      <td>2005.0</td>\n",
                            "      <td>1GNEK13Q2J285593</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>12.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>12.0</td>\n",
                            "      <td>12.0</td>\n",
                            "      <td>2.00</td>\n",
                            "      <td>N</td>\n",
                            "      <td>E</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>4</td>\n",
                            "      <td>N</td>\n",
                            "      <td>N</td>\n",
                            "      <td>E</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>N</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>N</td>\n",
                            "      <td>MCP2487000M</td>\n",
                            "      <td>40</td>\n",
                            "      <td>Y</td>\n",
                            "      <td>0</td>\n",
                            "      <td>00038116-1bf9-48cc-b317-4f4375d14b60</td>\n",
                            "      <td>INFI</td>\n",
                            "      <td>4S</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>JNKCV51E63M013580</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   AREA_DAMAGED_CODE1  AREA_DAMAGED_CODE2  AREA_DAMAGED_CODE3  \\\n",
                            "0                 8.0                 9.0                10.0   \n",
                            "1                12.0                 NaN                 NaN   \n",
                            "\n",
                            "   AREA_DAMAGED_CODE_IMP1  AREA_DAMAGED_CODE_MAIN  BODY_TYPE_CODE  \\\n",
                            "0                    10.0                    10.0           23.08   \n",
                            "1                    12.0                    12.0            2.00   \n",
                            "\n",
                            "  COMMERCIAL_FLAG CONTI_DIRECTION_CODE  CV_BODY_TYPE_CODE  DAMAGE_CODE  \\\n",
                            "0               N                    E                NaN            5   \n",
                            "1               N                    E                NaN            4   \n",
                            "\n",
                            "  DRIVERLESS_FLAG FIRE_FLAG GOING_DIRECTION_CODE  GVW_CODE  HARM_EVENT_CODE  \\\n",
                            "0               N         N                    E       NaN              9.0   \n",
                            "1               N         N                    E       NaN              1.0   \n",
                            "\n",
                            "  HAZMAT_SPILL_FLAG HIT_AND_RUN_FLAG HZM_NUM  MOVEMENT_CODE  NUM_AXLES  \\\n",
                            "0               NaN                N     NaN            1.0        NaN   \n",
                            "1               NaN                N     NaN            1.0        NaN   \n",
                            "\n",
                            "  PARKED_FLAG    REPORT_NO  SPEED_LIMIT TOWED_AWAY_FLAG  \\\n",
                            "0           N   ADJ487004H           30               Y   \n",
                            "1           N  MCP2487000M           40               Y   \n",
                            "\n",
                            "   TOWED_VEHICLE_CONFIG_CODE                            VEHICLE_ID VEH_MAKE  \\\n",
                            "0                          0  000238fd-44fa-4cd5-8eb7-41ab30500bec    CHEVY   \n",
                            "1                          0  00038116-1bf9-48cc-b317-4f4375d14b60     INFI   \n",
                            "\n",
                            "  VEH_MODEL  VEH_YEAR             VIN_NO  \n",
                            "0     TAHOE    2005.0   1GNEK13Q2J285593  \n",
                            "1        4S    2003.0  JNKCV51E63M013580  "
                        ]
                    },
                    "execution_count": 49,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "vehicles = pd.read_csv(\"data/cleaned/combined-vehicle-data.csv\")\n",
                "vehicles.head(2)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "What are we interested in for this dataset? We probably don't need all those columns!"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 50,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>REPORT_NO</th>\n",
                            "      <th>SPEED_LIMIT</th>\n",
                            "      <th>VEH_MAKE</th>\n",
                            "      <th>VEH_MODEL</th>\n",
                            "      <th>VEH_YEAR</th>\n",
                            "      <th>VIN_NO</th>\n",
                            "      <th>VEHICLE_ID</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>ADJ487004H</td>\n",
                            "      <td>30</td>\n",
                            "      <td>CHEVY</td>\n",
                            "      <td>TAHOE</td>\n",
                            "      <td>2005.0</td>\n",
                            "      <td>1GNEK13Q2J285593</td>\n",
                            "      <td>000238fd-44fa-4cd5-8eb7-41ab30500bec</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>MCP2487000M</td>\n",
                            "      <td>40</td>\n",
                            "      <td>INFI</td>\n",
                            "      <td>4S</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>JNKCV51E63M013580</td>\n",
                            "      <td>00038116-1bf9-48cc-b317-4f4375d14b60</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>CB5190006B</td>\n",
                            "      <td>55</td>\n",
                            "      <td>TOYT</td>\n",
                            "      <td>TK</td>\n",
                            "      <td>2011.0</td>\n",
                            "      <td>5TFUY5F1XBX167340</td>\n",
                            "      <td>0003b659-2785-4868-8877-0b786a284827</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>ADJ4590035</td>\n",
                            "      <td>5</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>CIVIC</td>\n",
                            "      <td>2015.0</td>\n",
                            "      <td>2HGFG4A59FH702545</td>\n",
                            "      <td>00050484-d08f-4b6e-bc7e-9ec270e94660</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>ADJ849000Z</td>\n",
                            "      <td>10</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>ACCORD</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>1HGCM66313A037175</td>\n",
                            "      <td>00057af4-d848-4cee-b854-707f57581f4e</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "     REPORT_NO  SPEED_LIMIT VEH_MAKE VEH_MODEL  VEH_YEAR             VIN_NO  \\\n",
                            "0   ADJ487004H           30    CHEVY     TAHOE    2005.0   1GNEK13Q2J285593   \n",
                            "1  MCP2487000M           40     INFI        4S    2003.0  JNKCV51E63M013580   \n",
                            "2   CB5190006B           55     TOYT        TK    2011.0  5TFUY5F1XBX167340   \n",
                            "3   ADJ4590035            5    HONDA     CIVIC    2015.0  2HGFG4A59FH702545   \n",
                            "4   ADJ849000Z           10    HONDA    ACCORD    2003.0  1HGCM66313A037175   \n",
                            "\n",
                            "                             VEHICLE_ID  \n",
                            "0  000238fd-44fa-4cd5-8eb7-41ab30500bec  \n",
                            "1  00038116-1bf9-48cc-b317-4f4375d14b60  \n",
                            "2  0003b659-2785-4868-8877-0b786a284827  \n",
                            "3  00050484-d08f-4b6e-bc7e-9ec270e94660  \n",
                            "4  00057af4-d848-4cee-b854-707f57581f4e  "
                        ]
                    },
                    "execution_count": 50,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "vehicles = vehicles[['REPORT_NO', 'SPEED_LIMIT', 'VEH_MAKE', 'VEH_MODEL', 'VEH_YEAR', 'VIN_NO', 'VEHICLE_ID']]\n",
                "vehicles.head()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Details about the crash itself"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 51,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>ACC_DATE</th>\n",
                            "      <th>ACC_TIME</th>\n",
                            "      <th>AGENCY_CODE</th>\n",
                            "      <th>AREA_CODE</th>\n",
                            "      <th>COLLISION_TYPE_CODE</th>\n",
                            "      <th>COUNTY_NO</th>\n",
                            "      <th>C_M_ZONE_FLAG</th>\n",
                            "      <th>DISTANCE</th>\n",
                            "      <th>DISTANCE_DIR_FLAG</th>\n",
                            "      <th>FEET_MILES_FLAG</th>\n",
                            "      <th>FIX_OBJ_CODE</th>\n",
                            "      <th>HARM_EVENT_CODE1</th>\n",
                            "      <th>HARM_EVENT_CODE2</th>\n",
                            "      <th>JUNCTION_CODE</th>\n",
                            "      <th>LANE_CODE</th>\n",
                            "      <th>LATITUDE</th>\n",
                            "      <th>LIGHT_CODE</th>\n",
                            "      <th>LOC_CODE</th>\n",
                            "      <th>LOGMILE_DIR_FLAG</th>\n",
                            "      <th>LOG_MILE</th>\n",
                            "      <th>LONGITUDE</th>\n",
                            "      <th>MAINROAD_NAME</th>\n",
                            "      <th>MUNI_CODE</th>\n",
                            "      <th>RD_COND_CODE</th>\n",
                            "      <th>RD_DIV_CODE</th>\n",
                            "      <th>REFERENCE_NO</th>\n",
                            "      <th>REFERENCE_ROAD_NAME</th>\n",
                            "      <th>REFERENCE_SUFFIX</th>\n",
                            "      <th>REFERENCE_TYPE_CODE</th>\n",
                            "      <th>REPORT_NO</th>\n",
                            "      <th>REPORT_TYPE</th>\n",
                            "      <th>ROUTE_TYPE_CODE</th>\n",
                            "      <th>RTE_NO</th>\n",
                            "      <th>RTE_SUFFIX</th>\n",
                            "      <th>SIGNAL_FLAG</th>\n",
                            "      <th>SURF_COND_CODE</th>\n",
                            "      <th>WEATHER_CODE</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>2018-04-10 00:00:00</td>\n",
                            "      <td>01:50:00</td>\n",
                            "      <td>MSP</td>\n",
                            "      <td>UNK</td>\n",
                            "      <td>17</td>\n",
                            "      <td>18.0</td>\n",
                            "      <td>N</td>\n",
                            "      <td>0.0</td>\n",
                            "      <td>N</td>\n",
                            "      <td>F</td>\n",
                            "      <td>22.03</td>\n",
                            "      <td>16.0</td>\n",
                            "      <td>11.0</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>38.277230</td>\n",
                            "      <td>4.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>N</td>\n",
                            "      <td>2.09</td>\n",
                            "      <td>-76.682876</td>\n",
                            "      <td>NEWTOWNE NECK RD</td>\n",
                            "      <td>0.0</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>166.0</td>\n",
                            "      <td>ROSEBANK RD</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>CO</td>\n",
                            "      <td>MSP6188002Q</td>\n",
                            "      <td>Injury Crash</td>\n",
                            "      <td>MD</td>\n",
                            "      <td>243.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>N</td>\n",
                            "      <td>2.0</td>\n",
                            "      <td>6.01</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>2018-05-02 00:00:00</td>\n",
                            "      <td>11:06:00</td>\n",
                            "      <td>ELKTON</td>\n",
                            "      <td>UNK</td>\n",
                            "      <td>5</td>\n",
                            "      <td>7.0</td>\n",
                            "      <td>N</td>\n",
                            "      <td>15.0</td>\n",
                            "      <td>S</td>\n",
                            "      <td>F</td>\n",
                            "      <td>0.00</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>3.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>39.613747</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>N</td>\n",
                            "      <td>19.41</td>\n",
                            "      <td>-75.837454</td>\n",
                            "      <td>N BRIDGE ST</td>\n",
                            "      <td>52.0</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>3.0</td>\n",
                            "      <td>362.0</td>\n",
                            "      <td>LAUREL DR</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>MU</td>\n",
                            "      <td>BK0227001M</td>\n",
                            "      <td>Injury Crash</td>\n",
                            "      <td>MD</td>\n",
                            "      <td>213.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>N</td>\n",
                            "      <td>2.0</td>\n",
                            "      <td>6.01</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "              ACC_DATE  ACC_TIME AGENCY_CODE AREA_CODE  COLLISION_TYPE_CODE  \\\n",
                            "0  2018-04-10 00:00:00  01:50:00         MSP       UNK                   17   \n",
                            "1  2018-05-02 00:00:00  11:06:00      ELKTON       UNK                    5   \n",
                            "\n",
                            "   COUNTY_NO C_M_ZONE_FLAG  DISTANCE DISTANCE_DIR_FLAG FEET_MILES_FLAG  \\\n",
                            "0       18.0             N       0.0                 N               F   \n",
                            "1        7.0             N      15.0                 S               F   \n",
                            "\n",
                            "   FIX_OBJ_CODE  HARM_EVENT_CODE1  HARM_EVENT_CODE2  JUNCTION_CODE  LANE_CODE  \\\n",
                            "0         22.03              16.0              11.0            1.0        NaN   \n",
                            "1          0.00               1.0               1.0            3.0        NaN   \n",
                            "\n",
                            "    LATITUDE  LIGHT_CODE LOC_CODE LOGMILE_DIR_FLAG  LOG_MILE  LONGITUDE  \\\n",
                            "0  38.277230         4.0      NaN                N      2.09 -76.682876   \n",
                            "1  39.613747         1.0      NaN                N     19.41 -75.837454   \n",
                            "\n",
                            "      MAINROAD_NAME  MUNI_CODE  RD_COND_CODE  RD_DIV_CODE  REFERENCE_NO  \\\n",
                            "0  NEWTOWNE NECK RD        0.0           1.0          1.0         166.0   \n",
                            "1       N BRIDGE ST       52.0           1.0          3.0         362.0   \n",
                            "\n",
                            "  REFERENCE_ROAD_NAME REFERENCE_SUFFIX REFERENCE_TYPE_CODE    REPORT_NO  \\\n",
                            "0         ROSEBANK RD              NaN                  CO  MSP6188002Q   \n",
                            "1           LAUREL DR              NaN                  MU   BK0227001M   \n",
                            "\n",
                            "    REPORT_TYPE ROUTE_TYPE_CODE  RTE_NO RTE_SUFFIX SIGNAL_FLAG  \\\n",
                            "0  Injury Crash              MD   243.0        NaN           N   \n",
                            "1  Injury Crash              MD   213.0        NaN           N   \n",
                            "\n",
                            "   SURF_COND_CODE  WEATHER_CODE  \n",
                            "0             2.0          6.01  \n",
                            "1             2.0          6.01  "
                        ]
                    },
                    "execution_count": 51,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "crashes = pd.read_csv(\"data/cleaned/combined-crash-data.csv\")\n",
                "crashes.head(2)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Do we need all of those fields? Again - probably not! What fields are we interested in?"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 52,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>REPORT_NO</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>MSP6188002Q</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>BK0227001M</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>ZU8005001W</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>MCP2891005S</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>MSP6743004Q</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "     REPORT_NO\n",
                            "0  MSP6188002Q\n",
                            "1   BK0227001M\n",
                            "2   ZU8005001W\n",
                            "3  MCP2891005S\n",
                            "4  MSP6743004Q"
                        ]
                    },
                    "execution_count": 52,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "crashes = crashes[['REPORT_NO']]\n",
                "crashes.head()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Weights of the cars"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 55,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>VIN</th>\n",
                            "      <th>Make</th>\n",
                            "      <th>Model</th>\n",
                            "      <th>ModelYear</th>\n",
                            "      <th>weight</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>2FMDA5143TBB45576</td>\n",
                            "      <td>FORD</td>\n",
                            "      <td>WINDSTAR</td>\n",
                            "      <td>1996</td>\n",
                            "      <td>3733.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>2G1WC5E37E1120089</td>\n",
                            "      <td>CHEVROLET</td>\n",
                            "      <td>IMPALA</td>\n",
                            "      <td>2014</td>\n",
                            "      <td>3618.0</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "                 VIN       Make     Model  ModelYear  weight\n",
                            "0  2FMDA5143TBB45576       FORD  WINDSTAR       1996  3733.0\n",
                            "1  2G1WC5E37E1120089  CHEVROLET    IMPALA       2014  3618.0"
                        ]
                    },
                    "execution_count": 55,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "weights = pd.read_csv(\"data/cleaned/vins_and_weights.csv\")\n",
                "weights.head(2)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Seems good to me!"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Person data"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 46,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>AIRBAG_DEPLOYED</th>\n",
                            "      <th>ALCOHOL_TESTTYPE_CODE</th>\n",
                            "      <th>ALCOHOL_TEST_CODE</th>\n",
                            "      <th>BAC_CODE</th>\n",
                            "      <th>CDL_FLAG</th>\n",
                            "      <th>CLASS</th>\n",
                            "      <th>CONDITION_CODE</th>\n",
                            "      <th>DATE_OF_BIRTH</th>\n",
                            "      <th>DRUG_TESTRESULT_CODE</th>\n",
                            "      <th>DRUG_TEST_CODE</th>\n",
                            "      <th>EJECT_CODE</th>\n",
                            "      <th>EMS_UNIT_LABEL</th>\n",
                            "      <th>EQUIP_PROB_CODE</th>\n",
                            "      <th>FAULT_FLAG</th>\n",
                            "      <th>INJ_SEVER_CODE</th>\n",
                            "      <th>LICENSE_STATE_CODE</th>\n",
                            "      <th>MOVEMENT_CODE</th>\n",
                            "      <th>OCC_SEAT_POS_CODE</th>\n",
                            "      <th>PED_LOCATION_CODE</th>\n",
                            "      <th>PED_OBEY_CODE</th>\n",
                            "      <th>PED_TYPE_CODE</th>\n",
                            "      <th>PED_VISIBLE_CODE</th>\n",
                            "      <th>PERSON_ID</th>\n",
                            "      <th>PERSON_TYPE</th>\n",
                            "      <th>REPORT_NO</th>\n",
                            "      <th>SAF_EQUIP_CODE</th>\n",
                            "      <th>SEX_CODE</th>\n",
                            "      <th>VEHICLE_ID</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>1.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>0.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>N</td>\n",
                            "      <td>C</td>\n",
                            "      <td>0.0</td>\n",
                            "      <td>1952-04-20 00:00:00</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>0.0</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>N</td>\n",
                            "      <td>1</td>\n",
                            "      <td>PA</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>48dd00ee-e033-47e7-ad1e-0b734020301b</td>\n",
                            "      <td>D</td>\n",
                            "      <td>AB4284000S</td>\n",
                            "      <td>13.0</td>\n",
                            "      <td>F</td>\n",
                            "      <td>eb6aadb8-dacb-4744-a1a7-ab812c96f27f</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>1.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>0.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>N</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>0.0</td>\n",
                            "      <td>1985-05-28 00:00:00</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>0.0</td>\n",
                            "      <td>0.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>0.0</td>\n",
                            "      <td>N</td>\n",
                            "      <td>1</td>\n",
                            "      <td>MD</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>166296bd-ffd3-4c16-aa74-4f4bf4139d8d</td>\n",
                            "      <td>D</td>\n",
                            "      <td>AB4313000X</td>\n",
                            "      <td>13.0</td>\n",
                            "      <td>F</td>\n",
                            "      <td>b463eb20-2f01-4200-9d6f-b18888ce2593</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   AIRBAG_DEPLOYED  ALCOHOL_TESTTYPE_CODE  ALCOHOL_TEST_CODE  BAC_CODE  \\\n",
                            "0              1.0                    NaN                0.0       NaN   \n",
                            "1              1.0                    NaN                0.0       NaN   \n",
                            "\n",
                            "  CDL_FLAG CLASS  CONDITION_CODE        DATE_OF_BIRTH DRUG_TESTRESULT_CODE  \\\n",
                            "0        N     C             0.0  1952-04-20 00:00:00                  NaN   \n",
                            "1        N   NaN             0.0  1985-05-28 00:00:00                  NaN   \n",
                            "\n",
                            "   DRUG_TEST_CODE  EJECT_CODE EMS_UNIT_LABEL  EQUIP_PROB_CODE FAULT_FLAG  \\\n",
                            "0             0.0         1.0            NaN              1.0          N   \n",
                            "1             0.0         0.0            NaN              0.0          N   \n",
                            "\n",
                            "   INJ_SEVER_CODE LICENSE_STATE_CODE  MOVEMENT_CODE  OCC_SEAT_POS_CODE  \\\n",
                            "0               1                 PA            NaN                NaN   \n",
                            "1               1                 MD            NaN                NaN   \n",
                            "\n",
                            "   PED_LOCATION_CODE  PED_OBEY_CODE  PED_TYPE_CODE  PED_VISIBLE_CODE  \\\n",
                            "0                NaN            NaN            NaN               NaN   \n",
                            "1                NaN            NaN            NaN               NaN   \n",
                            "\n",
                            "                              PERSON_ID PERSON_TYPE   REPORT_NO  \\\n",
                            "0  48dd00ee-e033-47e7-ad1e-0b734020301b           D  AB4284000S   \n",
                            "1  166296bd-ffd3-4c16-aa74-4f4bf4139d8d           D  AB4313000X   \n",
                            "\n",
                            "   SAF_EQUIP_CODE SEX_CODE                            VEHICLE_ID  \n",
                            "0            13.0        F  eb6aadb8-dacb-4744-a1a7-ab812c96f27f  \n",
                            "1            13.0        F  b463eb20-2f01-4200-9d6f-b18888ce2593  "
                        ]
                    },
                    "execution_count": 46,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "people = pd.read_csv(\"data/cleaned/combined-person-data.csv\")\n",
                "people.head(2)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 48,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>INJ_SEVER_CODE</th>\n",
                            "      <th>VEHICLE_ID</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>1</td>\n",
                            "      <td>eb6aadb8-dacb-4744-a1a7-ab812c96f27f</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>1</td>\n",
                            "      <td>b463eb20-2f01-4200-9d6f-b18888ce2593</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>1</td>\n",
                            "      <td>3c8629d0-d524-47c1-bfbc-b18e07f3087e</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>1</td>\n",
                            "      <td>c4628cdb-f295-4a24-8a4b-653741ac6ae7</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>1</td>\n",
                            "      <td>cdda1580-fd79-4358-8819-c2250f494591</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   INJ_SEVER_CODE                            VEHICLE_ID\n",
                            "0               1  eb6aadb8-dacb-4744-a1a7-ab812c96f27f\n",
                            "1               1  b463eb20-2f01-4200-9d6f-b18888ce2593\n",
                            "2               1  3c8629d0-d524-47c1-bfbc-b18e07f3087e\n",
                            "3               1  c4628cdb-f295-4a24-8a4b-653741ac6ae7\n",
                            "4               1  cdda1580-fd79-4358-8819-c2250f494591"
                        ]
                    },
                    "execution_count": 48,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "people = people[['INJ_SEVER_CODE', 'VEHICLE_ID']]\n",
                "people.head()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# Merging our data"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 57,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>REPORT_NO</th>\n",
                            "      <th>SPEED_LIMIT</th>\n",
                            "      <th>VEH_MAKE</th>\n",
                            "      <th>VEH_MODEL</th>\n",
                            "      <th>VEH_YEAR</th>\n",
                            "      <th>VIN_NO</th>\n",
                            "      <th>VEHICLE_ID</th>\n",
                            "      <th>VIN</th>\n",
                            "      <th>Make</th>\n",
                            "      <th>Model</th>\n",
                            "      <th>ModelYear</th>\n",
                            "      <th>weight</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>ADJ487004H</td>\n",
                            "      <td>30</td>\n",
                            "      <td>CHEVY</td>\n",
                            "      <td>TAHOE</td>\n",
                            "      <td>2005.0</td>\n",
                            "      <td>1GNEK13Q2J285593</td>\n",
                            "      <td>000238fd-44fa-4cd5-8eb7-41ab30500bec</td>\n",
                            "      <td>1GNEK13Q2J285593</td>\n",
                            "      <td>CHEVROLET</td>\n",
                            "      <td>GMT-400</td>\n",
                            "      <td>1988.0</td>\n",
                            "      <td>4300.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>MCP2487000M</td>\n",
                            "      <td>40</td>\n",
                            "      <td>INFI</td>\n",
                            "      <td>4S</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>JNKCV51E63M013580</td>\n",
                            "      <td>00038116-1bf9-48cc-b317-4f4375d14b60</td>\n",
                            "      <td>JNKCV51E63M013580</td>\n",
                            "      <td>INFINITI</td>\n",
                            "      <td>G35</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>3468.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>CB5190006B</td>\n",
                            "      <td>55</td>\n",
                            "      <td>TOYT</td>\n",
                            "      <td>TK</td>\n",
                            "      <td>2011.0</td>\n",
                            "      <td>5TFUY5F1XBX167340</td>\n",
                            "      <td>0003b659-2785-4868-8877-0b786a284827</td>\n",
                            "      <td>5TFUY5F1XBX167340</td>\n",
                            "      <td>TOYOTA</td>\n",
                            "      <td>TUNDRA</td>\n",
                            "      <td>2011.0</td>\n",
                            "      <td>5480.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>ADJ4590035</td>\n",
                            "      <td>5</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>CIVIC</td>\n",
                            "      <td>2015.0</td>\n",
                            "      <td>2HGFG4A59FH702545</td>\n",
                            "      <td>00050484-d08f-4b6e-bc7e-9ec270e94660</td>\n",
                            "      <td>2HGFG4A59FH702545</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>CIVIC</td>\n",
                            "      <td>2015.0</td>\n",
                            "      <td>2754.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>ADJ849000Z</td>\n",
                            "      <td>10</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>ACCORD</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>1HGCM66313A037175</td>\n",
                            "      <td>00057af4-d848-4cee-b854-707f57581f4e</td>\n",
                            "      <td>1HGCM66313A037175</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>ACCORD</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>3023.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>5</th>\n",
                            "      <td>AE5207008Z</td>\n",
                            "      <td>30</td>\n",
                            "      <td>FORD</td>\n",
                            "      <td>4D</td>\n",
                            "      <td>2016.0</td>\n",
                            "      <td>1FADP3K28GL258987</td>\n",
                            "      <td>00089d4a-7038-4693-9e02-b402676631af</td>\n",
                            "      <td>1FADP3K28GL258987</td>\n",
                            "      <td>FORD</td>\n",
                            "      <td>FOCUS</td>\n",
                            "      <td>2016.0</td>\n",
                            "      <td>2932.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>6</th>\n",
                            "      <td>MDTA1150000D</td>\n",
                            "      <td>30</td>\n",
                            "      <td>GILL</td>\n",
                            "      <td>BUS</td>\n",
                            "      <td>2004.0</td>\n",
                            "      <td>15GGD211X41076128</td>\n",
                            "      <td>000abd04-058d-4e14-9f1c-4a100493a305</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>7</th>\n",
                            "      <td>MSP5973003Q</td>\n",
                            "      <td>35</td>\n",
                            "      <td>CHEVROLET</td>\n",
                            "      <td>HHR</td>\n",
                            "      <td>2008.0</td>\n",
                            "      <td>3GNDA23DX8S558149</td>\n",
                            "      <td>000e3b5d-b4ae-44a1-af02-df6cce381fc1</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>8</th>\n",
                            "      <td>MCP27070015</td>\n",
                            "      <td>25</td>\n",
                            "      <td>ACUR</td>\n",
                            "      <td>TSX</td>\n",
                            "      <td>2008.0</td>\n",
                            "      <td>JH4CL96848C021626</td>\n",
                            "      <td>000f45d8-bc0e-4f9c-820a-474212e669cd</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>9</th>\n",
                            "      <td>ADJ859000Y</td>\n",
                            "      <td>15</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>CIVIC</td>\n",
                            "      <td>2009.0</td>\n",
                            "      <td>2HGFA16689H357624</td>\n",
                            "      <td>0010076b-0a45-45f3-8796-f387b39cd85d</td>\n",
                            "      <td>2HGFA16689H357624</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>CIVIC</td>\n",
                            "      <td>2009.0</td>\n",
                            "      <td>2678.0</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "      REPORT_NO  SPEED_LIMIT   VEH_MAKE VEH_MODEL  VEH_YEAR  \\\n",
                            "0    ADJ487004H           30      CHEVY     TAHOE    2005.0   \n",
                            "1   MCP2487000M           40       INFI        4S    2003.0   \n",
                            "2    CB5190006B           55       TOYT        TK    2011.0   \n",
                            "3    ADJ4590035            5      HONDA     CIVIC    2015.0   \n",
                            "4    ADJ849000Z           10      HONDA    ACCORD    2003.0   \n",
                            "5    AE5207008Z           30       FORD        4D    2016.0   \n",
                            "6  MDTA1150000D           30       GILL       BUS    2004.0   \n",
                            "7   MSP5973003Q           35  CHEVROLET       HHR    2008.0   \n",
                            "8   MCP27070015           25       ACUR       TSX    2008.0   \n",
                            "9    ADJ859000Y           15      HONDA     CIVIC    2009.0   \n",
                            "\n",
                            "              VIN_NO                            VEHICLE_ID                VIN  \\\n",
                            "0   1GNEK13Q2J285593  000238fd-44fa-4cd5-8eb7-41ab30500bec   1GNEK13Q2J285593   \n",
                            "1  JNKCV51E63M013580  00038116-1bf9-48cc-b317-4f4375d14b60  JNKCV51E63M013580   \n",
                            "2  5TFUY5F1XBX167340  0003b659-2785-4868-8877-0b786a284827  5TFUY5F1XBX167340   \n",
                            "3  2HGFG4A59FH702545  00050484-d08f-4b6e-bc7e-9ec270e94660  2HGFG4A59FH702545   \n",
                            "4  1HGCM66313A037175  00057af4-d848-4cee-b854-707f57581f4e  1HGCM66313A037175   \n",
                            "5  1FADP3K28GL258987  00089d4a-7038-4693-9e02-b402676631af  1FADP3K28GL258987   \n",
                            "6  15GGD211X41076128  000abd04-058d-4e14-9f1c-4a100493a305                NaN   \n",
                            "7  3GNDA23DX8S558149  000e3b5d-b4ae-44a1-af02-df6cce381fc1                NaN   \n",
                            "8  JH4CL96848C021626  000f45d8-bc0e-4f9c-820a-474212e669cd                NaN   \n",
                            "9  2HGFA16689H357624  0010076b-0a45-45f3-8796-f387b39cd85d  2HGFA16689H357624   \n",
                            "\n",
                            "        Make    Model  ModelYear  weight  \n",
                            "0  CHEVROLET  GMT-400     1988.0  4300.0  \n",
                            "1   INFINITI      G35     2003.0  3468.0  \n",
                            "2     TOYOTA   TUNDRA     2011.0  5480.0  \n",
                            "3      HONDA    CIVIC     2015.0  2754.0  \n",
                            "4      HONDA   ACCORD     2003.0  3023.0  \n",
                            "5       FORD    FOCUS     2016.0  2932.0  \n",
                            "6        NaN      NaN        NaN     NaN  \n",
                            "7        NaN      NaN        NaN     NaN  \n",
                            "8        NaN      NaN        NaN     NaN  \n",
                            "9      HONDA    CIVIC     2009.0  2678.0  "
                        ]
                    },
                    "execution_count": 57,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "# We'll keep the ones without weights so we can drop those accidents from the analysis\n",
                "vehicles_with_weights = vehicles.merge(weights,\n",
                "                                       how='left',\n",
                "                                       left_on='VIN_NO',\n",
                "                                       right_on='VIN')\n",
                "vehicles_with_weights.head(10)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Do the `VEH_MAKE` and `Make` columns match up, generally speaking? **Do we trust what we did by searching out the VIN?**"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 65,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>VEH_MAKE</th>\n",
                            "      <th>Make</th>\n",
                            "      <th>VEH_MODEL</th>\n",
                            "      <th>Model</th>\n",
                            "      <th>VEH_YEAR</th>\n",
                            "      <th>ModelYear</th>\n",
                            "      <th>weight</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>CHEVY</td>\n",
                            "      <td>CHEVROLET</td>\n",
                            "      <td>TAHOE</td>\n",
                            "      <td>GMT-400</td>\n",
                            "      <td>2005.0</td>\n",
                            "      <td>1988.0</td>\n",
                            "      <td>4300.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>INFI</td>\n",
                            "      <td>INFINITI</td>\n",
                            "      <td>4S</td>\n",
                            "      <td>G35</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>3468.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>TOYT</td>\n",
                            "      <td>TOYOTA</td>\n",
                            "      <td>TK</td>\n",
                            "      <td>TUNDRA</td>\n",
                            "      <td>2011.0</td>\n",
                            "      <td>2011.0</td>\n",
                            "      <td>5480.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>CIVIC</td>\n",
                            "      <td>CIVIC</td>\n",
                            "      <td>2015.0</td>\n",
                            "      <td>2015.0</td>\n",
                            "      <td>2754.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>ACCORD</td>\n",
                            "      <td>ACCORD</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>3023.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>5</th>\n",
                            "      <td>FORD</td>\n",
                            "      <td>FORD</td>\n",
                            "      <td>4D</td>\n",
                            "      <td>FOCUS</td>\n",
                            "      <td>2016.0</td>\n",
                            "      <td>2016.0</td>\n",
                            "      <td>2932.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>6</th>\n",
                            "      <td>GILL</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>BUS</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>2004.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>7</th>\n",
                            "      <td>CHEVROLET</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>HHR</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>2008.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>8</th>\n",
                            "      <td>ACUR</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>TSX</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>2008.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>9</th>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>CIVIC</td>\n",
                            "      <td>CIVIC</td>\n",
                            "      <td>2009.0</td>\n",
                            "      <td>2009.0</td>\n",
                            "      <td>2678.0</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    VEH_MAKE       Make VEH_MODEL    Model  VEH_YEAR  ModelYear  weight\n",
                            "0      CHEVY  CHEVROLET     TAHOE  GMT-400    2005.0     1988.0  4300.0\n",
                            "1       INFI   INFINITI        4S      G35    2003.0     2003.0  3468.0\n",
                            "2       TOYT     TOYOTA        TK   TUNDRA    2011.0     2011.0  5480.0\n",
                            "3      HONDA      HONDA     CIVIC    CIVIC    2015.0     2015.0  2754.0\n",
                            "4      HONDA      HONDA    ACCORD   ACCORD    2003.0     2003.0  3023.0\n",
                            "5       FORD       FORD        4D    FOCUS    2016.0     2016.0  2932.0\n",
                            "6       GILL        NaN       BUS      NaN    2004.0        NaN     NaN\n",
                            "7  CHEVROLET        NaN       HHR      NaN    2008.0        NaN     NaN\n",
                            "8       ACUR        NaN       TSX      NaN    2008.0        NaN     NaN\n",
                            "9      HONDA      HONDA     CIVIC    CIVIC    2009.0     2009.0  2678.0"
                        ]
                    },
                    "execution_count": 65,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "vehicles_with_weights[['VEH_MAKE', 'Make', 'VEH_MODEL', 'Model', 'VEH_YEAR', 'ModelYear', 'weight']].head(10)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "It looks some of these have weights, while others don't. How many do we have weights for?"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 68,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "True     534436\n",
                            "False    207697\n",
                            "Name: weight, dtype: int64"
                        ]
                    },
                    "execution_count": 68,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "# isna vs notna\n",
                "vehicles_with_weights.weight.notna().value_counts()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Some of our vehicles don't have weights or years!\n",
                "\n",
                "Let's get the report numbers of vehicles without weights and/or years, then exclude those reports (crashes) from our analysis."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 72,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>REPORT_NO</th>\n",
                            "      <th>SPEED_LIMIT</th>\n",
                            "      <th>VEH_MAKE</th>\n",
                            "      <th>VEH_MODEL</th>\n",
                            "      <th>VEH_YEAR</th>\n",
                            "      <th>VIN_NO</th>\n",
                            "      <th>VEHICLE_ID</th>\n",
                            "      <th>VIN</th>\n",
                            "      <th>Make</th>\n",
                            "      <th>Model</th>\n",
                            "      <th>ModelYear</th>\n",
                            "      <th>weight</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>6</th>\n",
                            "      <td>MDTA1150000D</td>\n",
                            "      <td>30</td>\n",
                            "      <td>GILL</td>\n",
                            "      <td>BUS</td>\n",
                            "      <td>2004.0</td>\n",
                            "      <td>15GGD211X41076128</td>\n",
                            "      <td>000abd04-058d-4e14-9f1c-4a100493a305</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>7</th>\n",
                            "      <td>MSP5973003Q</td>\n",
                            "      <td>35</td>\n",
                            "      <td>CHEVROLET</td>\n",
                            "      <td>HHR</td>\n",
                            "      <td>2008.0</td>\n",
                            "      <td>3GNDA23DX8S558149</td>\n",
                            "      <td>000e3b5d-b4ae-44a1-af02-df6cce381fc1</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>8</th>\n",
                            "      <td>MCP27070015</td>\n",
                            "      <td>25</td>\n",
                            "      <td>ACUR</td>\n",
                            "      <td>TSX</td>\n",
                            "      <td>2008.0</td>\n",
                            "      <td>JH4CL96848C021626</td>\n",
                            "      <td>000f45d8-bc0e-4f9c-820a-474212e669cd</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>14</th>\n",
                            "      <td>DA3276000V</td>\n",
                            "      <td>45</td>\n",
                            "      <td>TOYT</td>\n",
                            "      <td>CAMRY</td>\n",
                            "      <td>2018.0</td>\n",
                            "      <td>JTNB11HK2J3004953</td>\n",
                            "      <td>0018b4d3-b19b-4327-b5f9-f0725fd06844</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>15</th>\n",
                            "      <td>CB5712001X</td>\n",
                            "      <td>50</td>\n",
                            "      <td>JEEP</td>\n",
                            "      <td>COMPASS</td>\n",
                            "      <td>2015.0</td>\n",
                            "      <td>W452866115351</td>\n",
                            "      <td>001968e1-88be-424b-b7cf-bd816ffe340a</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "       REPORT_NO  SPEED_LIMIT   VEH_MAKE VEH_MODEL  VEH_YEAR  \\\n",
                            "6   MDTA1150000D           30       GILL       BUS    2004.0   \n",
                            "7    MSP5973003Q           35  CHEVROLET       HHR    2008.0   \n",
                            "8    MCP27070015           25       ACUR       TSX    2008.0   \n",
                            "14    DA3276000V           45       TOYT     CAMRY    2018.0   \n",
                            "15    CB5712001X           50       JEEP   COMPASS    2015.0   \n",
                            "\n",
                            "               VIN_NO                            VEHICLE_ID  VIN Make Model  \\\n",
                            "6   15GGD211X41076128  000abd04-058d-4e14-9f1c-4a100493a305  NaN  NaN   NaN   \n",
                            "7   3GNDA23DX8S558149  000e3b5d-b4ae-44a1-af02-df6cce381fc1  NaN  NaN   NaN   \n",
                            "8   JH4CL96848C021626  000f45d8-bc0e-4f9c-820a-474212e669cd  NaN  NaN   NaN   \n",
                            "14  JTNB11HK2J3004953  0018b4d3-b19b-4327-b5f9-f0725fd06844  NaN  NaN   NaN   \n",
                            "15      W452866115351  001968e1-88be-424b-b7cf-bd816ffe340a  NaN  NaN   NaN   \n",
                            "\n",
                            "    ModelYear  weight  \n",
                            "6         NaN     NaN  \n",
                            "7         NaN     NaN  \n",
                            "8         NaN     NaN  \n",
                            "14        NaN     NaN  \n",
                            "15        NaN     NaN  "
                        ]
                    },
                    "execution_count": 72,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "# Find the ones missing weights\n",
                "missing_weight = vehicles_with_weights[vehicles_with_weights.weight.isna()]\n",
                "missing_weight.head()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 74,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "6     MDTA1150000D\n",
                            "7      MSP5973003Q\n",
                            "8      MCP27070015\n",
                            "14      DA3276000V\n",
                            "15      CB5712001X\n",
                            "Name: REPORT_NO, dtype: object"
                        ]
                    },
                    "execution_count": 74,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "# Get their report numbers\n",
                "missing_weight.REPORT_NO.head()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": []
        },
        {
            "cell_type": "code",
            "execution_count": 75,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "Before (396419, 1)\n",
                        "After (218513, 1)\n"
                    ]
                }
            ],
            "source": [
                "# Filter those out of our 'crashes' dataset\n",
                "# We need to outline this real bigtime beforehand, \"what can we keep?\"\n",
                "# because if we're only looking at our own car, then we don't need to drop these\n",
                "print(\"Before\", crashes.shape)\n",
                "crashes = crashes[~crashes.REPORT_NO.isin(missing_weight.REPORT_NO)]\n",
                "print(\"After\", crashes.shape)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "If we're getting rid of half of our data, is it still okay?"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# We only want two-vehicle crashes\n",
                "\n",
                "> For analytic purposes, we decompose the data set into three sub-sa three-vehicle crashes, and single-vehicle crashes. The two-vehicle crash data set is the focus of most of our analyses\n",
                "\n",
                "Each vehicle has a `REPORT_NO`. If two vehicles show up with the same `REPORT_NO`, we know there were two vehicles in the crash."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 76,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>REPORT_NO</th>\n",
                            "      <th>SPEED_LIMIT</th>\n",
                            "      <th>VEH_MAKE</th>\n",
                            "      <th>VEH_MODEL</th>\n",
                            "      <th>VEH_YEAR</th>\n",
                            "      <th>VIN_NO</th>\n",
                            "      <th>VEHICLE_ID</th>\n",
                            "      <th>VIN</th>\n",
                            "      <th>Make</th>\n",
                            "      <th>Model</th>\n",
                            "      <th>ModelYear</th>\n",
                            "      <th>weight</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>ADJ487004H</td>\n",
                            "      <td>30</td>\n",
                            "      <td>CHEVY</td>\n",
                            "      <td>TAHOE</td>\n",
                            "      <td>2005.0</td>\n",
                            "      <td>1GNEK13Q2J285593</td>\n",
                            "      <td>000238fd-44fa-4cd5-8eb7-41ab30500bec</td>\n",
                            "      <td>1GNEK13Q2J285593</td>\n",
                            "      <td>CHEVROLET</td>\n",
                            "      <td>GMT-400</td>\n",
                            "      <td>1988.0</td>\n",
                            "      <td>4300.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>MCP2487000M</td>\n",
                            "      <td>40</td>\n",
                            "      <td>INFI</td>\n",
                            "      <td>4S</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>JNKCV51E63M013580</td>\n",
                            "      <td>00038116-1bf9-48cc-b317-4f4375d14b60</td>\n",
                            "      <td>JNKCV51E63M013580</td>\n",
                            "      <td>INFINITI</td>\n",
                            "      <td>G35</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>3468.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>CB5190006B</td>\n",
                            "      <td>55</td>\n",
                            "      <td>TOYT</td>\n",
                            "      <td>TK</td>\n",
                            "      <td>2011.0</td>\n",
                            "      <td>5TFUY5F1XBX167340</td>\n",
                            "      <td>0003b659-2785-4868-8877-0b786a284827</td>\n",
                            "      <td>5TFUY5F1XBX167340</td>\n",
                            "      <td>TOYOTA</td>\n",
                            "      <td>TUNDRA</td>\n",
                            "      <td>2011.0</td>\n",
                            "      <td>5480.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>ADJ4590035</td>\n",
                            "      <td>5</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>CIVIC</td>\n",
                            "      <td>2015.0</td>\n",
                            "      <td>2HGFG4A59FH702545</td>\n",
                            "      <td>00050484-d08f-4b6e-bc7e-9ec270e94660</td>\n",
                            "      <td>2HGFG4A59FH702545</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>CIVIC</td>\n",
                            "      <td>2015.0</td>\n",
                            "      <td>2754.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>ADJ849000Z</td>\n",
                            "      <td>10</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>ACCORD</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>1HGCM66313A037175</td>\n",
                            "      <td>00057af4-d848-4cee-b854-707f57581f4e</td>\n",
                            "      <td>1HGCM66313A037175</td>\n",
                            "      <td>HONDA</td>\n",
                            "      <td>ACCORD</td>\n",
                            "      <td>2003.0</td>\n",
                            "      <td>3023.0</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "     REPORT_NO  SPEED_LIMIT VEH_MAKE VEH_MODEL  VEH_YEAR             VIN_NO  \\\n",
                            "0   ADJ487004H           30    CHEVY     TAHOE    2005.0   1GNEK13Q2J285593   \n",
                            "1  MCP2487000M           40     INFI        4S    2003.0  JNKCV51E63M013580   \n",
                            "2   CB5190006B           55     TOYT        TK    2011.0  5TFUY5F1XBX167340   \n",
                            "3   ADJ4590035            5    HONDA     CIVIC    2015.0  2HGFG4A59FH702545   \n",
                            "4   ADJ849000Z           10    HONDA    ACCORD    2003.0  1HGCM66313A037175   \n",
                            "\n",
                            "                             VEHICLE_ID                VIN       Make  \\\n",
                            "0  000238fd-44fa-4cd5-8eb7-41ab30500bec   1GNEK13Q2J285593  CHEVROLET   \n",
                            "1  00038116-1bf9-48cc-b317-4f4375d14b60  JNKCV51E63M013580   INFINITI   \n",
                            "2  0003b659-2785-4868-8877-0b786a284827  5TFUY5F1XBX167340     TOYOTA   \n",
                            "3  00050484-d08f-4b6e-bc7e-9ec270e94660  2HGFG4A59FH702545      HONDA   \n",
                            "4  00057af4-d848-4cee-b854-707f57581f4e  1HGCM66313A037175      HONDA   \n",
                            "\n",
                            "     Model  ModelYear  weight  \n",
                            "0  GMT-400     1988.0  4300.0  \n",
                            "1      G35     2003.0  3468.0  \n",
                            "2   TUNDRA     2011.0  5480.0  \n",
                            "3    CIVIC     2015.0  2754.0  \n",
                            "4   ACCORD     2003.0  3023.0  "
                        ]
                    },
                    "execution_count": 76,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "vehicles_with_weights.head()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 103,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "MDTA1229000H    69\n",
                            "CB53480003      35\n",
                            "MSP67460095     18\n",
                            "CE4636002C      17\n",
                            "MSP6063005N     17\n",
                            "Name: REPORT_NO, dtype: int64"
                        ]
                    },
                    "execution_count": 103,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "# Which technique?\n",
                "\n",
                "# counted = vehicles_with_weights.groupby('REPORT_NO').size().reset_index(name='car_count')\n",
                "# counted.head()\n",
                "\n",
                "counted = vehicles_with_weights.REPORT_NO.value_counts()\n",
                "counted.head()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "There was a 69-car crash? That's pretty interesting. Better look that one up. To get the ones that have two cars, we just ask if `== 2` - since it's a series, we don't have to ask for the column name."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 105,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "Index(['MSP528100BT', 'MSP5179001M', 'AE5869002N', 'ADH2680025', 'MSP2312000H',\n",
                            "       'HS98300005', 'AC12830021', 'DA3892000P', 'AC2040003L', 'AC2127000B',\n",
                            "       ...\n",
                            "       'ADI179000X', 'DA3889002W', 'ADI946000L', 'ADJ380001Y', 'AC2034002L',\n",
                            "       'MSP61740086', 'AK00410003', 'MCP1366006C', 'ADJ186004B', 'ZH0339000W'],\n",
                            "      dtype='object', length=253011)"
                        ]
                    },
                    "execution_count": 105,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "#two_car_report_nos = counted[counted.car_count == 2].REPORT_NO\n",
                "two_car_report_nos = counted[counted == 2].index\n",
                "two_car_report_nos"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# Filter for only light vehicles\n",
                "\n",
                "> this data set contains all collisions involving two light vehicles built after 1980. We define a light vehicle as any car, pickup truck, SUV, or minivan that weights between 1500 and 6000 pounds."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 106,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "(217074, 1)"
                        ]
                    },
                    "execution_count": 106,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "is_not_light_vehicle = (vehicles_with_weights.weight <= 1500) | (vehicles_with_weights.weight > 6000)\n",
                "is_before_1980 = vehicles_with_weights.ModelYear.astype(float) < 1980\n",
                "not_light_vehicle_crash = vehicles_with_weights[is_not_light_vehicle | is_before_1980].REPORT_NO\n",
                "crashes = crashes[~crashes.REPORT_NO.isin(not_light_vehicle_crash)]\n",
                "crashes.shape"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": []
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.6.8"
        },
        "toc": {
            "base_numbering": 1,
            "nav_menu": {},
            "number_sections": true,
            "sideBar": true,
            "skip_h1_title": false,
            "title_cell": "Table of Contents",
            "title_sidebar": "Contents",
            "toc_cell": false,
            "toc_position": {},
            "toc_section_display": true,
            "toc_window_display": false
        }
    },
    "nbformat": 4,
    "nbformat_minor": 2
}