{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# Finding faulty airbags in a sea of consumer complaints with a decision tree\n",
                "\n",
                "**The story:**\n",
                "- https://www.nytimes.com/2014/09/12/business/air-bag-flaw-long-known-led-to-recalls.html\n",
                "- https://www.nytimes.com/2014/11/07/business/airbag-maker-takata-is-said-to-have-conducted-secret-tests.html\n",
                "- https://www.nytimes.com/interactive/2015/06/22/business/international/takata-airbag-recall-list.html\n",
                "- https://www.nytimes.com/2016/08/27/business/takata-airbag-recall-crisis.html\n",
                "\n",
                "This story, done by The New York Times, investigates the content in complaints made to National Highway Traffic Safety Administration (NHTSA) by customers who had bad experiences with Takata airbags in their cars. Eventually, car companies had to recall airbags made by the airbag supplier that promised a cheaper alternative. \n",
                "\n",
                "**Author:** Daeil Kim did a more complex version of this particular analysis - [presentation here](https://www.slideshare.net/mortardata/daeil-kim-at-the-nyc-data-science-meetup)\n",
                "\n",
                "**Topics:** Decision Trees, Random Forests\n",
                "\n",
                "**Datasets**\n",
                "\n",
                "* **sampled-labeled.csv:** a sample of vehicle complaints, labeled with being suspicious or not\n",
                "\n",
                "## What's the goal?\n",
                "\n",
                "It was too much work to read twenty years of vehicle comments to find the ones related to dangerous airbags! Because we're lazy, we wanted the computer to do this for us. We did this before with a classifier that used logistic regression, now we're going to try a different one."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<p class=\"reading-options\">\n  <a class=\"btn\" href=\"/nyt-takata-airbags/airbag-classifier-search-decision-tree\">\n    <i class=\"fa fa-sm fa-book\"></i>\n    Read online\n  </a>\n  <a class=\"btn\" href=\"/nyt-takata-airbags/notebooks/Airbag classifier search (Decision Tree).ipynb\">\n    <i class=\"fa fa-sm fa-download\"></i>\n    Download notebook\n  </a>\n  <a class=\"btn\" href=\"https://colab.research.google.com/github/littlecolumns/ds4j-notebooks/blob/master/nyt-takata-airbags/notebooks/Airbag classifier search (Decision Tree).ipynb\" target=\"_new\">\n    <i class=\"fa fa-sm fa-laptop\"></i>\n    Interactive version\n  </a>\n</p>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "### Prep work: Downloading necessary files\n",
                "Before we get started, we need to download all of the data we'll be using.\n",
                "* **sampled-labeled.csv:** labeled complaints - a sample of vehicle complaints, labeled with being suspicious or not\n"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {},
            "source": [
                "# Make data directory if it doesn't exist\n",
                "!mkdir -p data\n",
                "!wget -nc https://nyc3.digitaloceanspaces.com/ml-files-distro/v1/nyt-takata-airbags/data/sampled-labeled.csv -P data"
            ],
            "outputs": [],
            "execution_count": null
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# Our code\n",
                "\n",
                "## Setup"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 1,
            "metadata": {},
            "outputs": [],
            "source": [
                "import pandas as pd\n",
                "\n",
                "# Allow us to display 100 columns at a time, and 100 characters in each column (instead of ...)\n",
                "pd.set_option(\"display.max_columns\", 100)\n",
                "pd.set_option(\"display.max_colwidth\", 100)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Read in our labeled data\n",
                "\n",
                "We aren't going to be using the unlabeled dataset this time, we're only going to look at **how our classifier works.** We'll start by reading in our complaints that have labeled attached to them.\n",
                "\n",
                "**Read in `sampled-labeled.csv` and check how many suspicious/not suspicious complaints we have.**"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>is_suspicious</th>\n",
                            "      <th>CDESCR</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>0.0</td>\n",
                            "      <td>ALTHOUGH I LOVED THE CAR OVERALL AT THE TIME I DECIDED TO OWN, , MY DREAM CAR CADILLAC CTS HAS T...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>0.0</td>\n",
                            "      <td>CONSUMER SHUT SLIDING DOOR WHEN ALL POWER LOCKS ON ALL DOORS LOCKED BY ITSELF, TRAPPING INFANT I...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>0.0</td>\n",
                            "      <td>DRIVERS SEAT BACK COLLAPSED AND BENT WHEN REAR ENDED. PLEASE DESCRIBE DETAILS.  TT</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>0.0</td>\n",
                            "      <td>TL* THE CONTACT OWNS A 2009 NISSAN ALTIMA. THE CONTACT STATED THAT THE START BUTTON FOR THE IGNI...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>0.0</td>\n",
                            "      <td>THE FRONT MIDDLE SEAT DOESN'T LOCK IN PLACE.  *AK</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   is_suspicious  \\\n",
                            "0            0.0   \n",
                            "1            0.0   \n",
                            "2            0.0   \n",
                            "3            0.0   \n",
                            "4            0.0   \n",
                            "\n",
                            "                                                                                                CDESCR  \n",
                            "0  ALTHOUGH I LOVED THE CAR OVERALL AT THE TIME I DECIDED TO OWN, , MY DREAM CAR CADILLAC CTS HAS T...  \n",
                            "1  CONSUMER SHUT SLIDING DOOR WHEN ALL POWER LOCKS ON ALL DOORS LOCKED BY ITSELF, TRAPPING INFANT I...  \n",
                            "2                   DRIVERS SEAT BACK COLLAPSED AND BENT WHEN REAR ENDED. PLEASE DESCRIBE DETAILS.  TT  \n",
                            "3  TL* THE CONTACT OWNS A 2009 NISSAN ALTIMA. THE CONTACT STATED THAT THE START BUTTON FOR THE IGNI...  \n",
                            "4                                                    THE FRONT MIDDLE SEAT DOESN'T LOCK IN PLACE.  *AK  "
                        ]
                    },
                    "execution_count": 2,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "labeled = pd.read_csv(\"data/sampled-labeled.csv\")\n",
                "labeled.head()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 3,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "0.0    150\n",
                            "1.0     15\n",
                            "Name: is_suspicious, dtype: int64"
                        ]
                    },
                    "execution_count": 3,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "labeled.is_suspicious.value_counts()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "150 non-suspicious and 15 suspicious is a pretty terrible ratio, but we're remarkably lazy and not very many of the comments are actually suspicious.\n",
                "\n",
                "Now that we've read a few, let's train our classifier\n",
                "\n",
                "## Creating features\n",
                "\n",
                "### Selecting our features and building a features dataframe\n",
                "\n",
                "Last time, we can thought of some words or phrases that might make a comment interesting or not interesting. We came up with this list:\n",
                "\n",
                "* airbag\n",
                "* air bag\n",
                "* failed\n",
                "* did not deploy\n",
                "* violent\n",
                "* explode\n",
                "* shrapnel\n",
                "\n",
                "These **features** are the things that the machine learning algorithm is going to look for when it's reading. There are lots of words in each complaint, but these are the only ones we'll tell the classifier to pay attention to!\n",
                "\n",
                "To determine if a word is in `CDESCR`, we can use `.str.contains`. Because computers only like numbers, though, we need to use `.astype(int)` to change it from `True`/`False` to `1`/`0`. "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 4,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>is_suspicious</th>\n",
                            "      <th>airbag</th>\n",
                            "      <th>air bag</th>\n",
                            "      <th>failed</th>\n",
                            "      <th>did not deploy</th>\n",
                            "      <th>violent</th>\n",
                            "      <th>explode</th>\n",
                            "      <th>shrapnel</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>0.0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>0.0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>0.0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>0.0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>0.0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "      <td>0</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   is_suspicious  airbag  air bag  failed  did not deploy  violent  explode  \\\n",
                            "0            0.0       0        0       0               0        0        0   \n",
                            "1            0.0       0        0       0               0        0        0   \n",
                            "2            0.0       0        0       0               0        0        0   \n",
                            "3            0.0       0        0       0               0        0        0   \n",
                            "4            0.0       0        0       0               0        0        0   \n",
                            "\n",
                            "   shrapnel  \n",
                            "0         0  \n",
                            "1         0  \n",
                            "2         0  \n",
                            "3         0  \n",
                            "4         0  "
                        ]
                    },
                    "execution_count": 4,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "train_df = pd.DataFrame({\n",
                "    'is_suspicious': labeled.is_suspicious,\n",
                "    'airbag': labeled.CDESCR.str.contains(\"AIRBAG\", na=False).astype(int),\n",
                "    'air bag': labeled.CDESCR.str.contains(\"AIR BAG\", na=False).astype(int),\n",
                "    'failed': labeled.CDESCR.str.contains(\"FAILED\", na=False).astype(int),\n",
                "    'did not deploy': labeled.CDESCR.str.contains(\"DID NOT DEPLOY\", na=False).astype(int),\n",
                "    'violent': labeled.CDESCR.str.contains(\"VIOLENT\", na=False).astype(int),\n",
                "    'explode': labeled.CDESCR.str.contains(\"EXPLODE\", na=False).astype(int),\n",
                "    'shrapnel': labeled.CDESCR.str.contains(\"SHRAPNEL\", na=False).astype(int),\n",
                "})\n",
                "train_df.head()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Let's see how big our dataset is, and then remove any rows that are missing data (not all of them are labeled)."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 5,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "(350, 8)"
                        ]
                    },
                    "execution_count": 5,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "train_df.shape"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 6,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "(165, 8)"
                        ]
                    },
                    "execution_count": 6,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "train_df = train_df.dropna()\n",
                "train_df.shape"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Creating our classifier\n",
                "\n",
                "Any time you're bulding a classifier, doing regression, or most anything with machine learning, you're using a **model**. It **models** the relationship between the inputs and the outputs.\n",
                "\n",
                "### Classification with Decision Trees\n",
                "\n",
                "Last time we used a classifier based on **Logistic Regression**. First we split into `X` (our features) and `y` (our labels), and trained the classifier on them."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 7,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "LogisticRegression(C=1000000000.0, class_weight=None, dual=False,\n",
                            "                   fit_intercept=True, intercept_scaling=1, l1_ratio=None,\n",
                            "                   max_iter=100, multi_class='warn', n_jobs=None, penalty='l2',\n",
                            "                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
                            "                   warm_start=False)"
                        ]
                    },
                    "execution_count": 7,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "from sklearn.linear_model import LogisticRegression\n",
                "\n",
                "X = train_df.drop(columns='is_suspicious')\n",
                "y = train_df.is_suspicious\n",
                "\n",
                "clf = LogisticRegression(C=1e9, solver='lbfgs')\n",
                "\n",
                "clf.fit(X, y)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "After we built our classifier, we tested it and found it didn't work very well."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 8,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Predicted not suspicious</th>\n",
                            "      <th>Predicted suspicious</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>Is not suspicious</th>\n",
                            "      <td>150</td>\n",
                            "      <td>0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>Is suspicious</th>\n",
                            "      <td>13</td>\n",
                            "      <td>2</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "                   Predicted not suspicious  Predicted suspicious\n",
                            "Is not suspicious                       150                     0\n",
                            "Is suspicious                            13                     2"
                        ]
                    },
                    "execution_count": 8,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "from sklearn.metrics import confusion_matrix\n",
                "\n",
                "y_true = y\n",
                "y_pred = clf.predict(X)\n",
                "\n",
                "matrix = confusion_matrix(y_true, y_pred)\n",
                "\n",
                "label_names = pd.Series(['not suspicious', 'suspicious'])\n",
                "pd.DataFrame(matrix,\n",
                "     columns='Predicted ' + label_names,\n",
                "     index='Is ' + label_names)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "To understand a logisic regression classifier, we looked at the coefficients and the odds ratios."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 9,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>feature</th>\n",
                            "      <th>coefficient (log odds ratio)</th>\n",
                            "      <th>odds ratio</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>violent</td>\n",
                            "      <td>41.423096</td>\n",
                            "      <td>9.768364e+17</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>5</th>\n",
                            "      <td>explode</td>\n",
                            "      <td>1.269048</td>\n",
                            "      <td>3.557500e+00</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>air bag</td>\n",
                            "      <td>1.268123</td>\n",
                            "      <td>3.554200e+00</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>airbag</td>\n",
                            "      <td>0.945612</td>\n",
                            "      <td>2.574400e+00</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>failed</td>\n",
                            "      <td>-27.175214</td>\n",
                            "      <td>0.000000e+00</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>did not deploy</td>\n",
                            "      <td>-37.906428</td>\n",
                            "      <td>0.000000e+00</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>6</th>\n",
                            "      <td>shrapnel</td>\n",
                            "      <td>-13.204894</td>\n",
                            "      <td>0.000000e+00</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "          feature  coefficient (log odds ratio)    odds ratio\n",
                            "4         violent                     41.423096  9.768364e+17\n",
                            "5         explode                      1.269048  3.557500e+00\n",
                            "1         air bag                      1.268123  3.554200e+00\n",
                            "0          airbag                      0.945612  2.574400e+00\n",
                            "2          failed                    -27.175214  0.000000e+00\n",
                            "3  did not deploy                    -37.906428  0.000000e+00\n",
                            "6        shrapnel                    -13.204894  0.000000e+00"
                        ]
                    },
                    "execution_count": 9,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "import numpy as np\n",
                "\n",
                "feature_names = X.columns\n",
                "coefficients = clf.coef_[0]\n",
                "\n",
                "pd.DataFrame({\n",
                "    'feature': feature_names,\n",
                "    'coefficient (log odds ratio)': coefficients,\n",
                "    'odds ratio': np.exp(coefficients).round(4)\n",
                "}).sort_values(by='odds ratio', ascending=False)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "### Classification with Decision Trees\n",
                "\n",
                "We can also use a classifier called a **decision tree**. All you need to do is have one new import and change the line where you create your classifier."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 10,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,\n",
                            "                       max_features=None, max_leaf_nodes=None,\n",
                            "                       min_impurity_decrease=0.0, min_impurity_split=None,\n",
                            "                       min_samples_leaf=1, min_samples_split=2,\n",
                            "                       min_weight_fraction_leaf=0.0, presort=False,\n",
                            "                       random_state=None, splitter='best')"
                        ]
                    },
                    "execution_count": 10,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "#from sklearn.linear_model import LogisticRegression\n",
                "from sklearn.tree import DecisionTreeClassifier\n",
                "\n",
                "X = train_df.drop(columns='is_suspicious')\n",
                "y = train_df.is_suspicious\n",
                "\n",
                "#clf = LogisticRegression(C=1e9, solver='lbfgs')\n",
                "clf = DecisionTreeClassifier()\n",
                "\n",
                "clf.fit(X, y)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Confusion matrix code looks exactly the same."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 11,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Predicted not suspicious</th>\n",
                            "      <th>Predicted suspicious</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>Is not suspicious</th>\n",
                            "      <td>150</td>\n",
                            "      <td>0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>Is suspicious</th>\n",
                            "      <td>13</td>\n",
                            "      <td>2</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "                   Predicted not suspicious  Predicted suspicious\n",
                            "Is not suspicious                       150                     0\n",
                            "Is suspicious                            13                     2"
                        ]
                    },
                    "execution_count": 11,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "from sklearn.metrics import confusion_matrix\n",
                "\n",
                "y_true = y\n",
                "y_pred = clf.predict(X)\n",
                "\n",
                "matrix = confusion_matrix(y_true, y_pred)\n",
                "\n",
                "label_names = pd.Series(['not suspicious', 'suspicious'])\n",
                "pd.DataFrame(matrix,\n",
                "     columns='Predicted ' + label_names,\n",
                "     index='Is ' + label_names)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "When using a decision tree, **using the classifier is the same, but the code to understand the classifier is a bit different.** Instead of coefficients, we're going to look at **feature importance.**"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 12,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "\n",
                            "    <style>\n",
                            "    table.eli5-weights tr:hover {\n",
                            "        filter: brightness(85%);\n",
                            "    }\n",
                            "</style>\n",
                            "\n",
                            "\n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "        <table class=\"eli5-weights eli5-feature-importances\" style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto;\">\n",
                            "    <thead>\n",
                            "    <tr style=\"border: none;\">\n",
                            "        <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">Weight</th>\n",
                            "        <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
                            "    </tr>\n",
                            "    </thead>\n",
                            "    <tbody>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.3440\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                airbag\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 86.19%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.2026\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                violent\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 88.66%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.1529\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                air bag\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 89.10%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.1445\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                explode\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 90.40%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.1205\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                did not deploy\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 96.67%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.0266\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                failed\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 98.43%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.0091\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                shrapnel\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "    \n",
                            "    </tbody>\n",
                            "</table>\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "        \n",
                            "        <pre>\n",
                            "Decision tree feature importances; values are numbers 0 <= x <= 1;\n",
                            "all values sum to 1.\n",
                            "</pre>\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "\n",
                            "\n"
                        ],
                        "text/plain": [
                            "<IPython.core.display.HTML object>"
                        ]
                    },
                    "execution_count": 12,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "import eli5\n",
                "\n",
                "label_names = ['not suspicious', 'suspicious']\n",
                "feature_names = list(X.columns)\n",
                "\n",
                "eli5.show_weights(clf,\n",
                "                  feature_names=feature_names,\n",
                "                  target_names=label_names,\n",
                "                  show=['feature_importances', 'description'])"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "The most fun part of using a decision tree is **visualizing it.**"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 13,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "image/svg+xml": [
                            "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
                            "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
                            " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
                            "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
                            " -->\n",
                            "<!-- Title: Tree Pages: 1 -->\n",
                            "<svg width=\"770pt\" height=\"870pt\"\n",
                            " viewBox=\"0.00 0.00 769.84 870.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
                            "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 866)\">\n",
                            "<title>Tree</title>\n",
                            "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-866 765.8369,-866 765.8369,4 -4,4\"/>\n",
                            "<!-- 0 -->\n",
                            "<g id=\"node1\" class=\"node\">\n",
                            "<title>0</title>\n",
                            "<polygon fill=\"#e88e4d\" stroke=\"#000000\" points=\"599.7555,-862 462.0814,-862 462.0814,-784 599.7555,-784 599.7555,-862\"/>\n",
                            "<text text-anchor=\"middle\" x=\"530.9185\" y=\"-846.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">violent &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"530.9185\" y=\"-832.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.165</text>\n",
                            "<text text-anchor=\"middle\" x=\"530.9185\" y=\"-818.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 165</text>\n",
                            "<text text-anchor=\"middle\" x=\"530.9185\" y=\"-804.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [150, 15]</text>\n",
                            "<text text-anchor=\"middle\" x=\"530.9185\" y=\"-790.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 1 -->\n",
                            "<g id=\"node2\" class=\"node\">\n",
                            "<title>1</title>\n",
                            "<polygon fill=\"#e78d4b\" stroke=\"#000000\" points=\"527.7555,-748 390.0814,-748 390.0814,-670 527.7555,-670 527.7555,-748\"/>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-732.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">did not deploy &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-718.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.156</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-704.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 164</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-690.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [150, 14]</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-676.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 0&#45;&gt;1 -->\n",
                            "<g id=\"edge1\" class=\"edge\">\n",
                            "<title>0&#45;&gt;1</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M506.1401,-783.7677C500.6531,-775.0798 494.7796,-765.7801 489.1041,-756.794\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"492.0416,-754.8906 483.7424,-748.3046 486.1232,-758.6285 492.0416,-754.8906\"/>\n",
                            "<text text-anchor=\"middle\" x=\"478.2359\" y=\"-768.4907\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">True</text>\n",
                            "</g>\n",
                            "<!-- 20 -->\n",
                            "<g id=\"node21\" class=\"node\">\n",
                            "<title>20</title>\n",
                            "<polygon fill=\"#399de5\" stroke=\"#000000\" points=\"662.3666,-741 545.4703,-741 545.4703,-677 662.3666,-677 662.3666,-741\"/>\n",
                            "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-725.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-711.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1</text>\n",
                            "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-697.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0, 1]</text>\n",
                            "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-683.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = suspicious</text>\n",
                            "</g>\n",
                            "<!-- 0&#45;&gt;20 -->\n",
                            "<g id=\"edge20\" class=\"edge\">\n",
                            "<title>0&#45;&gt;20</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M556.0409,-783.7677C563.0702,-772.7904 570.7251,-760.8362 577.8101,-749.772\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"580.9019,-751.434 583.3471,-741.1252 575.0069,-747.6591 580.9019,-751.434\"/>\n",
                            "<text text-anchor=\"middle\" x=\"588.7012\" y=\"-761.3451\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">False</text>\n",
                            "</g>\n",
                            "<!-- 2 -->\n",
                            "<g id=\"node3\" class=\"node\">\n",
                            "<title>2</title>\n",
                            "<polygon fill=\"#e99254\" stroke=\"#000000\" points=\"449.7555,-634 312.0814,-634 312.0814,-556 449.7555,-556 449.7555,-634\"/>\n",
                            "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-618.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">air bag &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-604.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.212</text>\n",
                            "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-590.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 116</text>\n",
                            "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-576.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [102, 14]</text>\n",
                            "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-562.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 1&#45;&gt;2 -->\n",
                            "<g id=\"edge2\" class=\"edge\">\n",
                            "<title>1&#45;&gt;2</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M432.0753,-669.7677C426.131,-661.0798 419.768,-651.7801 413.6196,-642.794\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"416.3466,-640.5813 407.8111,-634.3046 410.5694,-644.5341 416.3466,-640.5813\"/>\n",
                            "</g>\n",
                            "<!-- 19 -->\n",
                            "<g id=\"node20\" class=\"node\">\n",
                            "<title>19</title>\n",
                            "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"605.7555,-627 468.0814,-627 468.0814,-563 605.7555,-563 605.7555,-627\"/>\n",
                            "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-611.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-597.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 48</text>\n",
                            "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-583.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [48, 0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-569.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 1&#45;&gt;19 -->\n",
                            "<g id=\"edge19\" class=\"edge\">\n",
                            "<title>1&#45;&gt;19</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M485.7616,-669.7677C493.3468,-658.6817 501.6136,-646.5994 509.2465,-635.4436\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"512.1798,-637.3547 514.9381,-627.1252 506.4026,-633.4019 512.1798,-637.3547\"/>\n",
                            "</g>\n",
                            "<!-- 3 -->\n",
                            "<g id=\"node4\" class=\"node\">\n",
                            "<title>3</title>\n",
                            "<polygon fill=\"#e78d4b\" stroke=\"#000000\" points=\"371.7555,-520 234.0814,-520 234.0814,-442 371.7555,-442 371.7555,-520\"/>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-504.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">airbag &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-490.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.156</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-476.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 94</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-462.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [86, 8]</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-448.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 2&#45;&gt;3 -->\n",
                            "<g id=\"edge3\" class=\"edge\">\n",
                            "<title>2&#45;&gt;3</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M354.0753,-555.7677C348.131,-547.0798 341.768,-537.7801 335.6196,-528.794\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"338.3466,-526.5813 329.8111,-520.3046 332.5694,-530.5341 338.3466,-526.5813\"/>\n",
                            "</g>\n",
                            "<!-- 12 -->\n",
                            "<g id=\"node13\" class=\"node\">\n",
                            "<title>12</title>\n",
                            "<polygon fill=\"#efb083\" stroke=\"#000000\" points=\"527.7555,-520 390.0814,-520 390.0814,-442 527.7555,-442 527.7555,-520\"/>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-504.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">explode &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-490.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.397</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-476.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 22</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-462.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [16, 6]</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-448.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 2&#45;&gt;12 -->\n",
                            "<g id=\"edge12\" class=\"edge\">\n",
                            "<title>2&#45;&gt;12</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M407.7616,-555.7677C413.7059,-547.0798 420.0689,-537.7801 426.2173,-528.794\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"429.2675,-530.5341 432.0258,-520.3046 423.4903,-526.5813 429.2675,-530.5341\"/>\n",
                            "</g>\n",
                            "<!-- 4 -->\n",
                            "<g id=\"node5\" class=\"node\">\n",
                            "<title>4</title>\n",
                            "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"215.7555,-399 78.0814,-399 78.0814,-335 215.7555,-335 215.7555,-399\"/>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-383.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-369.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 49</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-355.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [49, 0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-341.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 3&#45;&gt;4 -->\n",
                            "<g id=\"edge4\" class=\"edge\">\n",
                            "<title>3&#45;&gt;4</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M249.2321,-441.7677C233.0207,-429.9209 215.2525,-416.9364 199.1382,-405.1606\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"201.0183,-402.1995 190.8793,-399.1252 196.8881,-407.8513 201.0183,-402.1995\"/>\n",
                            "</g>\n",
                            "<!-- 5 -->\n",
                            "<g id=\"node6\" class=\"node\">\n",
                            "<title>5</title>\n",
                            "<polygon fill=\"#eb9c64\" stroke=\"#000000\" points=\"371.7555,-406 234.0814,-406 234.0814,-328 371.7555,-328 371.7555,-406\"/>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-390.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">failed &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-376.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.292</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-362.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 45</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-348.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [37, 8]</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-334.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 3&#45;&gt;5 -->\n",
                            "<g id=\"edge5\" class=\"edge\">\n",
                            "<title>3&#45;&gt;5</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M302.9185,-441.7677C302.9185,-433.6172 302.9185,-424.9283 302.9185,-416.4649\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"306.4186,-416.3046 302.9185,-406.3046 299.4186,-416.3047 306.4186,-416.3046\"/>\n",
                            "</g>\n",
                            "<!-- 6 -->\n",
                            "<g id=\"node7\" class=\"node\">\n",
                            "<title>6</title>\n",
                            "<polygon fill=\"#eb9f68\" stroke=\"#000000\" points=\"215.7555,-292 78.0814,-292 78.0814,-214 215.7555,-214 215.7555,-292\"/>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-276.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">shrapnel &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-262.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.308</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-248.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 42</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-234.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [34, 8]</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-220.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 5&#45;&gt;6 -->\n",
                            "<g id=\"edge6\" class=\"edge\">\n",
                            "<title>5&#45;&gt;6</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M249.2321,-327.7677C236.1674,-318.2204 222.0916,-307.9342 208.6852,-298.1373\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"210.5784,-295.1858 200.4394,-292.1115 206.4483,-300.8375 210.5784,-295.1858\"/>\n",
                            "</g>\n",
                            "<!-- 11 -->\n",
                            "<g id=\"node12\" class=\"node\">\n",
                            "<title>11</title>\n",
                            "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"371.7555,-285 234.0814,-285 234.0814,-221 371.7555,-221 371.7555,-285\"/>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-269.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-255.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 3</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-241.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [3, 0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-227.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 5&#45;&gt;11 -->\n",
                            "<g id=\"edge11\" class=\"edge\">\n",
                            "<title>5&#45;&gt;11</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M302.9185,-327.7677C302.9185,-317.3338 302.9185,-306.0174 302.9185,-295.4215\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"306.4186,-295.1252 302.9185,-285.1252 299.4186,-295.1252 306.4186,-295.1252\"/>\n",
                            "</g>\n",
                            "<!-- 7 -->\n",
                            "<g id=\"node8\" class=\"node\">\n",
                            "<title>7</title>\n",
                            "<polygon fill=\"#eba069\" stroke=\"#000000\" points=\"215.7555,-178 78.0814,-178 78.0814,-100 215.7555,-100 215.7555,-178\"/>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-162.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">explode &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-148.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.314</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-134.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 41</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-120.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [33, 8]</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-106.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 6&#45;&gt;7 -->\n",
                            "<g id=\"edge7\" class=\"edge\">\n",
                            "<title>6&#45;&gt;7</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M146.9185,-213.7677C146.9185,-205.6172 146.9185,-196.9283 146.9185,-188.4649\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"150.4186,-188.3046 146.9185,-178.3046 143.4186,-188.3047 150.4186,-188.3046\"/>\n",
                            "</g>\n",
                            "<!-- 10 -->\n",
                            "<g id=\"node11\" class=\"node\">\n",
                            "<title>10</title>\n",
                            "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"371.7555,-171 234.0814,-171 234.0814,-107 371.7555,-107 371.7555,-171\"/>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1, 0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 6&#45;&gt;10 -->\n",
                            "<g id=\"edge10\" class=\"edge\">\n",
                            "<title>6&#45;&gt;10</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M200.6048,-213.7677C216.8162,-201.9209 234.5845,-188.9364 250.6987,-177.1606\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"252.9488,-179.8513 258.9577,-171.1252 248.8187,-174.1995 252.9488,-179.8513\"/>\n",
                            "</g>\n",
                            "<!-- 8 -->\n",
                            "<g id=\"node9\" class=\"node\">\n",
                            "<title>8</title>\n",
                            "<polygon fill=\"#eca06a\" stroke=\"#000000\" points=\"137.7555,-64 .0814,-64 .0814,0 137.7555,0 137.7555,-64\"/>\n",
                            "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-48.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.32</text>\n",
                            "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 40</text>\n",
                            "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [32, 8]</text>\n",
                            "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 7&#45;&gt;8 -->\n",
                            "<g id=\"edge8\" class=\"edge\">\n",
                            "<title>7&#45;&gt;8</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M118.317,-99.7647C111.8591,-90.9057 104.9907,-81.4838 98.4936,-72.571\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"101.1227,-70.236 92.4037,-64.2169 95.4661,-74.3595 101.1227,-70.236\"/>\n",
                            "</g>\n",
                            "<!-- 9 -->\n",
                            "<g id=\"node10\" class=\"node\">\n",
                            "<title>9</title>\n",
                            "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"293.7555,-64 156.0814,-64 156.0814,0 293.7555,0 293.7555,-64\"/>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-48.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1</text>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1, 0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 7&#45;&gt;9 -->\n",
                            "<g id=\"edge9\" class=\"edge\">\n",
                            "<title>7&#45;&gt;9</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M175.5199,-99.7647C181.9778,-90.9057 188.8462,-81.4838 195.3433,-72.571\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"198.3708,-74.3595 201.4333,-64.2169 192.7142,-70.236 198.3708,-74.3595\"/>\n",
                            "</g>\n",
                            "<!-- 13 -->\n",
                            "<g id=\"node14\" class=\"node\">\n",
                            "<title>13</title>\n",
                            "<polygon fill=\"#eda877\" stroke=\"#000000\" points=\"527.7555,-406 390.0814,-406 390.0814,-328 527.7555,-328 527.7555,-406\"/>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-390.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">airbag &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-376.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.363</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-362.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 21</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-348.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [16, 5]</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-334.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 12&#45;&gt;13 -->\n",
                            "<g id=\"edge13\" class=\"edge\">\n",
                            "<title>12&#45;&gt;13</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M458.9185,-441.7677C458.9185,-433.6172 458.9185,-424.9283 458.9185,-416.4649\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"462.4186,-416.3046 458.9185,-406.3046 455.4186,-416.3047 462.4186,-416.3046\"/>\n",
                            "</g>\n",
                            "<!-- 18 -->\n",
                            "<g id=\"node19\" class=\"node\">\n",
                            "<title>18</title>\n",
                            "<polygon fill=\"#399de5\" stroke=\"#000000\" points=\"662.3666,-399 545.4703,-399 545.4703,-335 662.3666,-335 662.3666,-399\"/>\n",
                            "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-383.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-369.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1</text>\n",
                            "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-355.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0, 1]</text>\n",
                            "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-341.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = suspicious</text>\n",
                            "</g>\n",
                            "<!-- 12&#45;&gt;18 -->\n",
                            "<g id=\"edge18\" class=\"edge\">\n",
                            "<title>12&#45;&gt;18</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M508.8192,-441.7677C523.7493,-430.0296 540.1,-417.1745 554.9683,-405.485\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"557.3594,-408.0573 563.0575,-399.1252 553.0329,-402.5544 557.3594,-408.0573\"/>\n",
                            "</g>\n",
                            "<!-- 14 -->\n",
                            "<g id=\"node15\" class=\"node\">\n",
                            "<title>14</title>\n",
                            "<polygon fill=\"#fae6d7\" stroke=\"#000000\" points=\"527.7555,-285 390.0814,-285 390.0814,-221 527.7555,-221 527.7555,-285\"/>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-269.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.494</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-255.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 9</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-241.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [5, 4]</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-227.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 13&#45;&gt;14 -->\n",
                            "<g id=\"edge14\" class=\"edge\">\n",
                            "<title>13&#45;&gt;14</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M458.9185,-327.7677C458.9185,-317.3338 458.9185,-306.0174 458.9185,-295.4215\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"462.4186,-295.1252 458.9185,-285.1252 455.4186,-295.1252 462.4186,-295.1252\"/>\n",
                            "</g>\n",
                            "<!-- 15 -->\n",
                            "<g id=\"node16\" class=\"node\">\n",
                            "<title>15</title>\n",
                            "<polygon fill=\"#e78c4b\" stroke=\"#000000\" points=\"683.7555,-292 546.0814,-292 546.0814,-214 683.7555,-214 683.7555,-292\"/>\n",
                            "<text text-anchor=\"middle\" x=\"614.9185\" y=\"-276.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">failed &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"614.9185\" y=\"-262.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.153</text>\n",
                            "<text text-anchor=\"middle\" x=\"614.9185\" y=\"-248.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 12</text>\n",
                            "<text text-anchor=\"middle\" x=\"614.9185\" y=\"-234.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [11, 1]</text>\n",
                            "<text text-anchor=\"middle\" x=\"614.9185\" y=\"-220.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 13&#45;&gt;15 -->\n",
                            "<g id=\"edge15\" class=\"edge\">\n",
                            "<title>13&#45;&gt;15</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M512.6048,-327.7677C525.6695,-318.2204 539.7453,-307.9342 553.1517,-298.1373\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"555.3887,-300.8375 561.3975,-292.1115 551.2585,-295.1858 555.3887,-300.8375\"/>\n",
                            "</g>\n",
                            "<!-- 16 -->\n",
                            "<g id=\"node17\" class=\"node\">\n",
                            "<title>16</title>\n",
                            "<polygon fill=\"#e88e4d\" stroke=\"#000000\" points=\"605.7555,-171 468.0814,-171 468.0814,-107 605.7555,-107 605.7555,-171\"/>\n",
                            "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.165</text>\n",
                            "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 11</text>\n",
                            "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [10, 1]</text>\n",
                            "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 15&#45;&gt;16 -->\n",
                            "<g id=\"edge16\" class=\"edge\">\n",
                            "<title>15&#45;&gt;16</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M588.0753,-213.7677C580.4901,-202.6817 572.2233,-190.5994 564.5904,-179.4436\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"567.4343,-177.4019 558.8989,-171.1252 561.6572,-181.3547 567.4343,-177.4019\"/>\n",
                            "</g>\n",
                            "<!-- 17 -->\n",
                            "<g id=\"node18\" class=\"node\">\n",
                            "<title>17</title>\n",
                            "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"761.7555,-171 624.0814,-171 624.0814,-107 761.7555,-107 761.7555,-171\"/>\n",
                            "<text text-anchor=\"middle\" x=\"692.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"692.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1</text>\n",
                            "<text text-anchor=\"middle\" x=\"692.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1, 0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"692.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 15&#45;&gt;17 -->\n",
                            "<g id=\"edge17\" class=\"edge\">\n",
                            "<title>15&#45;&gt;17</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M641.7616,-213.7677C649.3468,-202.6817 657.6136,-190.5994 665.2465,-179.4436\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"668.1798,-181.3547 670.9381,-171.1252 662.4026,-177.4019 668.1798,-181.3547\"/>\n",
                            "</g>\n",
                            "</g>\n",
                            "</svg>\n"
                        ],
                        "text/plain": [
                            "<graphviz.files.Source at 0x11e68e898>"
                        ]
                    },
                    "execution_count": 13,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "from sklearn import tree\n",
                "import graphviz\n",
                "\n",
                "label_names = ['not suspicious', 'suspicious']\n",
                "feature_names = X.columns\n",
                "\n",
                "dot_data = tree.export_graphviz(clf,\n",
                "                    feature_names=feature_names,  \n",
                "                    filled=True,\n",
                "                    class_names=label_names)  \n",
                "graph = graphviz.Source(dot_data)  \n",
                "graph"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "You can also also see the tree with `eli5`, I just suppressed it because I thought we could use a little color."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 15,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "\n",
                            "    <style>\n",
                            "    table.eli5-weights tr:hover {\n",
                            "        filter: brightness(85%);\n",
                            "    }\n",
                            "</style>\n",
                            "\n",
                            "\n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "        <table class=\"eli5-weights eli5-feature-importances\" style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto;\">\n",
                            "    <thead>\n",
                            "    <tr style=\"border: none;\">\n",
                            "        <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">Weight</th>\n",
                            "        <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
                            "    </tr>\n",
                            "    </thead>\n",
                            "    <tbody>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.3440\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                airbag\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 86.19%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.2026\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                violent\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 88.66%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.1529\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                air bag\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 89.10%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.1445\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                explode\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 90.40%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.1205\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                did not deploy\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 96.67%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.0266\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                failed\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 98.43%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.0091\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                shrapnel\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "    \n",
                            "    </tbody>\n",
                            "</table>\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "        \n",
                            "        <br>\n",
                            "        <pre><svg width=\"853pt\" height=\"870pt\"\n",
                            " viewBox=\"0.00 0.00 852.84 870.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
                            "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 866)\">\n",
                            "<title>Tree</title>\n",
                            "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-866 848.8369,-866 848.8369,4 -4,4\"/>\n",
                            "<!-- 0 -->\n",
                            "<g id=\"node1\" class=\"node\">\n",
                            "<title>0</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"679.9563,-862 539.8806,-862 539.8806,-784 679.9563,-784 679.9563,-862\"/>\n",
                            "<text text-anchor=\"middle\" x=\"609.9185\" y=\"-846.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">violent &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"609.9185\" y=\"-832.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.165</text>\n",
                            "<text text-anchor=\"middle\" x=\"609.9185\" y=\"-818.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 100.0%</text>\n",
                            "<text text-anchor=\"middle\" x=\"609.9185\" y=\"-804.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.909, 0.091]</text>\n",
                            "<text text-anchor=\"middle\" x=\"609.9185\" y=\"-790.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 1 -->\n",
                            "<g id=\"node2\" class=\"node\">\n",
                            "<title>1</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"606.9563,-748 466.8806,-748 466.8806,-670 606.9563,-670 606.9563,-748\"/>\n",
                            "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-732.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">did not deploy &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-718.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.156</text>\n",
                            "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-704.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 99.4%</text>\n",
                            "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-690.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.915, 0.085]</text>\n",
                            "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-676.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 0&#45;&gt;1 -->\n",
                            "<g id=\"edge1\" class=\"edge\">\n",
                            "<title>0&#45;&gt;1</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M584.796,-783.7677C579.2327,-775.0798 573.2777,-765.7801 567.5234,-756.794\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"570.4274,-754.8386 562.0872,-748.3046 564.5324,-758.6135 570.4274,-754.8386\"/>\n",
                            "<text text-anchor=\"middle\" x=\"556.7331\" y=\"-768.5246\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">True</text>\n",
                            "</g>\n",
                            "<!-- 20 -->\n",
                            "<g id=\"node21\" class=\"node\">\n",
                            "<title>20</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"741.3666,-741 624.4703,-741 624.4703,-677 741.3666,-677 741.3666,-741\"/>\n",
                            "<text text-anchor=\"middle\" x=\"682.9185\" y=\"-725.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"682.9185\" y=\"-711.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 0.6%</text>\n",
                            "<text text-anchor=\"middle\" x=\"682.9185\" y=\"-697.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.0, 1.0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"682.9185\" y=\"-683.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = suspicious</text>\n",
                            "</g>\n",
                            "<!-- 0&#45;&gt;20 -->\n",
                            "<g id=\"edge20\" class=\"edge\">\n",
                            "<title>0&#45;&gt;20</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M635.0409,-783.7677C642.0702,-772.7904 649.7251,-760.8362 656.8101,-749.772\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"659.9019,-751.434 662.3471,-741.1252 654.0069,-747.6591 659.9019,-751.434\"/>\n",
                            "<text text-anchor=\"middle\" x=\"667.7012\" y=\"-761.3451\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">False</text>\n",
                            "</g>\n",
                            "<!-- 2 -->\n",
                            "<g id=\"node3\" class=\"node\">\n",
                            "<title>2</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"528.9563,-634 388.8806,-634 388.8806,-556 528.9563,-556 528.9563,-634\"/>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-618.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">air bag &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-604.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.212</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-590.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 70.3%</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-576.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.879, 0.121]</text>\n",
                            "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-562.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 1&#45;&gt;2 -->\n",
                            "<g id=\"edge2\" class=\"edge\">\n",
                            "<title>1&#45;&gt;2</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M510.0753,-669.7677C504.131,-661.0798 497.768,-651.7801 491.6196,-642.794\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"494.3466,-640.5813 485.8111,-634.3046 488.5694,-644.5341 494.3466,-640.5813\"/>\n",
                            "</g>\n",
                            "<!-- 19 -->\n",
                            "<g id=\"node20\" class=\"node\">\n",
                            "<title>19</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"684.7555,-627 547.0814,-627 547.0814,-563 684.7555,-563 684.7555,-627\"/>\n",
                            "<text text-anchor=\"middle\" x=\"615.9185\" y=\"-611.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"615.9185\" y=\"-597.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 29.1%</text>\n",
                            "<text text-anchor=\"middle\" x=\"615.9185\" y=\"-583.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1.0, 0.0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"615.9185\" y=\"-569.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 1&#45;&gt;19 -->\n",
                            "<g id=\"edge19\" class=\"edge\">\n",
                            "<title>1&#45;&gt;19</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M564.1058,-669.7677C571.7882,-658.6817 580.161,-646.5994 587.8917,-635.4436\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"590.8371,-637.3381 593.6563,-627.1252 585.0836,-633.351 590.8371,-637.3381\"/>\n",
                            "</g>\n",
                            "<!-- 3 -->\n",
                            "<g id=\"node4\" class=\"node\">\n",
                            "<title>3</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"449.9563,-520 309.8806,-520 309.8806,-442 449.9563,-442 449.9563,-520\"/>\n",
                            "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-504.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">airbag &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-490.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.156</text>\n",
                            "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-476.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 57.0%</text>\n",
                            "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-462.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.915, 0.085]</text>\n",
                            "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-448.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 2&#45;&gt;3 -->\n",
                            "<g id=\"edge3\" class=\"edge\">\n",
                            "<title>2&#45;&gt;3</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M431.7311,-555.7677C425.7106,-547.0798 419.2661,-537.7801 413.0388,-528.794\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"415.7285,-526.5304 407.1559,-520.3046 409.975,-530.5176 415.7285,-526.5304\"/>\n",
                            "</g>\n",
                            "<!-- 12 -->\n",
                            "<g id=\"node13\" class=\"node\">\n",
                            "<title>12</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"608.9563,-520 468.8806,-520 468.8806,-442 608.9563,-442 608.9563,-520\"/>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-504.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">explode &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-490.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.397</text>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-476.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 13.3%</text>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-462.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.727, 0.273]</text>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-448.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 2&#45;&gt;12 -->\n",
                            "<g id=\"edge12\" class=\"edge\">\n",
                            "<title>2&#45;&gt;12</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M486.4499,-555.7677C492.6095,-546.9903 499.2074,-537.5883 505.5738,-528.5161\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"508.4569,-530.5007 511.3362,-520.3046 502.727,-526.4797 508.4569,-530.5007\"/>\n",
                            "</g>\n",
                            "<!-- 4 -->\n",
                            "<g id=\"node5\" class=\"node\">\n",
                            "<title>4</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"291.7555,-399 154.0814,-399 154.0814,-335 291.7555,-335 291.7555,-399\"/>\n",
                            "<text text-anchor=\"middle\" x=\"222.9185\" y=\"-383.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"222.9185\" y=\"-369.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 29.7%</text>\n",
                            "<text text-anchor=\"middle\" x=\"222.9185\" y=\"-355.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1.0, 0.0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"222.9185\" y=\"-341.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 3&#45;&gt;4 -->\n",
                            "<g id=\"edge4\" class=\"edge\">\n",
                            "<title>3&#45;&gt;4</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M325.888,-441.7677C309.5727,-429.9209 291.6905,-416.9364 275.4729,-405.1606\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"277.3094,-402.1687 267.1611,-399.1252 273.1964,-407.833 277.3094,-402.1687\"/>\n",
                            "</g>\n",
                            "<!-- 5 -->\n",
                            "<g id=\"node6\" class=\"node\">\n",
                            "<title>5</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"449.9563,-406 309.8806,-406 309.8806,-328 449.9563,-328 449.9563,-406\"/>\n",
                            "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-390.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">failed &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-376.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.292</text>\n",
                            "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-362.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 27.3%</text>\n",
                            "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-348.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.822, 0.178]</text>\n",
                            "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-334.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 3&#45;&gt;5 -->\n",
                            "<g id=\"edge5\" class=\"edge\">\n",
                            "<title>3&#45;&gt;5</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M379.9185,-441.7677C379.9185,-433.6172 379.9185,-424.9283 379.9185,-416.4649\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"383.4186,-416.3046 379.9185,-406.3046 376.4186,-416.3047 383.4186,-416.3046\"/>\n",
                            "</g>\n",
                            "<!-- 6 -->\n",
                            "<g id=\"node7\" class=\"node\">\n",
                            "<title>6</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"293.7555,-292 156.0814,-292 156.0814,-214 293.7555,-214 293.7555,-292\"/>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-276.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">shrapnel &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-262.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.308</text>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-248.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 25.5%</text>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-234.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.81, 0.19]</text>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-220.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 5&#45;&gt;6 -->\n",
                            "<g id=\"edge6\" class=\"edge\">\n",
                            "<title>5&#45;&gt;6</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M326.5762,-327.7677C313.5953,-318.2204 299.6097,-307.9342 286.2893,-298.1373\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"288.2258,-295.2169 278.0963,-292.1115 284.0784,-300.8559 288.2258,-295.2169\"/>\n",
                            "</g>\n",
                            "<!-- 11 -->\n",
                            "<g id=\"node12\" class=\"node\">\n",
                            "<title>11</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"449.7555,-285 312.0814,-285 312.0814,-221 449.7555,-221 449.7555,-285\"/>\n",
                            "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-269.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-255.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1.8%</text>\n",
                            "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-241.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1.0, 0.0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-227.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 5&#45;&gt;11 -->\n",
                            "<g id=\"edge11\" class=\"edge\">\n",
                            "<title>5&#45;&gt;11</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M380.2626,-327.7677C380.3541,-317.3338 380.4534,-306.0174 380.5463,-295.4215\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"384.0487,-295.1555 380.6367,-285.1252 377.049,-295.0941 384.0487,-295.1555\"/>\n",
                            "</g>\n",
                            "<!-- 7 -->\n",
                            "<g id=\"node8\" class=\"node\">\n",
                            "<title>7</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"216.9563,-178 76.8806,-178 76.8806,-100 216.9563,-100 216.9563,-178\"/>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-162.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">explode &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-148.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.314</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-134.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 24.8%</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-120.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.805, 0.195]</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-106.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 6&#45;&gt;7 -->\n",
                            "<g id=\"edge7\" class=\"edge\">\n",
                            "<title>6&#45;&gt;7</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M198.0753,-213.7677C192.131,-205.0798 185.768,-195.7801 179.6196,-186.794\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"182.3466,-184.5813 173.8111,-178.3046 176.5694,-188.5341 182.3466,-184.5813\"/>\n",
                            "</g>\n",
                            "<!-- 10 -->\n",
                            "<g id=\"node11\" class=\"node\">\n",
                            "<title>10</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"372.7555,-171 235.0814,-171 235.0814,-107 372.7555,-107 372.7555,-171\"/>\n",
                            "<text text-anchor=\"middle\" x=\"303.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"303.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 0.6%</text>\n",
                            "<text text-anchor=\"middle\" x=\"303.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1.0, 0.0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"303.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 6&#45;&gt;10 -->\n",
                            "<g id=\"edge10\" class=\"edge\">\n",
                            "<title>6&#45;&gt;10</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M252.1058,-213.7677C259.7882,-202.6817 268.161,-190.5994 275.8917,-179.4436\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"278.8371,-181.3381 281.6563,-171.1252 273.0836,-177.351 278.8371,-181.3381\"/>\n",
                            "</g>\n",
                            "<!-- 8 -->\n",
                            "<g id=\"node9\" class=\"node\">\n",
                            "<title>8</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"137.7555,-64 .0814,-64 .0814,0 137.7555,0 137.7555,-64\"/>\n",
                            "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-48.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.32</text>\n",
                            "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 24.2%</text>\n",
                            "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.8, 0.2]</text>\n",
                            "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 7&#45;&gt;8 -->\n",
                            "<g id=\"edge8\" class=\"edge\">\n",
                            "<title>7&#45;&gt;8</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M118.317,-99.7647C111.8591,-90.9057 104.9907,-81.4838 98.4936,-72.571\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"101.1227,-70.236 92.4037,-64.2169 95.4661,-74.3595 101.1227,-70.236\"/>\n",
                            "</g>\n",
                            "<!-- 9 -->\n",
                            "<g id=\"node10\" class=\"node\">\n",
                            "<title>9</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"293.7555,-64 156.0814,-64 156.0814,0 293.7555,0 293.7555,-64\"/>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-48.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 0.6%</text>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1.0, 0.0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 7&#45;&gt;9 -->\n",
                            "<g id=\"edge9\" class=\"edge\">\n",
                            "<title>7&#45;&gt;9</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M175.5199,-99.7647C181.9778,-90.9057 188.8462,-81.4838 195.3433,-72.571\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"198.3708,-74.3595 201.4333,-64.2169 192.7142,-70.236 198.3708,-74.3595\"/>\n",
                            "</g>\n",
                            "<!-- 13 -->\n",
                            "<g id=\"node14\" class=\"node\">\n",
                            "<title>13</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"608.9563,-406 468.8806,-406 468.8806,-328 608.9563,-328 608.9563,-406\"/>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-390.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">airbag &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-376.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.363</text>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-362.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 12.7%</text>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-348.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.762, 0.238]</text>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-334.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 12&#45;&gt;13 -->\n",
                            "<g id=\"edge13\" class=\"edge\">\n",
                            "<title>12&#45;&gt;13</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M538.9185,-441.7677C538.9185,-433.6172 538.9185,-424.9283 538.9185,-416.4649\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"542.4186,-416.3046 538.9185,-406.3046 535.4186,-416.3047 542.4186,-416.3046\"/>\n",
                            "</g>\n",
                            "<!-- 18 -->\n",
                            "<g id=\"node19\" class=\"node\">\n",
                            "<title>18</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"743.3666,-399 626.4703,-399 626.4703,-335 743.3666,-335 743.3666,-399\"/>\n",
                            "<text text-anchor=\"middle\" x=\"684.9185\" y=\"-383.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"684.9185\" y=\"-369.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 0.6%</text>\n",
                            "<text text-anchor=\"middle\" x=\"684.9185\" y=\"-355.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.0, 1.0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"684.9185\" y=\"-341.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = suspicious</text>\n",
                            "</g>\n",
                            "<!-- 12&#45;&gt;18 -->\n",
                            "<g id=\"edge18\" class=\"edge\">\n",
                            "<title>12&#45;&gt;18</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M589.1634,-441.7677C604.1964,-430.0296 620.6598,-417.1745 635.6307,-405.485\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"638.0478,-408.0382 643.7757,-399.1252 633.7397,-402.5209 638.0478,-408.0382\"/>\n",
                            "</g>\n",
                            "<!-- 14 -->\n",
                            "<g id=\"node15\" class=\"node\">\n",
                            "<title>14</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"608.9563,-285 468.8806,-285 468.8806,-221 608.9563,-221 608.9563,-285\"/>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-269.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.494</text>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-255.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 5.5%</text>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-241.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.556, 0.444]</text>\n",
                            "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-227.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 13&#45;&gt;14 -->\n",
                            "<g id=\"edge14\" class=\"edge\">\n",
                            "<title>13&#45;&gt;14</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M538.9185,-327.7677C538.9185,-317.3338 538.9185,-306.0174 538.9185,-295.4215\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"542.4186,-295.1252 538.9185,-285.1252 535.4186,-295.1252 542.4186,-295.1252\"/>\n",
                            "</g>\n",
                            "<!-- 15 -->\n",
                            "<g id=\"node16\" class=\"node\">\n",
                            "<title>15</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"767.9563,-292 627.8806,-292 627.8806,-214 767.9563,-214 767.9563,-292\"/>\n",
                            "<text text-anchor=\"middle\" x=\"697.9185\" y=\"-276.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">failed &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"697.9185\" y=\"-262.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.153</text>\n",
                            "<text text-anchor=\"middle\" x=\"697.9185\" y=\"-248.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 7.3%</text>\n",
                            "<text text-anchor=\"middle\" x=\"697.9185\" y=\"-234.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.917, 0.083]</text>\n",
                            "<text text-anchor=\"middle\" x=\"697.9185\" y=\"-220.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 13&#45;&gt;15 -->\n",
                            "<g id=\"edge15\" class=\"edge\">\n",
                            "<title>13&#45;&gt;15</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M593.6372,-327.7677C606.9532,-318.2204 621.2997,-307.9342 634.9639,-298.1373\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"637.2807,-300.7828 643.3683,-292.1115 633.2019,-295.0939 637.2807,-300.7828\"/>\n",
                            "</g>\n",
                            "<!-- 16 -->\n",
                            "<g id=\"node17\" class=\"node\">\n",
                            "<title>16</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"688.9563,-171 548.8806,-171 548.8806,-107 688.9563,-107 688.9563,-171\"/>\n",
                            "<text text-anchor=\"middle\" x=\"618.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.165</text>\n",
                            "<text text-anchor=\"middle\" x=\"618.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 6.7%</text>\n",
                            "<text text-anchor=\"middle\" x=\"618.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.909, 0.091]</text>\n",
                            "<text text-anchor=\"middle\" x=\"618.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 15&#45;&gt;16 -->\n",
                            "<g id=\"edge16\" class=\"edge\">\n",
                            "<title>15&#45;&gt;16</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M670.7311,-213.7677C663.0488,-202.6817 654.676,-190.5994 646.9452,-179.4436\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"649.7533,-177.351 641.1807,-171.1252 643.9998,-181.3381 649.7533,-177.351\"/>\n",
                            "</g>\n",
                            "<!-- 17 -->\n",
                            "<g id=\"node18\" class=\"node\">\n",
                            "<title>17</title>\n",
                            "<polygon fill=\"none\" stroke=\"#000000\" points=\"844.7555,-171 707.0814,-171 707.0814,-107 844.7555,-107 844.7555,-171\"/>\n",
                            "<text text-anchor=\"middle\" x=\"775.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"775.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 0.6%</text>\n",
                            "<text text-anchor=\"middle\" x=\"775.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1.0, 0.0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"775.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 15&#45;&gt;17 -->\n",
                            "<g id=\"edge17\" class=\"edge\">\n",
                            "<title>15&#45;&gt;17</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M724.7616,-213.7677C732.3468,-202.6817 740.6136,-190.5994 748.2465,-179.4436\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"751.1798,-181.3547 753.9381,-171.1252 745.4026,-177.4019 751.1798,-181.3547\"/>\n",
                            "</g>\n",
                            "</g>\n",
                            "</svg>\n",
                            "</pre>\n",
                            "    \n",
                            "\n",
                            "\n",
                            "\n"
                        ],
                        "text/plain": [
                            "<IPython.core.display.HTML object>"
                        ]
                    },
                    "execution_count": 15,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "feature_names = list(X.columns)\n",
                "\n",
                "eli5.show_weights(clf,\n",
                "                  feature_names=feature_names,\n",
                "                  target_names=label_names)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "And the best part is: almost everything you can do with a logistic regression classifier you can do with a decision tree. Most of the time you can just **change your classifier to see if it does better.**\n",
                "\n",
                "Decision trees also have a lot of simple options."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 16,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,\n",
                            "                       max_features=None, max_leaf_nodes=None,\n",
                            "                       min_impurity_decrease=0.0, min_impurity_split=None,\n",
                            "                       min_samples_leaf=1, min_samples_split=2,\n",
                            "                       min_weight_fraction_leaf=0.0, presort=False,\n",
                            "                       random_state=None, splitter='best')"
                        ]
                    },
                    "execution_count": 16,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "from sklearn.tree import DecisionTreeClassifier\n",
                "\n",
                "X = train_df.drop(columns='is_suspicious')\n",
                "y = train_df.is_suspicious\n",
                "\n",
                "clf = DecisionTreeClassifier(max_depth=2)\n",
                "\n",
                "clf.fit(X, y)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 17,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "image/svg+xml": [
                            "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
                            "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
                            " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
                            "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
                            " -->\n",
                            "<!-- Title: Tree Pages: 1 -->\n",
                            "<svg width=\"358pt\" height=\"300pt\"\n",
                            " viewBox=\"0.00 0.00 358.14 300.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
                            "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 296)\">\n",
                            "<title>Tree</title>\n",
                            "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-296 354.1421,-296 354.1421,4 -4,4\"/>\n",
                            "<!-- 0 -->\n",
                            "<g id=\"node1\" class=\"node\">\n",
                            "<title>0</title>\n",
                            "<polygon fill=\"#e88e4d\" stroke=\"#000000\" points=\"287.7555,-292 150.0814,-292 150.0814,-214 287.7555,-214 287.7555,-292\"/>\n",
                            "<text text-anchor=\"middle\" x=\"218.9185\" y=\"-276.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">violent &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"218.9185\" y=\"-262.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.165</text>\n",
                            "<text text-anchor=\"middle\" x=\"218.9185\" y=\"-248.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 165</text>\n",
                            "<text text-anchor=\"middle\" x=\"218.9185\" y=\"-234.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [150, 15]</text>\n",
                            "<text text-anchor=\"middle\" x=\"218.9185\" y=\"-220.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 1 -->\n",
                            "<g id=\"node2\" class=\"node\">\n",
                            "<title>1</title>\n",
                            "<polygon fill=\"#e78d4b\" stroke=\"#000000\" points=\"215.7555,-178 78.0814,-178 78.0814,-100 215.7555,-100 215.7555,-178\"/>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-162.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">did not deploy &lt;= 0.5</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-148.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.156</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-134.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 164</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-120.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [150, 14]</text>\n",
                            "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-106.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 0&#45;&gt;1 -->\n",
                            "<g id=\"edge1\" class=\"edge\">\n",
                            "<title>0&#45;&gt;1</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M194.1401,-213.7677C188.6531,-205.0798 182.7796,-195.7801 177.1041,-186.794\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"180.0416,-184.8906 171.7424,-178.3046 174.1232,-188.6285 180.0416,-184.8906\"/>\n",
                            "<text text-anchor=\"middle\" x=\"166.2359\" y=\"-198.4907\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">True</text>\n",
                            "</g>\n",
                            "<!-- 4 -->\n",
                            "<g id=\"node5\" class=\"node\">\n",
                            "<title>4</title>\n",
                            "<polygon fill=\"#399de5\" stroke=\"#000000\" points=\"350.3666,-171 233.4703,-171 233.4703,-107 350.3666,-107 350.3666,-171\"/>\n",
                            "<text text-anchor=\"middle\" x=\"291.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"291.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1</text>\n",
                            "<text text-anchor=\"middle\" x=\"291.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0, 1]</text>\n",
                            "<text text-anchor=\"middle\" x=\"291.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = suspicious</text>\n",
                            "</g>\n",
                            "<!-- 0&#45;&gt;4 -->\n",
                            "<g id=\"edge4\" class=\"edge\">\n",
                            "<title>0&#45;&gt;4</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M244.0409,-213.7677C251.0702,-202.7904 258.7251,-190.8362 265.8101,-179.772\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"268.9019,-181.434 271.3471,-171.1252 263.0069,-177.6591 268.9019,-181.434\"/>\n",
                            "<text text-anchor=\"middle\" x=\"276.7012\" y=\"-191.3451\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">False</text>\n",
                            "</g>\n",
                            "<!-- 2 -->\n",
                            "<g id=\"node3\" class=\"node\">\n",
                            "<title>2</title>\n",
                            "<polygon fill=\"#e99254\" stroke=\"#000000\" points=\"137.7555,-64 .0814,-64 .0814,0 137.7555,0 137.7555,-64\"/>\n",
                            "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-48.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.212</text>\n",
                            "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 116</text>\n",
                            "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [102, 14]</text>\n",
                            "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 1&#45;&gt;2 -->\n",
                            "<g id=\"edge2\" class=\"edge\">\n",
                            "<title>1&#45;&gt;2</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M118.317,-99.7647C111.8591,-90.9057 104.9907,-81.4838 98.4936,-72.571\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"101.1227,-70.236 92.4037,-64.2169 95.4661,-74.3595 101.1227,-70.236\"/>\n",
                            "</g>\n",
                            "<!-- 3 -->\n",
                            "<g id=\"node4\" class=\"node\">\n",
                            "<title>3</title>\n",
                            "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"293.7555,-64 156.0814,-64 156.0814,0 293.7555,0 293.7555,-64\"/>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-48.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 48</text>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [48, 0]</text>\n",
                            "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n",
                            "</g>\n",
                            "<!-- 1&#45;&gt;3 -->\n",
                            "<g id=\"edge3\" class=\"edge\">\n",
                            "<title>1&#45;&gt;3</title>\n",
                            "<path fill=\"none\" stroke=\"#000000\" d=\"M175.5199,-99.7647C181.9778,-90.9057 188.8462,-81.4838 195.3433,-72.571\"/>\n",
                            "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"198.3708,-74.3595 201.4333,-64.2169 192.7142,-70.236 198.3708,-74.3595\"/>\n",
                            "</g>\n",
                            "</g>\n",
                            "</svg>\n"
                        ],
                        "text/plain": [
                            "<graphviz.files.Source at 0x11e6d9898>"
                        ]
                    },
                    "execution_count": 17,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "from sklearn import tree\n",
                "import graphviz\n",
                "\n",
                "label_names = ['not suspicious', 'suspicious']\n",
                "feature_names = X.columns\n",
                "\n",
                "dot_data = tree.export_graphviz(clf,\n",
                "                    feature_names=feature_names,  \n",
                "                    filled=True,\n",
                "                    class_names=label_names)  \n",
                "graph = graphviz.Source(dot_data)  \n",
                "graph"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "### A random forest is usually even better\n",
                "\n",
                "Although in this case our inputs are terrible so it's still not very good. Garbage in, garbage out.\n",
                "\n",
                "We'l change our classifier to be\n",
                "\n",
                "```python\n",
                "clf = RandomForestClassifier(n_estimators=100)\n",
                "```\n",
                "\n",
                "and it will use 100 decision **trees** to make a **forest**."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 18,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n",
                            "                       max_depth=None, max_features='auto', max_leaf_nodes=None,\n",
                            "                       min_impurity_decrease=0.0, min_impurity_split=None,\n",
                            "                       min_samples_leaf=1, min_samples_split=2,\n",
                            "                       min_weight_fraction_leaf=0.0, n_estimators=100,\n",
                            "                       n_jobs=None, oob_score=False, random_state=None,\n",
                            "                       verbose=0, warm_start=False)"
                        ]
                    },
                    "execution_count": 18,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "from sklearn.ensemble import RandomForestClassifier\n",
                "\n",
                "X = train_df.drop(columns='is_suspicious')\n",
                "y = train_df.is_suspicious\n",
                "\n",
                "clf = RandomForestClassifier(n_estimators=100)\n",
                "\n",
                "clf.fit(X, y)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 19,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Predicted not suspicious</th>\n",
                            "      <th>Predicted suspicious</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>Is not suspicious</th>\n",
                            "      <td>150</td>\n",
                            "      <td>0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>Is suspicious</th>\n",
                            "      <td>13</td>\n",
                            "      <td>2</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "                   Predicted not suspicious  Predicted suspicious\n",
                            "Is not suspicious                       150                     0\n",
                            "Is suspicious                            13                     2"
                        ]
                    },
                    "execution_count": 19,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "from sklearn.metrics import confusion_matrix\n",
                "\n",
                "y_true = y\n",
                "y_pred = clf.predict(X)\n",
                "\n",
                "matrix = confusion_matrix(y_true, y_pred)\n",
                "\n",
                "label_names = pd.Series(['not suspicious', 'suspicious'])\n",
                "pd.DataFrame(matrix,\n",
                "     columns='Predicted ' + label_names,\n",
                "     index='Is ' + label_names)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 20,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "\n",
                            "    <style>\n",
                            "    table.eli5-weights tr:hover {\n",
                            "        filter: brightness(85%);\n",
                            "    }\n",
                            "</style>\n",
                            "\n",
                            "\n",
                            "\n",
                            "    \n",
                            "        <p>Explained as: feature importances</p>\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "        \n",
                            "        <pre>\n",
                            "Random forest feature importances; values are numbers 0 <= x <= 1;\n",
                            "all values sum to 1.\n",
                            "</pre>\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "        <table class=\"eli5-weights eli5-feature-importances\" style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto;\">\n",
                            "    <thead>\n",
                            "    <tr style=\"border: none;\">\n",
                            "        <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">Weight</th>\n",
                            "        <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
                            "    </tr>\n",
                            "    </thead>\n",
                            "    <tbody>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.2394\n",
                            "                \n",
                            "                    &plusmn; 0.3216\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                did not deploy\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 81.01%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.2223\n",
                            "                \n",
                            "                    &plusmn; 0.3465\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                airbag\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 81.75%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.2101\n",
                            "                \n",
                            "                    &plusmn; 0.4087\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                air bag\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 85.29%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.1543\n",
                            "                \n",
                            "                    &plusmn; 0.2972\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                violent\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 86.85%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.1314\n",
                            "                \n",
                            "                    &plusmn; 0.2854\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                explode\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 94.59%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.0369\n",
                            "                \n",
                            "                    &plusmn; 0.0527\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                failed\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "        <tr style=\"background-color: hsl(120, 100.00%, 98.54%); border: none;\">\n",
                            "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
                            "                0.0057\n",
                            "                \n",
                            "                    &plusmn; 0.0190\n",
                            "                \n",
                            "            </td>\n",
                            "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
                            "                shrapnel\n",
                            "            </td>\n",
                            "        </tr>\n",
                            "    \n",
                            "    \n",
                            "    </tbody>\n",
                            "</table>\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "    \n",
                            "\n",
                            "\n",
                            "\n"
                        ],
                        "text/plain": [
                            "<IPython.core.display.HTML object>"
                        ]
                    },
                    "execution_count": 20,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "feature_names = list(X.columns)\n",
                "\n",
                "eli5.show_weights(clf, feature_names=feature_names, show=eli5.formatters.fields.ALL)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Review\n",
                "\n",
                "In our previous two attempts to tackle Takata airbag investigation, we used a **logistic regression classifier**. This time we're trying a new type called a **random forest**, which performed _slightly_ better (although it could have just been chance).\n",
                "\n",
                "Despite this slight improvement, its predictions were still very off."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Discussion topics\n",
                "\n",
                "What's wrong here? Why does nothing work for us, even though we keep throwing more machine learning tools at it?"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": []
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.6.8"
        },
        "toc": {
            "base_numbering": 1,
            "nav_menu": {},
            "number_sections": true,
            "sideBar": true,
            "skip_h1_title": false,
            "title_cell": "Table of Contents",
            "title_sidebar": "Contents",
            "toc_cell": false,
            "toc_position": {},
            "toc_section_display": true,
            "toc_window_display": false
        }
    },
    "nbformat": 4,
    "nbformat_minor": 2
}