{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Finding faulty airbags in a sea of consumer complaints with a decision tree\n", "\n", "**The story:**\n", "- https://www.nytimes.com/2014/09/12/business/air-bag-flaw-long-known-led-to-recalls.html\n", "- https://www.nytimes.com/2014/11/07/business/airbag-maker-takata-is-said-to-have-conducted-secret-tests.html\n", "- https://www.nytimes.com/interactive/2015/06/22/business/international/takata-airbag-recall-list.html\n", "- https://www.nytimes.com/2016/08/27/business/takata-airbag-recall-crisis.html\n", "\n", "This story, done by The New York Times, investigates the content in complaints made to National Highway Traffic Safety Administration (NHTSA) by customers who had bad experiences with Takata airbags in their cars. Eventually, car companies had to recall airbags made by the airbag supplier that promised a cheaper alternative. \n", "\n", "**Author:** Daeil Kim did a more complex version of this particular analysis - [presentation here](https://www.slideshare.net/mortardata/daeil-kim-at-the-nyc-data-science-meetup)\n", "\n", "**Topics:** Decision Trees, Random Forests\n", "\n", "**Datasets**\n", "\n", "* **sampled-labeled.csv:** a sample of vehicle complaints, labeled with being suspicious or not\n", "\n", "## What's the goal?\n", "\n", "It was too much work to read twenty years of vehicle comments to find the ones related to dangerous airbags! Because we're lazy, we wanted the computer to do this for us. We did this before with a classifier that used logistic regression, now we're going to try a different one." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<p class=\"reading-options\">\n <a class=\"btn\" href=\"/nyt-takata-airbags/airbag-classifier-search-decision-tree\">\n <i class=\"fa fa-sm fa-book\"></i>\n Read online\n </a>\n <a class=\"btn\" href=\"/nyt-takata-airbags/notebooks/Airbag classifier search (Decision Tree).ipynb\">\n <i class=\"fa fa-sm fa-download\"></i>\n Download notebook\n </a>\n <a class=\"btn\" href=\"https://colab.research.google.com/github/littlecolumns/ds4j-notebooks/blob/master/nyt-takata-airbags/notebooks/Airbag classifier search (Decision Tree).ipynb\" target=\"_new\">\n <i class=\"fa fa-sm fa-laptop\"></i>\n Interactive version\n </a>\n</p>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prep work: Downloading necessary files\n", "Before we get started, we need to download all of the data we'll be using.\n", "* **sampled-labeled.csv:** labeled complaints - a sample of vehicle complaints, labeled with being suspicious or not\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "# Make data directory if it doesn't exist\n", "!mkdir -p data\n", "!wget -nc https://nyc3.digitaloceanspaces.com/ml-files-distro/v1/nyt-takata-airbags/data/sampled-labeled.csv -P data" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Our code\n", "\n", "## Setup" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "# Allow us to display 100 columns at a time, and 100 characters in each column (instead of ...)\n", "pd.set_option(\"display.max_columns\", 100)\n", "pd.set_option(\"display.max_colwidth\", 100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read in our labeled data\n", "\n", "We aren't going to be using the unlabeled dataset this time, we're only going to look at **how our classifier works.** We'll start by reading in our complaints that have labeled attached to them.\n", "\n", "**Read in `sampled-labeled.csv` and check how many suspicious/not suspicious complaints we have.**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>is_suspicious</th>\n", " <th>CDESCR</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>0.0</td>\n", " <td>ALTHOUGH I LOVED THE CAR OVERALL AT THE TIME I DECIDED TO OWN, , MY DREAM CAR CADILLAC CTS HAS T...</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>0.0</td>\n", " <td>CONSUMER SHUT SLIDING DOOR WHEN ALL POWER LOCKS ON ALL DOORS LOCKED BY ITSELF, TRAPPING INFANT I...</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>0.0</td>\n", " <td>DRIVERS SEAT BACK COLLAPSED AND BENT WHEN REAR ENDED. PLEASE DESCRIBE DETAILS. TT</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>0.0</td>\n", " <td>TL* THE CONTACT OWNS A 2009 NISSAN ALTIMA. THE CONTACT STATED THAT THE START BUTTON FOR THE IGNI...</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>0.0</td>\n", " <td>THE FRONT MIDDLE SEAT DOESN'T LOCK IN PLACE. *AK</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " is_suspicious \\\n", "0 0.0 \n", "1 0.0 \n", "2 0.0 \n", "3 0.0 \n", "4 0.0 \n", "\n", " CDESCR \n", "0 ALTHOUGH I LOVED THE CAR OVERALL AT THE TIME I DECIDED TO OWN, , MY DREAM CAR CADILLAC CTS HAS T... \n", "1 CONSUMER SHUT SLIDING DOOR WHEN ALL POWER LOCKS ON ALL DOORS LOCKED BY ITSELF, TRAPPING INFANT I... \n", "2 DRIVERS SEAT BACK COLLAPSED AND BENT WHEN REAR ENDED. PLEASE DESCRIBE DETAILS. TT \n", "3 TL* THE CONTACT OWNS A 2009 NISSAN ALTIMA. THE CONTACT STATED THAT THE START BUTTON FOR THE IGNI... \n", "4 THE FRONT MIDDLE SEAT DOESN'T LOCK IN PLACE. *AK " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "labeled = pd.read_csv(\"data/sampled-labeled.csv\")\n", "labeled.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0 150\n", "1.0 15\n", "Name: is_suspicious, dtype: int64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "labeled.is_suspicious.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "150 non-suspicious and 15 suspicious is a pretty terrible ratio, but we're remarkably lazy and not very many of the comments are actually suspicious.\n", "\n", "Now that we've read a few, let's train our classifier\n", "\n", "## Creating features\n", "\n", "### Selecting our features and building a features dataframe\n", "\n", "Last time, we can thought of some words or phrases that might make a comment interesting or not interesting. We came up with this list:\n", "\n", "* airbag\n", "* air bag\n", "* failed\n", "* did not deploy\n", "* violent\n", "* explode\n", "* shrapnel\n", "\n", "These **features** are the things that the machine learning algorithm is going to look for when it's reading. There are lots of words in each complaint, but these are the only ones we'll tell the classifier to pay attention to!\n", "\n", "To determine if a word is in `CDESCR`, we can use `.str.contains`. Because computers only like numbers, though, we need to use `.astype(int)` to change it from `True`/`False` to `1`/`0`. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>is_suspicious</th>\n", " <th>airbag</th>\n", " <th>air bag</th>\n", " <th>failed</th>\n", " <th>did not deploy</th>\n", " <th>violent</th>\n", " <th>explode</th>\n", " <th>shrapnel</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>0.0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>0.0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>0.0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>0.0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>0.0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " is_suspicious airbag air bag failed did not deploy violent explode \\\n", "0 0.0 0 0 0 0 0 0 \n", "1 0.0 0 0 0 0 0 0 \n", "2 0.0 0 0 0 0 0 0 \n", "3 0.0 0 0 0 0 0 0 \n", "4 0.0 0 0 0 0 0 0 \n", "\n", " shrapnel \n", "0 0 \n", "1 0 \n", "2 0 \n", "3 0 \n", "4 0 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_df = pd.DataFrame({\n", " 'is_suspicious': labeled.is_suspicious,\n", " 'airbag': labeled.CDESCR.str.contains(\"AIRBAG\", na=False).astype(int),\n", " 'air bag': labeled.CDESCR.str.contains(\"AIR BAG\", na=False).astype(int),\n", " 'failed': labeled.CDESCR.str.contains(\"FAILED\", na=False).astype(int),\n", " 'did not deploy': labeled.CDESCR.str.contains(\"DID NOT DEPLOY\", na=False).astype(int),\n", " 'violent': labeled.CDESCR.str.contains(\"VIOLENT\", na=False).astype(int),\n", " 'explode': labeled.CDESCR.str.contains(\"EXPLODE\", na=False).astype(int),\n", " 'shrapnel': labeled.CDESCR.str.contains(\"SHRAPNEL\", na=False).astype(int),\n", "})\n", "train_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see how big our dataset is, and then remove any rows that are missing data (not all of them are labeled)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(350, 8)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_df.shape" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(165, 8)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_df = train_df.dropna()\n", "train_df.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating our classifier\n", "\n", "Any time you're bulding a classifier, doing regression, or most anything with machine learning, you're using a **model**. It **models** the relationship between the inputs and the outputs.\n", "\n", "### Classification with Decision Trees\n", "\n", "Last time we used a classifier based on **Logistic Regression**. First we split into `X` (our features) and `y` (our labels), and trained the classifier on them." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LogisticRegression(C=1000000000.0, class_weight=None, dual=False,\n", " fit_intercept=True, intercept_scaling=1, l1_ratio=None,\n", " max_iter=100, multi_class='warn', n_jobs=None, penalty='l2',\n", " random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n", " warm_start=False)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import LogisticRegression\n", "\n", "X = train_df.drop(columns='is_suspicious')\n", "y = train_df.is_suspicious\n", "\n", "clf = LogisticRegression(C=1e9, solver='lbfgs')\n", "\n", "clf.fit(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After we built our classifier, we tested it and found it didn't work very well." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Predicted not suspicious</th>\n", " <th>Predicted suspicious</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>Is not suspicious</th>\n", " <td>150</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>Is suspicious</th>\n", " <td>13</td>\n", " <td>2</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Predicted not suspicious Predicted suspicious\n", "Is not suspicious 150 0\n", "Is suspicious 13 2" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics import confusion_matrix\n", "\n", "y_true = y\n", "y_pred = clf.predict(X)\n", "\n", "matrix = confusion_matrix(y_true, y_pred)\n", "\n", "label_names = pd.Series(['not suspicious', 'suspicious'])\n", "pd.DataFrame(matrix,\n", " columns='Predicted ' + label_names,\n", " index='Is ' + label_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To understand a logisic regression classifier, we looked at the coefficients and the odds ratios." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>feature</th>\n", " <th>coefficient (log odds ratio)</th>\n", " <th>odds ratio</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>4</th>\n", " <td>violent</td>\n", " <td>41.423096</td>\n", " <td>9.768364e+17</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>explode</td>\n", " <td>1.269048</td>\n", " <td>3.557500e+00</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>air bag</td>\n", " <td>1.268123</td>\n", " <td>3.554200e+00</td>\n", " </tr>\n", " <tr>\n", " <th>0</th>\n", " <td>airbag</td>\n", " <td>0.945612</td>\n", " <td>2.574400e+00</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>failed</td>\n", " <td>-27.175214</td>\n", " <td>0.000000e+00</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>did not deploy</td>\n", " <td>-37.906428</td>\n", " <td>0.000000e+00</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>shrapnel</td>\n", " <td>-13.204894</td>\n", " <td>0.000000e+00</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " feature coefficient (log odds ratio) odds ratio\n", "4 violent 41.423096 9.768364e+17\n", "5 explode 1.269048 3.557500e+00\n", "1 air bag 1.268123 3.554200e+00\n", "0 airbag 0.945612 2.574400e+00\n", "2 failed -27.175214 0.000000e+00\n", "3 did not deploy -37.906428 0.000000e+00\n", "6 shrapnel -13.204894 0.000000e+00" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "feature_names = X.columns\n", "coefficients = clf.coef_[0]\n", "\n", "pd.DataFrame({\n", " 'feature': feature_names,\n", " 'coefficient (log odds ratio)': coefficients,\n", " 'odds ratio': np.exp(coefficients).round(4)\n", "}).sort_values(by='odds ratio', ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Classification with Decision Trees\n", "\n", "We can also use a classifier called a **decision tree**. All you need to do is have one new import and change the line where you create your classifier." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,\n", " max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort=False,\n", " random_state=None, splitter='best')" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#from sklearn.linear_model import LogisticRegression\n", "from sklearn.tree import DecisionTreeClassifier\n", "\n", "X = train_df.drop(columns='is_suspicious')\n", "y = train_df.is_suspicious\n", "\n", "#clf = LogisticRegression(C=1e9, solver='lbfgs')\n", "clf = DecisionTreeClassifier()\n", "\n", "clf.fit(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Confusion matrix code looks exactly the same." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Predicted not suspicious</th>\n", " <th>Predicted suspicious</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>Is not suspicious</th>\n", " <td>150</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>Is suspicious</th>\n", " <td>13</td>\n", " <td>2</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Predicted not suspicious Predicted suspicious\n", "Is not suspicious 150 0\n", "Is suspicious 13 2" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics import confusion_matrix\n", "\n", "y_true = y\n", "y_pred = clf.predict(X)\n", "\n", "matrix = confusion_matrix(y_true, y_pred)\n", "\n", "label_names = pd.Series(['not suspicious', 'suspicious'])\n", "pd.DataFrame(matrix,\n", " columns='Predicted ' + label_names,\n", " index='Is ' + label_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When using a decision tree, **using the classifier is the same, but the code to understand the classifier is a bit different.** Instead of coefficients, we're going to look at **feature importance.**" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " <style>\n", " table.eli5-weights tr:hover {\n", " filter: brightness(85%);\n", " }\n", "</style>\n", "\n", "\n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", " <table class=\"eli5-weights eli5-feature-importances\" style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto;\">\n", " <thead>\n", " <tr style=\"border: none;\">\n", " <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">Weight</th>\n", " <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.3440\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " airbag\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 86.19%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.2026\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " violent\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 88.66%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.1529\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " air bag\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 89.10%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.1445\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " explode\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 90.40%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.1205\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " did not deploy\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 96.67%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.0266\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " failed\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 98.43%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.0091\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " shrapnel\n", " </td>\n", " </tr>\n", " \n", " \n", " </tbody>\n", "</table>\n", " \n", "\n", " \n", "\n", "\n", " \n", "\n", " \n", " \n", " <pre>\n", "Decision tree feature importances; values are numbers 0 <= x <= 1;\n", "all values sum to 1.\n", "</pre>\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", "\n", "\n" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import eli5\n", "\n", "label_names = ['not suspicious', 'suspicious']\n", "feature_names = list(X.columns)\n", "\n", "eli5.show_weights(clf,\n", " feature_names=feature_names,\n", " target_names=label_names,\n", " show=['feature_importances', 'description'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most fun part of using a decision tree is **visualizing it.**" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n", " -->\n", "<!-- Title: Tree Pages: 1 -->\n", "<svg width=\"770pt\" height=\"870pt\"\n", " viewBox=\"0.00 0.00 769.84 870.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 866)\">\n", "<title>Tree</title>\n", "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-866 765.8369,-866 765.8369,4 -4,4\"/>\n", "<!-- 0 -->\n", "<g id=\"node1\" class=\"node\">\n", "<title>0</title>\n", "<polygon fill=\"#e88e4d\" stroke=\"#000000\" points=\"599.7555,-862 462.0814,-862 462.0814,-784 599.7555,-784 599.7555,-862\"/>\n", "<text text-anchor=\"middle\" x=\"530.9185\" y=\"-846.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">violent <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"530.9185\" y=\"-832.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.165</text>\n", "<text text-anchor=\"middle\" x=\"530.9185\" y=\"-818.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 165</text>\n", "<text text-anchor=\"middle\" x=\"530.9185\" y=\"-804.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [150, 15]</text>\n", "<text text-anchor=\"middle\" x=\"530.9185\" y=\"-790.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 1 -->\n", "<g id=\"node2\" class=\"node\">\n", "<title>1</title>\n", "<polygon fill=\"#e78d4b\" stroke=\"#000000\" points=\"527.7555,-748 390.0814,-748 390.0814,-670 527.7555,-670 527.7555,-748\"/>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-732.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">did not deploy <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-718.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.156</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-704.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 164</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-690.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [150, 14]</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-676.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 0->1 -->\n", "<g id=\"edge1\" class=\"edge\">\n", "<title>0->1</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M506.1401,-783.7677C500.6531,-775.0798 494.7796,-765.7801 489.1041,-756.794\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"492.0416,-754.8906 483.7424,-748.3046 486.1232,-758.6285 492.0416,-754.8906\"/>\n", "<text text-anchor=\"middle\" x=\"478.2359\" y=\"-768.4907\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">True</text>\n", "</g>\n", "<!-- 20 -->\n", "<g id=\"node21\" class=\"node\">\n", "<title>20</title>\n", "<polygon fill=\"#399de5\" stroke=\"#000000\" points=\"662.3666,-741 545.4703,-741 545.4703,-677 662.3666,-677 662.3666,-741\"/>\n", "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-725.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-711.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1</text>\n", "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-697.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0, 1]</text>\n", "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-683.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = suspicious</text>\n", "</g>\n", "<!-- 0->20 -->\n", "<g id=\"edge20\" class=\"edge\">\n", "<title>0->20</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M556.0409,-783.7677C563.0702,-772.7904 570.7251,-760.8362 577.8101,-749.772\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"580.9019,-751.434 583.3471,-741.1252 575.0069,-747.6591 580.9019,-751.434\"/>\n", "<text text-anchor=\"middle\" x=\"588.7012\" y=\"-761.3451\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">False</text>\n", "</g>\n", "<!-- 2 -->\n", "<g id=\"node3\" class=\"node\">\n", "<title>2</title>\n", "<polygon fill=\"#e99254\" stroke=\"#000000\" points=\"449.7555,-634 312.0814,-634 312.0814,-556 449.7555,-556 449.7555,-634\"/>\n", "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-618.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">air bag <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-604.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.212</text>\n", "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-590.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 116</text>\n", "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-576.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [102, 14]</text>\n", "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-562.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 1->2 -->\n", "<g id=\"edge2\" class=\"edge\">\n", "<title>1->2</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M432.0753,-669.7677C426.131,-661.0798 419.768,-651.7801 413.6196,-642.794\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"416.3466,-640.5813 407.8111,-634.3046 410.5694,-644.5341 416.3466,-640.5813\"/>\n", "</g>\n", "<!-- 19 -->\n", "<g id=\"node20\" class=\"node\">\n", "<title>19</title>\n", "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"605.7555,-627 468.0814,-627 468.0814,-563 605.7555,-563 605.7555,-627\"/>\n", "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-611.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-597.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 48</text>\n", "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-583.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [48, 0]</text>\n", "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-569.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 1->19 -->\n", "<g id=\"edge19\" class=\"edge\">\n", "<title>1->19</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M485.7616,-669.7677C493.3468,-658.6817 501.6136,-646.5994 509.2465,-635.4436\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"512.1798,-637.3547 514.9381,-627.1252 506.4026,-633.4019 512.1798,-637.3547\"/>\n", "</g>\n", "<!-- 3 -->\n", "<g id=\"node4\" class=\"node\">\n", "<title>3</title>\n", "<polygon fill=\"#e78d4b\" stroke=\"#000000\" points=\"371.7555,-520 234.0814,-520 234.0814,-442 371.7555,-442 371.7555,-520\"/>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-504.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">airbag <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-490.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.156</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-476.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 94</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-462.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [86, 8]</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-448.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 2->3 -->\n", "<g id=\"edge3\" class=\"edge\">\n", "<title>2->3</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M354.0753,-555.7677C348.131,-547.0798 341.768,-537.7801 335.6196,-528.794\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"338.3466,-526.5813 329.8111,-520.3046 332.5694,-530.5341 338.3466,-526.5813\"/>\n", "</g>\n", "<!-- 12 -->\n", "<g id=\"node13\" class=\"node\">\n", "<title>12</title>\n", "<polygon fill=\"#efb083\" stroke=\"#000000\" points=\"527.7555,-520 390.0814,-520 390.0814,-442 527.7555,-442 527.7555,-520\"/>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-504.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">explode <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-490.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.397</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-476.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 22</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-462.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [16, 6]</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-448.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 2->12 -->\n", "<g id=\"edge12\" class=\"edge\">\n", "<title>2->12</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M407.7616,-555.7677C413.7059,-547.0798 420.0689,-537.7801 426.2173,-528.794\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"429.2675,-530.5341 432.0258,-520.3046 423.4903,-526.5813 429.2675,-530.5341\"/>\n", "</g>\n", "<!-- 4 -->\n", "<g id=\"node5\" class=\"node\">\n", "<title>4</title>\n", "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"215.7555,-399 78.0814,-399 78.0814,-335 215.7555,-335 215.7555,-399\"/>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-383.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-369.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 49</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-355.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [49, 0]</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-341.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 3->4 -->\n", "<g id=\"edge4\" class=\"edge\">\n", "<title>3->4</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M249.2321,-441.7677C233.0207,-429.9209 215.2525,-416.9364 199.1382,-405.1606\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"201.0183,-402.1995 190.8793,-399.1252 196.8881,-407.8513 201.0183,-402.1995\"/>\n", "</g>\n", "<!-- 5 -->\n", "<g id=\"node6\" class=\"node\">\n", "<title>5</title>\n", "<polygon fill=\"#eb9c64\" stroke=\"#000000\" points=\"371.7555,-406 234.0814,-406 234.0814,-328 371.7555,-328 371.7555,-406\"/>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-390.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">failed <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-376.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.292</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-362.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 45</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-348.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [37, 8]</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-334.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 3->5 -->\n", "<g id=\"edge5\" class=\"edge\">\n", "<title>3->5</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M302.9185,-441.7677C302.9185,-433.6172 302.9185,-424.9283 302.9185,-416.4649\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"306.4186,-416.3046 302.9185,-406.3046 299.4186,-416.3047 306.4186,-416.3046\"/>\n", "</g>\n", "<!-- 6 -->\n", "<g id=\"node7\" class=\"node\">\n", "<title>6</title>\n", "<polygon fill=\"#eb9f68\" stroke=\"#000000\" points=\"215.7555,-292 78.0814,-292 78.0814,-214 215.7555,-214 215.7555,-292\"/>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-276.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">shrapnel <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-262.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.308</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-248.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 42</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-234.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [34, 8]</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-220.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 5->6 -->\n", "<g id=\"edge6\" class=\"edge\">\n", "<title>5->6</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M249.2321,-327.7677C236.1674,-318.2204 222.0916,-307.9342 208.6852,-298.1373\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"210.5784,-295.1858 200.4394,-292.1115 206.4483,-300.8375 210.5784,-295.1858\"/>\n", "</g>\n", "<!-- 11 -->\n", "<g id=\"node12\" class=\"node\">\n", "<title>11</title>\n", "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"371.7555,-285 234.0814,-285 234.0814,-221 371.7555,-221 371.7555,-285\"/>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-269.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-255.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 3</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-241.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [3, 0]</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-227.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 5->11 -->\n", "<g id=\"edge11\" class=\"edge\">\n", "<title>5->11</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M302.9185,-327.7677C302.9185,-317.3338 302.9185,-306.0174 302.9185,-295.4215\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"306.4186,-295.1252 302.9185,-285.1252 299.4186,-295.1252 306.4186,-295.1252\"/>\n", "</g>\n", "<!-- 7 -->\n", "<g id=\"node8\" class=\"node\">\n", "<title>7</title>\n", "<polygon fill=\"#eba069\" stroke=\"#000000\" points=\"215.7555,-178 78.0814,-178 78.0814,-100 215.7555,-100 215.7555,-178\"/>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-162.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">explode <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-148.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.314</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-134.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 41</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-120.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [33, 8]</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-106.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 6->7 -->\n", "<g id=\"edge7\" class=\"edge\">\n", "<title>6->7</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M146.9185,-213.7677C146.9185,-205.6172 146.9185,-196.9283 146.9185,-188.4649\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"150.4186,-188.3046 146.9185,-178.3046 143.4186,-188.3047 150.4186,-188.3046\"/>\n", "</g>\n", "<!-- 10 -->\n", "<g id=\"node11\" class=\"node\">\n", "<title>10</title>\n", "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"371.7555,-171 234.0814,-171 234.0814,-107 371.7555,-107 371.7555,-171\"/>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1, 0]</text>\n", "<text text-anchor=\"middle\" x=\"302.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 6->10 -->\n", "<g id=\"edge10\" class=\"edge\">\n", "<title>6->10</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M200.6048,-213.7677C216.8162,-201.9209 234.5845,-188.9364 250.6987,-177.1606\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"252.9488,-179.8513 258.9577,-171.1252 248.8187,-174.1995 252.9488,-179.8513\"/>\n", "</g>\n", "<!-- 8 -->\n", "<g id=\"node9\" class=\"node\">\n", "<title>8</title>\n", "<polygon fill=\"#eca06a\" stroke=\"#000000\" points=\"137.7555,-64 .0814,-64 .0814,0 137.7555,0 137.7555,-64\"/>\n", "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-48.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.32</text>\n", "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 40</text>\n", "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [32, 8]</text>\n", "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 7->8 -->\n", "<g id=\"edge8\" class=\"edge\">\n", "<title>7->8</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M118.317,-99.7647C111.8591,-90.9057 104.9907,-81.4838 98.4936,-72.571\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"101.1227,-70.236 92.4037,-64.2169 95.4661,-74.3595 101.1227,-70.236\"/>\n", "</g>\n", "<!-- 9 -->\n", "<g id=\"node10\" class=\"node\">\n", "<title>9</title>\n", "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"293.7555,-64 156.0814,-64 156.0814,0 293.7555,0 293.7555,-64\"/>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-48.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1</text>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1, 0]</text>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 7->9 -->\n", "<g id=\"edge9\" class=\"edge\">\n", "<title>7->9</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M175.5199,-99.7647C181.9778,-90.9057 188.8462,-81.4838 195.3433,-72.571\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"198.3708,-74.3595 201.4333,-64.2169 192.7142,-70.236 198.3708,-74.3595\"/>\n", "</g>\n", "<!-- 13 -->\n", "<g id=\"node14\" class=\"node\">\n", "<title>13</title>\n", "<polygon fill=\"#eda877\" stroke=\"#000000\" points=\"527.7555,-406 390.0814,-406 390.0814,-328 527.7555,-328 527.7555,-406\"/>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-390.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">airbag <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-376.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.363</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-362.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 21</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-348.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [16, 5]</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-334.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 12->13 -->\n", "<g id=\"edge13\" class=\"edge\">\n", "<title>12->13</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M458.9185,-441.7677C458.9185,-433.6172 458.9185,-424.9283 458.9185,-416.4649\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"462.4186,-416.3046 458.9185,-406.3046 455.4186,-416.3047 462.4186,-416.3046\"/>\n", "</g>\n", "<!-- 18 -->\n", "<g id=\"node19\" class=\"node\">\n", "<title>18</title>\n", "<polygon fill=\"#399de5\" stroke=\"#000000\" points=\"662.3666,-399 545.4703,-399 545.4703,-335 662.3666,-335 662.3666,-399\"/>\n", "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-383.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-369.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1</text>\n", "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-355.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0, 1]</text>\n", "<text text-anchor=\"middle\" x=\"603.9185\" y=\"-341.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = suspicious</text>\n", "</g>\n", "<!-- 12->18 -->\n", "<g id=\"edge18\" class=\"edge\">\n", "<title>12->18</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M508.8192,-441.7677C523.7493,-430.0296 540.1,-417.1745 554.9683,-405.485\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"557.3594,-408.0573 563.0575,-399.1252 553.0329,-402.5544 557.3594,-408.0573\"/>\n", "</g>\n", "<!-- 14 -->\n", "<g id=\"node15\" class=\"node\">\n", "<title>14</title>\n", "<polygon fill=\"#fae6d7\" stroke=\"#000000\" points=\"527.7555,-285 390.0814,-285 390.0814,-221 527.7555,-221 527.7555,-285\"/>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-269.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.494</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-255.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 9</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-241.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [5, 4]</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-227.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 13->14 -->\n", "<g id=\"edge14\" class=\"edge\">\n", "<title>13->14</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M458.9185,-327.7677C458.9185,-317.3338 458.9185,-306.0174 458.9185,-295.4215\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"462.4186,-295.1252 458.9185,-285.1252 455.4186,-295.1252 462.4186,-295.1252\"/>\n", "</g>\n", "<!-- 15 -->\n", "<g id=\"node16\" class=\"node\">\n", "<title>15</title>\n", "<polygon fill=\"#e78c4b\" stroke=\"#000000\" points=\"683.7555,-292 546.0814,-292 546.0814,-214 683.7555,-214 683.7555,-292\"/>\n", "<text text-anchor=\"middle\" x=\"614.9185\" y=\"-276.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">failed <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"614.9185\" y=\"-262.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.153</text>\n", "<text text-anchor=\"middle\" x=\"614.9185\" y=\"-248.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 12</text>\n", "<text text-anchor=\"middle\" x=\"614.9185\" y=\"-234.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [11, 1]</text>\n", "<text text-anchor=\"middle\" x=\"614.9185\" y=\"-220.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 13->15 -->\n", "<g id=\"edge15\" class=\"edge\">\n", "<title>13->15</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M512.6048,-327.7677C525.6695,-318.2204 539.7453,-307.9342 553.1517,-298.1373\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"555.3887,-300.8375 561.3975,-292.1115 551.2585,-295.1858 555.3887,-300.8375\"/>\n", "</g>\n", "<!-- 16 -->\n", "<g id=\"node17\" class=\"node\">\n", "<title>16</title>\n", "<polygon fill=\"#e88e4d\" stroke=\"#000000\" points=\"605.7555,-171 468.0814,-171 468.0814,-107 605.7555,-107 605.7555,-171\"/>\n", "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.165</text>\n", "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 11</text>\n", "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [10, 1]</text>\n", "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 15->16 -->\n", "<g id=\"edge16\" class=\"edge\">\n", "<title>15->16</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M588.0753,-213.7677C580.4901,-202.6817 572.2233,-190.5994 564.5904,-179.4436\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"567.4343,-177.4019 558.8989,-171.1252 561.6572,-181.3547 567.4343,-177.4019\"/>\n", "</g>\n", "<!-- 17 -->\n", "<g id=\"node18\" class=\"node\">\n", "<title>17</title>\n", "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"761.7555,-171 624.0814,-171 624.0814,-107 761.7555,-107 761.7555,-171\"/>\n", "<text text-anchor=\"middle\" x=\"692.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"692.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1</text>\n", "<text text-anchor=\"middle\" x=\"692.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1, 0]</text>\n", "<text text-anchor=\"middle\" x=\"692.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 15->17 -->\n", "<g id=\"edge17\" class=\"edge\">\n", "<title>15->17</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M641.7616,-213.7677C649.3468,-202.6817 657.6136,-190.5994 665.2465,-179.4436\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"668.1798,-181.3547 670.9381,-171.1252 662.4026,-177.4019 668.1798,-181.3547\"/>\n", "</g>\n", "</g>\n", "</svg>\n" ], "text/plain": [ "<graphviz.files.Source at 0x11e68e898>" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn import tree\n", "import graphviz\n", "\n", "label_names = ['not suspicious', 'suspicious']\n", "feature_names = X.columns\n", "\n", "dot_data = tree.export_graphviz(clf,\n", " feature_names=feature_names, \n", " filled=True,\n", " class_names=label_names) \n", "graph = graphviz.Source(dot_data) \n", "graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also also see the tree with `eli5`, I just suppressed it because I thought we could use a little color." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " <style>\n", " table.eli5-weights tr:hover {\n", " filter: brightness(85%);\n", " }\n", "</style>\n", "\n", "\n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", " <table class=\"eli5-weights eli5-feature-importances\" style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto;\">\n", " <thead>\n", " <tr style=\"border: none;\">\n", " <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">Weight</th>\n", " <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.3440\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " airbag\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 86.19%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.2026\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " violent\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 88.66%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.1529\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " air bag\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 89.10%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.1445\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " explode\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 90.40%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.1205\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " did not deploy\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 96.67%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.0266\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " failed\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 98.43%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.0091\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " shrapnel\n", " </td>\n", " </tr>\n", " \n", " \n", " </tbody>\n", "</table>\n", " \n", "\n", " \n", "\n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", " \n", " <br>\n", " <pre><svg width=\"853pt\" height=\"870pt\"\n", " viewBox=\"0.00 0.00 852.84 870.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 866)\">\n", "<title>Tree</title>\n", "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-866 848.8369,-866 848.8369,4 -4,4\"/>\n", "<!-- 0 -->\n", "<g id=\"node1\" class=\"node\">\n", "<title>0</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"679.9563,-862 539.8806,-862 539.8806,-784 679.9563,-784 679.9563,-862\"/>\n", "<text text-anchor=\"middle\" x=\"609.9185\" y=\"-846.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">violent <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"609.9185\" y=\"-832.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.165</text>\n", "<text text-anchor=\"middle\" x=\"609.9185\" y=\"-818.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 100.0%</text>\n", "<text text-anchor=\"middle\" x=\"609.9185\" y=\"-804.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.909, 0.091]</text>\n", "<text text-anchor=\"middle\" x=\"609.9185\" y=\"-790.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 1 -->\n", "<g id=\"node2\" class=\"node\">\n", "<title>1</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"606.9563,-748 466.8806,-748 466.8806,-670 606.9563,-670 606.9563,-748\"/>\n", "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-732.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">did not deploy <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-718.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.156</text>\n", "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-704.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 99.4%</text>\n", "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-690.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.915, 0.085]</text>\n", "<text text-anchor=\"middle\" x=\"536.9185\" y=\"-676.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 0->1 -->\n", "<g id=\"edge1\" class=\"edge\">\n", "<title>0->1</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M584.796,-783.7677C579.2327,-775.0798 573.2777,-765.7801 567.5234,-756.794\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"570.4274,-754.8386 562.0872,-748.3046 564.5324,-758.6135 570.4274,-754.8386\"/>\n", "<text text-anchor=\"middle\" x=\"556.7331\" y=\"-768.5246\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">True</text>\n", "</g>\n", "<!-- 20 -->\n", "<g id=\"node21\" class=\"node\">\n", "<title>20</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"741.3666,-741 624.4703,-741 624.4703,-677 741.3666,-677 741.3666,-741\"/>\n", "<text text-anchor=\"middle\" x=\"682.9185\" y=\"-725.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"682.9185\" y=\"-711.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 0.6%</text>\n", "<text text-anchor=\"middle\" x=\"682.9185\" y=\"-697.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.0, 1.0]</text>\n", "<text text-anchor=\"middle\" x=\"682.9185\" y=\"-683.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = suspicious</text>\n", "</g>\n", "<!-- 0->20 -->\n", "<g id=\"edge20\" class=\"edge\">\n", "<title>0->20</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M635.0409,-783.7677C642.0702,-772.7904 649.7251,-760.8362 656.8101,-749.772\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"659.9019,-751.434 662.3471,-741.1252 654.0069,-747.6591 659.9019,-751.434\"/>\n", "<text text-anchor=\"middle\" x=\"667.7012\" y=\"-761.3451\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">False</text>\n", "</g>\n", "<!-- 2 -->\n", "<g id=\"node3\" class=\"node\">\n", "<title>2</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"528.9563,-634 388.8806,-634 388.8806,-556 528.9563,-556 528.9563,-634\"/>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-618.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">air bag <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-604.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.212</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-590.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 70.3%</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-576.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.879, 0.121]</text>\n", "<text text-anchor=\"middle\" x=\"458.9185\" y=\"-562.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 1->2 -->\n", "<g id=\"edge2\" class=\"edge\">\n", "<title>1->2</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M510.0753,-669.7677C504.131,-661.0798 497.768,-651.7801 491.6196,-642.794\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"494.3466,-640.5813 485.8111,-634.3046 488.5694,-644.5341 494.3466,-640.5813\"/>\n", "</g>\n", "<!-- 19 -->\n", "<g id=\"node20\" class=\"node\">\n", "<title>19</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"684.7555,-627 547.0814,-627 547.0814,-563 684.7555,-563 684.7555,-627\"/>\n", "<text text-anchor=\"middle\" x=\"615.9185\" y=\"-611.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"615.9185\" y=\"-597.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 29.1%</text>\n", "<text text-anchor=\"middle\" x=\"615.9185\" y=\"-583.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1.0, 0.0]</text>\n", "<text text-anchor=\"middle\" x=\"615.9185\" y=\"-569.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 1->19 -->\n", "<g id=\"edge19\" class=\"edge\">\n", "<title>1->19</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M564.1058,-669.7677C571.7882,-658.6817 580.161,-646.5994 587.8917,-635.4436\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"590.8371,-637.3381 593.6563,-627.1252 585.0836,-633.351 590.8371,-637.3381\"/>\n", "</g>\n", "<!-- 3 -->\n", "<g id=\"node4\" class=\"node\">\n", "<title>3</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"449.9563,-520 309.8806,-520 309.8806,-442 449.9563,-442 449.9563,-520\"/>\n", "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-504.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">airbag <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-490.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.156</text>\n", "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-476.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 57.0%</text>\n", "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-462.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.915, 0.085]</text>\n", "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-448.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 2->3 -->\n", "<g id=\"edge3\" class=\"edge\">\n", "<title>2->3</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M431.7311,-555.7677C425.7106,-547.0798 419.2661,-537.7801 413.0388,-528.794\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"415.7285,-526.5304 407.1559,-520.3046 409.975,-530.5176 415.7285,-526.5304\"/>\n", "</g>\n", "<!-- 12 -->\n", "<g id=\"node13\" class=\"node\">\n", "<title>12</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"608.9563,-520 468.8806,-520 468.8806,-442 608.9563,-442 608.9563,-520\"/>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-504.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">explode <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-490.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.397</text>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-476.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 13.3%</text>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-462.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.727, 0.273]</text>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-448.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 2->12 -->\n", "<g id=\"edge12\" class=\"edge\">\n", "<title>2->12</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M486.4499,-555.7677C492.6095,-546.9903 499.2074,-537.5883 505.5738,-528.5161\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"508.4569,-530.5007 511.3362,-520.3046 502.727,-526.4797 508.4569,-530.5007\"/>\n", "</g>\n", "<!-- 4 -->\n", "<g id=\"node5\" class=\"node\">\n", "<title>4</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"291.7555,-399 154.0814,-399 154.0814,-335 291.7555,-335 291.7555,-399\"/>\n", "<text text-anchor=\"middle\" x=\"222.9185\" y=\"-383.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"222.9185\" y=\"-369.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 29.7%</text>\n", "<text text-anchor=\"middle\" x=\"222.9185\" y=\"-355.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1.0, 0.0]</text>\n", "<text text-anchor=\"middle\" x=\"222.9185\" y=\"-341.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 3->4 -->\n", "<g id=\"edge4\" class=\"edge\">\n", "<title>3->4</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M325.888,-441.7677C309.5727,-429.9209 291.6905,-416.9364 275.4729,-405.1606\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"277.3094,-402.1687 267.1611,-399.1252 273.1964,-407.833 277.3094,-402.1687\"/>\n", "</g>\n", "<!-- 5 -->\n", "<g id=\"node6\" class=\"node\">\n", "<title>5</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"449.9563,-406 309.8806,-406 309.8806,-328 449.9563,-328 449.9563,-406\"/>\n", "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-390.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">failed <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-376.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.292</text>\n", "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-362.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 27.3%</text>\n", "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-348.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.822, 0.178]</text>\n", "<text text-anchor=\"middle\" x=\"379.9185\" y=\"-334.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 3->5 -->\n", "<g id=\"edge5\" class=\"edge\">\n", "<title>3->5</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M379.9185,-441.7677C379.9185,-433.6172 379.9185,-424.9283 379.9185,-416.4649\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"383.4186,-416.3046 379.9185,-406.3046 376.4186,-416.3047 383.4186,-416.3046\"/>\n", "</g>\n", "<!-- 6 -->\n", "<g id=\"node7\" class=\"node\">\n", "<title>6</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"293.7555,-292 156.0814,-292 156.0814,-214 293.7555,-214 293.7555,-292\"/>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-276.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">shrapnel <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-262.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.308</text>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-248.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 25.5%</text>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-234.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.81, 0.19]</text>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-220.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 5->6 -->\n", "<g id=\"edge6\" class=\"edge\">\n", "<title>5->6</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M326.5762,-327.7677C313.5953,-318.2204 299.6097,-307.9342 286.2893,-298.1373\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"288.2258,-295.2169 278.0963,-292.1115 284.0784,-300.8559 288.2258,-295.2169\"/>\n", "</g>\n", "<!-- 11 -->\n", "<g id=\"node12\" class=\"node\">\n", "<title>11</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"449.7555,-285 312.0814,-285 312.0814,-221 449.7555,-221 449.7555,-285\"/>\n", "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-269.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-255.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1.8%</text>\n", "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-241.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1.0, 0.0]</text>\n", "<text text-anchor=\"middle\" x=\"380.9185\" y=\"-227.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 5->11 -->\n", "<g id=\"edge11\" class=\"edge\">\n", "<title>5->11</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M380.2626,-327.7677C380.3541,-317.3338 380.4534,-306.0174 380.5463,-295.4215\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"384.0487,-295.1555 380.6367,-285.1252 377.049,-295.0941 384.0487,-295.1555\"/>\n", "</g>\n", "<!-- 7 -->\n", "<g id=\"node8\" class=\"node\">\n", "<title>7</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"216.9563,-178 76.8806,-178 76.8806,-100 216.9563,-100 216.9563,-178\"/>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-162.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">explode <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-148.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.314</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-134.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 24.8%</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-120.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.805, 0.195]</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-106.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 6->7 -->\n", "<g id=\"edge7\" class=\"edge\">\n", "<title>6->7</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M198.0753,-213.7677C192.131,-205.0798 185.768,-195.7801 179.6196,-186.794\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"182.3466,-184.5813 173.8111,-178.3046 176.5694,-188.5341 182.3466,-184.5813\"/>\n", "</g>\n", "<!-- 10 -->\n", "<g id=\"node11\" class=\"node\">\n", "<title>10</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"372.7555,-171 235.0814,-171 235.0814,-107 372.7555,-107 372.7555,-171\"/>\n", "<text text-anchor=\"middle\" x=\"303.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"303.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 0.6%</text>\n", "<text text-anchor=\"middle\" x=\"303.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1.0, 0.0]</text>\n", "<text text-anchor=\"middle\" x=\"303.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 6->10 -->\n", "<g id=\"edge10\" class=\"edge\">\n", "<title>6->10</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M252.1058,-213.7677C259.7882,-202.6817 268.161,-190.5994 275.8917,-179.4436\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"278.8371,-181.3381 281.6563,-171.1252 273.0836,-177.351 278.8371,-181.3381\"/>\n", "</g>\n", "<!-- 8 -->\n", "<g id=\"node9\" class=\"node\">\n", "<title>8</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"137.7555,-64 .0814,-64 .0814,0 137.7555,0 137.7555,-64\"/>\n", "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-48.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.32</text>\n", "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 24.2%</text>\n", "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.8, 0.2]</text>\n", "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 7->8 -->\n", "<g id=\"edge8\" class=\"edge\">\n", "<title>7->8</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M118.317,-99.7647C111.8591,-90.9057 104.9907,-81.4838 98.4936,-72.571\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"101.1227,-70.236 92.4037,-64.2169 95.4661,-74.3595 101.1227,-70.236\"/>\n", "</g>\n", "<!-- 9 -->\n", "<g id=\"node10\" class=\"node\">\n", "<title>9</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"293.7555,-64 156.0814,-64 156.0814,0 293.7555,0 293.7555,-64\"/>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-48.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 0.6%</text>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1.0, 0.0]</text>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 7->9 -->\n", "<g id=\"edge9\" class=\"edge\">\n", "<title>7->9</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M175.5199,-99.7647C181.9778,-90.9057 188.8462,-81.4838 195.3433,-72.571\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"198.3708,-74.3595 201.4333,-64.2169 192.7142,-70.236 198.3708,-74.3595\"/>\n", "</g>\n", "<!-- 13 -->\n", "<g id=\"node14\" class=\"node\">\n", "<title>13</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"608.9563,-406 468.8806,-406 468.8806,-328 608.9563,-328 608.9563,-406\"/>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-390.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">airbag <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-376.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.363</text>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-362.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 12.7%</text>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-348.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.762, 0.238]</text>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-334.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 12->13 -->\n", "<g id=\"edge13\" class=\"edge\">\n", "<title>12->13</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M538.9185,-441.7677C538.9185,-433.6172 538.9185,-424.9283 538.9185,-416.4649\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"542.4186,-416.3046 538.9185,-406.3046 535.4186,-416.3047 542.4186,-416.3046\"/>\n", "</g>\n", "<!-- 18 -->\n", "<g id=\"node19\" class=\"node\">\n", "<title>18</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"743.3666,-399 626.4703,-399 626.4703,-335 743.3666,-335 743.3666,-399\"/>\n", "<text text-anchor=\"middle\" x=\"684.9185\" y=\"-383.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"684.9185\" y=\"-369.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 0.6%</text>\n", "<text text-anchor=\"middle\" x=\"684.9185\" y=\"-355.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.0, 1.0]</text>\n", "<text text-anchor=\"middle\" x=\"684.9185\" y=\"-341.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = suspicious</text>\n", "</g>\n", "<!-- 12->18 -->\n", "<g id=\"edge18\" class=\"edge\">\n", "<title>12->18</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M589.1634,-441.7677C604.1964,-430.0296 620.6598,-417.1745 635.6307,-405.485\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"638.0478,-408.0382 643.7757,-399.1252 633.7397,-402.5209 638.0478,-408.0382\"/>\n", "</g>\n", "<!-- 14 -->\n", "<g id=\"node15\" class=\"node\">\n", "<title>14</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"608.9563,-285 468.8806,-285 468.8806,-221 608.9563,-221 608.9563,-285\"/>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-269.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.494</text>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-255.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 5.5%</text>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-241.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.556, 0.444]</text>\n", "<text text-anchor=\"middle\" x=\"538.9185\" y=\"-227.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 13->14 -->\n", "<g id=\"edge14\" class=\"edge\">\n", "<title>13->14</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M538.9185,-327.7677C538.9185,-317.3338 538.9185,-306.0174 538.9185,-295.4215\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"542.4186,-295.1252 538.9185,-285.1252 535.4186,-295.1252 542.4186,-295.1252\"/>\n", "</g>\n", "<!-- 15 -->\n", "<g id=\"node16\" class=\"node\">\n", "<title>15</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"767.9563,-292 627.8806,-292 627.8806,-214 767.9563,-214 767.9563,-292\"/>\n", "<text text-anchor=\"middle\" x=\"697.9185\" y=\"-276.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">failed <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"697.9185\" y=\"-262.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.153</text>\n", "<text text-anchor=\"middle\" x=\"697.9185\" y=\"-248.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 7.3%</text>\n", "<text text-anchor=\"middle\" x=\"697.9185\" y=\"-234.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.917, 0.083]</text>\n", "<text text-anchor=\"middle\" x=\"697.9185\" y=\"-220.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 13->15 -->\n", "<g id=\"edge15\" class=\"edge\">\n", "<title>13->15</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M593.6372,-327.7677C606.9532,-318.2204 621.2997,-307.9342 634.9639,-298.1373\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"637.2807,-300.7828 643.3683,-292.1115 633.2019,-295.0939 637.2807,-300.7828\"/>\n", "</g>\n", "<!-- 16 -->\n", "<g id=\"node17\" class=\"node\">\n", "<title>16</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"688.9563,-171 548.8806,-171 548.8806,-107 688.9563,-107 688.9563,-171\"/>\n", "<text text-anchor=\"middle\" x=\"618.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.165</text>\n", "<text text-anchor=\"middle\" x=\"618.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 6.7%</text>\n", "<text text-anchor=\"middle\" x=\"618.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0.909, 0.091]</text>\n", "<text text-anchor=\"middle\" x=\"618.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 15->16 -->\n", "<g id=\"edge16\" class=\"edge\">\n", "<title>15->16</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M670.7311,-213.7677C663.0488,-202.6817 654.676,-190.5994 646.9452,-179.4436\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"649.7533,-177.351 641.1807,-171.1252 643.9998,-181.3381 649.7533,-177.351\"/>\n", "</g>\n", "<!-- 17 -->\n", "<g id=\"node18\" class=\"node\">\n", "<title>17</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"844.7555,-171 707.0814,-171 707.0814,-107 844.7555,-107 844.7555,-171\"/>\n", "<text text-anchor=\"middle\" x=\"775.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"775.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 0.6%</text>\n", "<text text-anchor=\"middle\" x=\"775.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [1.0, 0.0]</text>\n", "<text text-anchor=\"middle\" x=\"775.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 15->17 -->\n", "<g id=\"edge17\" class=\"edge\">\n", "<title>15->17</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M724.7616,-213.7677C732.3468,-202.6817 740.6136,-190.5994 748.2465,-179.4436\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"751.1798,-181.3547 753.9381,-171.1252 745.4026,-177.4019 751.1798,-181.3547\"/>\n", "</g>\n", "</g>\n", "</svg>\n", "</pre>\n", " \n", "\n", "\n", "\n" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "feature_names = list(X.columns)\n", "\n", "eli5.show_weights(clf,\n", " feature_names=feature_names,\n", " target_names=label_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the best part is: almost everything you can do with a logistic regression classifier you can do with a decision tree. Most of the time you can just **change your classifier to see if it does better.**\n", "\n", "Decision trees also have a lot of simple options." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,\n", " max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort=False,\n", " random_state=None, splitter='best')" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.tree import DecisionTreeClassifier\n", "\n", "X = train_df.drop(columns='is_suspicious')\n", "y = train_df.is_suspicious\n", "\n", "clf = DecisionTreeClassifier(max_depth=2)\n", "\n", "clf.fit(X, y)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n", " -->\n", "<!-- Title: Tree Pages: 1 -->\n", "<svg width=\"358pt\" height=\"300pt\"\n", " viewBox=\"0.00 0.00 358.14 300.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 296)\">\n", "<title>Tree</title>\n", "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-296 354.1421,-296 354.1421,4 -4,4\"/>\n", "<!-- 0 -->\n", "<g id=\"node1\" class=\"node\">\n", "<title>0</title>\n", "<polygon fill=\"#e88e4d\" stroke=\"#000000\" points=\"287.7555,-292 150.0814,-292 150.0814,-214 287.7555,-214 287.7555,-292\"/>\n", "<text text-anchor=\"middle\" x=\"218.9185\" y=\"-276.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">violent <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"218.9185\" y=\"-262.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.165</text>\n", "<text text-anchor=\"middle\" x=\"218.9185\" y=\"-248.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 165</text>\n", "<text text-anchor=\"middle\" x=\"218.9185\" y=\"-234.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [150, 15]</text>\n", "<text text-anchor=\"middle\" x=\"218.9185\" y=\"-220.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 1 -->\n", "<g id=\"node2\" class=\"node\">\n", "<title>1</title>\n", "<polygon fill=\"#e78d4b\" stroke=\"#000000\" points=\"215.7555,-178 78.0814,-178 78.0814,-100 215.7555,-100 215.7555,-178\"/>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-162.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">did not deploy <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-148.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.156</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-134.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 164</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-120.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [150, 14]</text>\n", "<text text-anchor=\"middle\" x=\"146.9185\" y=\"-106.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 0->1 -->\n", "<g id=\"edge1\" class=\"edge\">\n", "<title>0->1</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M194.1401,-213.7677C188.6531,-205.0798 182.7796,-195.7801 177.1041,-186.794\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"180.0416,-184.8906 171.7424,-178.3046 174.1232,-188.6285 180.0416,-184.8906\"/>\n", "<text text-anchor=\"middle\" x=\"166.2359\" y=\"-198.4907\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">True</text>\n", "</g>\n", "<!-- 4 -->\n", "<g id=\"node5\" class=\"node\">\n", "<title>4</title>\n", "<polygon fill=\"#399de5\" stroke=\"#000000\" points=\"350.3666,-171 233.4703,-171 233.4703,-107 350.3666,-107 350.3666,-171\"/>\n", "<text text-anchor=\"middle\" x=\"291.9185\" y=\"-155.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"291.9185\" y=\"-141.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 1</text>\n", "<text text-anchor=\"middle\" x=\"291.9185\" y=\"-127.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [0, 1]</text>\n", "<text text-anchor=\"middle\" x=\"291.9185\" y=\"-113.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = suspicious</text>\n", "</g>\n", "<!-- 0->4 -->\n", "<g id=\"edge4\" class=\"edge\">\n", "<title>0->4</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M244.0409,-213.7677C251.0702,-202.7904 258.7251,-190.8362 265.8101,-179.772\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"268.9019,-181.434 271.3471,-171.1252 263.0069,-177.6591 268.9019,-181.434\"/>\n", "<text text-anchor=\"middle\" x=\"276.7012\" y=\"-191.3451\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">False</text>\n", "</g>\n", "<!-- 2 -->\n", "<g id=\"node3\" class=\"node\">\n", "<title>2</title>\n", "<polygon fill=\"#e99254\" stroke=\"#000000\" points=\"137.7555,-64 .0814,-64 .0814,0 137.7555,0 137.7555,-64\"/>\n", "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-48.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.212</text>\n", "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 116</text>\n", "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [102, 14]</text>\n", "<text text-anchor=\"middle\" x=\"68.9185\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 1->2 -->\n", "<g id=\"edge2\" class=\"edge\">\n", "<title>1->2</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M118.317,-99.7647C111.8591,-90.9057 104.9907,-81.4838 98.4936,-72.571\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"101.1227,-70.236 92.4037,-64.2169 95.4661,-74.3595 101.1227,-70.236\"/>\n", "</g>\n", "<!-- 3 -->\n", "<g id=\"node4\" class=\"node\">\n", "<title>3</title>\n", "<polygon fill=\"#e58139\" stroke=\"#000000\" points=\"293.7555,-64 156.0814,-64 156.0814,0 293.7555,0 293.7555,-64\"/>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-48.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gini = 0.0</text>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 48</text>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = [48, 0]</text>\n", "<text text-anchor=\"middle\" x=\"224.9185\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">class = not suspicious</text>\n", "</g>\n", "<!-- 1->3 -->\n", "<g id=\"edge3\" class=\"edge\">\n", "<title>1->3</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M175.5199,-99.7647C181.9778,-90.9057 188.8462,-81.4838 195.3433,-72.571\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"198.3708,-74.3595 201.4333,-64.2169 192.7142,-70.236 198.3708,-74.3595\"/>\n", "</g>\n", "</g>\n", "</svg>\n" ], "text/plain": [ "<graphviz.files.Source at 0x11e6d9898>" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn import tree\n", "import graphviz\n", "\n", "label_names = ['not suspicious', 'suspicious']\n", "feature_names = X.columns\n", "\n", "dot_data = tree.export_graphviz(clf,\n", " feature_names=feature_names, \n", " filled=True,\n", " class_names=label_names) \n", "graph = graphviz.Source(dot_data) \n", "graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A random forest is usually even better\n", "\n", "Although in this case our inputs are terrible so it's still not very good. Garbage in, garbage out.\n", "\n", "We'l change our classifier to be\n", "\n", "```python\n", "clf = RandomForestClassifier(n_estimators=100)\n", "```\n", "\n", "and it will use 100 decision **trees** to make a **forest**." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=100,\n", " n_jobs=None, oob_score=False, random_state=None,\n", " verbose=0, warm_start=False)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.ensemble import RandomForestClassifier\n", "\n", "X = train_df.drop(columns='is_suspicious')\n", "y = train_df.is_suspicious\n", "\n", "clf = RandomForestClassifier(n_estimators=100)\n", "\n", "clf.fit(X, y)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Predicted not suspicious</th>\n", " <th>Predicted suspicious</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>Is not suspicious</th>\n", " <td>150</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>Is suspicious</th>\n", " <td>13</td>\n", " <td>2</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Predicted not suspicious Predicted suspicious\n", "Is not suspicious 150 0\n", "Is suspicious 13 2" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics import confusion_matrix\n", "\n", "y_true = y\n", "y_pred = clf.predict(X)\n", "\n", "matrix = confusion_matrix(y_true, y_pred)\n", "\n", "label_names = pd.Series(['not suspicious', 'suspicious'])\n", "pd.DataFrame(matrix,\n", " columns='Predicted ' + label_names,\n", " index='Is ' + label_names)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " <style>\n", " table.eli5-weights tr:hover {\n", " filter: brightness(85%);\n", " }\n", "</style>\n", "\n", "\n", "\n", " \n", " <p>Explained as: feature importances</p>\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", "\n", " \n", "\n", " \n", " \n", " <pre>\n", "Random forest feature importances; values are numbers 0 <= x <= 1;\n", "all values sum to 1.\n", "</pre>\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", " <table class=\"eli5-weights eli5-feature-importances\" style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto;\">\n", " <thead>\n", " <tr style=\"border: none;\">\n", " <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">Weight</th>\n", " <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.2394\n", " \n", " ± 0.3216\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " did not deploy\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 81.01%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.2223\n", " \n", " ± 0.3465\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " airbag\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 81.75%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.2101\n", " \n", " ± 0.4087\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " air bag\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 85.29%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.1543\n", " \n", " ± 0.2972\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " violent\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 86.85%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.1314\n", " \n", " ± 0.2854\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " explode\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 94.59%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.0369\n", " \n", " ± 0.0527\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " failed\n", " </td>\n", " </tr>\n", " \n", " <tr style=\"background-color: hsl(120, 100.00%, 98.54%); border: none;\">\n", " <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n", " 0.0057\n", " \n", " ± 0.0190\n", " \n", " </td>\n", " <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n", " shrapnel\n", " </td>\n", " </tr>\n", " \n", " \n", " </tbody>\n", "</table>\n", " \n", "\n", " \n", "\n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", "\n", "\n" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "feature_names = list(X.columns)\n", "\n", "eli5.show_weights(clf, feature_names=feature_names, show=eli5.formatters.fields.ALL)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Review\n", "\n", "In our previous two attempts to tackle Takata airbag investigation, we used a **logistic regression classifier**. This time we're trying a new type called a **random forest**, which performed _slightly_ better (although it could have just been chance).\n", "\n", "Despite this slight improvement, its predictions were still very off." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Discussion topics\n", "\n", "What's wrong here? Why does nothing work for us, even though we keep throwing more machine learning tools at it?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }