# Evaluating a logistic regression#

We've been running willy-nilly doing logistic regressions in these past few sections, but we haven't taken the chance to sit down and think **are they even of acceptable quality?**

In this section we'll discuss what makes a logistic regression worthwhile, along with how to analyze all the features you've selected.

## Our dataset#

We're going to use the updated scarf-making dataset from our last section. We're cool, we're crafty, but we're also not very good at finishing scarves we've set out to knit!

```
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
df = pd.DataFrame([
{ 'length_in': 55, 'large_gauge': 1, 'color': 'orange', 'completed': 1 },
{ 'length_in': 55, 'large_gauge': 0, 'color': 'orange', 'completed': 1 },
{ 'length_in': 55, 'large_gauge': 0, 'color': 'brown', 'completed': 1 },
{ 'length_in': 60, 'large_gauge': 0, 'color': 'brown', 'completed': 1 },
{ 'length_in': 60, 'large_gauge': 0, 'color': 'grey', 'completed': 0 },
{ 'length_in': 70, 'large_gauge': 0, 'color': 'grey', 'completed': 1 },
{ 'length_in': 70, 'large_gauge': 0, 'color': 'orange', 'completed': 0 },
{ 'length_in': 82, 'large_gauge': 1, 'color': 'grey', 'completed': 1 },
{ 'length_in': 82, 'large_gauge': 0, 'color': 'brown', 'completed': 0 },
{ 'length_in': 82, 'large_gauge': 0, 'color': 'orange', 'completed': 0 },
{ 'length_in': 82, 'large_gauge': 1, 'color': 'brown', 'completed': 0 },
])
df
```

Now let's ask some questions about it.

## Statistical significance#

Just because you run a regression **doesn't mean the results are true!** The standard way of judging whether you can trust what a regression is telling you is called the **p-value**. Let's take a look at our most recent regression, and figure out where the p-value is and what it means.

```
model = smf.logit("completed ~ length_in + large_gauge + C(color, Treatment('orange'))", data=df)
results = model.fit()
results.summary()
```

This is our logistic regression on scarf completion: given a scarf's intended length, color, and the size of our needles, can we finish it? Instead of looking at the coefficients and odds ratios, let's **peek at the regression's p value.**

The p value is listed as **LLR p-value** (bottom of the top right area), and it's the certainty we can have in our results. You can think of it as the percent chance that the regression can create a meaningful representation of us completing a scarf.

Typically a p value of 0.05 (or 5%) is thought of as "good" or "statistically significant," as there's only a 5% or less chance that these results aren't valid. **Our p-value is 0.2138, which is frankly terrible.** I don't think we can use this regression for anything!

### p-values for features#

Beyond p values for the entire regression, you can also find p-values for each individual feature. They're listed under `P>|z|`

down in the bottom features section.

Notice that the p values for brown is at the nightmarish level of above 80%! Grey is also incredibly high, at around 0.5 (not to be confused with *0.05*).

Honestly, **we probably shouldn't have added those in the first place.** You should only be adding features when you have an argument as to why they'd affect the outcome. Bad features have a tendency to not only ruin your entire regression's p value, they also screw around with features that are actually valid and good!

Let's remove the color category from our regression and try again.

```
model = smf.logit("completed ~ length_in + large_gauge", data=df)
results = model.fit()
results.summary()
```

Our regression's overall p value at 0.07 is now looking a lot closer to statistical significance. Removing unnecessary features improved our regression's chance of being meaningful - **more features isn't always better!**

## Model quality#

When we're looking at a linear regression, we spend a lot of time on **R-squared** values. In logistic regression, we don't have R-squared, but we *kind of* do. They're called (somewhat appropriately) **pseudo R-squared** values.

Pseudo R-squared is listed as **Pseudo R-sq.** up top.

Your pseudo R-squared is on a scale from 0 to 1, with higher values meaning a better fit. Unlike linear regression's R-squared, though, **you can't use it to say "we're explaining such-and-such of the variation."** You can only use it to say "this model is better than that model."

Let's compare the pseudo R-squared value from our `length_in + large_gauge`

regression to the `length_in + large_gauge + color`

regression.

It looks like the more complicated regression wins! Does that mean we run back to the regression that includes color, crying and apologizing for casting it aside?

Not even for a moment! **Because the color-including regression's p-value is so high - over 0.2 - we definitely shouldn't take it seriously.** We should only listen to a regression or a coefficient if its p-value is in a respectable place (generally speaking, under 0.05).

Note:Our p-values are generally going to be terrible because we have small datasets that involved me semi-randomly typing numbers. There isn't a secret trick where we're going to hit that 0.05 threshold and solve all our knitting problems forever, sorry.

### Pseudo R-squareds#

Let's have a little chat about logistic regression pseudo R-squareds for a quick second. It turns out there are actually *multiple versions of pseudo R-squared for logistic regression*. Literally different calculations that give different numbers, all called pseudo R-squared! It's a lot more complicated than linear regression, I guess.

The pseudo R-squared one we're using here is called **McFadden's R-squared**. Other pieces of statistical modeling software use difference calculations! I'm calling out both the complicated nature of determining "goodness of fit" and logistic regression R-squared measurements because it leads to great blog posts like this one:

For years, I’ve been recommending the Cox and Snell R2 over the McFadden R2, but I’ve recently concluded that that was a mistake. I now believe that McFadden’s R2 is a better choice. However, I’ve also learned about another R2 that has good properties, a lot of intuitive appeal, and is easily calculated. At the moment, I like it better than the McFadden R2. But I’m not going to make a definite recommendation until I get more experience with it.

First off, *it's a secret, he's keeping his favorite R-squared technique a secret!!!* I love that *so much*.

Second, I probably understand about as much of that as you do, but here's the point: non-math people often think that numbers automatically make things true or false, that there's exactly one way to do things, and that math can give you definitive answers. **This is rarely true, especially when we're talking about real-life data!**

The uncertainty in numbers is a big reason why even if you know enough stats to get by, **you should always be running your analyses by someone who knows more**, or someone who is a domain expert. And even the experts don't always agree with one another! On top of that, **readers really trust numbers,** and you need to go above and beyond to make sure you're explaining them correctly.

```
```

## Review#

In this section, we talked about **evaluating logistic regression models and features.**

Unlike judging the quality of a linear regression, we don't have an **R-squared** to explain goodness of fit. Instead we only have a **pseudo R-squared**. Like in linear regression, we can use pseudo R-squared to compare two different regressions. Higher is better!

The most important measure in your regression is going to be your **p value**, which is used to measure **statistical significance** (aka the chance your data is a happy accident, not actually meaningful). Traditionally 0.05 is the cutoff, which means there's a less than 5% chance that your findings were made by chance.

Whether your regression does or does not hit the p-value threshold, you can also examine the **p values of your features**. Removing features with high p values tends to improve your regression, as your regression no longer needs to pay attention to the noise they add.

## Discussion topics#

TODO

there’s literally nothing you can do right in stats

which also means there’s no wrong way to do them

```
```