4.2 Interpreting coefficients

The coefficient - coef - is what goes in our sentence: “for every increase of 1 in unemployed_pct, life expectancy goes up (or down) Y years”. In this case, the coefficient is -0.5214, so our sentence goes something like this:

For every increase of 1 percentage point in the unemployment rate, life expectancy decreases about 6 months (0.5 years).

It’s a decrease in life expectancy because the coefficient is negative.

4.2.1 Understanding const

Under coef there’s another coefficient we’ve been ignoring named const.

                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
const             81.1377      0.028   2856.410      0.000      81.082      81.193
unemployed_pct    -0.5214      0.005   -115.595      0.000      -0.530      -0.513

The basic idea is that linear regression loves the number zero. By default, linear regression on statsmodels assumes that if you have a unemployed_pct of zero, life_expectancy will also be zero. Linear regression doesn’t really think these things through, does it?

By adding this constant, you tell the linear regression that if unemployed_pct is zero, it’s totally okay for life expectancy to be something else. And in this case since const is 81.1377, that’s what life expectancy is with a zero unemployment rate.

Also, this is the reason for that weird line sm.add_const line in the regression that we didn’t talk about before. What it does is add a column that’s always 1 to our X, which is a sign to the regression that it’s okay to not start at zero. Take a look:

X = df[['unemployed_pct']]
X = sm.add_constant(X)
X

##        const  unemployed_pct
## 0        1.0        3.474903
## 1        1.0        6.701329
## 2        1.0        6.308411
## 3        1.0        2.695779
## 4        1.0        6.654991
## ...      ...             ...
## 65657    1.0        2.599922
## 65658    1.0        4.372723
## 65659    1.0        6.232427
## 65660    1.0        2.521856
## 65661    1.0        3.797019
## 
## [65662 rows x 2 columns]