4.2 Interpreting coefficients
The coefficient - coef
- is what goes in our sentence: “for every increase of 1 in unemployed_pct, life expectancy goes up (or down) Y years”. In this case, the coefficient is -0.5214, so our sentence goes something like this:
For every increase of 1 percentage point in the unemployment rate, life expectancy decreases about 6 months (0.5 years).
It’s a decrease in life expectancy because the coefficient is negative.
4.2.1 Understanding const
Under coef
there’s another coefficient we’ve been ignoring named const
.
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------
const 81.1377 0.028 2856.410 0.000 81.082 81.193
unemployed_pct -0.5214 0.005 -115.595 0.000 -0.530 -0.513
The basic idea is that linear regression loves the number zero. By default, linear regression on statsmodels assumes that if you have a unemployed_pct
of zero, life_expectancy
will also be zero. Linear regression doesn’t really think these things through, does it?
By adding this constant, you tell the linear regression that if unemployed_pct
is zero, it’s totally okay for life expectancy to be something else. And in this case since const
is 81.1377
, that’s what life expectancy is with a zero unemployment rate.
Also, this is the reason for that weird line sm.add_const
line in the regression that we didn’t talk about before. What it does is add a column that’s always 1
to our X
, which is a sign to the regression that it’s okay to not start at zero. Take a look:
## const unemployed_pct
## 0 1.0 3.474903
## 1 1.0 6.701329
## 2 1.0 6.308411
## 3 1.0 2.695779
## 4 1.0 6.654991
## ... ... ...
## 65657 1.0 2.599922
## 65658 1.0 4.372723
## 65659 1.0 6.232427
## 65660 1.0 2.521856
## 65661 1.0 3.797019
##
## [65662 rows x 2 columns]