Examining life expectancy at the local level

Combine geographically granular life expectancy data with the American Community Survey to see how poverty, education, income, and demographics can affect a community.

linear regression multivariable regression collinearity census data

Readings and links

Summary

What relationship does life expectancy have to metrics like unemployment and high school graduation rates? Using USALEEP life expectancy estimates combined with census data, the Associated Press performed a regression analysis to find out.

This chapter covers the basics of linear regression, including how to read coefficients and translate them into a "human-friendly" form. It also provides a brief introduction to Census data.

Notebooks, Assignments, and Walkthroughs

Complete walkthrough

Walkthrough of obtaining the datasets, then building a simple linear regression and improving it into a multivariable regression.

Multi-page walkthrough

Simple logistic regression using statsmodels (formula version)

Using the statsmodels package, we perform a series of regressions between life expectancy and Census data. This notebook uses the formula-based technique when performing the regression (uses Patsy, similar to R formulas).

Simple logistic regression using statsmodels (dataframes version)

Using the statsmodels package, we perform a series of regressions between life expectancy and Census data. This notebook uses the dataframes technique when performing the regression.

Discussion topics

For this project we used American Community Survey instead of the Current Population Survey. We were just listening to the Census, but what might be repercussions of this decision?

USALEEP is for "census-tract life expectancy at birth for the period 2010-2015," but our dataset is for people living between roughly 2010-2015. Can we/should we make judgments based on that sort of disconnect?

At what point might p values have on our research?

Many demographic factors are related to one another, giving rise to a problem called collinearity. What could we do to prevent collinearity being an issue in our research?

There are many, many measures of unemployment with a lot of specifics involved in each. How do you feel about the decision we made when picking our specific measure of unemployment? (Maybe don't click those links, just talk about What Might Be Important, or How Important It Might Be)

Do we need to talk to an expert about this analysis before we publish? Where could we find an expert that might be useful?