Examining life expectancy at the local level from The Associated Press

Examining life expectancy at the local level

Combine geographically granular life expectancy data with the American Community Survey to see how poverty, education, income, and demographics can affect a community.

linear regression multivariable regression collinearity census data

Readings and links

Unemployment, income affect life expectancy
USALEEP, the U.S. Small-area Life Expectancy Estimates Project

Summary

What relationship does life expectancy have to metrics like unemployment and high school graduation rates? Using USALEEP life expectancy estimates combined with census data, the Associated Press performed a regression analysis to find out.

This chapter covers the basics of linear regression, including how to read coefficients and translate them into a "human-friendly" form. It also provides a brief introduction to Census data.

Notebooks, Assignments, and Walkthroughs

Complete walkthrough

Walkthrough of obtaining the datasets, then building a simple linear regression and improving it into a multivariable regression.

Read online

Multi-page walkthrough

Simple logistic regression using statsmodels (formula version)

Using the statsmodels package, we perform a series of regressions between life expectancy and Census data. This notebook uses the formula-based technique when performing the regression (uses Patsy, similar to R formulas).

Read online

Jupyter Notebook

Download notebook

Jupyter Notebook

Interactive version

Jupyter Notebook

Simple logistic regression using statsmodels (dataframes version)

Using the statsmodels package, we perform a series of regressions between life expectancy and Census data. This notebook uses the dataframes technique when performing the regression.

Read online

Jupyter Notebook

Download notebook

Jupyter Notebook

Interactive version

Jupyter Notebook

Discussion topics

For this project we used American Community Survey instead of the Current Population Survey. We were just listening to the Census, but what might be repercussions of this decision?

USALEEP is for "census-tract life expectancy at birth for the period 2010-2015," but our dataset is for people living between roughly 2010-2015. Can we/should we make judgments based on that sort of disconnect?

At what point might p values have on our research?

Many demographic factors are related to one another, giving rise to a problem called collinearity. What could we do to prevent collinearity being an issue in our research?

There are many, many measures of unemployment with a lot of specifics involved in each. How do you feel about the decision we made when picking our specific measure of unemployment? (Maybe don't click those links, just talk about What Might Be Important, or How Important It Might Be)

Do we need to talk to an expert about this analysis before we publish? Where could we find an expert that might be useful?

Examining life expectancy at the local level

Readings and links

Summary

Notebooks, Assignments, and Walkthroughs

Complete walkthrough

Simple logistic regression using statsmodels (formula version)

Simple logistic regression using statsmodels (dataframes version)

Discussion topics

Text analysis

Putting things in categories automatically

How X affects Y

Python data science reference

All Projects