5 Improving our analysis

While our analysis is technically correct, there’s a real-life problem: unemployment rate isn’t the only thing that might affect life expectancy. There are plenty of other signals, and the AP went ahead and dug out a few possibilities.

  • Unemployment
  • Percent of people just above the poverty line
  • Racial breakdown of a census tract
  • Median income
  • Educational attainment

They grabbed the appropriate tables from the Census (see the data section of this chapter) and made an analysis that took in multiple factors in change of life expectancy across different census tracts, not just unemployment. This is called multivariate linear regression.

Let’s reproduce their model with our own census data.

5.0.1 Reading in and processing our data

Thankfully Social Explorer combined all of those different tables into one csv file for us, which makes reading in our data super convenient. Although the file is from Social Explorer, though, so we again need to make sure we specify Latin-1 encoding when opening the file.

census = pd.read_csv("data/R12221550_SL140.csv", encoding='latin-1')
census.head()
Geo_FIPS Geo_GEOID Geo_NAME Geo_QName Geo_STUSAB Geo_SUMLEV Geo_GEOCOMP Geo_FILEID Geo_LOGRECNO Geo_US Geo_REGION Geo_DIVISION Geo_STATECE Geo_STATE Geo_COUNTY Geo_COUSUB Geo_PLACE Geo_PLACESE Geo_TRACT Geo_BLKGRP Geo_CONCIT Geo_AIANHH Geo_AIANHHFP Geo_AIHHTLI Geo_AITSCE Geo_AITS Geo_ANRC Geo_CBSA Geo_CSA Geo_METDIV Geo_MACC Geo_MEMI Geo_NECTA Geo_CNECTA Geo_NECTADIV Geo_UA Geo_UACP Geo_CDCURR Geo_SLDU Geo_SLDL Geo_VTD Geo_ZCTA3 Geo_ZCTA5 Geo_SUBMCD Geo_SDELM Geo_SDSEC Geo_SDUNI Geo_UR Geo_PCI Geo_TAZ Geo_UGA Geo_BTTR Geo_BTBG Geo_PUMA5 Geo_PUMA1 ACS15_5yr_B03002001 ACS15_5yr_B03002002 ACS15_5yr_B03002003 ACS15_5yr_B03002004 ACS15_5yr_B03002005 ACS15_5yr_B03002006 ACS15_5yr_B03002007 ACS15_5yr_B03002008 ACS15_5yr_B03002009 ACS15_5yr_B03002010 ACS15_5yr_B03002011 ACS15_5yr_B03002012 ACS15_5yr_B03002013 ACS15_5yr_B03002014 ACS15_5yr_B03002015 ACS15_5yr_B03002016 ACS15_5yr_B03002017 ACS15_5yr_B03002018 ACS15_5yr_B03002019 ACS15_5yr_B03002020 ACS15_5yr_B03002021 ACS15_5yr_B03002001s ACS15_5yr_B03002002s ACS15_5yr_B03002003s ACS15_5yr_B03002004s ACS15_5yr_B03002005s ACS15_5yr_B03002006s ACS15_5yr_B03002007s ACS15_5yr_B03002008s ACS15_5yr_B03002009s ACS15_5yr_B03002010s ACS15_5yr_B03002011s ACS15_5yr_B03002012s ACS15_5yr_B03002013s ACS15_5yr_B03002014s ACS15_5yr_B03002015s ACS15_5yr_B03002016s ACS15_5yr_B03002017s ACS15_5yr_B03002018s ACS15_5yr_B03002019s ACS15_5yr_B03002020s ACS15_5yr_B03002021s ACS15_5yr_B06009001 ACS15_5yr_B06009002 ACS15_5yr_B06009003 ACS15_5yr_B06009004 ACS15_5yr_B06009005 ACS15_5yr_B06009006 ACS15_5yr_B06009007 ACS15_5yr_B06009008 ACS15_5yr_B06009009 ACS15_5yr_B06009010 ACS15_5yr_B06009011 ACS15_5yr_B06009012 ACS15_5yr_B06009013 ACS15_5yr_B06009014 ACS15_5yr_B06009015 ACS15_5yr_B06009016 ACS15_5yr_B06009017 ACS15_5yr_B06009018 ACS15_5yr_B06009019 ACS15_5yr_B06009020 ACS15_5yr_B06009021 ACS15_5yr_B06009022 ACS15_5yr_B06009023 ACS15_5yr_B06009024 ACS15_5yr_B06009025 ACS15_5yr_B06009026 ACS15_5yr_B06009027 ACS15_5yr_B06009028 ACS15_5yr_B06009029 ACS15_5yr_B06009030 ACS15_5yr_B06009001s ACS15_5yr_B06009002s ACS15_5yr_B06009003s ACS15_5yr_B06009004s ACS15_5yr_B06009005s ACS15_5yr_B06009006s ACS15_5yr_B06009007s ACS15_5yr_B06009008s ACS15_5yr_B06009009s ACS15_5yr_B06009010s ACS15_5yr_B06009011s ACS15_5yr_B06009012s ACS15_5yr_B06009013s ACS15_5yr_B06009014s ACS15_5yr_B06009015s ACS15_5yr_B06009016s ACS15_5yr_B06009017s ACS15_5yr_B06009018s ACS15_5yr_B06009019s ACS15_5yr_B06009020s ACS15_5yr_B06009021s ACS15_5yr_B06009022s ACS15_5yr_B06009023s ACS15_5yr_B06009024s ACS15_5yr_B06009025s ACS15_5yr_B06009026s ACS15_5yr_B06009027s ACS15_5yr_B06009028s ACS15_5yr_B06009029s ACS15_5yr_B06009030s ACS15_5yr_C17002001 ACS15_5yr_C17002002 ACS15_5yr_C17002003 ACS15_5yr_C17002004 ACS15_5yr_C17002005 ACS15_5yr_C17002006 ACS15_5yr_C17002007 ACS15_5yr_C17002008 ACS15_5yr_C17002001s ACS15_5yr_C17002002s ACS15_5yr_C17002003s ACS15_5yr_C17002004s ACS15_5yr_C17002005s ACS15_5yr_C17002006s ACS15_5yr_C17002007s ACS15_5yr_C17002008s ACS15_5yr_B19013001 ACS15_5yr_B19013001s ACS15_5yr_B23025001 ACS15_5yr_B23025002 ACS15_5yr_B23025003 ACS15_5yr_B23025004 ACS15_5yr_B23025005 ACS15_5yr_B23025006 ACS15_5yr_B23025007 ACS15_5yr_B23025001s ACS15_5yr_B23025002s ACS15_5yr_B23025003s ACS15_5yr_B23025004s ACS15_5yr_B23025005s ACS15_5yr_B23025006s ACS15_5yr_B23025007s
1001020100 14000US01001020100 Census Tract 201, Autauga County, Alabama Census Tract 201, Autauga County, Alabama al 140 0 ACSSF 1760 NaN NaN NaN NaN 1 1 NaN NaN NaN 20100 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1948 1931 1703 150 6 12 0 0 60 0 60 17 17 0 0 0 0 0 0 0 0 123.0303 128.4848 138.7879 76.36364 4.848485 9.696970 6.666667 6.666667 26.66667 6.666667 26.66667 12.727273 12.727273 6.666667 6.666667 6.666667 6.666667 6.666667 6.666667 6.666667 6.666667 1243 184 459 258 166 176 711 118 262 156 84 91 468 44 193 76 82 73 31 5 0 26 0 0 33 17 4 0 0 12 81.81818 44.24242 80.60606 40.60606 36.96970 42.42424 70.90909 30.90909 62.42424 36.36364 32.72727 26.66667 67.87879 24.84848 51.51515 17.57576 23.63636 24.24242 18.787879 5.454546 6.666667 17.575758 6.666667 6.666667 18.18182 12.72727 4.242424 6.666667 6.666667 11.515151 1948 26 132 81 101 125 16 1467 123.0303 18.78788 60.60606 40.60606 58.18182 60.00000 10.90909 127.2727 61838 7212.121 1554 997 997 943 54 0 557 92.12121 85.45455 85.45455 83.63636 18.78788 6.666667 67.87879
1001020200 14000US01001020200 Census Tract 202, Autauga County, Alabama Census Tract 202, Autauga County, Alabama al 140 0 ACSSF 1761 NaN NaN NaN NaN 1 1 NaN NaN NaN 20200 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2156 2139 872 1149 0 50 0 0 68 0 68 17 14 0 0 0 0 3 0 0 0 162.4242 162.4242 125.4545 151.51515 6.666667 36.969697 6.666667 6.666667 37.57576 6.666667 37.57576 15.151515 13.939394 6.666667 6.666667 6.666667 6.666667 4.242424 6.666667 6.666667 6.666667 1397 356 496 342 133 70 1102 295 391 275 94 47 243 43 86 58 33 23 9 0 0 9 0 0 43 18 19 0 6 0 101.21212 69.69697 72.12121 47.27273 29.09091 21.21212 95.15152 68.48485 69.09091 46.06061 24.24242 16.96970 33.93939 22.42424 26.06061 20.00000 13.93939 9.69697 9.090909 6.666667 6.666667 9.090909 6.666667 6.666667 28.48485 18.18182 20.000000 6.666667 4.848485 6.666667 1983 185 320 232 58 34 25 1129 155.1515 110.90909 74.54545 88.48485 25.45455 18.18182 16.96970 144.8485 32303 8204.848 1731 884 869 753 116 15 847 143.03030 115.15152 114.54545 107.27273 38.18182 14.545454 86.66667
1001020300 14000US01001020300 Census Tract 203, Autauga County, Alabama Census Tract 203, Autauga County, Alabama al 140 0 ACSSF 1762 NaN NaN NaN NaN 1 1 NaN NaN NaN 20300 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2968 2968 2212 551 15 41 8 0 141 0 141 0 0 0 0 0 0 0 0 0 0 244.8485 244.8485 225.4545 115.15152 13.333333 37.575758 8.484849 6.666667 81.81818 6.666667 81.81818 6.666667 6.666667 6.666667 6.666667 6.666667 6.666667 6.666667 6.666667 6.666667 6.666667 2074 221 747 674 240 192 1330 119 548 414 135 114 683 87 184 252 82 78 31 0 0 8 23 0 30 15 15 0 0 0 154.54545 53.33333 97.57576 101.21212 49.09091 46.06061 123.63636 35.15152 76.36364 92.72727 34.54545 36.36364 112.12121 34.54545 59.39394 64.84848 33.93939 28.48485 20.000000 6.666667 6.666667 8.484849 17.575758 6.666667 26.66667 16.96970 13.939394 6.666667 6.666667 6.666667 2968 164 213 148 207 82 520 1634 244.8485 138.18182 70.30303 60.60606 78.18182 39.39394 189.09091 175.1515 44922 3411.515 2462 1472 1464 1373 91 8 990 169.09091 132.12121 134.54545 123.03030 31.51515 8.484849 120.60606
1001020400 14000US01001020400 Census Tract 204, Autauga County, Alabama Census Tract 204, Autauga County, Alabama al 140 0 ACSSF 1763 NaN NaN NaN NaN 1 1 NaN NaN NaN 20400 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4423 3959 3662 162 69 0 0 48 18 5 13 464 30 42 0 0 0 372 20 20 0 298.7879 213.9394 207.8788 80.60606 49.090909 6.666667 6.666667 49.696970 10.30303 4.848485 9.69697 264.848485 20.000000 30.303030 6.666667 6.666667 6.666667 276.363636 17.575758 17.575758 6.666667 2899 339 1044 806 453 257 1623 154 605 458 305 101 1107 106 410 301 148 142 36 0 6 30 0 0 133 79 23 17 0 14 156.36364 78.78788 117.57576 97.57576 76.36364 51.51515 129.69697 43.03030 97.57576 72.72727 67.87879 27.27273 107.27273 36.96970 89.09091 56.96970 38.78788 37.57576 17.575758 6.666667 6.060606 16.363636 6.666667 6.666667 58.18182 60.00000 17.575758 12.121212 6.666667 12.727273 4423 18 74 141 182 583 201 3224 298.7879 17.57576 41.81818 53.33333 58.18182 188.48485 140.00000 331.5152 54329 4244.242 3424 2013 1998 1782 216 15 1411 197.57576 157.57576 161.81818 132.12121 58.78788 14.545454 127.87879
1001020500 14000US01001020500 Census Tract 205, Autauga County, Alabama Census Tract 205, Autauga County, Alabama al 140 0 ACSSF 1764 NaN NaN NaN NaN 1 1 NaN NaN NaN 20500 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 10763 10683 7368 2674 0 412 0 0 229 49 180 80 80 0 0 0 0 0 0 0 0 378.1818 373.3333 482.4242 449.69697 10.909091 146.666667 10.909091 10.909091 100.60606 44.848485 90.30303 43.030303 43.030303 10.909091 10.909091 10.909091 10.909091 10.909091 10.909091 10.909091 10.909091 6974 310 1674 1999 1829 1162 3243 127 967 832 865 452 3432 159 573 1129 884 687 127 0 14 38 52 23 172 24 120 0 28 0 265.45455 102.42424 223.63636 263.03030 255.15152 216.96970 290.30303 55.75758 192.72727 198.78788 183.63636 127.27273 283.63636 82.42424 120.60606 210.30303 152.12121 157.57576 58.787879 10.909091 14.545454 38.181818 26.666667 23.636364 93.93939 27.27273 69.696970 10.909091 29.090909 10.909091 10563 251 952 256 1064 289 89 7662 369.6970 94.54545 521.21212 113.33333 385.45455 162.42424 52.12121 641.8182 51965 4203.030 8198 5461 5258 5037 221 203 2737 321.81818 339.39394 356.96970 369.09091 89.09091 103.030303 273.93939

We thought our unemployment dataset had a lot of columns, but this has even more! If we’re curious, we can take a look:

census.shape
## (74001, 189)

Seeing as how that’s going to be too many to reasonably drop, and we’re doing a lot of percentage calculations for our new dataset (% unemployed, % just above poverty line, % white, etc), we’re just going to build a brand-new dataframe all at once. It might look a little cleaner than the alternatives.

By looking at our data dictionary R12221550.txt, we’re able to figure out what each of the codes mean, which ones we’re interested in, and what the proper calculations are to calculate percentages.

census_reg = pd.DataFrame({
    'Geo_FIPS': census.Geo_FIPS,
    # (1.00 to 1.24 + 1.25 to 1.49) / total
    'ritp_100_149_pct': (census.ACS15_5yr_C17002004 + 
                       census.ACS15_5yr_C17002005) / census.ACS15_5yr_C17002001 * 100,
    # Black Alone, Non-Hispanic
    'black_pct': census.ACS15_5yr_B03002004 / census.ACS15_5yr_B03002001 * 100, 
    # White Alone, Non-Hispanic
    'white_pct': census.ACS15_5yr_B03002003 / census.ACS15_5yr_B03002001 * 100,
    # All Hispanic
    'hisp_pct': census.ACS15_5yr_B03002012 / census.ACS15_5yr_B03002001 * 100,
    # Population 16 Years and Over: in Labor Force: Civilian Labor Force: Unemployed
    'unemployed_pct': census.ACS15_5yr_B23025005 / census.ACS15_5yr_B23025001 * 100,
    # Educational attainment less than high school
    'ea_less_than_hs_pct': census.ACS15_5yr_B06009002 / census.ACS15_5yr_B06009001 * 100,
    # Median income in 10,000s
    'median_income_10k': census.ACS15_5yr_B19013001 / 10000
})

census_reg.head()
Geo_FIPS ritp_100_149_pct black_pct white_pct hisp_pct unemployed_pct ea_less_than_hs_pct median_income_10k
1001020100 9.342916 7.700205 87.42300 0.8726899 3.474903 14.802896 6.1838
1001020200 14.624307 53.293135 40.44527 0.7884972 6.701329 25.483178 3.2303
1001020300 11.960916 18.564690 74.52830 0.0000000 3.696182 10.655738 4.4922
1001020400 7.302736 3.662672 82.79448 10.4906172 6.308411 11.693687 5.4329
1001020500 12.496450 24.844374 68.45675 0.7432872 2.695779 4.445082 5.1965

TALK MORE ABOUT THE RITP AND FEATURE SELECTION, MAYBE ITS A SUBSECTION

The black, white, and Hispanic table and calculations are especially interesting from a “selecting your data” perspective. I recommend reading the data section of the Milwaukee Journal Sentinel’s pothole-filling analysis to learn a bit more about that.

Now that we have a nice dataframe, we can merge our data the exact same way we did before.

df = life_expect.merge(census_reg, left_on='Tract ID', right_on='Geo_FIPS')
df.head()
Tract ID life_expectancy Geo_FIPS ritp_100_149_pct black_pct white_pct hisp_pct unemployed_pct ea_less_than_hs_pct median_income_10k
0 1001020100 73.1 1001020100 9.342916 7.700205 87.42300 0.8726899 3.474903 14.802896 6.1838
1 1001020200 76.9 1001020200 14.624307 53.293135 40.44527 0.7884972 6.701329 25.483178 3.2303
2 1001020400 75.4 1001020400 7.302736 3.662672 82.79448 10.4906172 6.308411 11.693687 5.4329
3 1001020500 79.4 1001020500 12.496450 24.844374 68.45675 0.7432872 2.695779 4.445082 5.1965
4 1001020600 73.1 1001020600 10.854324 11.918982 72.91613 13.0615425 6.654991 17.487267 6.3092

Looks good, and all set for analysis