5 Improving our analysis
While our analysis is technically correct, there’s a real-life problem: unemployment rate isn’t the only thing that might affect life expectancy. There are plenty of other signals, and the AP went ahead and dug out a few possibilities.
- Unemployment
- Percent of people just above the poverty line
- Racial breakdown of a census tract
- Median income
- Educational attainment
They grabbed the appropriate tables from the Census (see the data section of this chapter) and made an analysis that took in multiple factors in change of life expectancy across different census tracts, not just unemployment. This is called multivariate linear regression.
Let’s reproduce their model with our own census data.
5.0.1 Reading in and processing our data
Thankfully Social Explorer combined all of those different tables into one csv file for us, which makes reading in our data super convenient. Although the file is from Social Explorer, though, so we again need to make sure we specify Latin-1 encoding when opening the file.
Geo_FIPS | Geo_GEOID | Geo_NAME | Geo_QName | Geo_STUSAB | Geo_SUMLEV | Geo_GEOCOMP | Geo_FILEID | Geo_LOGRECNO | Geo_US | Geo_REGION | Geo_DIVISION | Geo_STATECE | Geo_STATE | Geo_COUNTY | Geo_COUSUB | Geo_PLACE | Geo_PLACESE | Geo_TRACT | Geo_BLKGRP | Geo_CONCIT | Geo_AIANHH | Geo_AIANHHFP | Geo_AIHHTLI | Geo_AITSCE | Geo_AITS | Geo_ANRC | Geo_CBSA | Geo_CSA | Geo_METDIV | Geo_MACC | Geo_MEMI | Geo_NECTA | Geo_CNECTA | Geo_NECTADIV | Geo_UA | Geo_UACP | Geo_CDCURR | Geo_SLDU | Geo_SLDL | Geo_VTD | Geo_ZCTA3 | Geo_ZCTA5 | Geo_SUBMCD | Geo_SDELM | Geo_SDSEC | Geo_SDUNI | Geo_UR | Geo_PCI | Geo_TAZ | Geo_UGA | Geo_BTTR | Geo_BTBG | Geo_PUMA5 | Geo_PUMA1 | ACS15_5yr_B03002001 | ACS15_5yr_B03002002 | ACS15_5yr_B03002003 | ACS15_5yr_B03002004 | ACS15_5yr_B03002005 | ACS15_5yr_B03002006 | ACS15_5yr_B03002007 | ACS15_5yr_B03002008 | ACS15_5yr_B03002009 | ACS15_5yr_B03002010 | ACS15_5yr_B03002011 | ACS15_5yr_B03002012 | ACS15_5yr_B03002013 | ACS15_5yr_B03002014 | ACS15_5yr_B03002015 | ACS15_5yr_B03002016 | ACS15_5yr_B03002017 | ACS15_5yr_B03002018 | ACS15_5yr_B03002019 | ACS15_5yr_B03002020 | ACS15_5yr_B03002021 | ACS15_5yr_B03002001s | ACS15_5yr_B03002002s | ACS15_5yr_B03002003s | ACS15_5yr_B03002004s | ACS15_5yr_B03002005s | ACS15_5yr_B03002006s | ACS15_5yr_B03002007s | ACS15_5yr_B03002008s | ACS15_5yr_B03002009s | ACS15_5yr_B03002010s | ACS15_5yr_B03002011s | ACS15_5yr_B03002012s | ACS15_5yr_B03002013s | ACS15_5yr_B03002014s | ACS15_5yr_B03002015s | ACS15_5yr_B03002016s | ACS15_5yr_B03002017s | ACS15_5yr_B03002018s | ACS15_5yr_B03002019s | ACS15_5yr_B03002020s | ACS15_5yr_B03002021s | ACS15_5yr_B06009001 | ACS15_5yr_B06009002 | ACS15_5yr_B06009003 | ACS15_5yr_B06009004 | ACS15_5yr_B06009005 | ACS15_5yr_B06009006 | ACS15_5yr_B06009007 | ACS15_5yr_B06009008 | ACS15_5yr_B06009009 | ACS15_5yr_B06009010 | ACS15_5yr_B06009011 | ACS15_5yr_B06009012 | ACS15_5yr_B06009013 | ACS15_5yr_B06009014 | ACS15_5yr_B06009015 | ACS15_5yr_B06009016 | ACS15_5yr_B06009017 | ACS15_5yr_B06009018 | ACS15_5yr_B06009019 | ACS15_5yr_B06009020 | ACS15_5yr_B06009021 | ACS15_5yr_B06009022 | ACS15_5yr_B06009023 | ACS15_5yr_B06009024 | ACS15_5yr_B06009025 | ACS15_5yr_B06009026 | ACS15_5yr_B06009027 | ACS15_5yr_B06009028 | ACS15_5yr_B06009029 | ACS15_5yr_B06009030 | ACS15_5yr_B06009001s | ACS15_5yr_B06009002s | ACS15_5yr_B06009003s | ACS15_5yr_B06009004s | ACS15_5yr_B06009005s | ACS15_5yr_B06009006s | ACS15_5yr_B06009007s | ACS15_5yr_B06009008s | ACS15_5yr_B06009009s | ACS15_5yr_B06009010s | ACS15_5yr_B06009011s | ACS15_5yr_B06009012s | ACS15_5yr_B06009013s | ACS15_5yr_B06009014s | ACS15_5yr_B06009015s | ACS15_5yr_B06009016s | ACS15_5yr_B06009017s | ACS15_5yr_B06009018s | ACS15_5yr_B06009019s | ACS15_5yr_B06009020s | ACS15_5yr_B06009021s | ACS15_5yr_B06009022s | ACS15_5yr_B06009023s | ACS15_5yr_B06009024s | ACS15_5yr_B06009025s | ACS15_5yr_B06009026s | ACS15_5yr_B06009027s | ACS15_5yr_B06009028s | ACS15_5yr_B06009029s | ACS15_5yr_B06009030s | ACS15_5yr_C17002001 | ACS15_5yr_C17002002 | ACS15_5yr_C17002003 | ACS15_5yr_C17002004 | ACS15_5yr_C17002005 | ACS15_5yr_C17002006 | ACS15_5yr_C17002007 | ACS15_5yr_C17002008 | ACS15_5yr_C17002001s | ACS15_5yr_C17002002s | ACS15_5yr_C17002003s | ACS15_5yr_C17002004s | ACS15_5yr_C17002005s | ACS15_5yr_C17002006s | ACS15_5yr_C17002007s | ACS15_5yr_C17002008s | ACS15_5yr_B19013001 | ACS15_5yr_B19013001s | ACS15_5yr_B23025001 | ACS15_5yr_B23025002 | ACS15_5yr_B23025003 | ACS15_5yr_B23025004 | ACS15_5yr_B23025005 | ACS15_5yr_B23025006 | ACS15_5yr_B23025007 | ACS15_5yr_B23025001s | ACS15_5yr_B23025002s | ACS15_5yr_B23025003s | ACS15_5yr_B23025004s | ACS15_5yr_B23025005s | ACS15_5yr_B23025006s | ACS15_5yr_B23025007s |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1001020100 | 14000US01001020100 | Census Tract 201, Autauga County, Alabama | Census Tract 201, Autauga County, Alabama | al | 140 | 0 | ACSSF | 1760 | NaN | NaN | NaN | NaN | 1 | 1 | NaN | NaN | NaN | 20100 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1948 | 1931 | 1703 | 150 | 6 | 12 | 0 | 0 | 60 | 0 | 60 | 17 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 123.0303 | 128.4848 | 138.7879 | 76.36364 | 4.848485 | 9.696970 | 6.666667 | 6.666667 | 26.66667 | 6.666667 | 26.66667 | 12.727273 | 12.727273 | 6.666667 | 6.666667 | 6.666667 | 6.666667 | 6.666667 | 6.666667 | 6.666667 | 6.666667 | 1243 | 184 | 459 | 258 | 166 | 176 | 711 | 118 | 262 | 156 | 84 | 91 | 468 | 44 | 193 | 76 | 82 | 73 | 31 | 5 | 0 | 26 | 0 | 0 | 33 | 17 | 4 | 0 | 0 | 12 | 81.81818 | 44.24242 | 80.60606 | 40.60606 | 36.96970 | 42.42424 | 70.90909 | 30.90909 | 62.42424 | 36.36364 | 32.72727 | 26.66667 | 67.87879 | 24.84848 | 51.51515 | 17.57576 | 23.63636 | 24.24242 | 18.787879 | 5.454546 | 6.666667 | 17.575758 | 6.666667 | 6.666667 | 18.18182 | 12.72727 | 4.242424 | 6.666667 | 6.666667 | 11.515151 | 1948 | 26 | 132 | 81 | 101 | 125 | 16 | 1467 | 123.0303 | 18.78788 | 60.60606 | 40.60606 | 58.18182 | 60.00000 | 10.90909 | 127.2727 | 61838 | 7212.121 | 1554 | 997 | 997 | 943 | 54 | 0 | 557 | 92.12121 | 85.45455 | 85.45455 | 83.63636 | 18.78788 | 6.666667 | 67.87879 |
1001020200 | 14000US01001020200 | Census Tract 202, Autauga County, Alabama | Census Tract 202, Autauga County, Alabama | al | 140 | 0 | ACSSF | 1761 | NaN | NaN | NaN | NaN | 1 | 1 | NaN | NaN | NaN | 20200 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2156 | 2139 | 872 | 1149 | 0 | 50 | 0 | 0 | 68 | 0 | 68 | 17 | 14 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 162.4242 | 162.4242 | 125.4545 | 151.51515 | 6.666667 | 36.969697 | 6.666667 | 6.666667 | 37.57576 | 6.666667 | 37.57576 | 15.151515 | 13.939394 | 6.666667 | 6.666667 | 6.666667 | 6.666667 | 4.242424 | 6.666667 | 6.666667 | 6.666667 | 1397 | 356 | 496 | 342 | 133 | 70 | 1102 | 295 | 391 | 275 | 94 | 47 | 243 | 43 | 86 | 58 | 33 | 23 | 9 | 0 | 0 | 9 | 0 | 0 | 43 | 18 | 19 | 0 | 6 | 0 | 101.21212 | 69.69697 | 72.12121 | 47.27273 | 29.09091 | 21.21212 | 95.15152 | 68.48485 | 69.09091 | 46.06061 | 24.24242 | 16.96970 | 33.93939 | 22.42424 | 26.06061 | 20.00000 | 13.93939 | 9.69697 | 9.090909 | 6.666667 | 6.666667 | 9.090909 | 6.666667 | 6.666667 | 28.48485 | 18.18182 | 20.000000 | 6.666667 | 4.848485 | 6.666667 | 1983 | 185 | 320 | 232 | 58 | 34 | 25 | 1129 | 155.1515 | 110.90909 | 74.54545 | 88.48485 | 25.45455 | 18.18182 | 16.96970 | 144.8485 | 32303 | 8204.848 | 1731 | 884 | 869 | 753 | 116 | 15 | 847 | 143.03030 | 115.15152 | 114.54545 | 107.27273 | 38.18182 | 14.545454 | 86.66667 |
1001020300 | 14000US01001020300 | Census Tract 203, Autauga County, Alabama | Census Tract 203, Autauga County, Alabama | al | 140 | 0 | ACSSF | 1762 | NaN | NaN | NaN | NaN | 1 | 1 | NaN | NaN | NaN | 20300 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2968 | 2968 | 2212 | 551 | 15 | 41 | 8 | 0 | 141 | 0 | 141 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 244.8485 | 244.8485 | 225.4545 | 115.15152 | 13.333333 | 37.575758 | 8.484849 | 6.666667 | 81.81818 | 6.666667 | 81.81818 | 6.666667 | 6.666667 | 6.666667 | 6.666667 | 6.666667 | 6.666667 | 6.666667 | 6.666667 | 6.666667 | 6.666667 | 2074 | 221 | 747 | 674 | 240 | 192 | 1330 | 119 | 548 | 414 | 135 | 114 | 683 | 87 | 184 | 252 | 82 | 78 | 31 | 0 | 0 | 8 | 23 | 0 | 30 | 15 | 15 | 0 | 0 | 0 | 154.54545 | 53.33333 | 97.57576 | 101.21212 | 49.09091 | 46.06061 | 123.63636 | 35.15152 | 76.36364 | 92.72727 | 34.54545 | 36.36364 | 112.12121 | 34.54545 | 59.39394 | 64.84848 | 33.93939 | 28.48485 | 20.000000 | 6.666667 | 6.666667 | 8.484849 | 17.575758 | 6.666667 | 26.66667 | 16.96970 | 13.939394 | 6.666667 | 6.666667 | 6.666667 | 2968 | 164 | 213 | 148 | 207 | 82 | 520 | 1634 | 244.8485 | 138.18182 | 70.30303 | 60.60606 | 78.18182 | 39.39394 | 189.09091 | 175.1515 | 44922 | 3411.515 | 2462 | 1472 | 1464 | 1373 | 91 | 8 | 990 | 169.09091 | 132.12121 | 134.54545 | 123.03030 | 31.51515 | 8.484849 | 120.60606 |
1001020400 | 14000US01001020400 | Census Tract 204, Autauga County, Alabama | Census Tract 204, Autauga County, Alabama | al | 140 | 0 | ACSSF | 1763 | NaN | NaN | NaN | NaN | 1 | 1 | NaN | NaN | NaN | 20400 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 4423 | 3959 | 3662 | 162 | 69 | 0 | 0 | 48 | 18 | 5 | 13 | 464 | 30 | 42 | 0 | 0 | 0 | 372 | 20 | 20 | 0 | 298.7879 | 213.9394 | 207.8788 | 80.60606 | 49.090909 | 6.666667 | 6.666667 | 49.696970 | 10.30303 | 4.848485 | 9.69697 | 264.848485 | 20.000000 | 30.303030 | 6.666667 | 6.666667 | 6.666667 | 276.363636 | 17.575758 | 17.575758 | 6.666667 | 2899 | 339 | 1044 | 806 | 453 | 257 | 1623 | 154 | 605 | 458 | 305 | 101 | 1107 | 106 | 410 | 301 | 148 | 142 | 36 | 0 | 6 | 30 | 0 | 0 | 133 | 79 | 23 | 17 | 0 | 14 | 156.36364 | 78.78788 | 117.57576 | 97.57576 | 76.36364 | 51.51515 | 129.69697 | 43.03030 | 97.57576 | 72.72727 | 67.87879 | 27.27273 | 107.27273 | 36.96970 | 89.09091 | 56.96970 | 38.78788 | 37.57576 | 17.575758 | 6.666667 | 6.060606 | 16.363636 | 6.666667 | 6.666667 | 58.18182 | 60.00000 | 17.575758 | 12.121212 | 6.666667 | 12.727273 | 4423 | 18 | 74 | 141 | 182 | 583 | 201 | 3224 | 298.7879 | 17.57576 | 41.81818 | 53.33333 | 58.18182 | 188.48485 | 140.00000 | 331.5152 | 54329 | 4244.242 | 3424 | 2013 | 1998 | 1782 | 216 | 15 | 1411 | 197.57576 | 157.57576 | 161.81818 | 132.12121 | 58.78788 | 14.545454 | 127.87879 |
1001020500 | 14000US01001020500 | Census Tract 205, Autauga County, Alabama | Census Tract 205, Autauga County, Alabama | al | 140 | 0 | ACSSF | 1764 | NaN | NaN | NaN | NaN | 1 | 1 | NaN | NaN | NaN | 20500 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10763 | 10683 | 7368 | 2674 | 0 | 412 | 0 | 0 | 229 | 49 | 180 | 80 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 378.1818 | 373.3333 | 482.4242 | 449.69697 | 10.909091 | 146.666667 | 10.909091 | 10.909091 | 100.60606 | 44.848485 | 90.30303 | 43.030303 | 43.030303 | 10.909091 | 10.909091 | 10.909091 | 10.909091 | 10.909091 | 10.909091 | 10.909091 | 10.909091 | 6974 | 310 | 1674 | 1999 | 1829 | 1162 | 3243 | 127 | 967 | 832 | 865 | 452 | 3432 | 159 | 573 | 1129 | 884 | 687 | 127 | 0 | 14 | 38 | 52 | 23 | 172 | 24 | 120 | 0 | 28 | 0 | 265.45455 | 102.42424 | 223.63636 | 263.03030 | 255.15152 | 216.96970 | 290.30303 | 55.75758 | 192.72727 | 198.78788 | 183.63636 | 127.27273 | 283.63636 | 82.42424 | 120.60606 | 210.30303 | 152.12121 | 157.57576 | 58.787879 | 10.909091 | 14.545454 | 38.181818 | 26.666667 | 23.636364 | 93.93939 | 27.27273 | 69.696970 | 10.909091 | 29.090909 | 10.909091 | 10563 | 251 | 952 | 256 | 1064 | 289 | 89 | 7662 | 369.6970 | 94.54545 | 521.21212 | 113.33333 | 385.45455 | 162.42424 | 52.12121 | 641.8182 | 51965 | 4203.030 | 8198 | 5461 | 5258 | 5037 | 221 | 203 | 2737 | 321.81818 | 339.39394 | 356.96970 | 369.09091 | 89.09091 | 103.030303 | 273.93939 |
We thought our unemployment dataset had a lot of columns, but this has even more! If we’re curious, we can take a look:
## (74001, 189)
Seeing as how that’s going to be too many to reasonably drop, and we’re doing a lot of percentage calculations for our new dataset (% unemployed, % just above poverty line, % white, etc), we’re just going to build a brand-new dataframe all at once. It might look a little cleaner than the alternatives.
By looking at our data dictionary R12221550.txt
, we’re able to figure out what each of the codes mean, which ones we’re interested in, and what the proper calculations are to calculate percentages.
census_reg = pd.DataFrame({
'Geo_FIPS': census.Geo_FIPS,
# (1.00 to 1.24 + 1.25 to 1.49) / total
'ritp_100_149_pct': (census.ACS15_5yr_C17002004 +
census.ACS15_5yr_C17002005) / census.ACS15_5yr_C17002001 * 100,
# Black Alone, Non-Hispanic
'black_pct': census.ACS15_5yr_B03002004 / census.ACS15_5yr_B03002001 * 100,
# White Alone, Non-Hispanic
'white_pct': census.ACS15_5yr_B03002003 / census.ACS15_5yr_B03002001 * 100,
# All Hispanic
'hisp_pct': census.ACS15_5yr_B03002012 / census.ACS15_5yr_B03002001 * 100,
# Population 16 Years and Over: in Labor Force: Civilian Labor Force: Unemployed
'unemployed_pct': census.ACS15_5yr_B23025005 / census.ACS15_5yr_B23025001 * 100,
# Educational attainment less than high school
'ea_less_than_hs_pct': census.ACS15_5yr_B06009002 / census.ACS15_5yr_B06009001 * 100,
# Median income in 10,000s
'median_income_10k': census.ACS15_5yr_B19013001 / 10000
})
census_reg.head()
Geo_FIPS | ritp_100_149_pct | black_pct | white_pct | hisp_pct | unemployed_pct | ea_less_than_hs_pct | median_income_10k |
---|---|---|---|---|---|---|---|
1001020100 | 9.342916 | 7.700205 | 87.42300 | 0.8726899 | 3.474903 | 14.802896 | 6.1838 |
1001020200 | 14.624307 | 53.293135 | 40.44527 | 0.7884972 | 6.701329 | 25.483178 | 3.2303 |
1001020300 | 11.960916 | 18.564690 | 74.52830 | 0.0000000 | 3.696182 | 10.655738 | 4.4922 |
1001020400 | 7.302736 | 3.662672 | 82.79448 | 10.4906172 | 6.308411 | 11.693687 | 5.4329 |
1001020500 | 12.496450 | 24.844374 | 68.45675 | 0.7432872 | 2.695779 | 4.445082 | 5.1965 |
TALK MORE ABOUT THE RITP AND FEATURE SELECTION, MAYBE ITS A SUBSECTION
The black, white, and Hispanic table and calculations are especially interesting from a “selecting your data” perspective. I recommend reading the data section of the Milwaukee Journal Sentinel’s pothole-filling analysis to learn a bit more about that.
Now that we have a nice dataframe, we can merge our data the exact same way we did before.
Tract ID | life_expectancy | Geo_FIPS | ritp_100_149_pct | black_pct | white_pct | hisp_pct | unemployed_pct | ea_less_than_hs_pct | median_income_10k | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1001020100 | 73.1 | 1001020100 | 9.342916 | 7.700205 | 87.42300 | 0.8726899 | 3.474903 | 14.802896 | 6.1838 |
1 | 1001020200 | 76.9 | 1001020200 | 14.624307 | 53.293135 | 40.44527 | 0.7884972 | 6.701329 | 25.483178 | 3.2303 |
2 | 1001020400 | 75.4 | 1001020400 | 7.302736 | 3.662672 | 82.79448 | 10.4906172 | 6.308411 | 11.693687 | 5.4329 |
3 | 1001020500 | 79.4 | 1001020500 | 12.496450 | 24.844374 | 68.45675 | 0.7432872 | 2.695779 | 4.445082 | 5.1965 |
4 | 1001020600 | 73.1 | 1001020600 | 10.854324 | 11.918982 | 72.91613 | 13.0615425 | 6.654991 | 17.487267 | 6.3092 |
Looks good, and all set for analysis