The Milwaukee Journal Sentinel and pothole fill times (No merging)#
Author: Keegan Kyle, Grant Smith and Ben Poston, Milwaukee Journal Sentinel
Topics: Census Data, Geocoding, QGIS Spatial Joins, Linear Regression
- potholes-cleaned-merged.csv: a series of merged datasets (minus the income dataset). The datasets include:
- 2007-2010 POTHOLES.xls: Pothole data, July 2007-July 2010 from the Milwaukee DPW
- 2010-2013 POTHOLES.xls: Pothole data, July 2010-July 2013 from the Milwaukee DPW
- 2013-2017 POTHOLES.xls: Pothole data, July 2013-July 2017 from the Milwaukee DPW
- tl_2013_55_tract.zip: 2013 census tract boundaries from the US Census Bureau
- addresses_geocoded.csv: a large selection of addresses in Milwaukee, geocoded by Geocod.io
- R12216099_SL140.csv: ACS 2013 5-year, tract level, from Social Explorer
- (Table A04001, Hispanic or Latino by Race,
R12216099.txtis the data dictionary
- (Table A04001, Hispanic or Latino by Race,
- R12216226_SL140.csv ACS 2013 5-year, tract level, from Social Explorer
- Table A14006, 2013 Median Household income
- Data dictionary is here
What's the story?#
We're trying to figure out if the time it took Milwaukee to fill pot holes is related to the racial makeup of a census tract.
Read in your data#
We're just reading in
potholes-cleaned-merged.csv for now. It's a lot lot lot of other files, somewhat cleaned and all merged together.
- Tip: Both
Geo_FIPSare census tract identifiers. You'll want to read them in as strings so they don't lose leading zeroes
What is the maximum and minimum
Use this to confirm that your date range is what you expected. If it isn't, take a look at what might have happened with your dataset.
- Tip: Missing data might be a headache
Calculate how long it took to fill potholes in 2013#
Save it into a new column.
- Tip: It's possible to subtract two dates
Hrm, well, I think we need that difference to be an integer#
If your new column isn't an integer, create another column that is.
- Tip: Just like you might use
.str.strip()on a string column, if your column is a datetime you can use
.dt.componentsto get the days, hours, minutes, seconds, etc of the column.
Cleaning up your census data#
SE_ columns are all from the census data, you can find out what they mean by reading the data dictionary
Add new columns to create:
pct_whiteThe percent of the population that is White
pct_blackThe percent of the population that is Black
pct_hispanicThe percent of the population that is Hispanic
pct_minorityThe percent of the population that is a minority (non-White)
The column names don't match exactly, but you can figure it out.
Feel free to drop the census if you're not interested in them any more.
statsmodels package, run a linear regression to find the coefficient relating percent minority and pothole fill times.
- Tip: Be sure to remove missing data with
.dropna()first. How many rows get removed?
- Tip: Don't forget to use
sm.add_constant. Why do we use it?
Translate that into the form "every X percentage point change in the minority population translates to a Y change in pot hole fill times"
Do you feel comfortable that someone can understand that? Can you reword it to make it more easily understandable?
Other methods of explanation#
While the regression is technically correct, it just doesn't sound very nice. What other options do we have?
What's the average wait to fill a pothole between majority-white and majority-minority census tracts?#
You'll need to create a new column to specify whether the census tract is majority White or not.
How does the average wait time to fill a pothole change as more minorities live in an area?#
- Tip: Use
.cutto split the percent minority (or white) into a few different bins.
R12216226_SL140.csv contains income data for each census tract in Wisconsin. Add it into your analysis.
If you run a multivariate regression also including income, how does this change things?
- Tip: Be sure to read in
Geo_FIPSas a string so leading zeroes don't get removed
- Tip: You can use this data dictionary to understand what column you're interested in.
The income coefficient is very unfriendly!#
Try to explain what it means in normal words. Or... don't, and just skip to the next question.
Explain that in normal human-being words#
Controlling for minority population, for an X change in income, there is a Y change in how long it takes to get potholes filled.
This seems unexpected, maybe?#
Not like we were hoping for the race this, but this seems like not what we were going to get. That means there's either a story or we're forgetting something obvious. What might be causing this trend? How could we investigate it?