2.7 Alternative techniques
We have a problem, though: “for every 25 percentage point increase in minorities, it’s additional 1 day of wait time” just isn’t very understandable. It doesn’t roll off the tongue, it doesn’t make sense very easily, and it’s going to be lost on a lot of your readers.
Even though linear regression is a nice advanced-lash method, it doesn’t mean it’s always the right one. Let’s try something easier:
df['majority_white'] = (df.pct_minority < 50).astype(int)
df.groupby('majority_white').wait_days.median()
## majority_white
## 0 4.208333
## 1 2.750000
## Name: wait_days, dtype: float64
2.7.1 Binning
While it’s easy to understand majority white vs. majority minority, we could even break it down into a few more categories. While it isn’t as easy as splitting into two groups, it’s a little more nuanced while still being understandable. This is called binning.
In the example below, we’ll cut them into brackets of 20 percentage points:
- 0-20% minority
- 20-40% minority
- 40-60% minority
- 60-80% minority
- and 80-100% minority
## address GEOID Geo_FIPS pct_white pct_minority \
## 0 3839 N 10TH ST 5.507900e+10 5.507900e+10 2.405063 97.594937
## 1 4900 W MELVINA ST 5.507900e+10 5.507900e+10 8.824796 91.175204
## 2 2400 W WISCONSIN AV 5.507901e+10 5.507901e+10 40.313725 59.686275
## 3 1800 W HAMPTON AV 5.507900e+10 5.507900e+10 4.389407 95.610593
## 4 4718 N 19TH ST 5.507900e+10 5.507900e+10 4.389407 95.610593
##
## wait_days majority_white bin
## 0 1.250000 0 (80, 100]
## 1 8.833333 0 (80, 100]
## 2 9.750000 0 (40, 60]
## 3 2.416667 0 (80, 100]
## 4 17.416667 0 (80, 100]
It seems like we’d use range(0, 100, 20)
, but nope! Always add one more to make sure your range includes the final number. Now w
e can group by the bin and see how a slow increase in demographics affects the wait days.
## bin
## (0, 20] 2.208333
## (20, 40] 2.916667
## (40, 60] 3.270833
## (60, 80] 4.291667
## (80, 100] 4.250000
## Name: wait_days, dtype: float64
Way more interesting, right? And much easier to communicate to your readers, to boot.