Analyzing mortgage rejections for racial bias

Based on government-mandated data collection on mortgage granting, are certain banks or areas discriminatory in their lending practices?

logistic regression classification odds ratio Home Mortgage Disclosure Act race

Summary

Analyzing a massive trove of public records, Reveal performed an analysis of lending disparities within racial and ethnic groups. Home Mortgage Disclosure Act data from individual borrowers is too unwieldy to sit as a CSV or even open in pandas, so this project jumps directly into managing a SQL database populated through scripts provided by the Consumer Finance Protection Bureau.

With one of the most easily reproducible whitepapers I've ever seen, it's simple to walk through Reveal's footsteps and use logistic regression to pull back the mask on the mortgage industry.

Reporting and analysis by Aaron Glantz and Emmanuel Martinez.

Notebooks, Assignments, and Walkthroughs

Complete walkthrough

Start-to-finish walkthrough of a reproduction of the Reveal analysis. Requires a bit of technical heavy lifting.

Multi-page walkthrough

Cleaning and combining data for the Reveal Mortgage Analysis

A full logistic regression using lending data and demographic data, following the whitepaper published by Reveal.

Wild formulas in statsmodels using Patsy (short version)

Using R-style/Patsy formulas in statsmodels opens up a lot of interesting opportunities for tweaking your regression at execution time.

Reveal Mortgage Analysis - Logistic Regression using statsmodels formulas

An introduction to using R-style/Patsy formulas in statsmodels, along with specially-created columns in your dataframe.

Reveal Mortgage Analysis - Logistic Regression

Use logistic regression to investigate lending disparities.