Disparate impact is one of the hottest topics in the lending industry. The CFPB expects all lenders to regularly monitor and test for potential disparity across ECOA protected classes. To help credit unions better understand what it is and how it can be addressed through analytics, CRIF Lending Solutions recently hosted a webinar featuring advice from our expert team. Below are just a few of the questions that were fielded following the presentation:
Q: How would someone infer race or ethnicity for someone named John Smith?
A: Starting with the first name, John is very much a male name according to Social Security Administration data. However, Smith is actually more multi-racial, according to census data, than one might think. It’s not 100% exclusive to white or black, but it does dominate those two categories. So we’d use the last name combined with the residency to build John Smith a conditional probability for each race and ethnicity category.
For the sake of this example, let’s say the census data tells us the name is 73% white and 22% black. We would then count Mr. Smith as 0.73 of a white applicant and 0.22 of a black applicant — meaning he would go into both pools. By doing this, we’re not making any assumptions that could prove to be incorrect. We’re instead using what the data tells us. Mr. Smith also goes into the other race and ethnicity pools that make up the other 5%.
We then update the probabilities based on where that particular John Smith lives. From the census, we know the different proportions of races living within certain census blocks. The key, once again, is to never make assumptions. We conditionally weight the name probability with the address probability. For the sake of this example, we will assume that our race and ethnicity probabilities remain the same. We then use the probability from each group and weight his APR by the 0.73 and the 0.22 for each of the two respective racial categories.
Q: How is disparate impact determined for someone who is multi-racial?
A: Multi-racial is a category that can be selected on the census. Those categories include black, white, American Indian/Alaskan Native, Asian/Pacific Islander, Hispanic (ethnicity) and multi-racial. Some people check more than one box, which is recorded in the census data. So you can see exactly how many categories were selected.
What we do is reallocate the applicants who select more than one category into multi-racial. Sometimes more than one selection is considered by the census to be one category. An example of this would include selecting Asian/Pacific Islander and unknown. We can attribute that probability to be entirely Asian/Pacific Islander, which is considered one category for this exercise.
Multi-racial is one of the smallest categories available next to American Indian/Alaskan Native. The smaller groups tend to be a little under represented, and that’s where you end up with the most fluctuation in a lender’s data. The larger categories are much easier to work with.
Q: With mixed neighborhoods being more common, what percent of accounts are you not able to geocode to use with first and last names to infer race and gender?
A: This is actually very rare. With the census data, it’s available at the census-track level, the census block group that’s a little higher of a roll up and the ZIP code level. We use the zip code when we are unable to geocode, which generally occurs for less than 5% of applicants. Now, that happens a little bit in partnership with the lender. We do a little bit of address hygiene on our end to ensure it’s clean before we run it through our geocoding software. If we notice high amounts of bad address data, we would work back with the lender to recommend either running it through the national change of address registry or some sort of publically available standardization. That said, we very rarely find lenders with data so sparse that ZIP codes aren’t available. So that gives us a nice safety net.
The mixing of neighborhoods where things aren’t as homogeneous as someone might assume is actually why we don’t use the threshold methodology. Your neighborhood tells us as much about you as your last name, but neither tells us everything. So we will attribute the applicant back into all of the racial categories indicated by the data available.
Have Questions of Your Own?
Disparate impact is something that lenders need to monitor on a consistent basis, and these responses will help you better understand how to safeguard your operations. The experts at CRIF Achieve are here to help institutions of any size with disparate impact analysis. To request a link to the full recording of our recent webinar, please click the button below.