BankThink

Your AI credit models are fine, but their training data is problematic

BankThink: Why does the CFPB want to undermine a key tool of law enforcement?
AI systems built to assess creditworthiness are trained on data that implicitly accepts past discriminatory lending decisions as legitimate signals about borrowers today, writes Deon Crasto, of Velocity Global.
Adobe Stock

The promise of artificial intelligence in lending offers faster decisions and broader access to credit, but it often perpetuates existing inequities. Be wary: Your AI lending model might not be as fair and objective as it appears.

Don't believe me? Let's look at a few instances. First, car loans — researchers at the University of Bath reported that women were more likely to be disproportionately favored for loan originations as opposed to their male counterparts, even while controlling other financial factors. Oh, and with mortgages, we see a very similar story. A 2024 examination leveraging leading large language models to determine creditworthiness found that Black applicants were at a higher risk of being denied as compared to their white counterparts. And it's not just race or color. It expands across age, postal codes and even the college you attended. 

At the end of the day, lenders are looking for deterministic factors to underwrite products — and that's what's going on here. I know all too well. I ran the product for the data science and decisioning team at Ondeck Capital and we looked at every data point we could get our hands on. And I mean it. Got a bad Yelp rating? It was accounted for in our model. Your FourSquare check-ins were down? Oh, we know. We even considered factors like seasonality in cash flow and how businesses in your neighborhood were doing. Our machine learning, or ML, models were designed to process thousands of data points to make lending decisions in seconds. 

But I'm here to give you an alternate narrative. I think your AI models are fine (for now), but your data is fundamentally flawed. The issue isn't in the algorithms themselves, but in the historical data we're feeding them. You see, models are trained on datasets that literally go back decades. So, if a certain group has historically been denied loans at higher rates, ML models will subconsciously associate this with "high risk." The model doesn't know it's being unfair; it's simply learning from the patterns we've provided.

Consumer Financial Protection Bureau Director Rohit Chopra said the FICO credit-scoring model has drawbacks in price, predictiveness and market competition, and stakeholders should develop a more open-sourced model that uses artificial intelligence.

November 21
Rohit Chopra

The problem is exacerbated by what we in the industry call "thin files" — credit reports with limited history. This disproportionately affects young adults and recent immigrants — arguably two groups most in need of access to credit. The alternative is to take on loans, often ones people cannot afford and on unfavorable terms, to build up credit, creating a Catch-22 situation that can trap people in a cycle of debt.

The impact of thin files on creditworthiness is staggering. According to a recent study by LexisNexis, banks in the U.K. could be denying loans to 80% of adults with thin credit files, often low-risk customers. These applications typically deemed high risk by traditional lending models would often, in theory, be auto-declined through "hard cuts," a process where applications are eliminated based on specific criteria deemed necessary for approval. If these criteria are missing, models typically disregard any other relevant financial information — often with more exhaustive data points, effectively shutting out individuals from credit. 

So how do we solve this? I'd argue the future of our lending models needs to account for a more holistic picture to determine creditworthiness. We need to diversify our data sources, implement rigorous back-testing for biases and make our models as transparent as possible. Transparency should also extend to the consumer, by allowing them to understand the factors influencing their creditworthiness.

It shouldn't just be about smarter algorithms. It should be about smarter, fairer and more complete data. And on some level, it's about ensuring algorithmic accountability — and the ethical application of AI and ML in products that have a broad-reaching impact on society.

For reprint and licensing requests for this article, click here.
Artificial intelligence Machine learning Credit Regulation and compliance
MORE FROM AMERICAN BANKER