Machine learning
If lenders fail to simultaneously adopt valid math-based explainable technologies, the models run the risk of violating the law. And the regulators may shoot it down before it gets off the ground.
When consumers are denied credit by ML models, the law (quite properly) requires lenders to tell consumers why. This helps consumers know what to do to improve their chances of getting approved next time. It also gives regulators confidence that lenders are making fair, business-justified lending decisions.
But, do lenders truly know why their lending models make the decisions that they do?
Most lenders rely on underwriting algorithms to make these decisions for them — algorithms that rely on hundreds or thousands of variables. Which variables principally caused the lender to deny the loan? Was it variable 127 or variable 18?
That question gets even harder when credit decisions are made using ML models, which make more accurate predictions and decisions based on countless interactions among all those variables.
Trouble comes, however, when lenders try to explain their lending decisions and identify principal denial reasons using math designed for simpler, antiquated models. You can’t use old tools to explain ML models if you want to get the right answer every time, as the law requires.
Yet today most lenders use one of two seemingly reasonable methods to identify principal denial reasons: “drop one” and its cousin, “impute median.”
With drop one, lenders test which model variables contribute most to the model score by removing one variable from the model and measuring the change in a score to quantify the influence of that removed variable. With impute median, lenders do the same thing but instead of dropping a variable, they replace each variable, one at a time, with the median value of that variable in the dataset.
These methods sound reasonable but in practice, they are often inaccurate. That’s because once you change the data that the model considers, you have moved from the real world into a hypothetical one. You end up trying to explain situations that would never happen in the real world.
These techniques also fail to account for the fact that variables interact, are not always independent, and that in ML models, variables may point in different directions.
A better approach is based on a
It turns out that in ML models, variables act a lot like basketball players, making Shapley’s methods perfect for explaining how the models operate, and accurately identify principal denial reasons each time.
For example, this approach was tested out in October during the Consumer Financial Protection Bureau’s virtual tech sprint on improving adverse action notices. During the sprint, Zest AI and our partners from First National Bank of Omaha, WebBank and Citizens Bank,
Looking further, it would help lenders if the
Such methodologies and technologies are available and used by lenders today. But it would help provide clarity to the industry if the CFPB fosters adoption so that consumers can get the precise information they need to build credit.