6 things banks look for in gen AI models and companies

Using generative AI comes with added risk. Beyond the normal risks of working with any vendor — Is the company sound and secure? Will it be around in five years? Will it protect access to systems and sensitive data? — there are dangers of hallucination, bias and error to which generative AI models are uniquely subject. There are cloud vendor risks — many generative AI providers offer their models in a public cloud. There are concentration risks, because many financial companies are gravitating to a few popular models, like OpenAI's ChatGPT and Anthropic's Claude. There's heightened reputation risk of providers that have undergone leadership drama.

Yet generative AI is not going away. Banks have to navigate these risks, starting with proper vetting and due diligence of companies and their models. These are some of the ways banks are evaluating generative AI models and vendors.

Past experience with large banks

At Citi, the first thing Arvind Purushotham, head of Citi Ventures, looks for in any AI company the bank is considering working with or investing in (or both) is whether it has large bank clients.

"Financial services is more complicated given the regulatory and compliance requirements, the data privacy requirements, what can live in the cloud versus what has to live on-prem, things of that nature," Purushotham said in an American Banker podcast that will air on January 21. "So we look at whether the company is targeting its products and services to large financial institutions." Many do. It's a lucrative vertical. 

One generative AI startup Citi has invested in is Lakera, which provides security, safety and soundness for generative AI prompts. Another company in Citi Ventures' portfolio is Norm Ai, an AI-based regulatory compliance platform that can help bank compliance teams review documents such as marketing materials more quickly by comparing their contents to compliance requirements.

Usefulness within the bank

The Citi Ventures team brings internal experts into the generative AI vendor vetting process. 

"As a strategic investment unit, we want to make sure that the companies that we invest in are a fit for us, not just from an investment standpoint but also from a commercial usage standpoint," Purushotham said. "And many of the colleagues within our technology business, whether it's inside the technology group or within our business units, are the experts. They're the practitioners on the ground and so we certainly leverage their expertise as a part of our due diligence as we explore the fit with Citi."

Well-established vendors

SouthState Bank in Florida has only approved two generative AI applications for employee use: Microsoft's Copilot and OpenAI's ChatGPT. 

OpenAI's partnership with Microsoft "provided us with a high degree of comfort," said Chris Nichols, director of capital markets at the bank. "We then completed further due diligence on the quality of their models and usability." 

Any time a bank intends to work with any vendor, including a generative AI provider, it has to consider regulatory requirements, protection of data and security, pointed out Maria Gotsch, CEO of the Partnership Fund for New York City, which works with banks and early stage companies in an annual Fintech Innovation Lab. 

"That paradigm through which they have to extend regulatory requirements to all vendors is as true for generative AI as everything else: security, protection of data, data provenance, and so forth," she said.

Well-managed startups

Strength of management is a key thing to look for in early stage startups, Gotsch said. This includes a wide variety of management characteristics. 

Leadership and company reputation risk is a factor when you're looking at any vendor, Gotsch pointed out.

When looking at companies that have been subject to lawsuits, Gotsch advises banks to "respect the process, remember that people are innocent until proven guilty. If anything, it's a distraction, those things take time out of people's day." 

Model testing

After putting a generative AI provider through the bank's normal vendor due diligence process, SouthState tests the model itself with an array of questions, which lets it directly gauge risk. 

"At least half the questions are for accuracy to limit hallucinations," Nichols said. "We have subject matter experts provide both questions and expected answers and then we review and rate how well the model does with its output." 

If a model passes the accuracy test, it's then tested for privacy, security, toxicity and the ability to be jailbroken. These tests usually make up the other 50% of the questions, he said. 

"It is important to note here that we test a model for a particular use case and not just the model," Nichols said. "Thus, if we use the same model for something else, we need to retest. Because these models rapidly change, depending on the changes and risk level, we may test a model monthly to ensure the model continues to meet our expectations." 

Explainability

In Gotsch's view, the biggest question about generative AI models as well as more traditional AI models is explainability. 

"If they're using a tool and got these results, if they can't explain it, they can't use it," Gotsch said. This is why for the past two years banks have been piloting generative AI internally, she said. 

Several companies offer software that provides explainability for generative AI models, however, banks need to do their own testing.

"They're not going to take just the word of a startup, they make them demonstrate they can do that," Gotsch said.

Part of the explainability is uncovering a model's propensity to hallucinate, in other words, make stuff up.

Dynamo AI, a company that participated in last year's fintech lab, offers help with uncovering gen AI hallucinations, Gotsch said. 

"They had a lot of interest from people wanting to know about them," she said.

For reprint and licensing requests for this article, click here.
Artificial intelligence Technology
MORE FROM AMERICAN BANKER