5 ways banks use data science

Complimentary Access Pill
Enjoy complimentary access to top ideas and insights — selected by our editors.

Banks, like most companies, have long used data science throughout their business. They use it to make decisions about loan approvals. They use it to detect fraud and cybersecurity threats. They use it to identify the best customers to target with marketing promotions and ads. They use it to verify data on potential new customers. The list goes on.

They are just starting to use advanced forms of data science, including cutting-edge AI. 

"The question is, what can we do by applying new artificial intelligence technologies to our data that we couldn't do before," said Ryan Favro, a managing principal at Capco. 

Banks typically need to analyze two kinds of data, he pointed out: structured and unstructured.

"Structured data is someone's name, their birthday, their address, their phone number, their bank account balances, the number of times they make a deposit," he said. "Unstructured data could be things like their social media posts or email."

Structured data can be analyzed with business intelligence tools. But such software struggles with unstructured data.

"With the advent of machine learning and generative AI, there's an ability to use data in new ways," Favro said.

Read on about five ways data science and data scientists play a large role in financial institutions.

loan officer
Krakenimages.com - stock.adobe.c

Lending

Once upon a time, banks loaned to people based mainly on character. The banker in a small town loaned money to the farmers, store owners, restaurateurs and business executives who were his neighbors, with whose children his kids went to school.

With the advent of online lending, loan decisions became less local and more data-driven, based largely on FICO scores and credit bureau data. FICO scores are based mainly on an analysis of five data elements: payment history (35%), amounts owed (30%), length of credit history (15%), new credit (10%) and credit mix (10%).

In recent years, fintechs and banks have been using more sophisticated forms of data science into their underwriting models.

At Fifth Third Bank, Kathy Grigg, head of the decision sciences group, has developed a model for credit card origination that is expected to increase approvals at the $207 billion-asset bank by 10%. Instead of relying on credit scores, it uses other data, such as deposit history.

"We have information, obviously, on how our customers are already using our products and services, particularly in the deposit space," Grigg said. "We're able to leverage that information to infer how customers might perform in credit in a way that maybe isn't being captured by bureau data alone. We can see positive deposit history and approve additional customers who might not otherwise qualify, without taking out an additional risk." 

Just looking at credit scores isn't enough, she says, because people who are new to borrowing, for instance because they are young or immigrants, typically have a low FICO score, or none at all.

The use of so-called "alternative data" such as bank account activity in loan decisions, and the use of AI to analyze multiple data sets to evaluate a potential borrower's creditworthiness, are both still considered controversial by regulators such as the Consumer Financial Protection Bureau.
fraud victim
Liubomyr Vorona/Tetiana - stock.adobe.com

Fraud detection

Banks use data science to sift through large amounts of transaction data and flag unusual activity that might be fraud.

Synchrony Bank in Stamford, Connecticut, started applying AI to fraud detection after customers complained that reporting potential fraud on their accounts was a painful experience. When a customer called to say their card was used in, say, a grocery store in California when they were at their home in Connecticut, they used to get grilled by the customer service reps.

About 170 data scientists created a machine learning model that predicts the likelihood that a transaction is truly fraudulent.

The model was trained with information about past fraud investigations. It's constantly fed transaction data for all the accounts a customer might have with Synchrony and it looks for suspicious patterns. If a customer has a billing address in Stamford, Conn. and has had nine transactions in the Stamford area and one transaction in California during the week, the odds of that West Coast transaction being fraud are high.

Banks are starting to use advanced AI, such as large language models, in this work. 

JPMorgan Chase, for instance, uses advanced AI to detect fraud in areas like business email compromise.

Fraudsters look for the weakest link, the place that is least protected, according to Ryan Schmiedl, global head of payments, trust and safety at the bank. And they often find the vulnerable point somewhere inside a corporate client.

"In a lot of cases, they're attacking corporates because there are so many people in a corporate entity and they don't communicate a lot," Schmiedl said. "I can't tell you how many times we have clients that have been socially engineered," he said.

Often fraudsters send an authentic-looking email that appears to be from a genuine vendor or partner. It might say the company is changing accounts, and that the recipient should send the money to a convincing but fake website the fraudsters have created.

JPMorgan Chase uses large language models, a type of technology that can process massive amounts of text, to detect patterns in language that might signal a business email compromise.

"Everything's on the table now, because we have new technology that was not available off the shelf to the majority of the banking industry," Favro said. "In the case of fraud, there's lots of angles where you can use data to predict and protect against fraud. It's easy for me to look at structured data patterns: I know this person's in California, but they had an in-person transaction in Europe five minutes after their domestic transaction. But we can use unstructured data patterns to determine, is this email a social engineering attempt? You can look for patterns, such as someone being evasive or using words or language that are not typical of the situation." 
cyber war room
Gorodenkoff Productions OU/Gorodenkoff - stock.adobe.com

Cybersecurity

Cybersecurity is another area where data science is used to flag unusual activity that might be threats. For example, a machine learning algorithms analyze network traffic for suspicious patterns or to determine whether a file on a corporate network is moving around in an unexpected manner.

Visa, for instance, uses a trick called "devaluing data" — making sure that any payment information a hacker or fraudster gets from a bank, merchant or consumer has as little value to the thief as possible. One way of doing this is by encrypting this data.

Paul Fabara, Visa's chief risk officer since 2019, says data devaluing is one of the five principal strategies the company uses to manage its fraud and cybersecurity risk. 

Favro does not see this as an area where generative AI or large language models are a fit.

"A large language model is generating a statistical response to text that you give it," Favro said. "If I give the large language model the text of 1, 2, 3, 4, it's going to statistically respond with 5, 6, 7, 8. "It's hard to make the LLMs or generative AI a good fit for a use case when it comes to first line defenses of cybersecurity." 

Simpler analytics tools make more sense to monitor IP addresses, package sizes, and numbers of requests being made over a network, he said. 
people looking at customer analytics
Yuri Arcurs peopleimages.com/Malambo C/peopleimages.com - stock.adobe.com

Customer analytics

Many banks use data science to predict what customers will want and need.

At TD Bank, data scientists analyze data to try to predict customers' moods.

So-called sentiment analysis software is already used in call centers to help detect when customers are getting frustrated or angry. But TD Bank aims to deploy it to capture mood as people interact with internet-of-things devices, with virtual assistants, and with the bank's mobile app and online banking.

It's challenging, because mood detection has to be of the moment, rather than relying on analysis of what a customer did yesterday or last month.

"The journey for us is more about getting a holistic view of the human being," said John Thomas, executive vice president and global innovation head for TD Bank Group.

Morgan Stanley uses machine learning, predictive analytics and workflow technology to help advisors predict the "next best action" for advisors to suggest to customers.

The technology uses predictive analytics and machine learning to comb through research reports, market data and client data to come up with insights about clients, market events and the impact of events on clients' portfolios.

Banks can also use data science to analyze reviews of their apps in the Apple and Google app stores. 

"In the old days, and by old days, I mean yesterday, the bank would have to collect, collate, analyze and respond to the feedback," Favro said. "Today I can have my LLM do that for me." Capco's Content Generation Workbench, for instance, can respond to reviews and categorize them. 
virtual assistant
fizkes - stock.adobe.com

Virtual assistants

Big banks, including Bank of America, Wells Fargo, U.S. Bank and Capital One, have virtual assistants that use several forms of data science to respond to customer queries. They use natural language processing to understand what customers ask, through voice or text. They parse the request out into action items that the bank's core system can understand and execute on.

"If a customer wants to Zelle $20 to someone, the customer says something to Google [Wells Fargo's technology partner on Fargo], Google tells us what the customer said, we then make the call to the Zelle system to initiate the transaction, to move the money and dissect it into who we want to send it to, how much and from what account," said Michelle Moore, head of digital at Wells Fargo. "So all of that is happening in seconds, to get the right customer experience."

Banks are starting to experiment with enterprise-ready versions of large language models for customer interactions. 

Here again, Favro recommends caution.

"The number one thing that I would be worried about if I was a bank going down the road of bringing in LLMs or any sort of similar technology to interact directly with my customer is, I think it's too soon because most of these systems operate on a statistical model, not logical. So there's a possibility that it could give the user the wrong information, and that wrong information could be something that they try to action and it doesn't work. It could be something that is completely embarrassing to the brand." 

Favro recommends always keeping a human in the loop. For instance, the LLM might generate a response email for a person to review.

Even testing large language models with employees is risky, he said.

"You could have 100,000 employees and that's just as bad as putting it in the public," he said. "Because all it takes is one employee to say, look at how crap this is. And they publish it. It's not private anymore and your dirty laundry gets out."
MORE FROM AMERICAN BANKER