Podcast

'These models will always hallucinate': Seth Dobrin on LLMs

Sponsored by
Dr. Seth Dobrin

Transcription:

Transcripts are generated using a combination of speech recognition software and human transcribers, and may contain errors. Please check the corresponding audio for the authoritative record.

Penny Crosman (00:03):

Welcome to the American Banker Podcast. I'm Penny Crosman. Can large language models be responsible? The question is relevant for the financial industry, where banks are giving employees large language models to use for a variety of use cases, and many hope to use the technology to answer queries from customers. JPMorgan Chase, for example, recently rolled out ChatGPT to more than 60,000 employees to help them with tasks like writing emails and reports. It plans to offer a variety of external large language models through a portal called LLM Suite and make them available throughout the bank to all 313,000 employees. Other banks have also provided large language models to employees to help them with their work, for instance, by automatically generating summaries of customer service calls for call center representatives. But according to Dr. Seth Dobrin, former global chief AI officer at IBM, founder of advisory firm Qantm AI and general partner at One Infinity Ventures and Silicon Sands Venture Studio, large language models are not capable of being responsible. Welcome, Dr. Dobrin.

Dr. Seth Dobrin (01:13):

Yeah, thanks for having me, Penny. I appreciate the opportunity.

Penny Crosman (01:16):

Thanks for coming. So what are some of the reasons large language models really can't be responsible today?

Dr. Seth Dobrin (01:24):

Yeah, I mean I think when it comes down to it, it's really at the heart of the core underlying technology. The base of the technology that large language models are built on today is something called the transformer architecture that came out of Google back in 2017. And a paper that's called, "Attention Is All You Need," if you're interested in going back and using it. And the transformer architecture was designed initially for language translation. And so it was designed for a specific, isolated small task in mind, and it was not really built for these large, massive pre-trained models. So essentially these large language models are trained on the whole of the internet. And these companies — Anthropic, Meta, Google, OpenAI — they keep building bigger and bigger models. And so they're getting further away from the core foundation of the technology. And the bigger the models get, the more they're likely to hallucinate and the more you get some of these other issues with the models.

Penny Crosman (02:36):

So some people in the financial industry have talked about using retrieval augmented generation to gather the data for their large language models and maybe small language models using a very limited dataset and not trying to go out and scrape anything from across the internet. Do you think that use of retrieval augmented generation is kind of an antidote, or are they putting too much hope on that idea?

Dr. Seth Dobrin (03:08):

Yes. So I guess first and foremost, if anyone ever tells you that these transformer-based large language models don't hallucinate, they don't know what they're talking about or they're not being completely honest with you. Fundamentally, these models will always hallucinate. With some technologies, it's possible to get them close to zero, to not hallucinate. So these small, focused task-specific models, you can get them to be very, very close to zero, the likelihood of hallucinations. Retrievable augmented generation, large language models are still trained on the whole of the internet. They still use GPT 4, they still use Llama, they still use Claude from Anthropic, they still use Gemini. They still use one of these models that are trained for the most part that are train on the whole of the internet. So they still have the hallucination problem. What they're doing is they're limiting the response to a particular set of data.

(04:11):

And there's been some studies from a company called Galileo that looks at the likelihood of hallucinations and they actually score the hallucinations. And these models still hallucinate at a pretty high rate, a pretty unacceptable rate for an organization like a bank. Now, that's not to say that these models don't have value in things like summarization and things like helping you write emails as long as humans are there checking and reworking and things like that. But you don't want these doing vital tasks. You don't want them fully automating information that's being used in critical customer interactions or critical decisions. But retrieval augmented generation is not a panacea for hallucinations.

Penny Crosman (05:07):

That's a good explanation. I've also seen vendors talk about offering hallucination-free LLMs, for instance, one talks about unifying databases into a knowledge graph, and then using that knowledge graph with semantic parsing to ground LLM outputs, thereby eliminating hallucinations. Do things like that sound doable?

Dr. Seth Dobrin (05:32):

And so I mean these solutions, so there's knowledge graph, semantic graph, rag solutions, retrieval augmented generation. Again, these all reduce the likelihood of hallucinations. They can reduce the likelihood below 5% to 10%, where you'll get only 5% or 10% hallucinations depending on the situation. That's still pretty high, but it's not zero, it's not close to zero. And so it's not someone coming in and saying they can get it to zero, it's just not true. I'd like to see the information. I'd like to see it on more than just their test dataset. I'd like to see it on some real-world applications. And so it does provide context, it helps it understand your business better. But again, these models are not trained on what the original architecture was trained for. If you want to get to a place where the models really have a low likelihood of hallucinations, you want them to be smaller, you want them to be task specific, you want the underlying data to be as complete as possible.

(06:55):

One of the biggest drivers of hallucinations is incomplete data for the tasks that you're asking it, for the knowledge you're asking it to have. It basically makes up the information because it doesn't know it. The other driving force for it is prompts that are not good. So imagine the LLM is an intern. If you don't ask the intern a good question, it's going to give you a bad answer. You don't give it a good description of the task, it's not going to give you a good result. And LLM is no different. And so in fact, it's less helpful than an intern. So it can't reason, it's not smart. Interns can reason and they're smart. So you need to think of it that way.

Penny Crosman (07:43):

Some banks are talking about hiring prompt engineers who would specialize in what exactly the model should be asked. And just another follow-up question on this overall idea about what kinds of data and how much these models should be ingesting. And I've had conversations with people who say, well, if you just limit it to your proprietary data, very strictly defined dataset, then risk of hallucination goes way down. But I've also spoken with people who say that if you take out all of the data that, say, ChatGPT has been trained on from sources like Shakespeare's works and the New York Times archive, that you lose a lot, that the model does learn something from taking in novels and magazine articles and Reddit and everything else that it takes in. Do you have any thoughts on that? So in other words, is the large language models still valuable if they're not taking in all this information?

Dr. Seth Dobrin (09:02):

So let's use Reddit for example. Reddit is a great place to understand how humans converse. It's horrible. It's a cesspool of information. If you look at what's in it though, there's lots of really, really good stuff, but there's also lots of horrible things that exist in there. And so yes, you could use Reddit as a source of how people learn to converse. And in fact, that was one of the driving reasons for using it. And that's why a lot of these places that are developing large language models are willing to now pay Reddit for their information. So looking at it that way, yes, it's a valuable source for large language models. If you look at Meta, the assumption is they're using their Facebook and Instagram for similar things. However, there's two ways to split this. Companies don't necessarily need all of the capabilities that a large language model has.

(10:07):

Do they need it to have all the information in the New York Times to solve a specific task? I don't know. If you're a bank, do you need it to know the works of Shakespeare to solve a problem about trading? Do you need the works of Shakespeare or the information in the New York Times to, well, let's use the information in the New York Times. You need the information in the New York Times in trading. That's a possibility. You might, or maybe the Wall Street Journal is a better example there. Maybe because there's a lot of unstructured financial information in the Wall Street Journal. So that might be a place where, yes, you need that information, but there's responsible ways to get it than just scraping the internet. Maybe you go engage with the Wall Street Journal, the Financial Times and similar journals, and you say, Hey, we want to buy your data and we want to incorporate it into our large language model.

(11:06):

Or maybe we'll scrape, we'll share, we'll split the revenues with it depending on how much your information is incorporated into it. So you don't need to scrape the information without compensating the people who put their intellectual property into it that spent money developing it. No more than these companies would expect to give it away, their AI away unwillingly against their will without someone just coming and grabbing their intellectual property and taking it against their will. So even if you were to use it, there's appropriate ways to go and engage with it other than just scraping it off the internet. And in fact, if you look at the EU GDPR, they know they're already violating those regulations. They're likely violating CCPA in California. So GDPR is the EU data protection Regulation. CCPA is the one that exists in California. So there's right ways to do it and there's wrong ways to do it. I think many of these companies took the wrong way.

Penny Crosman (12:08):

Sure. And I think copyright law is a factor also.

Dr. Seth Dobrin (12:11):

Yep.

Penny Crosman (12:13):

So bottom line, is there any way that a large language model could really become responsible maybe in the future?

Dr. Seth Dobrin (12:25):

I don't think there's ways that you could use large language models responsibly. And so if you look at large language models as a vehicle to converse only, and so let's take small language models and large language models. Small language models will never be as good as large language models in the context of having a conversation with you. But if you look at very small language models, and I've seen startups that have extremely small language models that can now outperform even GPT-4 on a specific task, but they're not very good at conversing. And so if you wanted to ask that small, very small language model to do something, the conversation with it would be very rudimentary if anything. And so you may want to use one of these large language models to interact with it and give it commands in whatever your native language is, which gets us to another topic we could go into for a while. But you wouldn't be using it to do anything, or you wouldn't be using it to give you any information. It would just be really to interact with those small models. And so that's a responsible way to be using it. So the model would be kind of more of a vehicle to interact with more responsible AI systems.

Penny Crosman (13:52):

That makes sense. Well, I think in a blog, you also said that large language models will never be green, which I think is a really interesting thing. And we've written a little bit about the energy guzzling nature of AI models, but if you talk to the folks at NVIDIA, they will tell you that if you're using their graphics processor units, they have reduced power consumption. And that workloads that used to require a hundred nodes could now be done with four nodes. In other words, there are companies saying that the hardware is working harder to do more with less in this area. And I wonder what you think of that argument and the whole idea that you could run AI in a way that doesn't become an energy hog.

Tech companies and banks are trying to shrink the carbon footprints of the large language models they create, host and deploy. Can they move fast enough?

August 29

Dr. Seth Dobrin (14:52):

And so today's large language models will never be green. So the way they're building them today, I don't think will ever be green. And I think if you look at what NVIDIA is doing, guess the amount of compute is getting smaller. And if you compare the compute to CPUs and go to GPUs, yes, the amount of compute is getting smaller. If you go from a one hundreds to a two hundreds, I don't think you're going from a hundred to four GPUs. I obviously haven't used them because they're not publicly available to anyone, but open AI and a few other select organizations today.

(15:31):

But I think it's kind of like people are saying that we're going to be consuming 70% of America's energy next year with AI. We've been running out of oil for the last 50 years as well, haven't run out of oil yet. Technology is going to move us to where we won't be consuming 70% of America's, U.S. energy, but that's going to be through new technologies where we'll be moving away from these large language models. I think we will see net new compute paradigms, but I think by that time we won't be using this, these large language, these current architectures for large language models. But there's a whole other component besides just power. Yes, we may build new chips, but most of the components that go into the existing, the thousands and tens of thousands of GPU systems that are out there today are not recyclable.

(16:29):

If we get rid of those, is that green? What do we do with those? That contributes to the lack of sustainability, if you will, of these AI systems. And so we need to work on how do we recycle some of this into these GPUs that are out there today and even older GPU systems. So it's more than just carbon, it's more than just water evaporation, more than just evaporation from cooling. It's the actual hardware itself. And there's three scopes, if you will, to ESG. There's scope one, which is what comes before your organization. There's scope two, which is what you use within your organization. And there's scope three, which is everything that's leaving. There's a whole lot of scope one that goes into AI that people aren't thinking about. And so that's something that we need to pay attention. So that's how the chips are made. And then there's a whole lot of scope three, which is what happens when you're done with your chips and what happens when people start using your AI when it leaves your organization. So there's a lot of carbon, a lot of ESG implications that are not carbon specifically or environmental impact implications that are not being thought about that need to taken into consideration.

Penny Crosman (17:50):

Well, that's a great point. Dr. Dobrin thank you so much for coming on our podcast and all of you. Thank you for listening to the American Banker Podcast. I produced this episode with audio production by Kelly Malone Yee. Special thanks this week to Dr. Seth Dobrin at Qantm AI. Rate us, review us and subscribe to our content at www.americanbanker.com/subscribe. For American Banker, I'm Penny Crosman and thanks for listening.