Podcast

Can chatbots learn to understand human emotions?

Sponsored by
Complimentary Access Pill
Enjoy complimentary access to top ideas and insights — selected by our editors.

Transcription:

Penny Crosman (00:03):

Welcome to the American Banker Podcast, I'm Penny Crosman. Can AI be trained to decipher human emotions? Dr. Aniket Bera, Associate Professor at the Department of Computer Science at Purdue University, is an expert on what's called affective computing using machine learning and other computer science methods to program artificial intelligence programs to better understand human behavior and emotion. He's here with us today to tell us about some of the projects he's been working on and how AI used in the financial services industry might be able to detect feelings, for instance, when a customer is angry or depressed or even when an employee is upset or about to commit fraud. Welcome Dr. Bera. 

Dr. Aniket Bera (00:46):

Thank you. Thank you for having me. 

Penny Crosman (00:48):

Sure. Can you tell us how you got into this kind of work? 

Dr. Aniket Bera (00:52):

So we've been working on this research for quite some time, even before AI or artificial intelligence sort of took over our tech landscape, so to speak. We've been working on a lot of these applications around robotics, computer graphics, natural language processing, and all of them, five, six years ago, we were looking at some of these technologies and we found that one key ingredient was really missing in all of them and that was bringing in the human angle to it. So to give you a good example, if there was a robot dog or there was a virtual therapist, virtual doctor, or you, you're talking to the next version of let's say chat box. So we found out that a lot of these technologies are being used at large scale by many companies across different sectors, but they are lack a key ingredient of communication and that is human affect or human emotions. 

(02:03)

So that's where we realize that if we really want these technologies to understand us, to go beyond the content of a communication, to also leverage the context of a communication to understand where these features or these under nuances which go way beyond the content of a speech. So I'll give you a good example, and this is something we've been working on for quite some time. We are working on developing something which we call Project Dose, which is a virtual therapy like application, but it sort of talks to your real therapist as well as the patient and tries to figure out if everything is fine. Are you taking your medications sometime? Is your therapy sessions, are they working part for you? So a good example is let's say when you talk to your friend and you ask a question, how are you appealing? Your friend might answer the question in many different ways, but with exactly the same content. 

(03:05)

So say I'm feeling okay, or they might say I'm okay. So even though we are not visually looking, there's no video right now, but if I were to look at my friend, I could see their facial expressions, their microexpressions, the tonality of the voice, the pitch, the intonation, the proi. There's so many different variables which we humans subconsciously read, learn, understand when we are talking to other people. So AI technologies or computer science technologies usually do not take in that into account. So in this example where in the first case I'm feeling okay, seems like a positive emotion, whereas on the other side, I'm okay may not be as positive as the content is. So that's where we want to understand. So we want to build these technologies, and this was a turning point for us when we realized that there's so much goes so much, which goes on beyond the absolute text, right? 

(04:09)

We've looked at so many chat bots, so many automated machines to spit out text, but text is one very small component of human communication. There's so many non-verbal features which tell human and human behaviors, human psychology and human emotion, which can really help us solve problems which go way beyond what we have done so far. And that was sort of our turning point for us. And obviously the recent developments in AI machine learning sort of helped bridge the gap in terms of what we can do now and what we could have done 10 years ago. So that's I would say the original, sorry for affective computing for us. But even right now we are working on so many different projects where the human is the limelight. A lot of times when we are trying to design right now we are working on one project. We'll be designing this robot, a robot dog in our lap to understand human emotions, are able to help people. 

(05:11)

So let's say I'm going to a new place and I am trying to search. So when I came to cardio seven, eight months ago, I looked very confused, where do I file the coffee machine or where do I file the restroom? So if there was a robot around us, it could look at my face and look at the confusion on my face and immediately come up to me and say, oh, you look confused. You need some help. Can I point you in the right direction for our elderly? For my grandparents, when they're at home, we need these robots to help them not just do everyday chores, but to also connect on an emotional, on a more sentimental level so people feel more engaged, people feel more committed, and people feel more, if I can say this, people feel more alive if they see this human connection all around there. 

Penny Crosman (06:05):

Well, so there's a few things to unpack there, and one of the reasons I invited you is because I see a really interesting application of your work in financial services. A lot of banks have been experimenting with and using chatbots and virtual assistance and even avatars in kiosks to a much lesser degree. But banks, the big banks have customers, millions of customers interacting with their chatbots every day. And they are very interested in figuring out the state of those customers. Are they happy? Are they satisfied? Are they thinking about leaving? And I think a lot of what's happened to date has been trying to identify keywords that indicate that somebody's upset. For instance, if they use a curse word or if somebody repeats the same statement over and over. That's tends to be an indicator of frustration. But I think that this sort of keyword based sentiment, text-based sentiment analysis is limited. Do you think you talked about, gave that example of somebody saying, oh, I'm okay, and it was sort of their tone of voice and their cadence and the, I dunno what you call it, up or down level of their voice changed. Are you getting some traction in that idea of being able to figure out this person has gone from happy to depressed or happy to upset? And is that something that might be applied to a virtual assistant or a chat bot? 

Dr. Aniket Bera (07:59):

Yeah, absolutely. I mean I think even though the domain is in the financial sector, the technology is pretty much I would imagine in the future to be directly applicable to this problem you're talking about. So in our virtual therapy sessions, obviously we are looking into over time, let's say the therapist or virtual therapist is spending 30 minutes with their customer or with their patient really in an anomalous in a situation which is similar in the financial sector, will be a virtual avatar, sort of talking to a customer for 30, 40 minutes. What is happening behind the seats? Is the engagement of the customer going down? Is the customer getting more angry? Is the customer getting more disappointed? And obviously we look at this multimodal cues like cues from text as a baseline, but also cues from this tone of your voice, the pitch of your voice, different features in your voice, micro facial expressions on the face, body language. 

(09:04)

So all of that can be computed in real time using our systems. And we can really find out if over time the customer is getting agitated, our customer is getting annoyed with the feedback, with the connections. And we can take that real-time feedback back into the generation pipeline, the e i pipeline gen, oh look guys, we are losing out on this customer or their problem. Maybe we need to try and solve it from a different point of view or maybe give different outputs. So eventually we want that when a customer joins a call or joins a conversation versus when a customer exits or ends the call, their engagement level should not have gone down, their agitation level should not have gone up. So overall you have somewhat a better experience with the call. That's why I think the problem is very similar to what you're describing. 

(10:05)

We need to have this what we call reinforcement learning or self when we take what's the input from the patient or the customer and bring it back to the AI pipeline, which tells it that, look, we need to change how we are responding to our customer. We need to do something else or maybe see the same thing in a different way so that we can get this person to be somewhat happier or somewhat less annoyed than where he or she is at this point. So this is a feedback mechanism in ai, which we can't work on a daily basis. Obviously the current chat bots with text only, is it limited because that's only one modality. We can still do it with text, but the more modes we have, the more modalities we have, we can definitely sort of help the customer in a much better way. 

Penny Crosman (11:03):

And you mentioned analysis of people's facial expressions, which I think is really interesting. Has that been proven to work or is this still kind of experimental? In other words, because everyone has such different facial expressions, some people are very stoic and other people are much more expressive. Have you seen that actually be proven out that AI can determine this person is happy and engaged, this other person is getting upset or that sort of thing 

Dr. Aniket Bera (11:42):

Uber next time? Yes. I've spent the last five years trying to do the same thing. I would say there are two different aspects to this problem. When we are looking at facial body expressions or even verbal or non-verbal cues, all of this combined, there are two aspects of how we look at a person. One is obviously the macroscopic aspect of things which sort of leverages their background, cultural upbringing, geographic upbringing, and to in some sense make the AI less biased towards certain races, certain cultures, certain geographical locations, genders, and many different variables becomes. But also what you Keith pointed out are the microscopic variables, which are like every individual is different. Some people do not like to emote more, some people like to. So what we've built our system is to not only learn from a wide variety of data of how this AI has interacted with, but also learn at every conversation. 

(12:47)

So let's say from a banking sector or from a financial sector, we have this customer coming in having this conversation with ai. So the AI starts with some blank, some template based on what it knows about that customer from some background, some data directly from his or her accounts. But over time the AI will learn what are the specific features or specific things, what this person likes, how does this person react? Maybe this person doesn't smile that much or maybe this person smiles a lot even though he or she is not happy. So if all of these micro personality decisions, they are constantly learned. So in the beginning, the AI will make some mistakes, but when it makes some mistakes, the hope is that we can understand from how the reaction of that person is to take it back into the system and retrain or remodel the system. 

(13:49)

So this happens every second. So the AI is not like something which you are built and we deployed. It's something which constantly learns based on feedback and to sort of this in a broader perspective, we humans do this as well when we are meeting new people, when we sort of crack a joke and then we realize that the other person is not laughing or we say something or the other person does not respond in the way we want. So we understand, we learn, okay, this guy doesn't care about certain jokes. This guy may take offenses to certain things as opposed to other things. So these are all things which we humans learn. So we are trying to build a similar model to the ai. So obviously it will make mistakes as it goes, but as the more and more you interact with this, maybe you have five sessions with this AI's tent, the more you interact, the more personalized and the more, if I can say it, more of a friend, it'll become more friend because it understands you. 

Penny Crosman (14:49):

Well, sometimes I have trouble reading other people their expressions and reading between the lines of what they're saying. So maybe someday I'll use your technology to help me understand what I'm seeing and hearing. You also mentioned work with robots, which I think is interesting. And this is again something where banks have experimented a little bit, not a lot. H s, BBC used to have this robot called PEPPER in its lobbies and people could come up to it and ask basic questions and it was sort of a little white humanoid thing. And here and there, there's been a little bit of experimentation to help people navigate a building or to welcome people to a booth at a trade show, that kind of thing. I feel like I guess I feel like it's still early days and sometimes those interactions, I guess personally I'm not a huge fan of those interactions so far from what I've seen these sort of basic robots. Are we getting to a point where these robots really are becoming truly helpful and can they even intuit who wants to talk to them and who doesn't? 

Dr. Aniket Bera (16:07):

So the robot mentioned pepper robots. We had one in our labs, so we used them for some human robot interactions or research in terms of understanding how people react to different kinds of robots. And it does the visual appearance matter at all. And we did the psychology study and some robotics studies. Well, so I think that there are two aspects to robotics in general. So the bigger, the reason why we do robotics the way we do right now is a very goal-oriented task. So you mentioned H s, BBC or some of these other banks have been using the PEPPER to welcome our customers or do some initial, I mean it's more of getting people excited than probably solving a real problem. So these are kind of issues which we care less about in our research lab because it's not solving a real world problem, it's just there to get probably some kids or some adults excited about the possibility of a robot sort of doing some things. 

(17:16)

And the pepper is a very limited robot. It doesn't do much at all. It looks like a human, cute human. But beyond that, so I always look at problems as what is the goal for this? And we are in the business of building the brains part, the AI component. So when we are building this AI systems for virtual avatars or whatever, or even the robotics site, it's the brains. So one of the problem, so we've been working on many robotics applications right now. We are working on a problem where we are trying to build this robot dog where it can go to places where humans have a difficult time to go and then possibly solves a problem. So in this example, what if there was an national disaster like an earthquake, which happened in Turkey just a few weeks or days ago? First responders take a lot of time to arrive and frankly it's very hard for them to access a lot of these areas, especially after a natural disaster. 

(18:18)

People are trapped under rubber, people are trapped without water and all these for days. So our hope is with this multi-robot models where there are drones flying around where this rro robot dog is there, it can quickly walk around difficult trains trying to detect signs of heart rate, signs of heaten, like heat sensors or signs of breathing, and then immediately go to that place, maybe give some medicines, some water, and then call the first responder's asap. So we are trying to go a very goal-directed way of approaching this whole robotics problem. So far, Ruba robotics has been successful is either a very in the field setting of things like the Amazon warehouse where robots are picking objects and putting it in some other places. And in general, warehouse robotics is taking off. I do see some robotics and we worked on them as well on them as well on surgical, surgical robotics. 

(19:18)

So precision, so robots are good for precision. They can solve tasks in a way, specific way and if you teach them, but even in those cases, the human element is taken out. And that's where we feel like if you build these robots where whatever the end goal is, whatever the task is, right? I mean in the future I would want a robot, my grandparents can use it. They are at home most of the time they're alone. They need some help with medications, food, daily chores, but also to make also to make a friend at home. So if it can do all of these things, then I think we have found a purpose. I don't think we are at a point that we can replace or we can completely deploy robots In the real world where there is human interaction, there's still a lot of challenges. 

(20:11)

But with the developments of AI and the developments of hardware of these robots, I think we are getting close. We've always deployed many robots in the real world, some of our robots to very quick. Now how do we navigate in a crowd? Everybody's there. How do I sense whether somebody needs help? When I give you the example that Purdue during Covid, there was a big NSF call about how do I monitor signs of covid, whether people are wearing masks or not, people are taking all these things or not, and a robots sort of walking around. So this is, right now it's more about goal-oriented problems, which we can solve a generic AI to do all kind of tasks. I think we have years, maybe decades from it. That's my take on it. I think people mean we are working on it. We've been working for many years. 

(21:11)

One of the things we have been doing is can I transport medicines quickly from place A to place B? So we've been talking to some collaborators in Africa where we can easily transport lots of medicines, lifesaving medicines on a robot where this robot dog sort of jumps and leaves around forests and uneven roads, which we humans will have a very difficult time to either walk or even drive so that there are applications. Just being acute application at an hssbc bank may not be something on our agenda at this point, but I'm sure there is some business. 

Penny Crosman (21:53):

Right, right. And I don't mean to bash what anybody's doing today. I mean, think it's always a good idea to try new things and experiment. We don't mean to be criticizing anybody for trying these things. It's always worth it. So another place that banks try to do something like this is some banks use sentiment analysis to analyze news articles to see how their band, their brand is being perceived by people. And they do this in Twitter, they do this on soc, other social media channels as well as in news feeds. And I wonder if your work in trying to detect emotion could make this kind of thing better. But again, it's text, it's not the tone and facial expression, et cetera. 

Dr. Aniket Bera (22:44):

You mean from newspaper articles? Y 

Penny Crosman (22:46):

Yes, newspaper articles from Twitter, feeds from Facebook, things of that nature. 

Dr. Aniket Bera (22:53):

Oh yeah, yeah. We are actually working on some of these applications as well. So one of the problems we are looking at now is on Twitter people, there are so many people, billions and billions of tweets and some tweets are more viral, get more viral than the others. And we saw that some hate speech tends to have more. It has an aspect of virality. What is the sentiment associated, what is the emotion or the affective of with that? So we are trying to learn some of those things that what makes a tweet go viral? What makes a newspaper article catch more eyeballs, so to speak, but more from the human behavior and affect emotional perspective is is the way they've written certain things. I mean, you can present the same news and I'm sure you'll know it better than anybody else. We can present the same news in 10 different ways and people might love the same thing and people might hate another different presentation of the exact same facts. 

(23:54)

So we are trying to learn, remove the fact part of a tweet and that understand the way things have been written, what is the general affect or the emotion behind it and how that percolates and how that transfers and potentially can lead to, assuming there's how fake news spreads or how hate speech spreads spreads. Can we detect those things at an earlier stage? Can we understand that this has the potential to spread hatred? If that's the case, can we do something about it? Can we target some of those? You make it better. So again, I shouldn't claim to do things what we should do after figuring that out. That's in a general public discussion, which how to handle things which have the potential to for hate speech or have the potential to be negative in society. But I think the first part of the problem is figuring out if something has the potential is what we are looking at. 

Penny Crosman (24:56):

Yeah, that's really interesting. I mean, in our case, we can see which stories, which headlines are performing well, but we can't generally figure out why so many people clicked on this particular thing or retweeted or something like that. So that would be helpful to a lot of companies to be able to figure out what kinds of messages resonate with people and make people want to want to actually do something. So another way that I've seen banks use or at least think about using some kind of sentiment analysis or emotional trying to read people's emotions is in trying to discover employee misconduct. I've, I remember seeing a demonstration of B m Watson several years ago where it was looking at messages from traders and what it discovered is that traders typically stop using profanity and angry language just before they're about to do something they shouldn't do. So it was sort of an interesting pattern of human behavior. Do you think your work could be applied to things like that? Or on the flip side detecting that some people are unhappy and that there might be something going on where people are more being the victim of some sort of policy change or something that's going on that they don't like? 

Dr. Aniket Bera (26:32):

Can I ask you to repeat the question a little bit in terms of, do you mean from a social media point of view or 

Penny Crosman (26:38):

I was just talking about trying to analyze behavioral patterns that indicate something, whether it's an indicator that people are about to commit fraud or that people are about to leave your company or that they're just miserable based on what they're saying, their emails, their chat messages, their phone conversations. 

Dr. Aniket Bera (27:09):

So there is a lot of work in understanding employee behavior, not just from physically body cues, video cameras and route, but also the way they're composing their email, the way they're responding to conversations. And there is a lot of research along these lines, which are the employees which are potentially likely to get agitated or potentially likely to be frustrated or eventually properly leave the company or even leak personal inform or private information come to information so that there is a good research work out there. I'm not sure how much of it is being applied in the real world primarily because of these shows related to privacy. 

Penny Crosman (27:56):

That's true. 

Dr. Aniket Bera (27:56):

But there is research. There's definitely research which combines again, multimodal research of email responses, chat messages, video footage of which person is most likely to leak personal or company information to the outside. They get so agitated that they leave the company and then start spilling out all secrets. 

Penny Crosman (28:18):

Right, exactly. So before you leave, I just want to ask you about, speaking of things that have gone viral chat G P T and what's your overall reaction to all the buzz around it? And there there's also discussion about could it have human emotions? Because New York Times reporter, Kevin Russ was interacting and it told him it loves him and there's been talk of a AI maybe becoming sentient at some point, or there's a Google engineer who feels that there is sentient, sentient AI today. What do you think about chat G P T and could it ever be part of this work that you are doing or something like it? 

Dr. Aniket Bera (29:11):

So we do a lot of these. We have a chat GPTs like system for us, for our virtual characters and AI avatars because we look at, we've done research on generating facial expressions for these virtual agents. Virtual characters. We look like humans, but we need the content of the speech. So obviously there's a lot of overlap between what we do, A lot of synchronous overlap between what Chad G P D does and what we do. But we do it from the non-text side of things. And Chad PD does only text, right? So there is a common goal there, but there are very different that it's Chad g D's fundamentally different in terms of it's not human. Of course there are some aspects of the, as you mentioned, it has some sentiments or it has some emotions, but at the same time, programs like chat, G P D, if you used them, you'll realize that it is giving you responses as if it's addiction. 

(30:16)

It's Wikipedia right at that. Where it has been trained on most of the data which chat G p D uses is internet data, like articles, Wikipedia pages, but not necessarily human conversations. When humans chat with each other, we have a lot of different ways to communicate. We are not perfect. We make mistakes while we are speaking them. We have something called disciplines in speech hesitation cues sometimes or make some mistakes and then correct it later. So there's very different ways when we are trying to make a human-like agent, then we also have to make the speech look like, or at least sound like human right Now, when you look at Apple Siri or Google's assistant, all of their responses, their audio is fine, but they're been robotic in nature. So we need to bring the human element to it to make people trust these systems more, to make these p p systems a little bit more. 

(31:24)

And that's a hope that when there is a little human-like features to the system, that maybe people will end up trusting more. Right now there's a big trust gap between AI and technology and AI technologies and the broader people. And those are not unfound. There are genuine issues with ai, fairness in ai, there's privacy issues in ai. There's a hundred different problems, but at the same time, we need to solve all of them together. So if we stop research in AI, while we are only looking at the problems, then I have, I'm worried that this will stunt the development or the rapid pace of development we are seeing right now in AI and chat. GPD is a very good example. I'm, I'm happy that you brought it up, that it's one of the exam technologies which actually has solved a lot of these problems. Is it perfect? 

(32:16)

Not at all. It has many problems with it. It possibly has some sentient characteristics, it confidently saves things incorrectly, but those things will get better. And the more people use technologies like these, I mean, you got to realize one thing, AI is heavily dependent on the data it learns. So the more we use it, the more it learns, the better it can get in order to serve the general audience. Right. So chat G P D is imperfect, but it's a good stepping stone for a lot of these language or human-like technologies to come in the next few years. 

Penny Crosman (32:55):

Yes. That's a great summary and analysis of it. Well, Dr. Aniket Bera, thank you so much for joining us and to all of you, thank you for listening to the American Banker podcast. I produced this episode with audio production by Kevin Parise. Special thanks this week to Aniket Bera at Purdue University. Rate us, review us and subscribe to our content at www.americanbanker.com/subscribe. For American Banker, I'm Penny Crosman and thanks for listening.