Dr. Yaa Kumah-Crystal is an Assistant Professor at Vanderbilt, as well as a practicing pediatric endocrinologist. Dr. Kumah-Crystal, along with Dan Albert, Associate Director of Health IT Product Development at Vanderbilt, is building EVA (EHR Voice Assistant). It is a voice user interface for physicians that can naturalistically communicate with an EHR to ask questions and get answers back. Erum Khan is CEO of SoundMind, which creates voice AI applications to empower seniors and improve the caregiving experience. The three technologists joined us on the podcast to discuss voice technology and AI, what it can do and what it is doing across health and healthcare.
Talk about the evolution of voice technology
People [today are] really emphasizing usability more for the end user in medicine. Unfortunately, I think we [in healthcare] tend to fall decades behind in terms of the technology that we have access to in that we’re able to use. In the consumer world you […] get to say, “well, I’m going to choose this device over this because it’s more useful and more user friendly.” Whereas with the EHR, that’s just what the company bought and that’s what you have to use to take care of your patients. And there’s very little empowerment of the provider. Speech, like you mentioned, that’s the most naturalistic way we have to communicate. We learn it starting at birth.
So what we are trying to do here is that we’re trying to say, what can we do and what questions are we able to allow providers to ask as if they’d be talking to another person, and then get information back to make the process of interacting with the EHR as seamless as possible.
My father’s a doctor and I remember when he brought home Dragon Dictate software back in 1997 and it was pretty groundbreaking for my siblings and I. […] I use dragon dictate one time to write a report for school, and I quickly saw the limitations of segmenting words and the overall accuracy of transcribing speech to text. [Look at] that time to where we are now. There has been such fast improvements that are enabling a lot of things when it comes to healthcare and beyond.
Explain where machine learning and artificial intelligence comes into play
The way it works effectively is somebody talks into a microphone. These tools then convert your speech into text and that kind of technology has been getting better and better over the years, [but] has been around for quite a long time. Dragon Dictate and things like that. So I can speak and I can say words and they’re recognized and the meaning of the words isn’t recognized in any way, but it doesn’t matter.
What’s really changed recently is the accuracy of the natural language processing that then takes that text that’s been recognized from speech and starts assigning meaning to it. And so what these tools do is they recognize what you can think of as intents. So what are you trying to accomplish? So an example for us would be that I can say to EVA, “tell me about my next patient,” and EVA can not only recognize the literal words but then realize that my intent is a patient summary. […] I want a summary for a patient and we then take that and turn it into a set of queries in our electronic health record to gather information. So that’s machine language versus artificial intelligence. And what’s going on with this natural language processing is recognition of intents and entities […] The machine language part of that is improving that over time so I can say variants of the same phrase, and over time the algorithm will get better and better at picking that up so that I don’t have to train the program on literally everything that somebody might say every different way while asking the same question.
What about safety concerns, ensuring that accurate information is returned to the user?
We have folks watching [pilot programs] very closely as we try and learn from these things, but also over time a number of best practices have evolved to help address some of the kinds of concerns and risks that you bring up. So just as an example, when you say, “tell me about my next patient,” you get a picture of the patient that’s displayed and the patient name is right there and it’s spoken to you. So there are a number of different ways that we can build safety mechanisms into both the visual and spoken responses.
For our clients, who are seniors and often times are known to be tech adverse, there’s a big education component. So the way that we introduce this technology, we take the time to educate them on Alexa, [we say], “look, this is your new roommate. Alexa is going to be living here” and then teaching them the things that they can ask and the thing that they can access. We’re backing into healthcare. [We’re starting by] understanding how seniors and older adults can use the technology and just ask simple questions like, “what’s the weather?” [Or] “remind me to take my medication” and really empower them to take control over their own access to information and then as they build more trust and more understanding what the technology, they’ll be able to feel more comfortable speaking to the voice assistants about their healthcare.
And also the concept of confidence scores and confirmation when you’re giving responses back to patients is a huge component of making sure you’re building for safety. Because it is a very high stakes game with medication and presenting [medical] information. So for example, if someone asks, “what problems are the on this patient’s problem list,” [the system] can just start spouting back the problems on the problem list. But there’s this concept of, well, what’s the confidence score? How well do I think I understood your question? And based on that threshold, how I deliver information back, certainly I could more implicitly confirm what I heard, and so explicitly say, “I heard you asked what problems are in the patient’s list, is that correct?” And have the person confirm.
What are some of the drawbacks to voice technology?
The biggest drawback is just the fact that voice, unlike a screen that is graphical and you can visually filter and focus on the information you want to see, voice is linear. As someone talks, you are collecting the information over time […] so you need to be able to summarize something in a very concise way to answer someone’s question. Otherwise they’re going to get fed up. So for example, many of our patients are on several medications. When I ask “what medications is the patient on,” am I going to read you back a list of 15 things? No, […] another individual would say, well, they’re on Tylenol and propranolol and Vancomycin and a few others. Do you want to hear the rest of them? That would make sense. But a layer on top of that is you can’t just arbitrarily picked three medications that you’re going to list back. The end user who is a provider would want the three most relevant medications for that patient. So you can think of all these conditional statements you have to put in there to be able to give back something that’s actually useful. They’re asking a question, but you almost have to read their mind to understand, well, what do they actually want back? And that’s been the most challenging part.
The other thing about the [voice technology] itself, certainly in a healthcare environment, [is that] you have to be very conscious about privacy and health information. So you don’t want to speak patient’s private information in a context where it’s going to be overheard inappropriately. So what we’re saying is we need to an interface that allows flexibility. Sometimes results are spoken, sometimes they’re visual; sometimes I speak to it, sometimes I type in information.
What is voice tech going to do for healthcare in the long run?
You can imagine being able to access information and perform tasks that can alleviate some stress and help provoke behavioral decisions that can lead to better healthcare outcomes. So for example, being able to call the front desk or call your family easily without having to fiddle with the phone. Or of course, empowering the individual to remind themselves to take their medication or even having a caretaker put in notifications of a treatment plan that a doctor has prescribed. And there’s the healthcare part where it’s medicine and figuring out how to treat disease. But then of course there’s the preventative part where you can take more command of your life with access to information, and it’s useful to you and you enjoy engaging with it.
For provider interactions with the EHR, what’s really interesting and hasn’t been explored yet is the role of decision support when you have a voice user interface. There’s this concept of alert fatigue for when users are getting popups all the time, they tend to click away and disregard them. But could we potentially improve safety by interjecting relevant components of information? So, “who am I seeing today?” “You’re seeing so and so and oh, by the way, she recently had an emergency department visit.” Or if I’m trying to prescribe a medication: “Oh, it looks like Sally’s allergic to so and so, and a preferred medication would be this, would you like to try that instead?”
Also, we’re going to be looking at time on task studies, [looking at] how easy it is to accomplish the task using a voice interface compared to the traditional keyboard and mouse. Because there’s this concept of foraging for information and trying to find what you need just buried somewhere in the EHR, and that’s really impacting user satisfaction and leading to provider burnout. But if you can just ask for what you need and have it magically appear, that would make for a great experience for the provider and the patient.