In a recent study, Google researchers say they have developed artificial intelligence (AI) that surpasses real doctors in accuracy and empathy, raising questions about the potential humanity of AI in healthcare.
Google AI development
The company has created an advanced language model called AMIE (Articulate Medical Intelligence Explorer) , powered by real data from medical records and doctor appointment transcripts. This system has been used to conduct medical interviews, as reported in a January 11 study on the preprint site arXiv .
Methodology of the study
In the study, twenty actors pretending to be patients participated in text-based online medical consultations, without knowing whether they were interacting with an AI or real doctors. Surprisingly, the AI outperformed real doctors on 24 of 26 conversational measures, showing greater empathy, courtesy and the ability to put patients at ease. Additionally, AI matched or exceeded primary care physicians in all six diagnostic categories examined.
What we read in AMIE’s study
“At the heart of medicine is the doctor-patient dialogue, in which skillful medical history paves the way for accurate diagnoses, effective management and lasting trust.
Artificial intelligence (AI) systems capable of diagnostic dialogue could increase the accessibility, consistency and quality of care.
However, bringing the expertise of doctors closer together is a great and exceptional challenge.
Here we present AMIE (Articulate Medical Intelligence Explorer) , a Large Language Model (LLM)-based artificial intelligence system optimized for diagnostic dialogue.
AMIE uses a novel autonomous play-based simulated environment with automated feedback mechanisms to adapt learning to different disease conditions, specialties and contexts.
We designed a framework for evaluating clinically significant axes of performance, including medical history, diagnostic accuracy, management reasoning, communication skills, and empathy.
We compared the performance of AMIE with that of primary care physicians (PCPs) in a double-blind, randomized crossover study of text-based consultations with validated patient actors in the style of an objective structured clinical examination (OSCE).
The study included 149 case scenarios from clinical providers in Canada, the United Kingdom and India, 20 PCPs for comparison with AMIE, and assessments by medical specialists and patient actors.
AMIE demonstrated greater diagnostic accuracy and superior performance on 28 out of 32 axes according to medical specialists and 24 out of 26 axes according to patients.
Our research has several limitations and should be interpreted with due caution. Clinicians were limited to unfamiliar synchronous text chats that allow for large-scale LLM-patient interactions but are not representative of usual clinical practice.
While more research is needed before AMIE can be translated into real-world settings, the findings represent a milestone toward conversational diagnostic AI.”
Precautions and limitations
The researchers stressed that the technology is still experimental and has not been tested on real patients. Additionally, the study has not yet been peer-reviewed.
Statements and interpretations
Alan Karthikesalingam , MD, PhD, clinical research scientist at Google Health and co-author of the study, told Nature in a January 12 article :
“We want the results to be interpreted with caution and humility. This in no way means that a language model is better than doctors at taking medical history.”
He noted that GPs involved in the study were likely not used to interacting with patients via text chat, which could have affected their performance.
He also noted that AI might seem more thoughtful than humans because, unlike humans, it doesn’t get tired.
Future potential of AI in healthcare
Despite the caveats, the researchers point out that the tool demonstrates “significant potential” to transform medical history collection and diagnostic conversations in healthcare.