GoogleThe AI-powered medical chatbot has achieved a passing score in a tough US medical licensing exam, but its responses still lag behind those of human doctors, a peer-reviewed study said Wednesday.
Last year, the launch of ChatGPT, whose developer open AI is backed by rival Google Microsoft — kicked off a race among tech giants in the burgeoning field of AI.
While much has been made about the future possibilities—and dangers—of AI, healthcare is an area where the technology has already shown tangible progress, with algorithms capable of reading certain medical and human scans.
Google first introduced its artificial intelligence tool for answering medical questions, called Med-PaLM, in a preprint study in December. Unlike ChatGPT, it has not been released to the public.
The US tech giant says Med-PaLM is the first large language model, an AI technique trained on large amounts of human-produced text, to pass the US Medical Licensing Exam (USMLE). .
A passing score for the exam, which is taken by medical students and medical trainees in the United States, is around 60 percent.
In February, a study said that ChatGPT had achieved pass or near pass results.
In a peer-reviewed study published in the journal Nature on Wednesday, Google researchers said Med-PaLM had scored 67.6 percent on USMLE-style multiple-choice questions.
“Med-PaLM works encouragingly, but remains inferior to physicians,” the study said.
To identify and reduce “hallucinations,” the name given when AI models provide false information, Google said it had developed a new evaluation benchmark.
Karan Singhal, a Google researcher and lead author of the new study, told AFP that the team used the benchmark to test a newer version of their model with “super exciting” results.
Med-PaLM 2 scored 86.5 percent on the USMLE exam, beating the previous version by nearly 20 percent, according to a preliminary non-peer-reviewed study published in May.
– ‘Elephant in the room’: James Davenport, a computer scientist at the UK’s University of Bath who was not involved in the research, said “there is an elephant in the room” for these AI-powered medical chatbots.
There’s a big difference between answering “medical questions and real medicine,” which includes diagnosing and treating genuine health problems,” he said.
Anthony Cohn, an AI expert at the UK’s University of Leeds, said hallucinations would probably always be a problem for such large language models, due to their statistical nature.
Therefore, these models “should always be viewed as assistants rather than final decision makers,” Cohn said.
Singhal said that in the future, Med-PaLM could be used to help clinicians offer alternatives that might not otherwise have been considered.
The Wall Street Journal reported earlier this week that Med-PaLM 2 has been in trials at the prestigious Mayo Clinic research hospital in the US since April.
Singhal said he could not speak on specific partnerships.
But he stressed that any tests would not be “clinical, or patient-facing, or capable of causing harm to patients.”
Instead, it would be for “more administrative tasks that can be automated relatively easily, with little risk,” he added.