New findings from the National Institutes of Health (NIH) reveal that “an artificial intelligence (AI) model solved medical quiz questions—designed to test health professionals’ ability to diagnose patients based on clinical images and a brief text summary—with high accuracy. However, physician-graders found the AI model made mistakes when describing images and explaining how its decision-making led to the correct answer.” The NIH's website has the news.
National Library of Medicine (NLM) acting director Stephen Sherry cautions that the results show that “AI is not advanced enough yet to replace human experience, which is crucial for accurate diagnosis.”
Both the AI model and human physicians answered questions from an “image challenge…that provides real clinical images and a short text description that includes details about the patient’s symptoms and presentation, then asks users to choose the correct diagnosis from multiple-choice answers.” The AI model was tasked with answering 207 questions and providing a “written rationale to justify each answer.” Nine physicians then answered questions assigned to them and were asked to “score the AI’s ability to describe the image, summarize relevant medical knowledge, and provide its step-by-step reasoning.”
The AI model “selected the correct diagnosis more often than physicians in closed-book settings, while physicians with open-book tools performed better than the AI model, especially when answering the questions ranked most difficult.” However, the AI model “often made mistakes when describing the medical image and explaining its reasoning behind the diagnosis — even in cases where it made the correct final choice.”