Patient Satisfaction

Research Suggests Diagnostic Decision Support Systems More Effective at Diagnosing Disease Than LLMs

June 2, 2025

In the study, both LLMs and DHSSs performed well, but the latter had more efficacy.

683dd36c3b403865f316e759 Healthcare Ai 2

New research published in JAMA Network Open suggests that diagnostic decision support systems (DHSS) were more effective than generative AI and large language models (LLMs) for diagnosing disease.

Computer scientists at Massachusetts General Hospital (MGH) developed their own DHSS called DXplain in 1984. It “relies on thousands of disease profiles, clinical findings, and data points to generate and rank potential diagnoses for use by clinicians.” Researchers with MGH compared “ChatGPT, Gemini, and DXplain at diagnosing patient cases, revealing that DXplain performed somewhat better, but the LLMs [ChatGPT and Gemini] also performed well. The investigators envision pairing DXplain with an LLM as the optimal way forward, as it would improve both systems and enhance their clinical efficacy.”

Corresponding author Mitchell Feldman wrote that DHSSs “can enhance and expand clinicians’ diagnoses, recalling information that physicians may forget in the heat of the moment.” He also writes that “combining the powerful explanatory capabilities of existing diagnostic systems with the linguistic capabilities of [LLMs] will enable better automated diagnostic decision support and patient outcomes.”

According to the research, all three of DXplain, ChatGPT, and Gemini “listed the correct diagnosis most of the time,” at 72%, 64%, and 58% respectively. Without lab data, “DXplain listed the correct diagnosis 56% of the time, outperforming ChatGPT (42%) and Gemini (39%), though the results were not statistically significant.” Preliminary work building off of these findings “reveals that LLMs could be used to pull clinical findings from narrative text, which could then be plugged into DDSSs.”