Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Toward Accurate and Actionable Differential Diagnosis with Lean LLM Orchestration
0
Zitationen
1
Autoren
2025
Jahr
Abstract
Large language models (LLMs) can assist clinicians with diagnostic reasoning, yet their autonomous diagnostic performance remains uncertain. We evaluated OpenMedicine AI, an LLM-powered diagnostic agent with a deterministic controller, on 302 New England Journal of Medicine Clinicopathological Conference (CPC) cases, a benchmark renowned for diagnostic difficulty. Models produced ranked differential-diagnosis lists. Accuracy was assessed by inclusion of the ground-truth diagnosis within the Top-n list (Top-n accuracy) and by Capture@K, an actionability metric that is “captured” if any of the Top-n differentials would appropriately lead a clinician to order the diagnostic test of record (DToR) or its immediate precursor. Across 302 CPCs, OpenMedicine AI achieved 46.0% Top-1 and 79.1% Top-10 accuracy, outperforming AMIE (32.5%, 68.9%) and physicians (15.6%, 20.9%). Paired McNemar tests confirmed superiority at all thresholds (p < 10 -5 ). For actionability, at Capture@10 it matched or exceeded AMIE in 97.0% of cases and physicians in 96.7%. It rescued 99 of 302 cases missed by physicians (odds ratio [OR] 16.5) and 44 missed by AMIE (OR 7.3), reducing misses by 31 and 13 per 100 cases, respectively. These gains correspond to a number needed to assess (NNA) of 3.21 versus physicians and 7.95 versus AMIE. A safety margin was evident already at Capture@3, with rescues outnumbering failures to rescue versus physicians (109 vs 15; OR 7.27; 95% CI, 4.24 to 12.47; p=8.7×10 -19 ) and versus AMIE (61 vs 15; OR 4.07; 95% CI, 2.31 to 7.15; p=9.84×10 -8 ), corresponding to 31 and 15 fewer misses per 100 cases, respectively. These findings indicate that a lightweight, deterministic controller layered over state-of-the-art LLMs can narrow the gap between diagnostic recall and clinical actionability. By producing high-quality differentials and prioritizing rational next tests, this approach offers a scalable, resource-efficient path to improved diagnostic performance in high-complexity clinical scenarios.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.380 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.243 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.671 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.496 Zit.