Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Toward Accurate and Actionable Differential Diagnosis with Lean LLM Orchestration

2025·0 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large language models (LLMs) can assist clinicians with diagnostic reasoning, yet their autonomous diagnostic performance remains uncertain. We evaluated OpenMedicine AI, an LLM-powered diagnostic agent with a deterministic controller, on 302 New England Journal of Medicine Clinicopathological Conference (CPC) cases, a benchmark renowned for diagnostic difficulty. Models produced ranked differential-diagnosis lists. Accuracy was assessed by inclusion of the ground-truth diagnosis within the Top-n list (Top-n accuracy) and by Capture@K, an actionability metric that is “captured” if any of the Top-n differentials would appropriately lead a clinician to order the diagnostic test of record (DToR) or its immediate precursor. Across 302 CPCs, OpenMedicine AI achieved 46.0% Top-1 and 79.1% Top-10 accuracy, outperforming AMIE (32.5%, 68.9%) and physicians (15.6%, 20.9%). Paired McNemar tests confirmed superiority at all thresholds (p < 10 -5 ). For actionability, at Capture@10 it matched or exceeded AMIE in 97.0% of cases and physicians in 96.7%. It rescued 99 of 302 cases missed by physicians (odds ratio [OR] 16.5) and 44 missed by AMIE (OR 7.3), reducing misses by 31 and 13 per 100 cases, respectively. These gains correspond to a number needed to assess (NNA) of 3.21 versus physicians and 7.95 versus AMIE. A safety margin was evident already at Capture@3, with rescues outnumbering failures to rescue versus physicians (109 vs 15; OR 7.27; 95% CI, 4.24 to 12.47; p=8.7×10 -19 ) and versus AMIE (61 vs 15; OR 4.07; 95% CI, 2.31 to 7.15; p=9.84×10 -8 ), corresponding to 31 and 15 fewer misses per 100 cases, respectively. These findings indicate that a lightweight, deterministic controller layered over state-of-the-art LLMs can narrow the gap between diagnostic recall and clinical actionability. By producing high-quality differentials and prioritizing rational next tests, this approach offers a scalable, resource-efficient path to improved diagnostic performance in high-complexity clinical scenarios.

Autoren

Erica Yang

Institutionen

Open Society(CZ)

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsMachine Learning in Healthcare

Volltext beim Verlag öffnen

Toward Accurate and Actionable Differential Diagnosis with Lean LLM Orchestration

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen