Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of large language model in cross-specialty medical scenarios

2025·0 Zitationen·Journal of Translational MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large language models (LLMs) demonstrate transformative potential in healthcare, yet their diagnostic and therapeutic accuracy across medical specialties remains inadequately characterized. This study aimed to compare diagnostic and therapeutic capabilities of GPT-4o, GPT-3.5-Turbo, Claude-3-Sonnet across 12 medical specialties using standardized clinical vignettes. 50 PubMed-derived clinical cases between 2007 and 2024 were assessed. Two board-certified physicians independently evaluated LLMs outputs, with a senior clinician adjudicating discrepancies. All LLMs received identical text-based case descriptions with or without images, generating free-text diagnostic and therapeutic recommendations for blinded, randomized evaluation. Among the three evaluated LLMs, GPT-4o demonstrated superior diagnostic accuracy (median 10; IQR, 7.5–10), outperforming Claude-3-Sonnet (median 8; IQR, 2.8–10; P = .02) and GPT-3.5-Turbo (median 4; IQR, 1–9.3; P < .0001). A narrow IQR and minimal variation (SD = 2.9; range = 5.0) reflected high consistency in diagnostic outputs across diverse medical fields. For therapeutic recommendations, GPT-4o (median 10, IQR 0–10) outperformed GPT-3.5-Turbo (median 0, IQR 0–6.3; P = .0005) but showed no significant advantage over Claude-3-Sonnet (median 5, IQR 0–10; P = .45). This study demonstrates that advanced LLMs, particularly GPT-4o, have significant potential to support clinical diagnostics, showing high accuracy and consistency across specialties. However, their inconsistent performance in generating therapeutic recommendations presents a major barrier to clinical adoption.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationGenomics and Rare DiseasesMachine Learning in Healthcare

Volltext beim Verlag öffnen

Performance of large language model in cross-specialty medical scenarios

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen