Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Quantitative Synthesis of Large Language Model Performance in Medical Reasoning Tasks

2026·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

This study presents a comprehensive evaluation of AI diagnostic models across diverse clinical cases sourced from Thieme (77 cases) and Elsevier (48 cases). The dataset spans frequent (42 cases), less frequent (44 cases), and rare (39 cases) conditions, ensuring balanced assessment. Diagnostic performance, treatment recommendation accuracy, and linguistic reliability were compared across multiple state-of-the-art AI systems. Results highlight strong overall diagnostic capabilities, with notable variations across specialties and disease frequencies. While Pediatrics consistently demonstrated the highest performance, Surgery emerged as the most challenging specialty. Among models, GPT-4o achieved superior diagnostic consensus, treatment recommendation accuracy, and linguistic precision, underscoring its clinical utility. The findings provide empirical benchmarks for advancing AI-based medical decision support systems.

Autoren

Institutionen

Themen

Topic ModelingMachine Learning in HealthcareArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Quantitative Synthesis of Large Language Model Performance in Medical Reasoning Tasks

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen