Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparison of GPT-4o and o3-Mini on Otolaryngology USMLE-Style Questions

2025·0 Zitationen·Journal of Craniofacial Surgery

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Outcomes were to compare the accuracy of 2 large-language models-GPT-4o and o3-Mini-against medical-student performance on otolaryngology-focused, USMLE-style multiple-choice questions. With permission from AMBOSS, we extracted 146 Step 2 CK questions tagged "Otolaryngology" and stratified them by AMBOSS difficulty (levels 1-5). Each item was presented verbatim to GPT-4o and o3-Mini through their official APIs; outputs were scored correct/incorrect. Historical, de-identified student responses to the same items served as the comparator. Accuracy (%) was calculated per difficulty tier. Group differences were assessed with one-way ANOVA followed by independent-samples t tests (α=0.05). Mean accuracy across all items was 93.35% for o3-Mini and 90.45% for GPT-4o ( P =0.465). Both models outperformed students (55.44%; P =0.008 and 0.012, respectively). Performance for GPT-4o and o3-Mini remained ≥86% across all 5 difficulty levels, whereas student accuracy declined from 85.6% (level 1) to 26.7% (level 5). At the hardest tier, o3-Mini achieved 100% accuracy. GPT-4o and o3-Mini markedly exceed average medical-student performance on ENT-specific USMLE-style questions, maintaining high accuracy even at the greatest difficulty. These findings support the integration of advanced language models as adjunctive learning tools in otolaryngology.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiology practices and educationClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Comparison of GPT-4o and o3-Mini on Otolaryngology USMLE-Style Questions

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen