Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparison of GPT-4o and o3-Mini on Otolaryngology USMLE-Style Questions
0
Zitationen
4
Autoren
2025
Jahr
Abstract
Outcomes were to compare the accuracy of 2 large-language models-GPT-4o and o3-Mini-against medical-student performance on otolaryngology-focused, USMLE-style multiple-choice questions. With permission from AMBOSS, we extracted 146 Step 2 CK questions tagged "Otolaryngology" and stratified them by AMBOSS difficulty (levels 1-5). Each item was presented verbatim to GPT-4o and o3-Mini through their official APIs; outputs were scored correct/incorrect. Historical, de-identified student responses to the same items served as the comparator. Accuracy (%) was calculated per difficulty tier. Group differences were assessed with one-way ANOVA followed by independent-samples t tests (α=0.05). Mean accuracy across all items was 93.35% for o3-Mini and 90.45% for GPT-4o ( P =0.465). Both models outperformed students (55.44%; P =0.008 and 0.012, respectively). Performance for GPT-4o and o3-Mini remained ≥86% across all 5 difficulty levels, whereas student accuracy declined from 85.6% (level 1) to 26.7% (level 5). At the hardest tier, o3-Mini achieved 100% accuracy. GPT-4o and o3-Mini markedly exceed average medical-student performance on ENT-specific USMLE-style questions, maintaining high accuracy even at the greatest difficulty. These findings support the integration of advanced language models as adjunctive learning tools in otolaryngology.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.