Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Quantifying the speed-accuracy trade-off of large language models on oral and maxillofacial surgery multiple-choice questions

2025·2 Zitationen·Scientific ReportsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large language models (LLMs) such as GPT-4o, Copilot and Gemini are entering dental curricula, yet their suitability for real-time decision support remains unclear because most evaluations report accuracy alone. This prospective in silico diagnostic-accuracy study benchmarked six engines-GPT-4o, OpenAI o3, Copilot-Quick, Copilot-Deep, Gemini-Flash and Gemini-Pro-against 1766 single-best-answer multiple-choice questions from a contemporary oral and maxillofacial surgery (OMFS) board-review text. Textbook keys served as the reference standard. Overall and domain-level accuracy, intra-model answer consistency and per-batch response latency were recorded; χ² tests compared accuracies and Kruskal-Wallis with multiplicity-adjusted Mann-Whitney U tests compared response times. Accuracy differed significantly across engines (χ² = 97.31, p < 0.001), ranging from 77.9% for Copilot-Quick to 88.3% for Gemini-Pro. Reasoning-optimised variants (o3, Copilot-Deep, Gemini-Pro) exceeded their speed-tuned counterparts by 3.8-6.2% points, with the largest gains in trauma, craniofacial deformity and orthognathic surgery domains. These improvements incurred a marked latency penalty: median response times of 2.1-3.1 s versus 0.1-0.2 s for the faster engines. Each additional 3-6 correct answers per 100 items therefore required roughly 2-3 s of extra processing. Items unanswered by all models clustered around rare numeric facts and negatively worded stems. Reasoning-optimised LLMs deliver clinically meaningful accuracy gains on OMFS board questions, but educators and clinicians must weigh this benefit against slower output and maintain expert oversight to mitigate residual knowledge gaps.

Autoren

Institutionen

Phenikaa University(VN)

Themen

Artificial Intelligence in Healthcare and EducationDental Radiography and ImagingTopic Modeling

Volltext beim Verlag öffnen

Quantifying the speed-accuracy trade-off of large language models on oral and maxillofacial surgery multiple-choice questions

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen