Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination
0
Zitationen
2
Autoren
2026
Jahr
Abstract
Background: This study systematically evaluated the performance of large language models (LLMs) on official periodontology questions from the Turkish Dental Specialization Examination (DUS).Methods: A total of 180 text-based questions (159 multiple-choice (MCQs), 21 combination-type MCQs (C-MCQs)) were categorized into nine domains across 13 years (2012–2024). In April 2025, eight LLMs were tested: ChatGPT-4o, ChatGPT-4o mini (OpenAI), Gemini 1.5 Flash, Gemini 1.5 Pro, Gemini 2.0 Flash (Google DeepMind), Copilot (Microsoft), DeepSeek-V3 (DeepSeek), and Qwen 2.5-Max (Alibaba Cloud). Each question was submitted independently via official interfaces. Accuracy rates were compared across models, domains, years, and question types using Pearson’s chi-square test, with Cramér’s V and Phi coefficients reported for effect sizes.Results: Accuracy differed significantly by domain (χ²(8, N = 1440) = 38.20, p < .001, Cramér’s V = .163). Gemini 2.5 Pro achieved the highest performance, scoring 100% in six domains and ≥87.5% in others. ChatGPT-4o mini and Qwen 2.5-Max underperformed, particularly in Periodontium and Periodontal Treatment. Year-based analysis showed stable performance across 2012–2024 (χ²(12, N = 1440) = 14.51, p = .269). No difference emerged between MCQs and C-MCQs (χ²(1, N = 1440) = 1.42, p = .233).Conclusion: LLM accuracy in periodontology is domain- and model-dependent. Advanced systems such as Gemini 2.5 Pro show potential as supportive tools for education and clinical decision-making, yet persistent weaknesses in reasoning- and calculation-intensive areas underscore the need for expert oversight.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.316 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.177 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.575 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.468 Zit.