Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative analysis of the performance of artificial intelligence language models on Turkish dental specialty examination questions
0
Zitationen
3
Autoren
2026
Jahr
Abstract
The aim of this study was to compare the performance of three large language models (LLMs), namely ChatGPT-4o, Gemini 2.0 Pro, and DeepSeek-R1 on multiple-choice questions from the Turkish Dental Specialty Examination (DUS). Ethical approval was not required for this study. A total of 1506 questions from DUS examinations conducted between 2012 and 2021 were independently entered into the LLMs in their original Turkish language using a zero-shot prompt structure. The study was conducted in July 2025 using specific model versions (ChatGPT-4o, Gemini 2.0 Pro, and DeepSeek-R1). Canceled questions and those containing visual material were excluded. The number of correct answers was recorded for total performance, for basic and clinical sciences, and for 15 sub-disciplines. Statistical analyses were performed using Cochran’s Q test, and in cases of significance, Bonferroni-corrected McNemar tests were applied. A p-value < 0.05 was considered statistically significant. A significant difference was found among the three models in terms of total correct answers (p < 0.001). Gemini 2.0 Pro provided significantly more correct answers compared to ChatGPT-4o (p < 0.001) and DeepSeek-R1 (p < 0.001). No significant difference was observed in basic sciences (p = 0.050), whereas Gemini 2.0 Pro outperformed both models in clinical sciences (p < 0.001). Subgroup analysis showed that Gemini 2.0 Pro outperformed ChatGPT-4o in periodontology (OR:4.43, p = 0.002), restorative dentistry (OR:2.75, p = 0.002), prosthodontics (OR: 2.65, p < 0.001), orthodontics (OR: 3.05, p = 0.003), and pediatric dentistry (OR: 2.98, p < 0.001); and outperformed DeepSeek-R1 in prosthodontics (OR: 2.20, p = 0.002). Although performance differences exist among LLMs, rapid advancements in these technologies are likely to reduce such disparities, enabling broader applications of LLMs in dental education.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.560 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.451 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.948 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.797 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.