OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 04.05.2026, 05:38

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative analysis of the performance of artificial intelligence language models on Turkish dental specialty examination questions

2026·0 Zitationen·BMC Oral HealthOpen Access
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2026

Jahr

Abstract

The aim of this study was to compare the performance of three large language models (LLMs), namely ChatGPT-4o, Gemini 2.0 Pro, and DeepSeek-R1 on multiple-choice questions from the Turkish Dental Specialty Examination (DUS). Ethical approval was not required for this study. A total of 1506 questions from DUS examinations conducted between 2012 and 2021 were independently entered into the LLMs in their original Turkish language using a zero-shot prompt structure. The study was conducted in July 2025 using specific model versions (ChatGPT-4o, Gemini 2.0 Pro, and DeepSeek-R1). Canceled questions and those containing visual material were excluded. The number of correct answers was recorded for total performance, for basic and clinical sciences, and for 15 sub-disciplines. Statistical analyses were performed using Cochran’s Q test, and in cases of significance, Bonferroni-corrected McNemar tests were applied. A p-value < 0.05 was considered statistically significant. A significant difference was found among the three models in terms of total correct answers (p < 0.001). Gemini 2.0 Pro provided significantly more correct answers compared to ChatGPT-4o (p < 0.001) and DeepSeek-R1 (p < 0.001). No significant difference was observed in basic sciences (p = 0.050), whereas Gemini 2.0 Pro outperformed both models in clinical sciences (p < 0.001). Subgroup analysis showed that Gemini 2.0 Pro outperformed ChatGPT-4o in periodontology (OR:4.43, p = 0.002), restorative dentistry (OR:2.75, p = 0.002), prosthodontics (OR: 2.65, p < 0.001), orthodontics (OR: 3.05, p = 0.003), and pediatric dentistry (OR: 2.98, p < 0.001); and outperformed DeepSeek-R1 in prosthodontics (OR: 2.20, p = 0.002). Although performance differences exist among LLMs, rapid advancements in these technologies are likely to reduce such disparities, enabling broader applications of LLMs in dental education.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationDental Research and COVID-19Dental Radiography and Imaging
Volltext beim Verlag öffnen