Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative performance of AI models on case-based oral medicine questions across Bloom’s taxonomy levels and subtopics

2026·0 Zitationen·OdontologyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Artificial intelligence (AI) chatbots are increasingly used by dental students for self-directed learning, yet their performance in specialty-level subjects like oral medicine remains underexplored. As oral medicine encompasses diagnostic and clinical reasoning across interdisciplinary domains, assessing AI competence in this field is necessary. This study aimed to evaluate and compare the performance of four advanced AI chatbots-ChatGPT-4, Microsoft Copilot, Google Gemini, and DeepSeek-in answering case-based oral medicine multiple choice questions (MCQs) across Bloom's cognitive levels and key subtopics. A total of 114 high-quality, case-based MCQs were developed and validated based on authoritative references. Each question was classified according to Bloom's taxonomy and mapped to one of six oral medicine subdomains. The chatbots' responses were evaluated for accuracy, response time, and word count. Statistical comparisons were performed using Cochrane Q test, Friedman test, McNemar's test, and Cohen's kappa for inter-model agreement. All four chatbots demonstrated high overall accuracy (≥ 97.4%), with Microsoft Copilot showing numerically the highest score (99.1%) although no statistically significant differences were observed among the models. ChatGPT-4 generated the fastest response (mean: 7.0 s), while Copilot provided the most detailed explanations. Performance was consistent across cognitive levels, with near-perfect accuracy in the "Applying" and "Analyzing" domains. Accuracy across subtopics was also high although minor discrepancies were noted in infectious diseases and oral potentially malignant disorders. Inter-chatbot agreement ranged from moderate to perfect (kappa = 0.315-1.00). Advanced AI chatbots, including ChatGPT-4, Copilot, Gemini, and DeepSeek, demonstrated similarly high performance in answering case-based multiple choice questions in oral medicine.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAI in Service InteractionsClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Comparative performance of AI models on case-based oral medicine questions across Bloom’s taxonomy levels and subtopics

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen