Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative performance of AI models on case-based oral medicine questions across Bloom’s taxonomy levels and subtopics
0
Zitationen
2
Autoren
2026
Jahr
Abstract
Artificial intelligence (AI) chatbots are increasingly used by dental students for self-directed learning, yet their performance in specialty-level subjects like oral medicine remains underexplored. As oral medicine encompasses diagnostic and clinical reasoning across interdisciplinary domains, assessing AI competence in this field is necessary. This study aimed to evaluate and compare the performance of four advanced AI chatbots-ChatGPT-4, Microsoft Copilot, Google Gemini, and DeepSeek-in answering case-based oral medicine multiple choice questions (MCQs) across Bloom's cognitive levels and key subtopics. A total of 114 high-quality, case-based MCQs were developed and validated based on authoritative references. Each question was classified according to Bloom's taxonomy and mapped to one of six oral medicine subdomains. The chatbots' responses were evaluated for accuracy, response time, and word count. Statistical comparisons were performed using Cochrane Q test, Friedman test, McNemar's test, and Cohen's kappa for inter-model agreement. All four chatbots demonstrated high overall accuracy (≥ 97.4%), with Microsoft Copilot showing numerically the highest score (99.1%) although no statistically significant differences were observed among the models. ChatGPT-4 generated the fastest response (mean: 7.0 s), while Copilot provided the most detailed explanations. Performance was consistent across cognitive levels, with near-perfect accuracy in the "Applying" and "Analyzing" domains. Accuracy across subtopics was also high although minor discrepancies were noted in infectious diseases and oral potentially malignant disorders. Inter-chatbot agreement ranged from moderate to perfect (kappa = 0.315-1.00). Advanced AI chatbots, including ChatGPT-4, Copilot, Gemini, and DeepSeek, demonstrated similarly high performance in answering case-based multiple choice questions in oral medicine.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.316 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.177 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.575 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.468 Zit.