OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 17.05.2026, 00:48

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Human Expertise Outperforms Artificial Intelligence in Medical Education Assessments: MCQ Creation Highlights the Irreplaceable Role of Teachers

2025·0 Zitationen·Avicenna Journal of MedicineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2025

Jahr

Abstract

Introduction: Multiple-choice questions (MCQs) are vital tools for assessment in education because they allow for the direct measurement of various knowledge, skills, and competencies across a wide range of disciplines. While artificial intelligence (AI) holds promise as a supplementary tool in medical education, particularly for generating large volumes of practice questions, it cannot yet replace the nuanced and expert-driven process of question creation that human educators provide. This study seeks to close the gap, particularly with regard to difficulty index, discrimination index, and distractor efficiency. Materials and Methods: A total of 50 medical students received a set of fifty randomized, blinded, validated MCQs by human physiology experts. Of these, 25 were made by AI, and the remaining 25 were made by qualified, experienced professors. Using the item response theory (IRT) framework, we calculated key metrics like item reliability, difficulty index, discrimination index, and distractor functionality. Results: = 0.45). However, significant differences emerged in other key quality metrics. The discrimination index, which reflects a question's ability to distinguish between high- and low-performing students, was notably higher for expert-created MCQs (Mean = 0.48, SD = 0.12) than for those generated by AI (Mean = 0.32, SD = 0.10), indicating a moderate-to-large effect (p = 0.0082, Chi-square = 11.7, df = 3). Similarly, distractor efficiency (DE), which evaluates the effectiveness of incorrect answer options, was significantly greater in expert-authored questions (Mean = 0.24, SD = 7.2) compared to AI-generated items (Mean = 0.4, SD = 8.1), with a moderate effect size (p = 0.0001, Chi-square = 26.2, df = 2). These findings suggest that while AI can replicate human-level difficulty, expert involvement remains crucial for ensuring high-quality discrimination and distractor performance in MCQ design. Conclusion: The findings suggest that AI holds promise, particularly in generating questions of appropriate difficulty, but human expertise remains essential in crafting high-quality assessments that effectively differentiate between levels of student performance and challenge students' critical thinking. As AI technology continues to evolve, ongoing research and careful implementation will be essential in ensuring that AI contributes positively to medical education.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationIntelligent Tutoring Systems and Adaptive LearningClinical Reasoning and Diagnostic Skills
Volltext beim Verlag öffnen