OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.05.2026, 02:58

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

When AI models take the exam: large language models vs medical students on multiple-choice course exams

2025·4 Zitationen·Medical Education OnlineOpen Access
Volltext beim Verlag öffnen

4

Zitationen

10

Autoren

2025

Jahr

Abstract

= 442) were summarized as mean ± SD or median (IQR). Pairwise differences between models were explored with McNemar's test; student-LLM contrasts were descriptive. Across courses, LLMs consistently exceeded the student median and, in several instances, the highest student score. Mean LLM courses scores ranged 7.46-9.88, versus student means 4.28-7.32. OpenAI o1 achieved the highest mean in three courses; Copilot led in Cardiovascular Medicine (text-only subset due to image limitations). All LLMs answered every MCQ and short term test-retest agreement was high (AC1 0.79-1.00). Aggregated across courses, LLMs averaged 8.75 compared with 5.76 for students. On department-set Spanish MCQ exams with negative marking, LLMs outperformed enrolled medical students, answered every item, and showed high short-term reproducibility. These findings support cautious, faculty-supervised use of LLMs as adjuncts to MCQ assessment (e.g. automated pretesting, feedback). Confirmation across institutions, languages, and image-rich formats, and evaluation of educational impact beyond accuracy are needed.

Ähnliche Arbeiten