OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 02.05.2026, 00:27

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Applicability of ChatGPT to generate multiple-choice questions for ophthalmology resident exam

2025·0 Zitationen·British Journal of Ophthalmology
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2025

Jahr

Abstract

BACKGROUND/AIMS: Developing high-quality multiple-choice questions (MCQs) for medical education is a challenging and time-consuming task. This study aimed to assess the applicability of Chat Generative Pre-trained Transformer (ChatGPT) in generating MCQs for ophthalmology residents. METHODS: ChatGPT 4 was used to generate 100 MCQs, while an additional 100 MCQs were authored by university faculty. Item-writing flaws in both sets of questions were evaluated by a single reviewer. A quality assessment panel, consisting of board-certified ophthalmology subspecialists, compared the quality of two sets. Ophthalmology residents then answered all MCQs in a randomised order. The item difficulty and discrimination indices were calculated and compared between the two sets of questions. RESULTS: Item-writing flaws were more frequent in ChatGPT-generated MCQs (56%) compared with human-authored MCQs (27%, p<0.001). While ChatGPT-generated questions were comparable to human-written ones in most quality parameters, the distractor quality was significantly higher in human-generated MCQs (p=0.006). The mean resident scores were 46.5±9.5 for the ChatGPT-generated MCQs and 49.0±10.9 for the human-written MCQs (p=0.051). The difficulty index was 0.47±0.21 and 0.51±0.19, respectively (p=0.12). The discrimination index was significantly lower in the ChatGPT questions (0.20±0.19 vs 0.28±0.16, p<0.001). CONCLUSIONS: While ChatGPT can efficiently generate MCQs for ophthalmology residents, it has notable limitations, including higher rates of item-writing flaws and lower-quality distractors. Additionally, ChatGPT-generated MCQs are less effective at distinguishing high-performing from low-performing examinees. Integrating ChatGPT with human expertise is essential to enhance the quality and reliability of artificial intelligence-generated MCQs.

Ähnliche Arbeiten