OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 24.05.2026, 23:32

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Large Language Models as Tools to Generate Radiology Board-Style Multiple-Choice Questions

2024·48 Zitationen·Academic RadiologyOpen Access
Volltext beim Verlag öffnen

48

Zitationen

6

Autoren

2024

Jahr

Abstract

Rationale and ObjectivesTo determine the potential of large language models (LLMs) to be used as tools by radiology educators to create radiology board-style multiple choice questions (MCQs), answers, and rationales.MethodsTwo LLMs (Llama 2 and GPT-4) were used to develop 104 MCQs based on the American Board of Radiology exam blueprint. Two board-certified radiologists assessed each MCQ using a 10-point Likert scale across five criteria—clarity, relevance, suitability for a board exam based on level of difficulty, quality of distractors, and adequacy of rationale. For comparison, MCQs from prior American College of Radiology (ACR) Diagnostic Radiology In-Training (DXIT) exams were also assessed using these criteria, with radiologists blinded to the question source.ResultsMean scores (±standard deviation) for clarity, relevance, suitability, quality of distractors, and adequacy of rationale were 8.7 (±1.4), 9.2 (±1.3), 9.0 (±1.2), 8.4 (±1.9), and 7.2 (±2.2), respectively, for Llama 2; 9.9 (±0.4), 9.9 (±0.5), 9.9 (±0.4), 9.8 (±0.5), and 9.9 (±0.3), respectively, for GPT-4; and 9.9 (±0.3), 9.9 (±0.2), 9.9 (±0.2), 9.9 (±0.4), and 9.8 (±0.6), respectively, for ACR DXIT items (p < 0.001 for Llama 2 vs. ACR DXIT across all criteria; no statistically significant difference for GPT-4 vs. ACR DXIT). The accuracy of model-generated answers was 69% for Llama 2 and 100% for GPT-4.ConclusionA state-of-the art LLM such as GPT-4 may be used to develop radiology board-style MCQs and rationales to enhance exam preparation materials and expand exam banks, and may allow radiology educators to further use MCQs as teaching and learning tools. To determine the potential of large language models (LLMs) to be used as tools by radiology educators to create radiology board-style multiple choice questions (MCQs), answers, and rationales. Two LLMs (Llama 2 and GPT-4) were used to develop 104 MCQs based on the American Board of Radiology exam blueprint. Two board-certified radiologists assessed each MCQ using a 10-point Likert scale across five criteria—clarity, relevance, suitability for a board exam based on level of difficulty, quality of distractors, and adequacy of rationale. For comparison, MCQs from prior American College of Radiology (ACR) Diagnostic Radiology In-Training (DXIT) exams were also assessed using these criteria, with radiologists blinded to the question source. Mean scores (±standard deviation) for clarity, relevance, suitability, quality of distractors, and adequacy of rationale were 8.7 (±1.4), 9.2 (±1.3), 9.0 (±1.2), 8.4 (±1.9), and 7.2 (±2.2), respectively, for Llama 2; 9.9 (±0.4), 9.9 (±0.5), 9.9 (±0.4), 9.8 (±0.5), and 9.9 (±0.3), respectively, for GPT-4; and 9.9 (±0.3), 9.9 (±0.2), 9.9 (±0.2), 9.9 (±0.4), and 9.8 (±0.6), respectively, for ACR DXIT items (p < 0.001 for Llama 2 vs. ACR DXIT across all criteria; no statistically significant difference for GPT-4 vs. ACR DXIT). The accuracy of model-generated answers was 69% for Llama 2 and 100% for GPT-4. A state-of-the art LLM such as GPT-4 may be used to develop radiology board-style MCQs and rationales to enhance exam preparation materials and expand exam banks, and may allow radiology educators to further use MCQs as teaching and learning tools.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Radiology practices and educationArtificial Intelligence in Healthcare and EducationInnovations in Medical Education
Volltext beim Verlag öffnen