Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Accuracy of LLMs in medical education: evidence from a concordance test with medical teacher

2025·17 Zitationen·BMC Medical EducationOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

BACKGROUND: There is an unprecedented increase in the use of Generative AI in medical education. There is a need to assess these models' accuracy to ensure patient safety. This study assesses the accuracy of ChatGPT, Gemini, and Copilot in answering multiple-choice questions (MCQs) compared to a qualified medical teacher. METHODS: This study randomly selected 40 Multiple Choice Questions (MCQs) from past United States Medical Licensing Examination (USMLE) and asked for answers to three LLMs: ChatGPT, Gemini, and Copilot. The results of an LLM are then compared with those of a qualified medical teacher and with responses from other LLMs. The Fleiss' Kappa Test was used to determine the concordance between four responders (3 LLMs + 1 Medical Teacher). In case of poor agreement between responders, Cohen's Kappa test was performed to assess the agreement between responders. RESULTS: ChatGPT demonstrated the highest accuracy (70%, Cohen's Kappa = 0.84), followed by Copilot (60%, Cohen's Kappa = 0.69), while Gemini showed the lowest accuracy (50%, Cohen's Kappa = 0.53). The Fleiss' Kappa value of -0.056 indicated significant disagreement among all four responders. CONCLUSION: The study provides an approach for assessing the accuracy of different LLMs. The study concludes that ChatGPT is far superior (70%) to other LLMs when asked medical questions across different specialties, while contrary to expectations, Gemini (50%) performed poorly. When compared with medical teachers, the low accuracy of LLMs suggests that general-purpose LLMs should be used with caution in medical education.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

Accuracy of LLMs in medical education: evidence from a concordance test with medical teacher

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen