Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

AI in Medical Education: A Comparative Analysis of GPT-4 and GPT-3.5 on Turkish Medical Specialization Exam Performance

2023·11 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2023

Jahr

Abstract

Abstract Background/aim Large-scale language models (LLMs), such as GPT-4 and GPT-3.5, have demonstrated remarkable potential in the rapidly developing field of artificial intelligence (AI) in education. The use of these models in medical education, especially their effectiveness in situations such as the Turkish Medical Specialty Examination (TUS), is yet understudied. This study evaluates how well GPT-4 and GPT-3.5 respond to TUS questions, providing important insight into the real-world uses and difficulties of AI in medical education. Materials and methods In the study, 1440 medical questions were examined using data from six Turkish Medical Specialties examinations. GPT-4 and GPT-3.5 AI models were utilized to provide answers, and IBM SPSS 26.0 software was used for data analysis. For advanced enquiries, correlation analysis and regression analysis were used. Results GPT-4 demonstrated a better overall success rate (70.56%) than GPT-3.5 (40.17%) and physicians (38.14%) in this study examining the competency of GPT-4 and GPT-3.5 in answering questions from the Turkish Medical Specialization Exam (TUS). Notably, GPT-4 delivered more accurate answers and made fewer errors than GPT-3.5, yet the two models skipped about the same number of questions. Compared to physicians, GPT-4 produced more accurate answers and a better overall score. In terms of the number of accurate responses, GPT-3.5 performed slightly better than physicians. Between GPT-4 and GPT-3.5, GPT-4 and the doctors, and GPT-3.5 and the doctors, the success rates varied dramatically. Performance ratios differed across domains, with doctors outperforming AI in tests involving anatomy, whereas AI models performed best in tests involving pharmacology. Conclusions In this study, GPT-4 and GPT-3.5 AI models showed superior performance in answering Turkish Medical Specialization Exam questions. Despite their abilities, these models demonstrated limitations in reasoning beyond given knowledge, particularly in anatomy. The study recommends adding AI support to medical education to enhance the critical interaction with these technologies.

Autoren

Mustafa Eray Kılıç

Institutionen

Dokuz Eylül University(TR)

Themen

Artificial Intelligence in Healthcare and EducationCardiac, Anesthesia and Surgical OutcomesRadiomics and Machine Learning in Medical Imaging

Volltext beim Verlag öffnen

AI in Medical Education: A Comparative Analysis of GPT-4 and GPT-3.5 on Turkish Medical Specialization Exam Performance

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen