Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of ChatGPT in Israeli Arabic-language OBGYN national medical licensure exam

2026·0 Zitationen·BMC Medical EducationOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Previous studies of ChatGPT performance in the field of medical exams have reached contradictory results. The performance of ChatGPT in languages other than English, including Arabic, which is the official language of medical education and practice in many countries, has yet to be explored. We aim to evaluate the performance of ChatGPT in Arabic-language Israeli OBGYN medical licensure exams for foreign university alumni. We conducted a performance study using a consecutive sample of text-based multiple-choice questions, originated from authentic Arabic-language Israeli OBGYN medical licensure exams for foreign university alumni. ChatGPT-3.5 (using a newly created account) answered all questions in Arabic. We compared the performance of ChatGPT including in the different fields of the exam; Obstetrics, Reproductive medicine and Infertility, Gynecology and Gynecologic Oncology, and also compared ChatGPT Arabic performance vs. previously published English medical tests. Overall, 123 authentic questions were analyzed. ChatGPT correctly answered 54 questions (43.9%, 95% CI: 35.1% – 52.7%) and reached a score below 50%. There was no difference in ChatGPT performance in the four different subjects of the exam: Gynecologic Oncology (61.5%, 95% CI: 35.1% – 87.9%), Gynecology (44.0%, 95% CI: 24.5% – 63.5%), Obstetrics (42.3%, 95% CI: 28.9% – 55.7%), Reproductive medicine and Infertility (39.4%, 95% CI: 22.7% – 56.1%), p = .579. In a comparison to ChatGPT performance in 9,091 English language questions in the field of medicine, the performance of Arabic ChatGPT was lower (43.9% in Arabic vs. 60.7% in English, p < .001). ChatGPT-3.5 answered correctly approximately 44% of Arabic OBGYN medical licensure exam questions. At the time of writing of this manuscript, considering the results of our analysis, ChatGPT-3.5 cannot be considered a reliable primary tool for exam preparation in Arabic. Further research and efforts should be made to improve ChatGPT performance in other languages besides English especially Arabic.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)Adversarial Robustness in Machine Learning

Volltext beim Verlag öffnen

Performance of ChatGPT in Israeli Arabic-language OBGYN national medical licensure exam

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen