Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of ChatGPT in Israeli Arabic-language OBGYN national medical licensure exam
0
Zitationen
5
Autoren
2026
Jahr
Abstract
Previous studies of ChatGPT performance in the field of medical exams have reached contradictory results. The performance of ChatGPT in languages other than English, including Arabic, which is the official language of medical education and practice in many countries, has yet to be explored. We aim to evaluate the performance of ChatGPT in Arabic-language Israeli OBGYN medical licensure exams for foreign university alumni. We conducted a performance study using a consecutive sample of text-based multiple-choice questions, originated from authentic Arabic-language Israeli OBGYN medical licensure exams for foreign university alumni. ChatGPT-3.5 (using a newly created account) answered all questions in Arabic. We compared the performance of ChatGPT including in the different fields of the exam; Obstetrics, Reproductive medicine and Infertility, Gynecology and Gynecologic Oncology, and also compared ChatGPT Arabic performance vs. previously published English medical tests. Overall, 123 authentic questions were analyzed. ChatGPT correctly answered 54 questions (43.9%, 95% CI: 35.1% – 52.7%) and reached a score below 50%. There was no difference in ChatGPT performance in the four different subjects of the exam: Gynecologic Oncology (61.5%, 95% CI: 35.1% – 87.9%), Gynecology (44.0%, 95% CI: 24.5% – 63.5%), Obstetrics (42.3%, 95% CI: 28.9% – 55.7%), Reproductive medicine and Infertility (39.4%, 95% CI: 22.7% – 56.1%), p = .579. In a comparison to ChatGPT performance in 9,091 English language questions in the field of medicine, the performance of Arabic ChatGPT was lower (43.9% in Arabic vs. 60.7% in English, p < .001). ChatGPT-3.5 answered correctly approximately 44% of Arabic OBGYN medical licensure exam questions. At the time of writing of this manuscript, considering the results of our analysis, ChatGPT-3.5 cannot be considered a reliable primary tool for exam preparation in Arabic. Further research and efforts should be made to improve ChatGPT performance in other languages besides English especially Arabic.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.479 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.364 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.814 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.543 Zit.