Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of ChatGPT and Large Language Models on Medical Licensing Exams Worldwide: A Systematic Review and Network Meta-Analysis With Meta-Regression
1
Zitationen
9
Autoren
2025
Jahr
Abstract
Large language models (LLMs) are increasingly being tested on national medical licensing examinations, yet existing research is fragmented across models, exam systems, and languages. This study is the first meta-analysis to systematically assess LLM performance across multiple medical licensing exams and languages using pooled estimates, network meta-analysis, and moderator-aware meta-regression. We synthesized accuracy data from 120 evaluations covering 10 exam systems in nine languages, identified through comprehensive searches of PubMed, Web of Science, and Institute of Electrical and Electronics Engineers (IEEE) Xplore, covering the period from 2021 to June 2025. The random-effects meta-analysis showed that 13 of 16 models exceeded the 60% passing threshold, with GPT-o1 leading at 95.4%, followed by DeepSeek-R1 (92.0%) and GPT-4o (89.4%). P-score rankings and network meta-analysis confirmed the superior performance of GPT-o1, while GPT-3.5 and LLaMA-13B consistently underperformed. Meta-regression revealed a significant variation in accuracy by model version, exam system, and language, with lower performance in Chinese and Japanese exams and higher performance in German and Peruvian settings. After adjustment for exam and language, GPT-o1 and DeepSeek-R1 achieved similar accuracy, both significantly higher than GPT-4. Model type, exam system, and language explained most of the between-study heterogeneity (R² =88.99%), and sensitivity analyses supported the robustness of the pooled estimates. Overall, several LLMs now approach or exceed the accuracy required to pass standardized medical licensing exams, supporting their potential role in medical education and decision support.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.324 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.189 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.588 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.470 Zit.
Autoren
Institutionen
- Peking University(CN)
- New York College of Podiatric Medicine(US)
- Chongqing Medical University(CN)
- American University of Integrative Sciences(SX)
- University of Gondar(ET)
- Gandhi Medical College & Hospital(IN)
- Gandhi Medical College(IN)
- University Hospitals of Leicester NHS Trust(GB)
- University College Cork(IE)
- Marquette University(US)