OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 28.03.2026, 12:21

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of ChatGPT and Large Language Models on Medical Licensing Exams Worldwide: A Systematic Review and Network Meta-Analysis With Meta-Regression

2025·1 Zitationen·CureusOpen Access
Volltext beim Verlag öffnen

1

Zitationen

9

Autoren

2025

Jahr

Abstract

Large language models (LLMs) are increasingly being tested on national medical licensing examinations, yet existing research is fragmented across models, exam systems, and languages. This study is the first meta-analysis to systematically assess LLM performance across multiple medical licensing exams and languages using pooled estimates, network meta-analysis, and moderator-aware meta-regression. We synthesized accuracy data from 120 evaluations covering 10 exam systems in nine languages, identified through comprehensive searches of PubMed, Web of Science, and Institute of Electrical and Electronics Engineers (IEEE) Xplore, covering the period from 2021 to June 2025. The random-effects meta-analysis showed that 13 of 16 models exceeded the 60% passing threshold, with GPT-o1 leading at 95.4%, followed by DeepSeek-R1 (92.0%) and GPT-4o (89.4%). P-score rankings and network meta-analysis confirmed the superior performance of GPT-o1, while GPT-3.5 and LLaMA-13B consistently underperformed. Meta-regression revealed a significant variation in accuracy by model version, exam system, and language, with lower performance in Chinese and Japanese exams and higher performance in German and Peruvian settings. After adjustment for exam and language, GPT-o1 and DeepSeek-R1 achieved similar accuracy, both significantly higher than GPT-4. Model type, exam system, and language explained most of the between-study heterogeneity (R² =88.99%), and sensitivity analyses supported the robustness of the pooled estimates. Overall, several LLMs now approach or exceed the accuracy required to pass standardized medical licensing exams, supporting their potential role in medical education and decision support.

Ähnliche Arbeiten