Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the Performance of a ChatGPT Model in Rheumatology Exams.
0
Zitationen
3
Autoren
2026
Jahr
Abstract
BACKGROUND: Large language models (LLMs) are rapidly advancing, with the potential to improve healthcare. While LLM performance on medical licensing exams were studied extensively, their performance in rheumatology exams requires specific evaluation. OBJECTIVES: To assess Chat Generative Pre-Trained Transformer (ChatGPT) performance on 200 validated Israeli rheumatology board exam questions. METHODS: ChatGPT performance was evaluated using 200 multiple-choice questions from the 2023 and 2024 Israeli official rheumatology board examinations. Three gpt-4-turbo based variants were assessed: base model (Model 1), few-shot chain of thought (CoT) model (Model 2), and knowledge-augmented prompting model incorporating rheumatology guidelines (Model 3). Model 1 was assessed using both the original Hebrew and a validated English translated version, while Models 2 and 3 were assessed using the English version only. RESULTS: Overall, Model 3 achieved the highest numerical accuracy (81%), followed by Model 1 (English, 77%), Model 2 (75%), and Model 1 (Hebrew, 74.5%); however, these differences were not statistically significant. Performance varied markedly by question type. For text-only questions (n=177), accuracies ranged from 78.5% to 83.1%, with Model 3 showing the highest point estimate (83.1%). In contrast, all models demonstrated substantially lower performance on questions that included images (n=23), with accuracies ranging from 34.8% to 65.2%. Model 3 yielded the highest numerical accuracy (65.2%). CONCLUSIONS: The study highlights the potential role of LLMs in rheumatology board examinations but also emphasizes their critical limitations. Future research should focus on addressing limitations, especially image interpretation and management of complex cases to enable efficient application of LLMs in rheumatology.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.758 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.666 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.220 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.896 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.