OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 30.03.2026, 05:42

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT's Performance on the Orthopaedic In-Training Examination (OITE): No Better Than a PGY3 Resident?

2025·1 Zitationen·CureusOpen Access
Volltext beim Verlag öffnen

1

Zitationen

7

Autoren

2025

Jahr

Abstract

Background Orthopaedic In-Training Examination (OITE) performance is an important metric for assessing resident knowledge and directing education. ChatGPT (OpenAI, San Francisco, California, United States) is a novel artificial intelligence (AI) large language model (LLM) with the ability to emulate human conversation and provide access to large pools of knowledge. Recent reports using ChatGPT have achieved a passing performance on medical licensing (United States Medical Licensing Examination (USMLE)) and legal examinations. Whether ChatGPT can successfully complete the OITE and be used as a didactic component in the education of residents remains unknown. This study was thus conceived to compare ChatGPT OITE performance versus local and national resident performance over the last five years. Purpose As ChatGPT and other LLMs grow in popularity, they are more commonly used by orthopaedic trainees to aid information acquisition and didactic learning. This study was conducted to elucidate the accuracy with which LLMs can answer questions on the OITE, a standardized knowledge and decision-making exam that acts as a benchmark for all orthopaedic trainees. This will help evaluate the current utility of this technology to orthopaedic trainees as a didactic tool throughout training. Patients and methods ChatGPT was provided 200 (10 sets of 20) randomly chosen questions from OITE years 2018 to 2022. All images without direct links present in the selected OITE questions were uploaded to an image hosting service, and links were provided alongside the corresponding question text when entered into ChatGPT. The primary outcome of interest was the percentage of correct responses. ChatGPT's performance was compared against institutional resident averages, as well as national orthopaedic resident averages, for each PGY class. Statistical synthesis comprised the one-sample Wilcoxon test and the Kruskal-Wallis test with the Dunn-Sidak correction. Results ChatGPT underperformed all PGY year national averages of allopathic/Accreditation Council for Graduate Medical Education (ACGME)-accredited orthopaedic surgery residents (p<0.01). Local institution PGY3 (p=0.0444), PGY4 (p=0.0045), and PGY5 (p=0.0004) resident classes also performed significantly better on the OITE. ChatGPT performed best on the 2021 exam (47.3%) and worst on the 2020 exam (35.3%). Overall ChatGPT performance for all five years was not significantly different (p>0.05). Conclusions ChatGPT scored lower but statistically equivalent to PGY1-2 orthopaedic surgery residents at our institution, although performance was lower than national PGY1-5 resident averages on the 2018-2022 OITEs. This performance is likely to improve in future iterations of ChatGPT as this remains a text-based language tool, not yet validated for image interpretation. Future generative language applications of ChatGPT are broad-ranging for continuing education and the assessment of residents-in-training. Clinical relevance As ChatGPT continues to gain popularity, it will inevitably be used by orthopaedic trainees in preparation for the OITE and future board examinations. To our knowledge, this study is the first of its kind to evaluate ChatGPT's performance on the annual OITE, providing insight into its current accuracy and limitations. These findings help clarify its potential role as an adjunctive tool in resident education and future orthopaedic training.

Ähnliche Arbeiten