Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing the performance of ChatGPT-4o on the Turkish Orthopedics and Traumatology Board Examination
3
Zitationen
3
Autoren
2025
Jahr
Abstract
OBJECTIVES: This study aims to assess the overall performance of ChatGPT version 4-omni (GPT-4o) on the Turkish Orthopedics and Traumatology Board Examination (TOTBE) using actual examinees as a reference point to evaluate and compare the performance of GPT-4o with that of human participants. MATERIALS AND METHODS: In this study, GPT-4o was tested with multiple-choice questions that formed the first step of 14 TOTBEs conducted between 2010 and 2023. The assessment of image-based questions was conducted separately for all exams. The questions were classified based on the subspecialties for the five exams (2010-2014). The performance of GPT-4o was assessed and compared to those of actual examinees of the TOTBE. RESULTS: The mean total score of GPT-4o was 70.2±5.64 (range, 61 to 84), whereas that of actual examinees was 58±3.28 (range, 53.6 to 64.6). Considering accuracy rates, GPT-4o demonstrated 62% accuracy on image-based questions and 70% accuracy on text-based questions. It also demonstrated superior performance in the field of basic sciences, whereas actual examinees performed better in the specialty of reconstruction. Both GPT-4o and actual examinees exhibited the lowest scores in the subspecialty of lower extremity and foot. CONCLUSION: Our study results showed that GPT-4o performed well on the TOTBE, particularly in basic sciences. While it demonstrated accuracy comparable to actual examinees in some areas, these findings highlight its potential as a helpful tool in medical education.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.773 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.682 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.242 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.