Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The Comparative Performance of Large Language Models on the Hand Surgery Self-Assessment Examination
5
Zitationen
4
Autoren
2024
Jahr
Abstract
BACKGROUND: Generative artificial intelligence (AI) models have emerged as capable of producing human-like responses and have showcased their potential in general medical specialties. This study explores the performance of AI systems on the American Society for Surgery of the Hand (ASSH) Self-Assessment Exams (SAE). METHODS: ChatGPT 4.0 and Bing AI were evaluated on a set of multiple-choice questions drawn from the ASSH SAE online question bank spanning 5 years (2019-2023). Each system was evaluated with 999 questions. Images and video links were inserted into question prompts to allow for complete AI interpretation. The performance of both systems was standardized using the May 2023 version of ChatGPT 4.0 and Microsoft Bing AI, both of which had web browsing and image capabilities. RESULTS: ChatGPT 4.0 scored an average of 66.5% on the ASSH questions. Bing AI scored higher, with an average of 75.3%. Bing AI outperformed ChatGPT 4.0 by an average of 8.8%. As a benchmark, a minimum passing score of 50% was required for continuing medical education credit. Both ChatGPT 4.0 and Bing AI had poorer performance on video-type and image-type questions on analysis of variance testing. Responses from both models contained elements from sources such as PubMed, Journal of Hand Surgery, and American Academy of Orthopedic Surgeons. CONCLUSIONS: ChatGPT 4.0 with browsing and Bing AI can both be anticipated to achieve passing scores on the ASSH SAE. Generative AI, with its ability to provide logical responses and literature citations, presents a convincing argument for use as an interactive learning aid and educational tool.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.774 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.685 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.244 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.