Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The Comparative Performance of Large Language Models on the Hand Surgery Self-Assessment Examination

2024·5 Zitationen·HandOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

BACKGROUND: Generative artificial intelligence (AI) models have emerged as capable of producing human-like responses and have showcased their potential in general medical specialties. This study explores the performance of AI systems on the American Society for Surgery of the Hand (ASSH) Self-Assessment Exams (SAE). METHODS: ChatGPT 4.0 and Bing AI were evaluated on a set of multiple-choice questions drawn from the ASSH SAE online question bank spanning 5 years (2019-2023). Each system was evaluated with 999 questions. Images and video links were inserted into question prompts to allow for complete AI interpretation. The performance of both systems was standardized using the May 2023 version of ChatGPT 4.0 and Microsoft Bing AI, both of which had web browsing and image capabilities. RESULTS: ChatGPT 4.0 scored an average of 66.5% on the ASSH questions. Bing AI scored higher, with an average of 75.3%. Bing AI outperformed ChatGPT 4.0 by an average of 8.8%. As a benchmark, a minimum passing score of 50% was required for continuing medical education credit. Both ChatGPT 4.0 and Bing AI had poorer performance on video-type and image-type questions on analysis of variance testing. Responses from both models contained elements from sources such as PubMed, Journal of Hand Surgery, and American Academy of Orthopedic Surgeons. CONCLUSIONS: ChatGPT 4.0 with browsing and Bing AI can both be anticipated to achieve passing scores on the ASSH SAE. Generative AI, with its ability to provide logical responses and literature citations, presents a convincing argument for use as an interactive learning aid and educational tool.

Autoren

Institutionen

Einstein Healthcare Network(US)

Themen

Artificial Intelligence in Healthcare and EducationDiversity and Career in MedicineClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

The Comparative Performance of Large Language Models on the Hand Surgery Self-Assessment Examination

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen