Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial Intelligence in the Trauma Bay: A Pilot Comparison With Surgical Trainees
0
Zitationen
9
Autoren
2026
Jahr
Abstract
Background Large language models (LLMs) have demonstrated strong performance on general medical knowledge assessments; however, their accuracy within high-acuity, guideline-driven surgical environments such as the trauma bay remains incompletely characterized. Objective To compare the accuracy of a contemporary LLM, Google Gemini, with junior general surgery residents on trauma knowledge questions derived from national practice management guidelines. Methods Thirty multiple-choice questions were developed from current trauma guidelines issued by nationally recognized professional organizations and independently validated by faculty trauma surgeons. Six junior general surgery residents (PGY-1-2) completed the assessment, generating 180 total responses. The LLM was tested on the same questions under standardized conditions. Accuracy was calculated with 95% confidence intervals and compared using a two-proportion z-test. Results Residents answered 157 of 180 questions correctly (87.2%, 95% CI 81.6-91.3). The LLM answered 27 of 30 questions correctly (90.0%, 95% CI 74.4-96.5). There was no statistically significant difference in accuracy between groups ( P = .67). Conclusion In this pilot study, a LLM demonstrated accuracy comparable to junior surgical residents when evaluated on trauma guideline–based questions. Although no significant difference was found, the findings of our exploratory study support cautious exploration of guideline-grounded artificial intelligence as an adjunct in surgical education while underscoring the need for broader validation. Further power studies are required to confirm these preliminary findings.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.707 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.613 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.159 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.875 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.