Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
2050 AtlasGPT: A Language Model Grounded in Neurosurgery
1
Zitationen
20
Autoren
2025
Jahr
Abstract
INTRODUCTION: Large language models (LLMs) have shown promising performance on medical licensing exams, but their ability to excel in subspecialty domains and their robustness under adversarial conditions remain unclear. METHODS: AtlasGPT was built using GPT-4 with retrieval-augmented generation from expert-verified neurosurgical knowledge sources. Its performance was compared to GPT-4 and Gemini Advanced on a 149-question neurosurgery exam. Adversarial testing assessed robustness to misinformation. Answer explanations were rated by 15 independent neurosurgeons and compared to the question bank. RESULTS: Across all 149 questions, AtlasGPT achieved 90.6% accuracy, outperforming GPT-4 (80.5%, P=0.020) and Gemini Advanced (80.5%, P=0.020). On text-only questions, AtlasGPT, Gemini Advanced and GPT-4 achieved 95.6%, 92.9% and 87.8% accuracy, respectively. In adversarial testing, AtlasGPT was fooled 14% of the time, compared to 44% for GPT-4 and 68% for Gemini Advanced. Neurosurgeons rated AtlasGPT's answer explanations as significantly more comprehensive, relevant, and better-referenced than the question bank's explanations (P<0.001). CONCLUSIONS: AtlasGPT demonstrates the potential of subspecialty-focused LLMs to outperform general models, exhibit robustness to misinformation, and generate high-quality explanations. Domain-specific LLMs may improve medical knowledge, decision-making, and educational materials in complex fields like neurosurgery.
Ähnliche Arbeiten
The SCARE 2020 Guideline: Updating Consensus Surgical CAse REport (SCARE) Guidelines
2020 · 5.572 Zit.
Virtual Reality Training Improves Operating Room Performance
2002 · 2.788 Zit.
An estimation of the global volume of surgery: a modelling strategy based on available data
2008 · 2.507 Zit.
Objective structured assessment of technical skill (OSATS) for surgical residents
1997 · 2.258 Zit.
Does Simulation-Based Medical Education With Deliberate Practice Yield Better Results Than Traditional Clinical Education? A Meta-Analytic Comparative Review of the Evidence
2011 · 1.708 Zit.