Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

2050 AtlasGPT: A Language Model Grounded in Neurosurgery

2025·1 Zitationen·Neurosurgery

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

INTRODUCTION: Large language models (LLMs) have shown promising performance on medical licensing exams, but their ability to excel in subspecialty domains and their robustness under adversarial conditions remain unclear. METHODS: AtlasGPT was built using GPT-4 with retrieval-augmented generation from expert-verified neurosurgical knowledge sources. Its performance was compared to GPT-4 and Gemini Advanced on a 149-question neurosurgery exam. Adversarial testing assessed robustness to misinformation. Answer explanations were rated by 15 independent neurosurgeons and compared to the question bank. RESULTS: Across all 149 questions, AtlasGPT achieved 90.6% accuracy, outperforming GPT-4 (80.5%, P=0.020) and Gemini Advanced (80.5%, P=0.020). On text-only questions, AtlasGPT, Gemini Advanced and GPT-4 achieved 95.6%, 92.9% and 87.8% accuracy, respectively. In adversarial testing, AtlasGPT was fooled 14% of the time, compared to 44% for GPT-4 and 68% for Gemini Advanced. Neurosurgeons rated AtlasGPT's answer explanations as significantly more comprehensive, relevant, and better-referenced than the question bank's explanations (P<0.001). CONCLUSIONS: AtlasGPT demonstrates the potential of subspecialty-focused LLMs to outperform general models, exhibit robustness to misinformation, and generate high-quality explanations. Domain-specific LLMs may improve medical knowledge, decision-making, and educational materials in complex fields like neurosurgery.

Autoren

Themen

Surgical Simulation and TrainingArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

2050 AtlasGPT: A Language Model Grounded in Neurosurgery

Abstract

Ähnliche Arbeiten

Autoren

Themen