Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Generalist Large-Language Models for Spine Imaging Diagnostics: An Early Analysis of Detection Performance for Scoliosis and Lumbar Stenosis

2026·0 Zitationen·World NeurosurgeryOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

BACKGROUND: Web-based large language models (LLMs) are increasingly used by patients for medical self-assessment, but their efficacy in spine imaging diagnostics remains underexplored. This study systematically evaluated five leading multimodal LLMs-Grok 2, Grok 3, Grok 4, ChatGPT, and Gemini-for detecting scoliosis and lumbar spinal stenosis across radiographs and MRI modalities. METHODS: We assessed 171 full-length anterior-posterior radiographs (100 with scoliosis, 71 normal) and 200 axial T2-weighted lumbar spine MRIs (100 with severe stenosis, 100 normal) from public databases. Models were prompted without examples to identify pathology and quantify certainty (0-100%). Analyses included McNemar's test for accuracy and ANOVA for confidence levels. RESULTS: In scoliosis detection, Grok 4 exhibited superior accuracy (0.942), followed by Gemini (0.912), Grok 2 (0.890), ChatGPT (0.643), and Grok 3 (0.637). For stenosis, Gemini performed best (0.600), then Grok 4 (0.575), ChatGPT (0.545), Grok 2 (0.500), and Grok 3 (0.450). All models sustained >70% mean certainty (SD <5.3%) across pathologies. ChatGPT and Grok 3 demonstrated reduced confidence in erroneous scoliosis responses (p<0.0001), while only ChatGPT did so for stenosis. Gemini reported elevated confidence in incorrect stenosis responses (p<0.0001). CONCLUSIONS: LLMs perform highly in scoliosis detection but struggle to identify lumbar stenosis. ChatGPT's superior confidence calibration, suggests enhanced reliability. Performance inconsistencies across model iterations (e.g., Grok 3 underperforming Grok 2) underscore the necessity for specialized medical imaging training. Although promising for patient education in simple spine conditions, substantial advancements in accuracy and confidence metrics are essential prior to clinical adoption or broad patient utilization.

Autoren

Institutionen

University of Pennsylvania(US)

Themen

Medical Imaging and AnalysisArtificial Intelligence in Healthcare and EducationSpine and Intervertebral Disc Pathology

Volltext beim Verlag öffnen

Generalist Large-Language Models for Spine Imaging Diagnostics: An Early Analysis of Detection Performance for Scoliosis and Lumbar Stenosis

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen